Quote:As the authors used more and more markers to compare the three major racial groups (Europeans, East Asians, and sub-Saharan Africans), the less stringent clustering measurements rapidly fell to a 0% overlap, as expected from previous studies. What about the more stringent measurement “w”, which looks at comparisons between individuals, and does not consider group data? Once the authors reached 1,000 (or more) markers, the genetic overlap between these groups essentially reached zero. It is useful at this point to quote the authors about this fundamentally important finding: “This implies that, when enough loci are considered, individuals from these population groups will always be genetically more similar to members of their own group.” With respect to the question of whether individual members of one group may be genetically more similar to members of another group, they write:
However, if genetic similarity is measured over many thousands of loci, the answer becomes “never” when individuals are sampled from geographically separated populations.
Thus, the naďve “anti-racist” view, actually stated by some people, that it is possible for individual Europeans and Africans to be more genetically similar to each other than to members of their own race, is simply false. Any such “finding” is simply due to insufficient numbers of DNA markers being used. With an adequate methodology, individual members of the major racial groups will always be more similar to members of their own group than to members of other groups. Some may not like this, and deem it “racist,” but these are the scientific facts, nonetheless.
For whatever reason, the authors were not satisfied with ending their study with these findings and decided to repeat their data analysis incorporating populations they term “intermediate” or “admixed.” These included New Guineans, South Asians, Native Americans, African Americans and “Hispano-Latino” groups. Not unexpectedly, it became somewhat more difficult to distinguish between groups, with a given number of markers, when these additional “intermediate/admixed” populations were added. Even with more than 10,000 markers, the “w” measurement and the clustering measurements never quite reached zero with respect to overlap, although the numbers were low. For example the authors state that with 1,000 or more markers the “w” measurement reached a value of 3.1%, meaning that even with the intermediate/admixed populations, genetic overlap was at a frequency of less than 5%.
Do these latter findings mean that there will always be genetic overlap between members of more closely related groups, especially when so-called “intermediate” and “admixed” populations are considered? Although some people may fervently wish that 100% accurate classification will remain impossible, except for the most widely divergent groups, this may well not be the case. We are entering an era in which reasonably affordable whole genome sequencing will be possible, and with the proper methodologies, it will be possible to compare a number of markers considerably larger than what is used in the current paper. While 10,000 markers may not be sufficient to eliminate overlap between all groups completely – although it does reduce the overlap to very low levels – it is possible that larger numbers of markers, or even whole genome comparisons, could do so. With more data, it may well be possible to distinguish, with near 100% accuracy, between groups that still demonstrate a low level of “w” with current data.
Then we must consider the issue of genetic structure, not directly addressed in this study. Although structure can include such genetic phenomena as inversions, deletions, and copy number variation, the major component of genetic structure is the co-inheritance of specific genes. In other words, we must consider not only the frequencies of each gene taken in turn, but the frequencies of specific genes together. For example, there are genes that code for eye color, skin color, hair color, etc. One can examine the frequency of each gene on a one-by-one basis in an individual (or group) and do all the pairwise comparisons to another individual (or group) and determine “w.” But what are the frequencies of particular combinations of gene types inherited together? For example, what is the frequency of having genes for blue eyes and blonde hair and fair skin, etc. co-inherited, rather than measuring the frequencies of each of these genes in turn and averaging the results? Genetic structure superimposes further genetic differences on top of one-by-one consideration of genes; therefore, differences between groups are going to be larger when structure is considered compared to when only frequency differences of individual genes are measured and averaged.
To further explain the difference between genetic similarity and genetic structure, I present an analogy using colored marbles. Assume that individuals of different races each have a set of marbles, numbered from one to 100, with the marbles being of various colors. Genetic similarity (the basis of the “w” metric) would be analogous to comparing the marbles of two individuals one-by-one; first comparing the color of marble #1, then #2, then #3, and so forth, on an individual basis and then counting the total number of matches. Genetic structure, on the other hand, would be analogous to asking if the two individuals have similar, or even identical