Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

American Indian mtdna and Y Chromosome Genetic Data - Book Summary - Indian literature, Summaries of Indian Literature

Over the past two decades physical anthropologists and molecular geneticists have begun to use molecular data in their attempt to answer several, long standing questions, especially those of migrations in human history.

Typology: Summaries

2010/2011

Uploaded on 12/08/2011

aramix
aramix 🇬🇧

4.5

(28)

368 documents

1 / 135

Toggle sidebar

Related documents


Partial preview of the text

Download American Indian mtdna and Y Chromosome Genetic Data - Book Summary - Indian literature and more Summaries Indian Literature in PDF only on Docsity!

American Indian mtDNA and Y Chromosome Genetic Data: A

Comprehensive Report of their Use in Migration and Other

Anthropological Studies

by Peter N. Jones July 31, 2004

________________________________________________________________________

A. INTRODUCTION

Over the past two decades physical anthropologists and molecular geneticists have begun to use molecular data in their attempt to answer several, long standing questions, especially those of migrations in human history. This new line of research has been called genetic anthropology, molecular anthropology, and archaeogenetics (see Renfrew & Boyle, 2000). For the purposes of this report the term molecular anthropology will be used. Presently most of the new information related to this field has come from DNA obtained from living populations, though there are a handful of studies using ancient DNA (aDNA), molecular data recovered from historic bones, teeth, and other resources. Within this new line of research, many procedures have been used, some more robust than others, which have allowed researchers to yield conclusions about the relationships between populations, both past and present. Therefore, researchers have been able to hypothesize about present relationships between populations and their demographic histories, which are situated in the past.

This report is a comprehensive analysis of the history, theory, and current state of the field of molecular anthropology, focusing on the use of mitochondrial DNA (mtDNA) and Y chromosome genetic material of North American Indians. The report begins with a methodology section, followed by a brief history of the development of this new line of research. Following these two sections is one covering the relevant terminology and conceptions employed within the molecular anthropological field. Penultimately, two sections discussing the major findings of the field are offered, the first focusing on mtDNA research, and the second discussing Y chromosome research. Finally the report ends with a summary and conclusion. An appendix is also included, abstracting every study that could be located, as well as giving the title of the study, authors, citation, funding sources for the study, and genetic materials used.

B. METHODS

This study was conducted using the standard methods of a systematic analysis. Systematic analyses provide a rational synthesis of the research base and offer clear advantages to decision-makers. They attempt to overcome the deficiencies of narrative reviews and polemics by applying rigorous standards. Good systematic reviews take great care to find all the relevant studies (published and unpublished), assess each study for the quality of its design and execution, and combine the findings from individual studies in an unbiased manner. In this way they aim to present a balanced and impartial summary of the existing research evidence.

As a result, a comprehensive, in depth literature review was conducted using the following keywords: American Indians, American Indians, mtDNA, Y chromosome(s), genetics, genes, migration, DNA, haplogroup, haplotype, affiliation, North America, and Americas. These keywords were used to search the following indexes and databases: Web of Science (ISI); ABI Inform (OCLC); OCLC First Search; SilverPlatter; EBSCO; Cambridge Scientific Abstracts; National Library of Medicine; University of Colorado Catalogue; ScienceDirect; Anthropology Plus; Annual Review of Anthropology; Social Sciences Abstracts; Biological Sciences; and MedLine.

Both single word searches, as well as combination and Boolean word searchers, were conducted. Once all studies were located using the above search criteria, bibliographies of each study were cross-referenced to see if any studies had been missed. Furthermore, a general search was conducted on the World Wide Web using the following search engines: google, yahoo, altavista, lycos, and msn.

Once all studies were located, they were compiled, abstracted, and search for information concerning funding sources and genetic lines used in the study. All of the studies found are listed in alphabetical order in the Appendix.

C. MOLECULAR ANTHROPOLOGY: A REVIEW OF THE FIELD

The first attempt at reconstructing historical population movements (also referred to in the literature as demographic history, prehistoric or ancient migrations, prehistoric or ancient population affiliations, or phylogenetic relationships) on the basis of “classical” genetic data based on samples taken from living populations was undertaken in 1963 by Cavalli-Sforza and Edwards in a pioneering paper entitled Analysis of human evolution (published in 1965). Classical genetic data uses proteins and blood groups, as opposed to molecular genetic studies that use mtDNA and Y chromosome data. These same authors went on to compile their magisterial The History and Geography of Human Genes (Cavalli-Sforza, Menozzi, & Piazza, 1994), which relied primarily upon classical genetic markers sampled on a worldwide basis. This volume has been taken as a marker of the end of the “first phase” in the development of molecular anthropology, at a time when the use of such classical genetic markers was replaced by DNA studies.

The “second phase,” which this report is concerned about, is currently in full spate, and was initiated by the earliest papers utilizing DNA sequencing for the reconstruction of human population histories. One of the first of these, entitled Evolutionary relationships of human populations from an analysis of nuclear DNA polymorphisms (Wainscoat et al.,

  1. used nuclear DNA. The important paper by Cann, Stoneking, and Wilson (1987) entitled Mitochondrial DNA and human evolution , was one of the first to utilize the potential for studying specific lineages offered by mitochondrial DNA for the female line. These studies focused on the larger questions of human evolution, and not necessarily the demographic history of North American Indians.

Recent studies, however, have focused extensively on North American Indian demographic histories using both mtDNA and Y chromosome data. These studies have

used genetic data to claim a link between European/Western Asian populations and North American Indian populations; both populations having recent common ancestry (Brown et al., 1998). Other studies have claimed to have identified a single wave of migration for the peopling of the Americas (Bianchi et al., 1997; Bianchi et al., 1998; Easton, Merriwether, Crews, & Ferrell, 1996; Merriwether & Ferrell, 1996; Merriwether, Hell, Vahlne, & Ferrell, 1996; Merriwether, Rothhammer, & Ferrell, 1995) as opposed to several (Karafet et al., 1997; Karafet et al., 1999), while a few studies have concluded that some American Indian tribes have recently moved into specific geographic areas (Kaestle, 1995; 1997; 1998; Kaestle & Smith, 2001b), despite contrary evidence from oral history and archaeology. Some of the most publicized uses of molecular anthropology in recent years concerns the question of biological affiliation, as part of compliance procedures associated with the Native American Graves Protection and Repatriation Act (NAGPRA), exemplified in such cases as the Spirit Cave Mummy and the Kennewick Man repatriation controversies (Kaestle, 2000a; 2000b; Merriwether & Cabana, 2000; Merriwether, 2000; Tuross & Kolman, 2000). In these two examples the situation is complicated by the great antiquity of the skeletons, 9,415+/-25 years ago and 8410+/-60 years ago, respectively (Napton 1997; Chatters 2000).

This new line of research, however, is not only concerned with migrations. In fact, the potential benefits from this new line of research offers to be vast and highly valuable. Such potential benefits include a better understanding of the genetic and evolutionary factors that influence populations; an understanding of maternally transmitted diseases such as blindness, epilepsy, dementias, cardiac and skeletal muscle diseases, diabetes mellitus, and movement disorders; the development of new metabolic and genetic therapies for mitochondrial diseases; and a better understanding of the geographic origin of anatomically modern humans, to name just a few. However, these studies are still rather rare because of the lack of information, partially resolved with the sequencing of the human genome.

Instead, most studies use the new molecular genetic data to investigate questions of biological affiliation and historic population movements. The earliest studies identified to use either mtDNA or Y chromosome material in exploring the question of historical population movements are those of Aquadro and Greenberg (1983), Johnson, Wallace, Ferris, Rattazzi, and Cavalli-Sforza (1983), and Salzano (1982). Erdtmann, Salzano, and Mattevi published an early paper discussing the use of Y chromosome data and South American Indians (1981), but the first studies to discuss North American Indians were not done until the mid-1980s (Paabo, Gifford, & Wilson, 1988; Wallace, Garrison, & Knowler, 1985).

D. CONCEPTS AND TERMINOLOGY

This new line of research draws on the fields of anthropology and molecular genetics, and therefore, many terms are used that may not be familiar to many. As a result, this section covers the concepts and terminology found within the literature.

We all carry deoxyribonucleic acid (DNA) in every cell of our bodies, which has been passed down almost unchanged from our earliest ancestors. DNA is the messenger of heredity. A simple metaphor to help explain much of the terminology used in molecular anthropology is that of an instruction manual. We can view DNA as a set of written instructions on how to build a human with the chromosomes acting as volumes of the manual. Not surprisingly, these instructions are immensely complicated and nowhere near fully understood. Nonetheless, the language of the instructions is very straightforward. Like many languages, the meaning is contained within a sequence of symbols or letters, of which the genetic language contains four symbols. These four symbols are the simple organic chemicals adenine, cytidine, guanine, and thymidine, always referred to as A, C, G, and T. These four chemicals, the nucleotide bases, are joined together one after another in a long molecular chain that forms DNA. In fact, the DNA molecule consists of two strands, the famous double-helix, each one containing the same information in its sequence of bases but in a complementary way. Therefore, when A appears in one strand, it is always opposite a T in the other. G and C are similarly matched.

When cells divide, DNA must be copied so that each daughter cell receives a full set of instructions. This is accomplished by unwinding the double-helix, and using each single strand as a template, to make two new identical double-helices. Because of the complementarity of the bases, the sequence remains intact. The copying mechanism is remarkably exact, but there are occasional mistakes, called mutations. It is these mutations, introduced randomly, that molecular anthropologists look for to compare.

Molecular anthropology doesn’t compare the blood from one individual to that of another, but instead compares polymorphic genetic frequencies to those of others. A polymorphism is the condition that within a population there exist differences in the population genetic structure, based on mutations. This implies the presence of two or more alleles – actual alternative variants that are similar but not identical – located at a particular position or locus on a chromosome. The human genome – the collective name for all DNA in each cell – is organized into what are called chromosomes, separate “volumes” of the human genome that reside in the cell nucleus. All of the chromosomes contain the three billion symbols that make up the human genome. There are twenty-four different chromosomes in the human genome, twenty-two from the parents (eleven from the mother, eleven from the father). These twenty-two chromosomes are collectively known as the autosomes, which distinguish them from the X and the Y sex chromosomes (chromosomes 23 and 24). Females have a pair of X chromosomes while males have both an X and a Y chromosome.

The classic biochemical approach for investigating historic population movements consists in taking samples, usually blood samples, from a well-defined human population and testing these to determine the presence or absence of alleles for the given polymorphisms under investigation. The number of individuals within the given sample of the population who possess a particular allele is then expressed as a gene frequency. These molecular frequencies are called haplotypes, and several haplotypes (i.e., gene frequency varieties) make up a haplogroup.

By documenting the various mutations found in a population (the population’s molecular frequency for a particular allele) and comparing these molecular frequencies to those of other populations, molecular anthropologists can begin to reconstruct the historic population movements of the two populations under study. This is usually done using theories modeling coalescent times and divergence times, which will be explained in further detail below.

The human genome contains one other piece of DNA, which is contained not in the nucleus but in small particles in the cell cytoplasm, called mitochondrial DNA (mtDNA). It is much smaller than the nuclear genome with just over 16,500 bases compared with the 3000 million bases found in the nuclear DNA. MtDNAs peculiar genetic characteristics, however, have made it a central component of molecular anthropological research.

As discussed above, there are several elements that make mtDNA particularly useful in the study of historic population movements. The first is that unlike nuclear DNA it is inherited from only one parent – the mother. This is because human eggs have a large cytoplasm full of mitochondria while human sperm contains only a few, and those either do not get into the fertilized egg or are eliminated shortly afterwards. This has two major implications. First, this means that all people inherit all of their mtDNA from their mother, who inherited her mtDNA from her mother, and so forth. Therefore, at any time in the past only one woman was an individual’s maternal and hence mtDNA ancestor. The other important implication is that mtDNA does not undergo genetic recombination. Recombination is the device used by chromosomes to shuffle their genes at every generation, which has the evolutionary advantage that new, favorable gene combinations occasionally emerge. These two features have proved useful because it has allowed molecular anthropologists the ability to track the rare mutations that arise between generations of the mtDNA, thus allowing them to document mtDNA allele frequencies of particular maternal lines. It is these allele frequencies that can then be compared to other allele frequencies to calculate when the two maternal lines historically diverged.

The Y chromosome also shares these two features of mtDNA, namely uniparental inheritance and a lack of recombination. Molecular anthropologists use variations that arise through mutations in both mtDNA and Y chromosome data to trace populations. Most mutations arise during the DNA-copying process prior to cell division. The simplest type of mutation, known as a point mutation, is the replacement of one base (A, C, G, or T) by another. This always happens in one individual cell in one individual person. To be passed on to the next generation the mutations must occur in the so-called germ line cells that are the precursors of either eggs or sperm. All sorts of mutations occur in other body cells, but these are irrelevant to the study of historic population movements, because they do not get passed on to the next generation. Furthermore, the mutations will have to increase to be noticed at all. If the new mutation does not alter the biological fitness of the individuals carrying it, in other words if it is a neutral change, then the process by which it spreads, or is eliminated, is governed purely by chance and referred to as genetic drift. Therefore, taking the Y chromosome as an example, suppose

a mutation happened in the germ line of a male. If he did not have any sons then the new mutation would not be passed on. However, if a mutation arises in a males body cells, it does not matter if he has sons or not, because body cells are not transmitted from generation to generation. MtDNA is more complicated, however, since a new mutation arising in a female germ line will only be in one molecule to begin with. There are thousands of mtDNA in each cell, and along the line of cell divisions to the mature egg cell the number of mitochondria are successfully reduced to one, so that there is a chance that this new mutation might not get passed on to the next generation. If, however, the mutated mtDNA cell does slip through the cellular “bottleneck” and get into the next generation, the new allele might manage to reach a reasonable proportion in the egg cells sufficient to be noticed in, say a blood sample of the individual. This transition state, between detectability and undetectability, is known as heteroplasmy and persists for half a dozen generations or so before the new mutation either takes over the entire germ line (fixation) or recedes into oblivion. Even when fixed, though, the new version is far from secure because if the women who carry it do not have any daughters then it will not be passed on any further.

As mentioned, single point mutations are the commonest source of variation in mtDNA. The circular mitochondrial genome contains the genes for making the components of aerobic metabolism, leaving about 1000 bases that do not code for anything (that researchers have found yet). Mitochondria, because they do not recombine, are not able to correct for copying errors, which has led researchers to theorize that mtDNA undergoes a constant mutation rate, estimated at 20 times faster than that of nuclear DNA. Furthermore, the 1000 bases of non-coding mtDNA – the so-called control region (CR) – accumulates mutations even faster than the rest of the mtDNA, making it by far the most variable stretch of DNA in the whole human genome. Sequencing this 1000 base segment, or even just two segments within it, called hypervariable segments I and II (HVS I and II), has proven to be a very productive way of researching mtDNA variation.

There are other variable bases outside the control region and a selection of these is often recruited to clarify the control region variation. Because these mutations are rather thinly spread they tend not to be sequenced directly, but detected by their ability to create or destroy so-called restriction sites. Restriction sites are short DNA sequences, typically 4

  • 6 bases in length. These areas of the genome are referred to as Restriction Fragment Length Polymorphisms (RFLP).

The Y chromosome also has, spread along its length, a useful selection of RFLPs, now often referred to as the bi-allelic markers, so named because they distinguish two alleles, one where the restriction site is intact and the other where the site is disrupted. There is also another useful source of variation in the Y chromosome, called the microsatellites. For some reason certain very short DNA sequences of 2, 3, or 4 nucleotides in length have a tendency to grow. The short sequence is repeated several, sometimes hundreds, of times. In some, but not all, the number of repeats is unstable and several versions of different lengths are to be found. This has proven to be an excellent source of variation, especially when combined with the bi-allelic data from the same individual. Variable minisatellites, tandemly repeated units often much longer than microsatellites and

frequently with a complex internal structure, are found more rarely, but when they are, such as MSY1 located on the Y chromosome, they can also be useful. The study of these regions on the mtDNA and Y chromosome and their allele frequencies are called haplotypes. Many haplotypes make up a haplogroup, and it is this larger entity that is used to compare one population with another.

1. H APLOTYPES AND H APLOGROUPS

Although researchers have noted that limitations exist when studying only one gene (Chen et al., 2000; Karafet, Zegura, Vuturo-Brady, Posukh, Osipova, Wiebe, Romero, Long, Harihara, Jin, Dashnyam, Gerelsaikhan, Keiichi, & Hammer, 1997; Mountain & Cavalli-Sforza, 1997), most studies still rely on only one gene and its alleles because of the ease in identifying differences in a restricted location on that gene , especially in non- recombining genes such as mtDNA and Y chromosome. The allele sequences that are studied are called haplotypes, which for American Indians presently fall into five larger recognized haplogroups (A, B, C, D, and X), and have been used in most studies concerning American Indian population genetics.

It is necessary to point out several assumptions underlying the uses of haplotypes and haplogroups. First, many studies use within-local-population frequencies for the genetic sequences, which are highly affected by each population’s specific recent demographic history, and the possibility exists that researchers will underestimate the nucleotide diversity of the population as a whole (Bonatto & Salzano, 1997a). Therefore, the differing results between CR (control region) sequences and RFLP (restriction fragment length polymorphism) sequence data cannot be explained either by sample size or attributed to the different ways in which the haplogroup frequencies were treated, but are more probably due to the different populations or regions of the DNA studied.

Likewise, as previously noted, the only changes introduced into genes are either point mutations, insertions, and deletions (with insertions and deletions being rare in comparison to point mutations). This means that each possible founding lineage cluster can be thought of as containing the founding lineage haplotype plus a collection of that lineage’s descendants. However, there are several problems inherent in this assumption, notably that the original Y chromosome can eventually die out, shifting time, haplotype frequency, or relationships of the population under study (Bradman & Thomas, 1998), and can result in faulty data when comparing a present population’s frequencies to those of an ancient population’s haplotype frequencies. As Bradman and Thomas (1998) pointed out using the insertion of the YAP (Y chromosome alu polymorphism) indel (insert) on the Y chromosome, descendents of individuals after only one generation may not carry the same Y chromosome alleles. It is possible that a descendent of the individual who first acquired the YAP indel may lose that indel, yet still remain a descendent of that individual. This is also possible with mtDNA, where a father’s son or daughter will not carry the genetic information of that person’s father’s mother. By only looking at specific alleles, mutations, insertions, and deletions can be viewed as coming from discontinuous populations. Furthermore, as Bianchi et al have pointed out, “the combination of a decrease in the effective population size and genetic hitch-hiking may

have been the cause producing a single variety of Y-chromosomes in the earliest ancestors of extant Amerindians,” (1997, p. 87). If this is correct, then spurious results may arise when determining biological affiliation between populations. Finally, as noted, the mitochondrial genome undergoes no recombination, and therefore the 16,569-bp genome behaves evolutionarily as a single locus. As MacEachern (2000, p. 358) noted, “In particular, it appears that there may be significant variability in selection mechanisms on the genome itself and in the mitochondria and in rates of phylogenetic versus intergenerational mtDNA mutation that are only now being appreciated (Gibbons 1998; Parsons, Muniec, and Sullivan 1997).” Therefore, inferences from any one such locus lack robustness (Pamilo & Nei, 1998).

2. COALESCENT TREES AND G ENE TREES

Once the haplotypes of a population have been determined, a unique gene tree can be constructed from the configuration of mutations under the assumption that point mutations arise at sites only once in time, without any back or parallel mutation. The coalescent tree is hypothetically a perfect phylogeny representing the mutation history of that haplotype back in time. The coalescent tree is equivalent to the DNA sequence data, and because it hypothetically represents the ancestry of the population, it is common to think of the DNA sequence data as a phylogenetic tree. It is important to remember, however, that the data is not independent, however, because of the relationship through shared ancestry. The likelihood of a coalescent tree under a stochastic coalescent model of evolution can be found by advanced simulation techniques, thus allowing a maximum likelihood estimation of the parameters using the full information in the data. The distribution of the time to the most recent common ancestor and ages of mutations in the tree, conditional on its typology, can also be found by simulation techniques. Computing likelihoods by computer intensive methods for samples of DNA sequences under general models is currently a very active research area. Some of the approaches used are Importance Sampling (Bahlo & Griffiths, 2000; Fearnhead & Donnelly, 2001; Griffiths & Marjoram, 1996; Griffiths & Tavare, 1994a; 1994b; 1994c; Nielsen, 1997; Slade, 2000a; 2000b); Markov Chain Monte Carlo (MCMC) by Felsenstein and colleagues (1995; 1998; 1997); and other approaches, such as MCMC of a Bayesian nature by Wilson and Balding (1998), Beaumont (1999), and Markovtsova et al. (2000a; 2000b).

One important quantity in classical population genetics is the inbreeding coefficient, a measure of mating between relatives, that is defined as the probability that a pair of genes at a locus are identical by descent. A pair of genes are considered identical by descent if both are derived from the same gene in a common ancestor (Crow & Kimura, 1970), and Gustave Malecót (1955) was one of the first who clearly distinguished this concept from identity-in-state. The inbreeding coefficient is intimately related to the effective number of individuals in a population that is the size of an ideally behaving population that would have the same decrease in heterozygosity as the observed population. The effective number or effective population size is used if there are fluctuations in the population number from time to time, or if the distribution of number of progeny per parent is nonbinomial, or if there is any other kind of deviation from the idealized model that has been assumed (Crow & Kimura, 1970). Therefore, in a hypothetically random mating

population of effective number Ne, the inbreeding coefficient is 1/(2Ne) at a diploid locus. Malecot (1955; 1967) showed that the expected heterozygosity decreases in time with this rate of inbreeding coefficient and that in a randomly mating population of N 1 males and N 2 females, Ne is given by 4N 1 N 2 /(N 1 + N 2 ). This theory has been formulated into the coalescence theory of modern population genetics (Kingman, 1982). Furthermore, under neutrality (Kimura, 1968), the number of polymorphisms are determined by the effective size of the population (Ne) and the hypothetical neutral mutation rate. The nucleotide diversity (Nei, 1987), and the number of segregating sites (Watterson, 1975), are the commonly used measures for DNA polymorphism in such a model. Since both measures are simply related to a single population parameter (4Negμ, where g is the generation time and μ is the neutral mutation rate per site per year), it is possible to estimate Ne from observed values of the nucleotide diversity and/or the number of segregating sites. It is important to point out that this model assumes both non-random mating and constant mutation rates, two features that do not occur in human populations. Furthermore, neither the nucleotide diversity nor the number of segregating sites is observable for an ancestral species, which is essential if one is attempting to research historic population movements. They, and any other quantities, must be inferred indirectly, and precisely for this purpose, two methods have been proposed.

The first method, called the trichotomy method, uses gene genealogies among three species that diverged from each other in close succession. Obviously, this method is not applicable for the study of prehistoric human population movements, and will not be discussed.

The second method, called maximum likelihood, uses pairs of orthologous sequences sampled from two species only (Takahata & Satta, 1997; Takahata, Satta, & Klein, 1995). This is the method preferred in studying human populations, and in studying biological affiliation between two populations the same principles have been assumed valid, i.e., orthologous sequences sampled from two different populations (as opposed to two different species). Such orthologous sequences, however, must have diverged prior to the populations splitting, thus it is important to correlate the two populations under comparison with corroborating evidence that will support that the two populations were at one time genetically related. This corroborating evidence can either be archaeological or linguistic in nature. While segregation occurred in the ancestral population, it is hypothesized that two orthologous sequences developed and accumulated nucleotide substitutions to form an ancestral polymorphism. When the subsequent populations split, it is hypothesized that this also allowed them to further differentiate by population- specific nucleotide substitutions. It is assumed, therefore, that the number of nucleotide substitutions per sequence prior to and posterior to population splitting follows geometric and Poisson distributions (the number of events occurring within a given time interval), respectively. The principle of this method is to separate these two types of substitutions when a number of independent pairs of orthologous sequences are available. One important assumption is that the neutral mutation or substitution rate is kept constant over the nucleotide sites under study. Hence, only synonymous sites, or introns in coding regions, and intergenic regions are preferably used. However, as already mentioned,

researchers are not sure whether nucleotide sites undergo a constant mutation or substitution rate.

Researchers, because of this lack of assurance, have concluded that if intragenic recombination is frequent it will lead to erroneous estimates of ancestral polymorphism. Two incompatible requirements, therefore, arise. To infer accurate gene genealogy, researchers must look at long stretches of DNA in which a sufficiently large number of nucleotide differences can be observed. On the other hand, such long stretches are likely to undergo intragenic recombination, resulting in faulty genealogies.

Likewise, it is essential that the researcher, in using genetic frequencies and coalescent times, does not assume that these are the same as the times of origin for the population under study or when one population split from another (i.e., biological affiliation). Although tracing the genealogy of mtDNA or Y chromosome allele frequencies theoretically can lead to a single common ancestor, this is not evidence that the populations under study went through a period when only one ancestral breeding population was alive and reproducing. Tracing the coalescent times leads to one ancestor of a unilineally transmitted set of markers (either through the maternal or paternal line), but the descendents of the original DNA will have had haplotype frequencies that differed among that of the entire population, resulting in a biased sample of the total historic population’s frequencies when using coalescent times. This is because working back in time does not allow one to take into account the various branches of diversity that the historic population had, but only the lineal history of the specific marker being coalesced. Three primary assumptions arising from the use of coalescent times (Hoelzer, Wallman, & Melnick, 1998; Hudson, 1990; Templeton, 1993; 1998; 2002) that have been employed specifically in understanding American Indian historic population movements are:

  1. gene coalescence is a regular process of mutation accumulation in neutral systems, and therefore can be timed like a regularly ticking clock with an acceptable range of error;
  2. American Indian populations were isolated from each other after they originated or migrated to the Americas; and
  3. the history of particular gene systems is the history of the specific populations in which they are found.

However, as already mentioned, human populations are not neutral systems, and it is not clear if it is safe to assume that mutations occur in a regular, timely fashion. Furthermore, as much of the American Indian ethnographic, linguistic, and archaeological data demonstrates, American Indian populations were never isolated, either from each other or possibly from ancestral populations in Asia (for discussions on the latter aspect, see Akazawa, 1999; Anderson & Gillam, 2001; Bever, 2001; Ikawa- Smith, 1982; Tarazona-Santos & Santos, 2002).

One important requirement in coalescence theory is the use of random samples of genes from the population under study. However, this is extremely difficult to accomplish, not to mention when studying historical relationships between ancient populations and their

possible descendents. As Donnelly and Tavare (1995) point out,

In practice, genetic data are typically obtained from convenience samples rather than proper random samples. There is an obvious danger that such data may contain individuals who share relatively too much ancestry on the relevant timescales. The extent to which application of coalescent (or traditional) methods to such convenience samples may be misleading remains an open, and potentially serious, question. (p. 418)

Furthermore, most studies that research American Indian historic population movements rely on the idea that American Indians came to the Americas from Asia in small groups (usually thought to have occurred as part of one to three migration waves) across the Bering Land Bridge in prehistoric times. If this is the case, coalescence times will be shorter than actual population divergence times because smaller populations in the past are more likely to share ancestors (Donnelly & Tavare, 1995, p. 410), leading to an accelerated time of origin for American Indians, and thus not correctly representing occupational time depth or biological affiliation.

Similarly, departures from random mating due to inbreeding, assortative mating, or population stratification can lead to non-random association between genotypes and further complicate the interpretation of the data and coalescent times. As Karafet et al. (1997) concluded, because of the presence of the 1T haplotype (a Y chromosome combination haplotype) in both northeastern Siberia and the Americas, the possibility of historic and prehistoric back-migration is extremely likely. Similar studies have also noted the possibility of gene transfer or the “hitch-hiking theory” among American Indian and Asian populations (Bianchi, Bailliet, Bravi, Carnese, Rothhammer, Martinez- Marignac, & Pena, 1997; Bradman & Thomas, 1998; Hudson, 1990). Because population-coalescence times are frequently a result of the fusion of several of the ancient phylogenetic clusters and not necessarily the age of individual populations (Watson, Forster, Richards, & Bandelt, 1997), faulty results may be reported. Therefore, using gene coalescent times as possible times of origin for American Indians can lead to spurious conclusions, for there is no evidence that American Indians were ever: 1) part of a neutral system that can be timed like a regularly clicking clock, 2) were isolated from each other or from Asian populations, and 3) that the current gene systems found in a particular population fully represent the historical diversity of that population.

The mtDNA and Y chromosome sections of the human genome have proven to be the most useful for studying historical population movements because of their ease in replication and amplification, as well as the fact that they are non-recombining. However, the larger theoretical assumptions underlying how molecular anthropologists reconstruct particular population allele frequencies is still nascent. In 1985, Jeffreys et al. (1985) introduced individual-specific “fingerprints” for multiple loci, which were later applied to single-locus variable numbers of tandem repeat (VNTR) polymorphisms and short tandem repeats (STRs). In parallel with these molecular advances, which made DNA typing more sensitive and reliable, the mathematical theory became more precise. In rapid succession three main obstacles were overcome: population structure, kinship,

and database trawls. All depend on probability theory, the first two deriving from Malecot’s work (1955; 1967; 1973; 1975), which demonstrated that weight of evidence is measured by the likelihood ratio (LR):

where E denotes mtDNA or Y chromosome evidence, C is the genetic mtDNA or Y chromosome frequency within that population (already discussed above), and S is a sample from the skeleton, bone sample, or individual in question. The null hypothesis H 0 is that S ≠ C (different blood lines; no biological affiliation) and the alternative hypothesis is that S = C (the same blood line; biological affiliation). These probabilities are functions of gene frequencies, at least one parameter of population structure α, and perhaps N, and the number of individuals in a database trawl. Certain conditions for mtDNA or Y chromosome identification are outside the probability theory, which also must be taken into account in genetic affiliation studies. For example, the “chain of custody” must be preserved from the location where the sample was taken through testing, with adequate guarantees against tampering, misidentification, and contamination. Furthermore, there must also be guarantees that testing was performed without error, and probabilities under a given hypothesis are evaluated correctly. Likewise, gene frequencies must be estimated from appropriate random samples of the relevant population and that other parameters are appropriate. Bayesian methods that specify prior probabilities for the various assumptions and hypothesis are too subjective to be considered in presenting the evidence.

A complicating factor in this line of research is that genotypes are not unique: monozygotic cotwins have the same genotype, and any other individual (whether related or not) has a finite probability of the same genotype at a finite number of loci. This means that DNA evidence must be presented in terms of matching probabilities.

It is for the above reasons that adequate sample sizes of the populations under study be used. Variations in population size are commonly attributed to bottlenecks and the so- called founder principle, in which a population encounters a severe reduction in size or a few individuals colonize a new area, resulting in a small selection of gene frequencies as compared with the original population. However, an important complication that makes it impossible to determine census size of a prehistoric human group as a direct estimate of the effective population size is that human populations have overlapping generations. Rogers and Jorde (1995) have shown that the only sense in which sequence diversity can be employed as a measure of chronological age is as an estimation of the time during which a particular population has expanded after experiencing a severe bottleneck. This is because we are dealing with allele frequencies (haplotypes), and not with distinct populations. In fact, the error variance increases with time and the earliest observations

are the most precise. For example, computer simulations that suggest that the four major haplogroups found among American Indians underwent a bottleneck followed by a large population expansion may be questioned. These simulations are based primarily on the analysis of CR sequences from haplogroup A and do not take into account haplogroups B, C, D, and X. Similarly, although most studies investigating human population movements have used sequence diversity as a measure of age, few have investigated whether their samples met the very stringent assumptions required by this practice (Bonatto & Salzano, 1997a, p. 1417). Furthermore, Bonatto and Salzano (1997) have also noted that studies using RFLPs have found that haplogroup B has a much lower diversity than the other four (A, C, D, X), which would lead to inaccurate computer simulations. Therefore, for example, the current dates from mtDNA and Y chromosome studies contending that American Indians arrived in the “New World” around 35, years ago can be questioned (Bonatto & Salzano, 1997a; Bonatto & Salzano, 1997b; Brown, Hosseini, Torroni, Bandelt, Allen, Schurr, Scozzari, Cruciani, & Wallace, 1998). This number is actually the time during which American Indians theoretically experienced an expansion after a bottleneck. However, it is unknown if this bottleneck took place in Asia, the generally accepted origin of American Indians, or in the Americas after their arrival, nor is it known what effects subsequent migrations and bottlenecks from disease and other factors have on this time estimation.

Adequate sample sizes are also critical if the genetic frequencies used to characterize a population are to be considered reliable. Typically in studies addressing American Indian historic population movements, sample sizes range between four and 30 individuals per tribal population; this is insufficient to detect little more than the most common haplotypes in each population. Although it is necessary to have genetic samples from 50 males or 50 females of an individual population to accurately infer genetic demographic history, very few studies have done this. The largest study to date on American Indians dealt with 2,198 males from 60 global populations, including 20 American Indian groups (Karafet et al. 1999; this study relied on large amounts of data gathered from previously published reports, and thus could not correct for those sample sizes). However, only the Inuit Eskimo and Navajo samples were over 50 at 62 males and 56 males respectively. All others ranged from as high as 44 to as low as two individuals. It is unrealistic to assume that one can get an accurate picture of a tribe’s genetic frequencies using only two males. In fact, Weiss (1994, p. 834) suggests that we may not be able to distinguish loss of lineages after one migration or separate migrations from a common source population, thus further stressing the critical need for adequate population sample sizes. As Ward et al. (1993) have noted, a sample size of 25 will detect ~63 percent of the lineages in a tribe with normal diversity. In tribes with extensive diversity a sample size of 25 individuals will only detect ~40 percent of the lineages and sample sizes of 70 or above are required to detect two-thirds of the lineages. The fact that the majority of studies lack the required sample sizes necessary to detect even 63 percent of the lineages in a normally diverse tribe brings into question many of the results of these studies, especially when it has been noted that most American Indian tribes are believed to have a high level of diversity (Ward, Alan Redd, Valencia, Frazier, & Paabo, 1993).

As has been discussed, prehistoric migrations are difficult to reconstruct from mtDNA and Y chromosome data. The most meaningful measure of migration from a genetic point of view is obtained by taking the generation as the time unit. Measuring the distribution between birthplaces of parent and offspring theoretically can yield a statistical measure of migration. However, this method works only for a continuous model in which the population is constant, and is not entirely satisfactory when the population is highly clustered as is believed most prehistoric populations were (Cavalli- Sforza & Bodmer, 1971, p. 433). A similar limitation in using such data to infer migrations is that exchange between non-neighboring clusters may have been frequent enough among prehistoric populations to violate the rules of the simplest stepping-stone models (Cavalli-Sforza and Bodmer 1971).

A final aspect of human DNA confounding many of the current uses of this data to reconstruct human population movements is that human mtDNA variation is high. Likewise, genetic variation within populations is much greater than between populations (Walpoff, 1999, p. 551). What this means is that mtDNA evolution, and possibly the evolution of other genetic systems, is not the same as the evolution of particular populations. As Scozzari et al. (1999) have noted, groups or tribes thought to have descended from a common ancestor more than 10,000 years ago may have lost even their shared-by-descent portion of their gene pool and can no longer be detected as affiliated through genetic analysis. Likewise, population specific mutations and the gene trees inferred from these sequences are generally inconsistent with historic and prehistoric population affiliation. Page and Charleston (1990) have identified a method for visualizing and quantifying the relationship between a pair of gene and species trees that constructs a third, reconciled tree. Reconciled trees use a more critically optimal method for mapping the combined history of genes and populations. However, even this more accurate method of depicting gene and population trees has limitations such as allele phylogenies and horizontal transfer, neither of which has been addressed in studies concerning American Indian historic population movements. In fact, many of the polymorphisms observed for mtDNA probably predate population separations (Mountain & Cavalli-Sforza, 1997) and would not be useful in constructing genetic, population, or reconciled trees. Mitochondrial DNA or Y chromosome lineages are not human populations. In order to estimate the significance of variation of gene frequencies between groups, it is necessary to estimate how large a sample must be in order to be representative of the group. This can only be accomplished if an accurate estimate of the real variation to be expected in the gene frequencies is possible. This estimation is valid only for genes without dominance, in which case genes can be counted. However, if people in the sample from a given tribal village or town are closely related, a single source of variation may greatly inflate the estimate of variance between populations (Cavalli-Sforza and Bodmer 1971, p. 422). Multivariate analysis, or the use of more than one trait or gene, which is presently the most commonly employed method of analysis, poses more difficult problems in that one must determine the maximum number of genes possible for each population in order to be accurate. Unfortunately, many authors have tested only a small set of markers on one gene (univariate) for their studies (Cavalli- Sforza, Menozzi, & Piazza, 1994, p. 22), combining their data with those of others to arrive at several sets of markers for their multivariate analysis.

As has been discussed, the field of molecular anthropology uses a wide array of terms and concepts derived from molecular genetics. This use of unfamiliar terms can result in a misunderstanding of the data, because the underlying assumptions that the data represent are not clearly described. Furthermore, this section has also attempted to elucidate some of the fundamental principles involved in the use of mtDNA and Y chromosome data and the investigation of American Indian historic population movements. With these points in mind, it is now possible to examine the various studies that have been conducted and what they have found.

E. AMERICAN INDIAN MTDNA AND Y CHROMOSOME STUDIES

As previously mentioned, the majority of studies involving American Indian mtDNA and Y chromosome data have been concerned with the initial peopling of the Americas. There have been only a handful of studies that have attempted to study human population movements within the Americas, which will be discussed below under the section covering ancient DNA (aDNA). First, however, studies using classical genetic markers and American Indians will be briefly reviewed to establish a baseline upon which mtDNA and Y chromosome data can be contextualized.

Studies have shown that American Indians resemble Siberian and other Asian populations in the kinds of frequencies of various genetic markers of the blood they carry. For a more complete description of classical genetic markers and their use in tracing populational affinities and origins, see Crawford (1973). Szathmary (1993) has summarized the genetic diversity of North American Indian populations based upon classical frequency distributions. American Indian and northeastern Siberian populations have similar frequencies of many blood types, forms of serum proteins, and red-cell enzymes. More recent research has confirmed that mtDNA and Y chromosome haplotype and haplogroup distributions are also similar. When compared to other geographical populations of the world, on the basis of multivariate statistical analyses of gene frequencies, the Siberian or Asian populations tend to cluster together with those of the American Indians.

In 1988, Cavalli-Sforza et al. used an average linkage analysis of Nei’s genetic distances to construct a genetic tree based upon 120 alleles from 42 world populations. A bootstrap method (a resampling technique for obtaining standard errors) was utilized to test the reproducibility of the sequence of the splits in the phylogenetic tree (dendrogram). This tree showed two main branches, the African and non-African. The North Eurasian branch divided into Europeans (Caucasians) and Northeast Asians, including the American Indians. Thus, this multivariate approach to population affinities revealed a close genetic relationship between American Indians and Asian groups.

However, not all classical genetic markers occur across the world in differing frequencies. Instead, some occur only in New World and Asian populations. These include the following: the Diego allele, DIA; gamma globulin allotypes, GMA T; Factor 13B3; transferring, TFC4; and complement, C6B2 alleles. Szathmary (1993) added SGOT2 (glutamic oxaloacetic transaminase), TFD, GCTK1 (GC 1A9), and

GCN (GC 1A3), to a list of markers that indicate an Asian connection. Although the Diego DIA gene is not always present in all American Indian groups, when it is detected DIA occurs only in American Indians or Asians. The frequency of the immunoglobulin haplotype GMA T in Asian populations reaches 50 percent in central Mongolia but is at a lower frequency in North American Indian groups. Similarly, GMA G is found at frequencies varying between 86 percent in the Chukchi of Siberia (Schanfield, Crawford, Dossetor, & Gershowitz, 1990) to 56 percent among the Ainu of Japan (Matsumoto & Miyazaki, 1972). In North American Indian populations, this GM marker varies from 98 percent among the Northern Cree to 47 percent in a mixed Alaskan group (Schanfield, Crawford, Dossetor, & Gershowitz, 1990). Less is known about the geographic distribution of the complement B2 allele and Factor 13B3; however, preliminary analyses suggests that these alleles occur at higher frequencies in both the American Indian and Asian groups.

In many of the other genetic systems, e.g. the human leukocyte antigen (HLA) system, the various blood groups, and even the mitochondrial DNA (mtDNA) Asian haplotypes, most of the forms occur in some other populations of the world, but at different frequencies. American Indians share the four major haplogroups (A-D) with Asian populations (Torroni & Wallace, 1995). In addition, Siberian and American Indian populations share two identical mitochondrial DNA haplotypes, namely S26 (AM43) and S13 (AM88). The S and AM designations represent the same haplotypes, defined by the presence or absence of the specific restriction sites, in Siberian and American Indian populations. From these two haplotypes, Torroni et al. (1993) attempted to reconstruct the time of divergence of the Asian and American Indian mtDNA variation. These differences in the frequencies of some of the genetic markers led these researchers to conclude that American Indian populations are the result of small founding groups, unique historical events, and possibly the action of natural selection over a span of 15, to possibly 40,000 years.

William Boyd (1952) was one of the first to use classical genetic data to compare American Indians to other world populations. He believed that American Indians as a whole were distinct from other major continental populations in their blood-group frequencies. He proposed a single American Indian serological grouping, one of seven such major groupings. In the volume Biomedical Challenges Presented by the American Indian (1968) Miguel Layrisse summarized the distinctive patterns of frequencies in American Indian populations (see Table 1).

Table 1

High-incidence markers Low-incidence or absent markers

ABOO ABOA

MNM ABOB

RHR1 RHRO

FYA LUA

DIA KK

ABH*SE LEA+

Abnormal hemoglobin

Source: After Layrisse (1968)

During the past two decades, innovations in biochemical genetics and serology have produced a plethora of genetic markers that can be utilized to evaluate populational affinities and movements. Since Layrisse’s compilation of 1968, a number of new genetic markers have been identified through electrophoresis, isoelectric focusing and immunologic techniques. Two of the most informative sets of genetic markers encode the immunoglobulins (GMs and KMs) and the human leukocyte antigens (HLA system). The newer markers that distinguish American Indian populations are listed in Table 2.

Table 2

High-incidence Low-incidence or absences

HLAA2, HLAA9, HLAW28, HLAA10 HLAA1, HLAA3, HLA*A

HLABW15, HLABW16, HLABW40 HLAB29, HLA*B

GMA G, GMA T GMF B, GMA,F B

GC1S, GCIGLOOLIK BF*F

GC*CHIP

ALBMEX, ALBNASK

TFDCHI, TFBO-

CHE1S, CHE2+

According to Lampl and Blumberg (1979), the HLAA2 allele has the highest incidence among American Indians. In addition, HLAA9, HLAW28, and HLAW31 are common alleles. Bodmer and Bodmer (1973) note the absence of HLAA1, HLAA3, HLAA10, HLAA11, HLAW29 from American Indian populations. Furthermore, North and South American Indian populations can be distinguished on the bases of HLAAW31 and HLAW15, which occur at high frequencies among South American Indian groups, whereas HLAW28, HLAA9, and HLAW5 are North and Central American markers.

Other genetic systems that can be used to establish population affinities between Siberian and New World groups include: group-specific component (GC), serum pseudocholinesterase (CHE1 and CHE2), properdin factor B (BF), transferring (TF), and albumin (ALB). However, these classical markers are all rather crude when attempting to investigate historic population movements. DNA RFLPs and sequences of mitochondrial and Y chromosome DNA, on the other hand, are providing finer discrimination among populations and individuals. Furthermore, with recent developments in the use of restriction fragment length polymorphisms (RFLP), amplification through polymerase chain reaction (PCR) and nucleotide sequencing, it is now possible to explore the variation of the DNA molecule (Cann, 1985).

1. M ITOCHONDRIAL DNA (mtDNA)

As already described, mitochondrial DNA (mtDNA) is a small, circular molecule of 16,569 nucleotide pairs (np) located in mitochondria of the cytoplasm (Anderson et al., 1981). This molecule evolves by the accumulation of mutations in the maternal lineages (Brown, Prager, Wang, & Wilson, 1979), and is believed to fix new mutations more than ten times faster than nuclear DNA (Wallace et al., 1987). This rapid evolutionary change allows molecular anthropologists to use it to study the evolutionary divergence of human populations.

Even though Alan Wallace and Rebecca Cann first applied mtDNA to the study of human phylogeny, Douglas Wallace and his research group was the first to use mtDNA to examine questions concerning the peopling of the New World. Wallace et al. (1985) utilized a series of restriction endonucleases, Hpa I, Bam HI, Hae II, Msp I, Ava II, and Hinc II to identify haplotypes of mtDNA. Their initial restriction analysis of the mtDNA of the Pima and Papago Indians revealed a distinctive marker HincII morph-6, in 42 percent of the people tested. This haplotype is observed at low frequency in the mtDNA of Asians and its presence at high frequencies among American Indians can be explained by the founder effect (Schurr et al., 1990). Among molecular anthropologists there is some controversy surrounding the number of maternal lineages that are necessary to account for the observed variation in mtDNA haplotypes of American Indian populations. Initially, Schurr et al. (1990) argued for the presence of four different lineages, as follows: 1) HincII morph-6 (site loss at nt 13,259 and an AluI site gain at nt 13,262) marker (haplotype AM10) found in virtually every mtDNA from Pima, Maya, and Ticuna Indians, later termed haplogroup C. 2) Asian-specific COII-tRNALys^ intergenic deletion (haplotype AM2), which is found in American Indians who lack the HincII morph. This is defined by a 9-bp deletion and is referred to as haplogroup B. 3) HaeIII site at nt 663 polymorphism (haplotype AM6) first observed by Cann (1982) in Chicanos and Chinese, but was later noted in the Pima, Maya, and Ticuna. This lineage is now referred to as haplogroup A. 4) Haplogroup D is defined by an AluI site loss at nt 5176. Schurr et al. (1990) placed this haplogroup in the middle of the American Indian phylogenetic tree. Because most of these haplotypes can be derived from each other with the sequential accumulation of mutations, Wallace et al. (1985) initially claimed the origin of all American Indians from a single lineage. Pääbo et al. (1988), after an analysis of the mtDNA from a 7000 year old American Indian brain from Florida, proposed the

existence of another founding lineage. However, Schurr et al. (1990) explained these results in terms of the loss of this founding haplotype from the three American Indian tribes they had studied.

A more extensive analysis by PCR amplification with 14 endonucleases of 321 American Indians from 17 populations confirmed the presence of the four different lineages or haplogroups A, B, C, and D that account for 96.9 percent of American Indian mtDNA variation and are of Asian ancestry (Torroni, Schurr et al., 1993; Torroni & Wallace, 1995). These findings supported the hypothesis that the four American Indian mtDNA haplogroups resulted from four separate demic expansions. Three of the four haplogroups (A, C, and D) observed in the Americas are present in indigenous Siberian populations (Torroni, Sukernik, Schurr, Starikorskaya, Cabell, Crawford, Comuzzie, & Wallace, 1993). Presently, none of the Siberian populations have exhibited haplogroup B. The presence of haplogroup B in Asia and the Americas, and its absence from Siberia, is suggestive of its separate expansion into the Americas, possibly prior to the peopling of Siberia (circa 20,000 years ago). When dates of divergence were calculated using the mtDNAs of American Indians and Siberians, they fell between 17,000 and 34,000 ybp (Torroni, Sukernik, Schurr, Starikorskaya, Cabell, Crawford, Comuzzie, & Wallace, 1993).

Torroni and Wallace (1995) reported that out of 743 American Indians tested to date, 25 individuals, scattered among eight tribal groups of North, Middle, and South America, displayed some mtDNA variants that differed from the four common haplogroups (A, B, C, D). They suggested that these variants may be the result of: 1) a second mutational event; 2) possible admixture with Europeans or Africans; or 3) additional Asian haplogroups brought into the New World by Siberians. Thus, Torroni and Wallace cautioned researchers against classifying mtDNAs from Old World populations by using only the primary variants found in American Indian mtDNA. They went on to point out that the 9-bp deletion had occurred independently in different regions of the world. This conclusion was supported by Soodyall et al. (1996), who discovered the presence of the so-called Asian-specific deletion in sub-Saharan Africa. From these data it appears that this 9-bp deletion arose independently at least twice, once in Asia and once in Africa. Furthermore, Bailliet et al. (1994) have proposed the existence of as many as ten possible mtDNA founder haplotypes in American Indian populations. However, others believe that some of these haplotypes may be due to mutations in American Indian populations and/or admixture with Europeans (Bianchi & Rothhammer, 1995).

Ward and his colleagues (1991) sequenced a 360-nucleotide segment of the mtDNA control region from 63 individuals of the Nuu-Chah-Nulth (or Nootka) from Vancouver Island. They identified 28 mtDNA lineages as defined by 26 variable positions within the control region. Ward et al. (1991) computed the average sequence divergence between the lineage clusters using a maximum rate of evolution of 33 percent divergence per million years for the control region. They obtained a range of 41,000-78,000 years, with an average of 60,000 years. These data suggest that the mitochondrial lineages within a single American Indian tribe diverged approximately 60,000 years ago. They interpreted these data as evidence that the lineages were established prior to the American

Indian entry into the Americas, and they concluded that the founding populations of American Indians contained considerable genetic diversity.

Merriwether et al. (1995) extensively investigated the geographic distribution of the four founding mtDNA lineage haplogroups in American Indian populations. They observed a north-south increase in the frequency of haplogroup B, accompanied by a north-south decrease in the frequency of haplogroup A. Based upon the extensive distribution of the four lineages in the Americas, Merriwether and his colleagues concluded that the pattern is consistent with a single migratory wave from Siberia into the Americas, followed by genetic divergence. However, these data can also be interpreted to represent a number of migrations from Siberia reintroducing the same haplogroups.

After this study, Merriwether et al. (1996) went on to compare mtDNA RFLPs from Mongolians of Ulan Bator with an array of frequencies of the founding lineage haplogroups in American, Asian, and Siberian populations, revealing considerable similarity between the Mongolian and American populations (Merriwether, Hell, Vahlne, & Ferrell, 1996). In this study the haplogroups were further subtyped into A1, A2, B1, B2, C1, C2, D1, D2, X6, X7, and “others.” Unlike the Northeastern Siberian populations, this Mongolian sample exhibited all four of the American primary haplogroups and shared the highest number of haplotypes with American Indian populations. As mentioned, haplogroup B has not been detected in any of the Siberian populations in closest proximity to the Bering Strait. However, this haplogroup occurs at a frequency of 75 percent among the Atacameno and 50 percent among the Pima, but is absent in a number of other American Indian groups (e.g., Makiritare, Dogrib, and Haida).

The vast majority of mtDNAs from modern American Indian populations belong to primarily five different haplogroups, which have been designated A–D and X (Brown, Hosseini, Torroni, Bandelt, Allen, Schurr, Scozzari, Cruciani, & Wallace, 1998; Forster, Harding, Torroni, & Bandelt, 1996; Schurr, Ballinger, Gan, Hodge, Weiss, & Wallace, 1990; Torroni, 1994; Torroni, Chen et al., 1993; Torroni et al., 1994; Torroni, Schurr, Cabell, Brown, Neel, Larsen, Smith, Vullo, & Wallace, 1993; Torroni et al., 1992; Torroni, Sukernik, Schurr, Starikorskaya, Cabell, Crawford, Comuzzie, & Wallace, 1993). Each of these is distinguished by a unique combination of coding region RFLPs and HVR-I sequence polymorphisms. Together, they comprise 95–100 percent of all mtDNAs in indigenous populations of the Americas (Schurr & Sherry, 2004; Schurr & Wallace, 2002). The same pattern of variation is also observed in ancient Amerindian samples (Carlyle, Parr, Hayes, & O'Rourke, 2000; Fox, 1996; Kaestle, 1995; 1997; 1998; Kaestle & Smith, 2001a; Lalueza, PerezPerez, Prats, Cornudella, & Turbon, 1997; Merriwether, Rothhammer, & Ferrell, 1994; Monsalve, Edin, & Devine, 1998; O'Rourke, Hayes, & Carlyle, 2000b; Parr, Carlyle, & ORourke, 1996; Ribeiro-dos-Santos, Guerreiro, Santos, & Zago, 2001; Ribeiro-dos-Santos, Santos, Machado, Guapindaia, & Zago, 1997; Stone & Stoneking, 1998). Therefore, these five haplogroups are clearly the main founding mtDNA lineages in American Indian populations. However, a certain number of haplotypes not belonging to these five maternal lineages have been detected in different American Indian groups (Bailliet, Rothhammer, Carnese, Bravi, & Bianchi,

1994; Easton, Merriwether, Crews, & Ferrell, 1996; Lorenz & Smith, 1996; Lorenz & Smith, 1997; Merriwether, Rothhammer, & Ferrell, 1995; Merriwether, Rothhammer, & Ferrell, 1994; Ribeiro-dos-Santos, Guerreiro, Santos, & Zago, 2001; Rickards, Martinez- Labarga, Lum, De Stefano, & Cann, 1999). Most of these have been shown to belong to either haplogroup X, derive from haplogroups A–D mtDNAs, or result from non-native admixture (Schurr & Wallace, 1999; Schurr & Sherry, 2004; Schurr & Wallace, 2002; Smith, Malhi, Eshleman, Lorenz, & Kaestle, 1999). The remaining haplotypes have not been sufficiently analyzed to determine their haplogroup status (Bailliet, Rothhammer, Carnese, Bravi, & Bianchi, 1994). Haplogroup A–D mtDNAs are observed in indigenous populations from North, Central, and South America. Among Amerindians, haplogroup A generally occurs at higher frequencies in North America relative to other regions, whereas haplogroups C and D generally occur at higher frequencies in South America. There does not appear to be a distinct clinal distribution for haplogroup B, but it is virtually absent from northern North America (Fox, 1996; Malhi, Schultz, & Smith, 2001; Malhi & Smith, 2002; Schurr, Ballinger, Gan, Hodge, Weiss, & Wallace, 1990; Torroni, 1994; Torroni, Chen, Scott, Semino, Santachiarabenerecetti, Lott, & Wallace, 1993; Torroni, Chen, Semino, Santachiara-Beneceretti, Scott, Lott, Winter, & Wallace, 1994; Torroni, Schurr, Cabell, Brown, Neel, Larsen, Smith, Vullo, & Wallace, 1993; Torroni, Schurr, Yang, Szathmary, Williams, Schanfield, Troup, Knowler, Lawrence, Weiss, & et al., 1992). In contrast, haplogroup X is found nearly exclusively in North America (Brown, Hosseini, Torroni, Bandelt, Allen, Schurr, Scozzari, Cruciani, & Wallace, 1998; Malhi, Schultz, & Smith, 2001; Malhi & Smith, 2002; Torroni, Schurr, Cabell, Brown, Neel, Larsen, Smith, Vullo, & Wallace, 1993; Torroni, Schurr, Yang, Szathmary, Williams, Schanfield, Troup, Knowler, Lawrence, Weiss, & et al., 1992). These distributions likely reflect both the original pattern of settlement of the Americas and the subsequent genetic differentiation of American Indian populations within these continental areas. Haplogroup A–D mtDNAs have also been detected in populations representing the three American Indian linguistic groups (Amerind, Na-Dene, Eskimo- Aleut) proposed by Greenberg (1987). Although haplogroups A–D usually appear together in Amerindian populations, many tribes lack haplotypes from at least one of these haplogroups (Batista, Kolman, & Bermingham, 1995; Lorenz & Smith, 1996; Lorenz & Smith, 1997; Scozzari et al., 1997; Torroni, 1994; Torroni, Chen, Scott, Semino, Santachiarabenerecetti, Lott, & Wallace, 1993; Torroni, Chen, Semino, Santachiara-Beneceretti, Scott, Lott, Winter, & Wallace, 1994; Torroni, Schurr, Cabell, Brown, Neel, Larsen, Smith, Vullo, & Wallace, 1993). This pattern likely reflects the extent to which genetic drift and founder events have influenced the distribution of mtDNA haplotypes in American populations. However, ancestral populations for the Na- Dene Indians and Eskimo-Aleuts may not have possessed all four of these haplogroups. These populations show different haplogroup profiles than Amerindians, which consist largely of haplogroup A and D mtDNAs. In addition, they essentially lack haplogroup B and have very low frequencies of haplogroup C. Moreover, none of them have haplogroup X (Rubicz, Schurr, Babb, & Crawford, 2003; Saillard, Forster, Lynnerup, Bandelt, & Norby, 2000; Starikovskaya, Sukernik, Schurr, Kogelnik, & Wallace, 1998; Ward, Alan Redd, Valencia, Frazier, & Paabo, 1993). Thus, circumarctic groups appear to have experienced different population histories than Amerindians.

2. Y CHROMOSOME

The non-recombining (Y-specific) portion of the human Y chromosome has been of great interest to molecular anthropology in reconstructing human phylogeny. Much like the mtDNA, but a male mirror image, the Y-specific portion also evolves through the accumulation of mutations. Markers on the Y-specific region provide some indication of male migration and admixture. The initial research using this data was somewhat disappointing because of the paucity of variation in the Y chromosome. More recently, Y-specific polymorphisms have successfully been used to construct informative haplotypes that are specific to geographic regions and to possible historic population movements. In addition, Y-chromosome-specific deletions and transitions have been discovered that apparently have arisen once in human evolution and serve as markers for phylogenetic reconstruction (Karafet, Zegura, Vuturo-Brady, Posukh, Osipova, Wiebe, Romero, Long, Harihara, Jin, Dashnyam, Gerelsaikhan, Keiichi, & Hammer, 1997).

Underhill et al. (1996) reported a C → T point mutation at the DYS19 microsatellite locus. To date this mutation has been found only in Inuits and Navajos of North America and other populations of South and Central America. This mutation is believed to have occurred in Siberia and brought to the Americas by the first Asian migrants. Given an average mutation rate of 1.5 X 10-4^ and an average generation time of 27 years, Underhill and colleagues (1996) computed the age of this transition at 30,000 years ago. However, they acknowledged that mutation rates of smaller magnitude provide dates that are more recent, i.e., only 2147 years ago. This polymorphism, in the future, may shed some light on the colonization of the Americas, particularly if it is present in some Siberian populations and not in others (Santos et al., 1995; Santos, Rodriguez-Delfin, Pena, Moore, & Weiss, 1996).

Lin et al. (1994) investigated the variation in Asian, European, and African-American populations for Y- and X-associated polymorphisms by using the 47z (DXYS5) probe. Although both the X1 and X2 alleles were detected in most of the populations, Y1 and Y2 were polymorphic in only the Japanese, Koreans, and the Hakas and Folo of Taiwan. To date, these X- and Y-associated markers have not been tested in American Indian populations.

In characterizing Y chromosome variation in American Indians, researchers have employed a number of different single nucleotide (SNP) and short tandem repeat (STR) loci to define the paternal lineages present within them (Hammer, 1997; Karafet, de Knijff, & Wood, 1998; Karafet, Zegura, Vuturo-Brady, Posukh, Osipova, Wiebe, Romero, Long, Harihara, Jin, Dashnyam, Gerelsaikhan, Keiichi, & Hammer, 1997; Karafet, Zegura, Posukh, Osipova, Bergen, Long, Goldman, Klitz, Harihara, de Knijff, Wiebe, Griffiths, Templeton, & Hammer, 1999; Pena et al., 1995; Underhill, Jin, Zemans, Oefner, & Cavalli-Sforza, 1996; Underhill et al., 2001; Underhill et al., 2000). However, these research groups have not used the same combination of genetic markers in their studies, leading to alternative and sometimes confusing nomenclatures for Y chromosome haplotypes and haplogroups. A recent synthesis of these data has resulted in a consensus nomenclature based on known single nucleotide polymorphisms (SNPs)

(Consortium, 2002). This system identifies a Y chromosome haplogroup by a letter and the SNP that defines it (e.g., G-M201). American Indian Y chromosome haplotypes derive from subsamples of the haplogroups present in Siberia. These include haplogroups Q-M3, R1a1-M17, P-M45, F-M89, and C-M130. Two of them, Q-M3 and P-M45, represent the majority of American Indian Y chromosomes. Q-M3 haplotypes appear at significant frequencies in most American Indian populations and are distributed in an increasing north-to-south cline within the Americas (Bianchi, Bailliet, Bravi, Carnese, Rothhammer, Martinez-Marignac, & Pena, 1997; Bianchi, Catanesi, Bailliet, Martinez- Marignac, Bravi, Vidal-Rioja, Herrera, & Lopez-Camelo, 1998; Lell, 2000; Lell et al., 1997; Lell et al., 2002; Santos et al., 1999b; Underhill, Jin, Zemans, Oefner, & Cavalli- Sforza, 1996). The STR data from Q-M3 haplotypes also reveal significant differences in haplotype distributions between North/Central and South American populations, suggesting different population histories in the two major continental regions (Bianchi, Bailliet, Bravi, Carnese, Rothhammer, Martinez-Marignac, & Pena, 1997; Bianchi, Catanesi, Bailliet, Martinez-Marignac, Bravi, Vidal-Rioja, Herrera, & Lopez-Camelo, 1998; Lell, 2000; Lell, Brown, Schurr, Sukernik, Starikovskaya, Torroni, Moore, Troup, & Wallace, 1997; Lell, Sukernik, Starikovskaya, Su, Jin, Schurr, Underhill, & Wallace, 2002; Santos, Pandya, Tyler-Smith, Pena, Schanfield, Leonard, Osipova, Crawford, & Mitchell, 1999b; Underhill, Jin, Zemans, Oefner, & Cavalli-Sforza, 1996). Haplogroup P-M45 is also widely distributed among American Indian populations and represents 30 percent of their Y chromosome haplotypes (Bortolini et al., 2002; Bortolini et al., 2003; Lell, Sukernik, Starikovskaya, Su, Jin, Schurr, Underhill, & Wallace, 2002; Ruiz-Linares et al., 1999). In addition, phylogenetic analysis has revealed two distinct sets of P-M45 haplotypes in American Indian populations. The first of these (M45a) is more broadly distributed in populations from North, Central, and South America, whereas the second (M45b) appears in only North and Central American groups (Lell, Sukernik, Starikovskaya, Su, Jin, Schurr, Underhill, & Wallace, 2002). Most of the remaining Y chromosome haplotypes belong to one of several different haplogroups, and comprise only 5 percent of American Indian Y chromosomes. In general, these haplotypes have limited distributions in the Americas. For example, C-M130 haplotypes have only been detected in the Na-Dene-speaking Tanana, Navajo, and Chipewayan, and the Amerindian Cheyenne (Bergen et al., 1999; Bortolini, Salzano, Bau, Layrisse, Petzl-Erler, Tsuneto, Hill, Hurtado, Castro-De-Guerra, Bedoya, & Ruiz-Linares, 2002; Bortolini, Salzano, Thomas, Stuart, Nasanen, Bau, Hutz, Layrisse, Petzl-Erler, Tsuneto, Hill, Hurtado, Castro-de-Guerra, Torres, Groot, Michalski, Nymadawa, Bedoya, Bradman, Labuda, & Ruiz-Linares, 2003; Karafet, Zegura, Posukh, Osipova, Bergen, Long, Goldman, Klitz, Harihara, de Knijff, Wiebe, Griffiths, Templeton, & Hammer, 1999). In addition, R1a1- M17 haplotypes have only been observed in the Guaymi (Ngobe), a Chibchan-speaking tribe from Costa Rica (Lell, Sukernik, Starikovskaya, Su, Jin, Schurr, Underhill, & Wallace, 2002). Neither of these haplogroups has been detected in South American Indian populations.

3. ANCIENT DNA

Although mtDNA and Y chromosome data have proven the most useful for investigating the peopling of the Americas, data generated from contemporary sources proves to not

offer a very detailed picture of historic population movements within the Americas themselves. For this reason, ancient DNA (aDNA) is preferred when attempting to reconstruct historic population movements within a specific geographic area across time. The genetic frequencies of the aDNA are compared to the genetic frequencies of contemporary American Indians to see if the frequencies match, if so, one can reasonably establish biological affiliation; if not, a historic population movement may be inferred.

That DNA in ancient specimens could be extracted and characterized was first demonstrated in nonhuman material in 1984 by Higuchi and colleagues, who identified nucleic acids from a museum specimen of the extinct quagga and showed its phylogenetic affinity to the modern zebra (Higuchi, Bowman, Freiberger, Ryder, & Wilson, 1984). A year later, Paabo (Paabo, 1985a; 1985b) obtained DNA sequence data from a 2400-year-old Egyptian mummy. This result was surprising not only for its demonstration of the remarkable antiquity for which molecular genetic analysis was apparently possible, but also for the large DNA fragment sequenced ( > 3 kb). Both of these early efforts relied on extracting ancient (a)DNA fragments, cloning fragments into a vector, and subsequent sequencing of the cloned fragments. Following the nearly simultaneous development of the polymerase chain reaction [PCR (a molecular technique that uses the complementary nature of DNA bases and an enzyme involved in DNA replication to produce millions of copies of a single, specific DNA target sequence)] (Mullis & Faloona, 1987; Saiki, Gelfand, Stoffel, Scharf, & Higuchi, 1988), a number of researchers began extracting and characterizing aDNA from geographically dispersed human samples.

Most ancient population samples are composed of several individuals separated by varying periods of time in a restricted geographic area, and therefore they do not conform to standard definitions of a population. If the samples come from a geographically and temporally restricted prehistoric horizon, however, and are associated with a uniform material culture, researchers have treated them as representing multiple, related, continuous lineages. It should be recognized at the outset that this is not properly a population in the traditional sense, and assumptions of standard population or genetic analyses are compromised by such sample composition. It also means that reliable temporal provenience is essential for such samples. With the exception of the Fremont samples from the Eastern Great Basin (Parr, Carlyle, & ORourke, 1996), dating of samples for aDNA research has been neither widely nor uniformly practiced.

An additional problem with aDNA research is less than uniform success in obtaining marker typings on all samples. For example, when using discrete marker data, such as those used to infer Amerindian haplogroups, not all primer sets are likely to be successful on every sample. This complicates the computation of haplogroup frequencies and results in haplogroup and marker frequencies that are discordant.

Stone & Stoneking (1993; 1998) obtained DNA from skeletons of the relatively recent Oneota archaeological complex of western Illinois. mtDNA haplogroup diversity in the Oneota samples indicated 31 percent were haplogroup A, 12 percent haplogroup B, 42.6 percent haplogroup C, and 8.3 percent haplogroup D. Six specimens (5.5 percent) were

inconsistent with any of the Amerindian haplogroups. Two of these were subsequently determined to be of exogenous origin, whereas the remainder represented a fifth founding haplogroup. Of the samples, 52 were sequenced for the HVRI region and found to have a high proportion of singleton mtDNA types (73.9 percent). This is higher than typically observed in modern Amerindian populations. It may reflect loss of rare lineages due to drift in small populations (perhaps as a result of population declines at contact), or it may be a characteristic of ancient samples in general, due to sampling of lineages through time (Stone & Stoneking, 1998). Insufficient sequence data on other ancient populations are available to distinguish between these alternatives.

Kaestle (1997, 1998) characterized a series of skeletal samples (~300–6000 ybp) from Pyramid Lake and Stillwater Marsh in the Western Great Basin. These samples were genetically indistinguishable based on mtDNA haplogroup analysis. They also proved to be genetically similar to modern Paiute/Shoshone and California Penutian samples, with low-to-moderate frequencies of haplogroups A and B, low frequency of haplogroup C, and high frequency of haplogroup D.

O’Rourke and colleagues assayed mtDNA variation in the Northern Fremont of Utah (Parr, Carlyle, & ORourke, 1996) and Anasazi of the US southwest (Carlyle, 2000). Of 43 Fremont samples, 40 were directly dated, whereas 8 of 40 Anasazi specimens have been directly dated so far, with both sets of samples dating to approximately 1000–2000 ybp. The latter samples are distributed over a larger geographic area and a slightly longer time frame than are the Fremont materials. Nevertheless, the haplogroup profiles of these two geographically proximal ancient samples are similar.

Both are characterized by low to absent frequencies of haplogroup A, moderate-to-high ( > 50 percent) frequencies of haplogroup B, and low ( < 15 percent) frequencies of haplogroups C and D. Both the Anasazi and Fremont are also characterized by a few samples that do not conform to the traditional four founding haplogroups and are presumed to represent haplogroup X (Smith, Malhi, Eshleman, Lorenz, & Kaestle, 1999), or an as-yet-undetected contaminant.

Modern North Amerindian mtDNA variation is strongly geographically patterned (Lorenz & Smith, 1996), and ancient samples studied to date appear to exhibit the same geographic structure (O'Rourke, Hayes, & Carlyle, 2000b). Thus, the Oneota (Stone & Stoneking, 1993; Stone & Stoneking, 1998) are most similar to modern populations currently inhabiting the central plains and eastern woodlands of North America, as well as an archaeologically recovered Fort Ancient sample from West Virginia (Merriwether et al., 1997; Merriwether, Rothhammer, & Ferrell, 1994). The Western Great Basin samples (Kaestle, 1997; 1998) share greatest similarities to modern populations in Northern California and the northwest Great Basin, whereas the Fremont and Anasazi share mtDNA haplogroup profiles in common with modern southwestern populations. Thus, aDNA analyses confirm that the observed geographic structure of modern North American mtDNA variation has been temporally stable ( > 2000 years) and apparently little affected by the dramatic disruptions attendant to contact (O'Rourke, Hayes, & Carlyle, 2000a). The observed geographic and temporal stability of mtDNA discrete