Download Gene Flow and Natural Selection in Oceanic

Gene Flow and Natural Selection in Oceanic Human Populations Inferred from Genome-Wide SNP Typing Ryosuke Kimura,* Jun Ohashi, Yasuhiro Matsumura,à Minato Nakazawa,§ Tsukasa Inaoka,k Ryutaro Ohtsuka,{ Motoki Osawa,* and Katsushi Tokunaga *Department of Forensic Medicine, Tokai University School of Medicine, Kanagawa, Japan; Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan; àDivision of Health Informatics and Education, National Institute of Health and Nutrition, Tokyo, Japan; §Socio-Environmental Health Sciences, Graduate School of Medicine, Gunma University, Gunma, Japan; kDepartment of Environmental Sociology, Faculty of Agriculture, Saga University, Saga, Japan; and {National Institute for Environmental Studies, Ibaraki, Japan It is suggested that the major prehistoric human colonizations of Oceania occurred twice, namely, about 50,000 and 4,000 years ago. The first settlers are considered as ancestors of indigenous people in New Guinea and Australia. The second settlers are Austronesian-speaking people who dispersed by voyaging in the Pacific Ocean. In this study, we performed genome-wide single-nucleotide polymorphism (SNP) typing on an indigenous Melanesian (Papuan) population, Gidra, and a Polynesian population, Tongans, by using the Affymetrix 500K assay. The SNP data were analyzed together with the data of the HapMap samples provided by Affymetrix. In agreement with previous studies, our phylogenetic analysis indicated that indigenous Melanesians are genetically closer to Asians than to Africans and European Americans. Population structure analyses revealed that the Tongan population is genetically originated from Asians at 70% and indigenous Melanesians at 30%, which thus supports the so-called Slow train model. We also applied the SNP data to genome-wide scans for positive selection by examining haplotypic variation and identified many candidates of locally selected genes. Providing a clue to understand human adaptation to environments, our approach based on evolutionary genetics must contribute to revealing unknown gene functions as well as functional differences between alleles. Conversely, this approach can also shed some light onto the invisible phenotypic differences between populations. Introduction The peopling of Oceania has intrigued anthropologists because it is one of the most mysterious adventures in human history. The first colonization of New Guinea and Australia by modern humans is thought to have occurred by about 50 thousand years ago (KYA) when these lands formed a continent called Sahul (White and O’Connell 1979; Roberts et al. 1990). These first settlers are considered as ancestors of indigenous Melanesians (Papuans) and Australians, who are anthropologically classified into the Australoid. The second major migration to Oceania was made about 4 KYA by Austronesian-speaking people (Bellwood 1989, 1991). They voyaged to Polynesia during 1-3 KYA. As for the origin of Polynesians, essentially 2 opposite models have been proposed. One is called the ‘‘Express train’’ model, which supposes that Austronesian-speaking people rapidly dispersed to Polynesia with negligible admixture with indigenous Melanesians (Diamond 1988). The other is the ‘‘Entangled bank’’ model hypothesizing that Polynesians were derived from Melanesian populations affected by social and trade networks with people from Southeast Asia (Terrell 1988). Between these polar opposites, there are models such as the ‘‘Slow train’’ that suggests significant genetic contributions from both original Austronesian-speaking people and indigenous Melanesians. To elucidate the origin of these populations, genetic evidence is most direct and informative. It has been well known that indigenous Melanesians and Australians, who have the phenotypes similar to Africans in visible traits such as skin color and hair shape, are genetically closer to Asians Key words: adaptive evolution, gene flow, human genome, SNP, Oceania. E-mail: [email protected]. Mol. Biol. Evol. 25(8):1750–1761. 2008 doi:10.1093/molbev/msn128 Advance Access publication June 3, 2008 Ó The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] than to Africans and Europeans (Nei and Roychoudhury 1993; Zhivotovsky et al. 2004). This fact indicates that the first settlers of Oceania have shared a common ancestry with Asians after the divergence from Europeans. The genetic origin of Polynesians is still controversial although a number of studies have focused on this point. The analyses of mitochondrial DNA (mtDNA) have supported an Asian origin of Polynesians without admixture with indigenous Melanesians as expected in the Express train model (Lum et al. 1994, 1998; Melton et al. 1995; Redd et al. 1995), whereas the analyses of the Y chromosome have revealed that indigenous Melanesians predominantly contributed to the genetic components of Polynesians as described in the Entangled bank model (Kayser et al. 2000; Su et al. 2000; Capelli et al. 2001). However, both mtDNA and the Y chromosome are haploids that transmit without recombination. Classical studies analyzing autosomal markers have shown different results depending on the marker used (Hill et al. 1985; Serjeantson 1985; O’Shaughnessy et al. 1990; Cavalli-Sforza et al. 1994; Martinson 1996; Serjeantson and Gao 1996). Therefore, analyses using a large number of autosomal loci are required for further elucidation. The first settlers of Oceania, that is, indigenous Melanesians and Australians, must have been exposed to various selective pressures due to environmental differences during the long migration from Africa and due to the uniqueness of environments in Oceania after the settlement. They were isolated until recent time and developed their own lifestyle, which might have thus resulted in new selective pressures. Especially, the slow growth, short stature, and lightweight characteristics of New Guineans are generally assumed to reflect an adaptation to the low energy and nutrient densities of diets in which tubers and root crops predominate (Norgan 1995). In contrast, a distinctive characteristic of Polynesians is their large body size. From this phenotype, there has been implied the presence of a ‘‘thrifty Gene Flow and Natural Selection in Oceania 1751 genotype’’ that is associated with saved energy expenditure and efficient fat storage (Neel 1962; Bindon and Baker 1997). An alternative explanation for Polynesian’s large body size is based on the ‘‘Bergman’s rule,’’ a principle that correlates body mass with environmental temperature (Houghton 1990; Bindon and Baker 1997). However, the validity of the thrifty genotype and Bergman’s rule hypotheses in Polynesians is still open to debate. Recent advances in DNA technologies have now enabled us to perform genome-wide single-nucleotide polymorphism (SNP) typing. The Affymetrix GeneChip Human 500K arrays used in this study are commercially provided DNA chips that can genotype about 500,000 SNPs for each individual. The preponderant number of typing data would assure us of accurate estimation of the admixture rate between Asian and Melanesian lineages in Polynesians. Moreover, genome-wide SNP data are applicable to genome-wide scans for genetic regions under positive selection. Several researchers have recently developed methods to identify signatures of positive selection from SNP data based on hitchhiking events and selective sweeps and have conducted genome-wide scans using SNP databases from the HapMap project and Perlegen Sciences (Kim and Stephan 2002; Sabeti et al. 2002; Nielsen et al. 2005; Voight et al. 2006; Wang et al. 2006; Kimura et al. 2007; Tang et al. 2007; Williamson et al. 2007). The strategy based on evolutionary genetics has provided cues to reveal genotype–phenotype association (Fujimoto et al. 2008; Kayser, Liu, et al. 2008). The present study investigates the peopling of Oceania with a special focus on the admixture rate between Asian and indigenous Melanesian lineages in Polynesians. We also performed genome-wide scans for positive selection on Oceanic populations. For these purposes, we subjected an indigenous Melanesian (Papuan) population, Gidra, and a Polynesian population, Tongans, for genome-wide SNP typing with the Affymetrix GeneChip Human 500K array set. from Beijing (CHB) and Japanese from Tokyo (JPT) (The International HapMap Consortium 2005). Here, East Asians (EAS) denote JPT and CHB together. We selected only unrelated individuals (parents in trios) from YRI and CEU and also removed a JPT individual that showed high inbreeding. In total, 209 HapMap individuals (60 YRI; 60 CEU; 45 CHB; and 44 JPT) were thus used for our analyses. Genotyping and Data Quality Control SNP genotyping was performed with the Affymetrix GeneChip Human 500K array set. In brief, genomic DNA (250 ng) was digested with a restriction enzyme (NspI or StyI) and ligated to adaptors that recognize the cohesive 4-bp overhangs. These fragments were amplified with polymerase chain reaction using a generic primer that recognizes the adaptor sequence. The amplified DNA was then fragmented, labeled, and hybridized to a microarray chip. The chip was scanned with Affimetrix GeneChip Scanner 3000. The genotypes were determined with GeneChip Genotyping Analysis Software based on the Dynamic Model algorithm, in which a strict confidence threshold of P 5 0.26 was selected. Only the autosomal SNPs (490,031 SNPs) were analyzed in this study. The SNPs were filtered with a criterion of missing rate ,0.25 in every population (supplementary table S1, Supplementary Material online). According to our typing, the missing rates for GDP and TGN were slightly high probably due to DNA quality. We excluded SNPs with P , 0.01 in chi-square test for the Hardy–Weinberg equilibrium, which accounted for 0.024–0.033 of the polymorphic loci (2.4–3.3 times higher than expected), because it was highly possible for these SNPs to be mistyped or to be located on copy number variations. We also removed those SNPs that were monomorphic in all the populations. Finally, 393,971 autosomal SNPs remained. Because all the SNPs covered 2.7 Gbp of the genome, the average SNP interval was 6.8 kbp/SNP. Materials and Methods Samples Individuals from 2 Oceanic populations, Gidra in Papua New Guinea (GDP samples, n 5 24) and Tongans from Nukualofa, Kingdom of Tonga (TGN samples, n 5 24), were subjected to our study. The Gidra are Papuan-speaking people that inhabit the lowlands of Western Province, Papua, New Guinea. This population has been reported to have a small size and to be isolated (Ohtsuka 1986). Tongans are Austronesian language–speaking people in Polynesia. Informed consent for participation was obtained from all the subjects in their own language. This study was approved by the Research Ethics Committee at The University of Tokyo. In addition to these populations, we referred to genotyping data of the HapMap samples that were publicly provided by Affymetrix (http://www.affymetrix.com). The HapMap samples included 30 trios (90 individuals) each from Yoruba from Ibadan in Nigeria (YRI) and US residents with ancestry from northern and western Europe (CEU) and 45 unrelated individuals each from Han Chinese FST between Populations and Phylogenetic Tree among Individuals For each SNP, we calculated FST between pairs of populations. The genetic distance between each pair of individuals was calculated simply from the average nucleotide difference of 2 chromosomes drawn at random from different individuals. For locus l, the nucleotide difference between individuals x and y is defined as hxy,l 5 (d11 þ d12 þ d21 þ d22)/4, where indicator dab is 1 when chromosome a in individual x is different from chromosome b in individual y and zero when otherwise. For biallelic loci, hxy,l can only take 3 values: 0 (e.g., AA:AA), 1/2 (e.g., AA:AB or AB:AB), and 1 (e.g., AA:BB). The average nucleotide difference between 2 individuals x and y (Hxy) can be obtained by averaging hxy,l over L analyzed loci. Suppose aX,l and bX,l are the frequency of the allele A and B, respectively, at locus l in the population X, E(Hxy) 5 R2aX,lbX,l/L ([DX) when individuals x and y are randomly extracted from the same 1752 Kimura et al. FIG. 1.—Analyses of population phylogeny, structure, and LD. (A and B) Neighbor-Joining trees were constructed with (A) and without (B) TGN individuals. (C) The proportion of membership for each individual in a STRUCTURE analysis. The number of groups (k) was assumed to be 2–5. (D and E) LD coefficients and physical distance. To calculate the LD coefficients, 48 chromosomes were used in each population. population X. On the other hand, E(Hxy) 5 R(1 aX,laY,l bX,lbY,l)/L ([DXY) when individuals x and y are derived from different populations X and Y. Therefore, under a large number of loci analyzed, every Hxy value becomes nearly equal to DX or DXY. From a distant matrix obtained, we constructed phylogenetic trees of individuals using the NeighborJoining method (Saitou and Nei 1987) with Molecular Evolutionary Genetics Analysis version 3.1 (Kumar et al. 2004). The length of the outer branch for an individual in the phylogenetic trees (fig. 1A and B) is nearly equal to DX/2, whereas the length of the inner branch between 2 populations is nearly equal to the Nei’s (1973) minimum genetic distance, Dm 5 DXY (DX þ DY)/2. We also performed multidimensional scaling (MDS) analyses using the distance matrix for individuals to observe the homogeneity of the populations (Kruskal and Wish 1978). Population Structure Analysis A cluster analysis for population structure was performed using the STRUCTURE version 2.0 software program (Pritchard et al. 2000). Because a number of SNPs were typed and neighboring SNPs were in strong linkage disequilibrium (LD) with each other, 1/100 of the ‘‘typed SNPs’’ (3,940 SNPs) were selected for a set, and 4 different sets were used for this analysis. In all runs of the STRUCTURE algorithm, we used 10,000 Markov chain Monte Carlo replications after a burn-in of length 10,000, with a model of correlated allele frequencies. The number of groups from k 5 2 to k 5 5 were tested in a population model with admixture. We averaged the proportions of memberships over the 4 different sets that showed almost the same results. Estimation of Haplotypes and Missing Genotypes The estimation of the haplotypes and missing genotypes was performed with fastPHASE version 1.2 (Scheet and Stephens 2006). We used 5 random starts of the expectation-maximization algorithm with population label information. An allele frequency spectrum for each population was drawn after estimating the missing genotypes. The LD coefficients, D# and r2, for each population were also calculated using 48 chromosomes when the physical distance between 2 SNPs was less than 250 kb. Although haplotype estimation may be inaccurate, especially for rare haplotypes in the presence of low LD, the accuracy in the frequency of major haplotypes would be retained to some extent. Therefore, the inaccuracy in haplotype estimation is thought to have only a slight effect on the following analyses for scanning positive selection. Gene Flow and Natural Selection in Oceania 1753 Modified Long-Range Haplotype Test The long-range haplotype (LRH) test (Sabeti et al. 2002) was modified and performed as described below. The extended haplotype homozygosity (EHH) statistic is defined as the probability that any 2 chromosomes of a particular core allele have the same extended haplotype. The unbiased estimate of this statistic is calculated as: m P eAi 2 i51 EHHA 5 ; cA 2 where cA denotes the number of chromosomes with a particular allele A, eAi denotes the number of chromosomes with ith extended haplotype, and m denotes the number of extended haplotypes. To detect an unusual increase in EHH, we can compare the target allele and the other allele at the core SNP. In this study, every SNP with a minor allele frequency 10% was subjected to EHH computation. The EHH value for the target allele (EHHT) was calculated in the range from the core SNP to the position just before EHHT drops below 0.4, where we do not need to use the physical (bp) or genetic (cM) distance to decide the range for calculation (supplementary fig. S1, Supplementary Material online). In comparison to the integrated EHH (iHH) reported previously (Voight et al. 2006) in which EHH are integrated until EHH reaches 0.05, our definition of EHH calculation hardly changes the detection power but dramatically reduces the computational time (data not shown). The EHH value for the other reference allele (EHHR) was calculated also in the same range. Here, EHHR/EHHT, which is reciprocal to the relative EHH defined generally, was used as a statistic because EHHR is zero when all the extended haplotypes for the reference allele are unique. The EHHR/EHHT values for both teromeric and centromeric sides of the target SNP were averaged. We thereafter further calculated the average of the relative EHH (AREHH) for the major allele (50% , frequency 90%) over continuous 15 loci. Because the number of loci for windows should be large to some extent to stabilize the AREHH value, we picked 15 SNPs windows for generating the AREHH values. A definition of the window by fixed physical size (such as 200 kbp) can generate windows with a small number of SNPs because of low SNP density in the DNA chip, which are prone to yield low AREHH values by chance. Although the physical size of the windows can be very large depending on the SNP density in our definition (supplementary fig. S1, Supplementary Material online), its effect is conservative in statistical testing. The windows across the genome were decided without overlap in each population. In the previous genome-wide scans based on EHH-related tests, values of the original statistic (such as unbiased iHS) for each bin of the allele frequency were standardized according to their empirical distribution (Voight et al. 2006; Sabeti et al. 2007; Tang et al. 2007). However, the standardization consumes time and, in addition, conceals the distribution of values of the original statistic, which disturbs fair comparison in the distribution between real data and simulation results. This can be a problem if the demographic model assumed in the simulation is not adequately imitated to the real situation. In our method, there is no necessity for standardizing the original statistic. Comparison of Haplotype Homozygosity between Populations To detect local selective sweeps, the haplotype variation was compared between the test and reference populations. In addition, the haplotype homozygosity (H) and homozygosity for the test population’s most frequent haplotype (HM) and their interpopulation ratio (RH and RM, respectively) were herein calculated for statistical purposes (Kimura et al. 2007). To determine the blocks for the calculation of these statistics, we used 2 ways: at least 2 SNPs with HM 0.9 (method 1) or HM 0.5 (method 2) in the test population. Thereafter, HM and H were calculated not only in the test population (HMT and HT) but also in the reference populations (HMR and HR) using the blocks defined with the test population. Because the haplotypes were estimated in this study, we used the expected haplotype homozygosities, that is, HM 5 p12 and H 5 Rpi2, where pi is the frequency of the ith frequent haplotype in the test population. The RM and RH between 2 populations were defined as HMR/HMT and HR/HT, respectively. Because hitchhiking events cause the rapid increase in the frequency of the haplotype in which advantageous mutation was generated, low RM and low RH values can indicate the high differentiation and low diversity of haplotypes, respectively, thus being signatures of strong selection in the test population. Computer Simulations for the Modified LRH Test To elucidate the behavior of AREHH, computer simulations were performed. Because the SNPs typed in the DNA chips were chosen according to the allele frequency in populations analyzed in large-scale projects, not in local populations analyzed in this study, it is not easy to reflect such a process in a general coalescent simulation. An important point is that DNA chips are expected to contain SNPs with decreased heterozygosity under selective sweeps in our studied populations but not to include SNPs specific to them. To control such bias, the simulations were divided into 2 phases: a neutral ancestral phase and a selection phase (supplementary fig. S2, Supplementary Material online). The neutral ancestral phase was operated with a coalescent simulation for choosing typed SNPs and creating a founder state of the selection phase. The selection phase was carried out with forward-time simulation. Another strong point of this strategy is that we can extract the results at any point of generations in the forward-time simulation. However, because of the computational load, the forwardtime simulation restricts the population size and the number of sites. Therefore, we assumed a small population size, N 5 1,000, and instead a high recombination rate, r 5 107 (per base pairs per generation), so that 4Nr 5 4 104. In the forward-time simulation of the selection phase, we simulated 81 loci (including the selected locus under positive selection at the center) with constant 6-kb intervals 1754 Kimura et al. without assuming new mutation. Here, ith locus of jth chromosome of the founder generation was denoted by (i, j), not by allelic state, and thus the identical-by-descent state at each generation could be obtained. We examined the strength of selection at s 5 0.15 or 0.075 (2Ns 5 300 or 150) for codominant selection conditions or at s 5 0 for neutral conditions, where s and s/2 are the selection coefficients for homozygotes and heterozygotes, respectively. In addition to the constant population model (N 5 1,000), a population decline (N 5 500) model in the selection phase was tested (supplementary fig. S2, Supplementary Material online). Under the selection condition, we assumed that an advantageous mutation generated in a single chromosome increases by positive selection. The simulation results were extracted at those generations where the advantageous allele frequencies become 15%, 25%, 35%, 45%, 55%, 65%, 75%, and 85% (supplementary fig. S2, Supplementary Material online), which are near to 100 and 200 generations in the case of s 5 0.15 and 0.075, respectively. Under neutral conditions, the simulation results were extracted at 100 or 200 generations. For each parameter setting, the simulation runs were replicated 500 times. The coalescent simulation of the neutral ancestral phase was operated with cosi program (Schaffner et al. 2005), which is a modification of Hudson’s ms program. In the simulation, we assumed a 500-kbp region, a constant population size of N 5 1,000, and a mutation rate of l 5 1.5 107 (per base pairs per generation) in which we have 4Nl 5 6 104 and sampled all the chromosomes in the population (2n 5 2N 5 2,000). To choose typed SNPs, we set 2-kb windows with 4-kb intervals between adjoining windows (total 80 windows). Thereafter, the SNPs with the highest minor allele frequency in every window were chosen. These SNPs were relocated to have constant 6-kbp intervals, which were used as the founder state of the selection phase. The coalescent simulation was then repeated to create 500 founder states. The results of the selection phase that denoted by (i, j) were connected to the results of the neutral ancestral phase one by one, and the denotations were replaced by allelic state. We calculated EHHR/EHHT values for major alleles with the frequency 90% as described above. In a few cases that the EHHT value did not decay below 0.4 at the end SNP (1st or 81st), the EHHR/EHHT value was calculated at the end SNP. To compute AREHH, the selected locus was excluded, and 15 SNPs around the selected locus were used. Although our simulation models may lack rigorousness to imitate the actual demographic history of populations, they are useful to estimate roughly the behavior of the statistic. To determine the null distribution of the EHH statistic across the genome under neutrality, we also performed a genome-size neutral simulation as previously reported (Kimura et al. 2007). In brief, a neutral coalescent simulation using cosi program was performed for African, European, and East Asian populations with a flexible recombination rate and a fitting demographic model proposed previously (Schaffner et al. 2005). To correct the ascertainment bias of the selected SNPs on the Affymetrix 500K chips, we extracted the typed SNPs from the simulation data using a re- jection method based on the allele frequency spectrum of the simulation and real data. Computer Simulations for RM and RH Test In the same manner as that described in a previous study (Kimura et al. 2007), we simulated the detection powers of RM and RH to see the effect of the SNP density, sample size, and the initial number of the advantageous alleles. We therefore designed 2 constant-size populations (N 5 1,000) that diverged for 200 generations and assumed s 5 0.15 (2Ns 5 300) for a model of complete selective sweeps. The frequency of the selected allele was set at a single chromosome or 20% when positive selection began to take effect. In addition, we examined a model of partial selective sweeps in which the advantageous mutation reaches an 80% frequency under the positive selection of s 5 0.085 (2Ns 5 170) for 200 generations. Results and Discussion Genetic Differentiation and Admixture between Populations FST values exhibited a genetic differentiation between GDP and another non-African population which was relatively high in comparison to that between any other nonAfrican pairs (supplementary fig. S3, Supplementary Material online). A Neighbor-Joining tree among individuals demonstrated the GDP individuals to have a small diversity within the population (fig. 1A). Taken together, these results are consistent with the fact that this population has been isolated and also possessed a small population size (Ohtsuka 1986). A close relationship between TGN and EAS was inferred from the MDS analyses (supplementary fig. S4, Supplementary Material online) as well as low FST values, whereas TGN individuals lay between EAS and GDP individuals in the phylogenetic tree (fig. 1A). Because the population admixture can distort the shape of the phylogenetic tree, we reconstructed another tree removing the TGN individuals. In the reconstructed tree (fig. 1B), the branch length changed but the topology was still retained. This means that indigenous Melanesians have a stronger genetic affinity with Asians than with Africans and European Americans as previously reported (Nei and Roychoudhury 1993; Zhivotovsky et al. 2004). The results of the STRUCTURE analyses clearly suggested Tongans to originate from an admixture population between Asians and indigenous Melanesians (fig. 1C). When the number of groups assumed (k) was 4 in the STRUCTURE analyses, then individuals in YRI, CEU, EAS, and GDP were assigned to 4 respective groups, which are thought to correspond to classical human races, that is, Negroid, Caucasoid, Mongoloid, and Australoid. These analyses suggested that the Tongan population is genetically derived from Mongoloid at 70.1%, from Australoid at 27.7%, and from the others at 2.2%, which are proportions that are similar to those estimated in some of the previous small-scale studies (Serjeantson 1985; Martinson 1996). Most recent studies analyzing a large number of Gene Flow and Natural Selection in Oceania 1755 autosomal microsatellites have also showed almost same genetic contributions of Asians and indigenous Melanesians to Polynesians (Friedlaender et al. 2008; Kayser, Lao, et al. 2008). Only a few individuals showed a small genetic contribution from Europeans, thus indicating relatively recent immigration. On the other hand, because the proportion of genetic contribution from Asians and Melanesians in Tongan individuals was homogeneous, it is suggested that the admixture occurred long ago and people have only randomly mated after that. This is also inferred from a tight cluster of TGN individuals in the MDS analysis for the 3 populations (TGN, GDP, and EAS) (supplementary fig. S4, Supplementary Material online). Our results support the Slow train model, obviously ruling out the Express train and Entangled bank models. In addition, the proportions observed in this study were compatible with the sex-biased contribution inferred from previous mtDNA and Y-chromosome data, that is, a nearly 100% Asian origin for maternal lineage and 35% Asian and 65% indigenous Melanesian origins for paternal lineage (Kayser et al. 2006). Linkage Disequilibrium The allele frequency spectra after the estimation of the haplotype phase and missing genotype with the fastPHASE algorithm are shown in supplementary figure S5 (Supplementary Material online). We calculated LD coefficients, D# and r2, in each population using 48 chromosomes (fig. 1D and E). Both of the coefficients were high in GDP and TGN, low in YRI, and intermediate in EAS and CEU, which is thought to reflect their past population sizes. As for TGN, the high LD coefficients can also be attributed partly to the population admixture. Scans for Selective Sweeps with an LRH Test To scan for partial selective sweeps in the genome, we employed a modified LRH test. Figure 2 represents the manner that the pattern of EHHR/EHHT for SNPs around the selected locus becomes bipolar as the frequency of the advantageous allele increases. This indicates that a hitchhiking allele, which generally has a higher frequency than the selected allele, showed a low EHHR/EHHT value and the other allele showed a high EHHR/EHHT value. When the frequency of the advantageous allele is still low, the distribution of the EHHR/EHHT values is similar to the neutral case, suggesting difficulty of detecting positive selection in such a case. However, after the selected allele becomes the major allele (over 50%), the major allele of neighboring SNPs also showed a very low EHHR/EHHT value. Therefore, we can detect such a signature of strong positive selection even without typing the locus under selection using the EHHR/EHHT values for the major allele of neighboring SNPs. In this study, we calculated AREHH, that is, the average of the EHHR/EHHT values for alleles having a 50–90% frequency over 15 continuous SNPs. Before we applied this method to real data, its performance was examined with a computer simulation. Figure 3 exhib- its the results of simulations for estimating the power of our method, which was affected by the frequency of the selected allele (fig. 3A). Under neutrality, the first percentile of the value of AREHH was 0.819. When the threshold was set at this value, then the advantageous mutation (2Ns 5 300) that reached frequencies of 55%, 65%, 75%, and 85% was detected at a probability of 72.8%, 87.2%, 92.8%, and 86.2%, respectively. Because the number of sampled chromosomes hardly altered the detection power (fig. 3B), we thought that the 24 individuals sampled in this study were therefore adequate. The strength of selection (2Ns 5 300 or 150) had a substantial effect on the detection power (fig. 3C), which may reflect the opportunity for recombination events that depend on the time needed to reach the examined frequency. In addition, a population decline caused a decrease in the detection power and an increase in the false positive rate at a certain threshold (fig. 3C), thus suggesting the limits of an approach based on a comparison between alleles. The real distributions of AREHH across genomic windows were considerably different among the populations (fig. 3D). To determine the null distribution of this statistic under the neutrality, we also carried out a genome-size coalescent simulation for East Asian, European, and African populations according to a validated demographic model reported previously (Schaffner et al. 2005) (fig. 3D). The results of the simulation demonstrated the demographic history of populations to affect the AREHH values, which can partly explain the interpopulation difference in the distribution. However, the real and null distributions were substantially deviated in CEU and EAS, whereas those matched well in YRI. This deviation may tell us how often humans have been exposed to selective pressures in the out-of-Africa duration. Alternatively, because the deviation is too large, it may be attributed to population substructures and more strong bottlenecks that were not considered in the simulation model used. To identify candidates, we set the threshold at the second percentile of the empirical distribution (AREHH below 0.484 in TGN, 0.452 in GDP, 0.512 in EAS, 0.579 in CEU, and 0.842 in YRI) although these thresholds may hold only a low power to detect positive selection especially in GDP. Nonetheless, our modified method could detect the selected genes that have been previously reported such as LCT in CEU and ALDH1A2 in EAS (Bersaglieri et al. 2004; Oota et al. 2004). The AREHH values are plotted on their chromosomal positions in supplementary figures S6 and S7 (Supplementary Material online). We also exhibit the rank of AREHH values in supplementary data S1–S5 (Supplementary Material online). Scans for Population-Specific Selective Sweeps The approach based on interallelic comparison in EHH is not applicable to scanning for loci fixed already, and it has only a low power when the population has experienced severe bottlenecks as described above. Although the composite likelihood test that is based on the allele frequency spectrum can detect complete selective sweeps (Kim and Stephan 2002; Nielsen et al. 2005; Williamson et al. 1756 Kimura et al. FIG. 2.—Change of the pattern of EHHR/EHHT values around the selected loci. The results of simulations (2Ns 5 300) under the various frequencies of the allele under selection (p) are shown. EHHR/EHHT values of SNPs within 200 kb around the selected loci were fractionated (y axis) and counted for each bin of the allele frequency (x axis). The data from 500 replications of simulation are put together. The vertical line represents P. 2007), similarly to the aforementioned approach, this test captures loci under positive selection even if it has operated in the common ancestral population. However, we are now most interested in the selective sweeps occurring locally in Oceanic populations. To detect population-specific selective sweeps, therefore, we calculated the interpopulation ratio of haplotype homozygosity, RH, and the interpopulation ratio of homozygosity for the test population’s most frequent haplotype, RM (Kimura et al. 2007). The RH value can be an indicator of nucleotide diversity and past recombination events, whereas the RM value can be an indicator of genetic differentiation like FST. Previous reports (Sabeti et al. 2007; Tang et al. 2007) have proposed similar approaches based on comparison between populations, which require calculation of the interpopulation ratio of EHH values for every allele. In the RH and RM test, we can avoid redundant tests for neighboring SNPs in strong LD with each other. In addition, the RM value measuring haplotypic Gene Flow and Natural Selection in Oceania 1757 FIG. 3.—The results of computer simulations for estimating the power of statistics. (A–D) Cumulative distributions of AREHH. Various values of the selected allele frequency (P 5 55%, 65%, 75%, or 85%) (A), the chromosome sample size (2n 5 120 or 48) (B), the strength of selection (2Ns 5 300 or 150) and the population size (constant or decline) (C) were simulated to examine the detection power. The 15 SNPs window for the AREHH calculation spanned but excluded the selected locus. In the panels (B) and (C), the cases of P 5 75% are shown for the selection condition. In the default setting, the population size (N) is 1,000, the number of chromosomes sampled (2n) is 120, and the strength of selection (2Ns) is 300 in the selection (Sel) condition or 0 in the neutral (Neu) condition . Under the population decline models, N is 500 in the selection phase. The number of generations after the population decline (G) is dependent on the condition of the simulations. Cumulative distributions for the real data and the data from the genome-size neutral simulation using a demographic model are also shown (D). (E–H) Estimation of the detection powers of RM and RH. Cumulative distributions of RM (E) and RH (F) in method 1 (block definition: HM 0.9) and those (G and H) in method 2 (block definition: HM 0.5) are shown. Different initial frequencies (IFs) of the selected allele at the start of the selection phase (single chromosome or 20%) and different final frequencies of the selected mutation (fixed or 80%) were tested using methods 1 and 2, respectively. Different values of the SNP density (2 or 6 kb/ SNP) and chromosome sample size (2n 5 120 or 48) were also simulated. 1758 Kimura et al. Table 1 The Number of Blocks Detected with RM and RH Test Test Population Method 1 (HM 0.9) TGN GDP Method 2 (HM 0.5) TGN GDP Method 2 (HM 0.5) GDP Total Blocks 30,822 37,010 48,037 42,510 42,510 versus EAS RM , 0.05 0 472 RM , 0.1 286 5,181 RM , 0.01 1,146 RH , 0.3 7 900 RH , 0.5 2,075 8,640 RH , 0.25 1,663 differentiation enables us to capture the differentiation of untyped polymorphisms more powerfully than the FST value for each SNP (Kimura et al. 2007). The block definition of HM 0.9, which we call method 1 here, is appropriate to detect complete selective sweeps in which advantageous alleles have reached (near) fixation. When we performed a simulation assuming 2 diverged populations with a constant size and 6 kb of SNP intervals that is the similar density as mounted on the Affymetrix 500K chips, thresholds of RM , 0.05 and RH , 0.3 realized approximately 80% power (fig. 3E and F). The block definition of HM 0.5, or method 2, is potentialized to detect the alleles under selection that have reached a frequency of over approximately 70%. Figure 3G and H shows the detection power for the cases in which a single advantageous mutation increased to 80% frequency, which is lower than the power for a complete selective sweep. For a selected allele with 80% frequency, the thresholds of RM , 0.1 and RH , 0.5 had approximately an 80% power. As previously reported (Kimura et al. 2007), the distributions of RM and RH shift depend on the demographic history of the populations. Especially, it should be noted that decline of the test population’s size results in downshift of the distributions in both cases of selection and neutrality. Therefore, if we test a population that has experienced a decline in size, then our simulations assuming a model with constant-size populations are thought to give conservative estimation of the detection power. Our first interest is to elucidate whether there are any mutations that were generated after the divergence from Asians and then reached fixation in Polynesians. For this purpose, we applied method 1 (HM 0.9) to the test for TGN using EAS as the reference population (TGN vs. EAS). As a result, we did not observe any block satisfying the thresholds of RM , 0.05 and RH , 0.3 (table 1). Taking into account the powerfulness of these thresholds (fig. 3E and F), the result suggests that there was no (or few, if any) mutation newly generated and fixed in Polynesians. Because the dispersal of Austronesian-speaking people is thought to be dated at 6 KYA at the most, then the divergence time would be too short for a new mutation on autosomes to reach fixation in Tongans. Although the near fixation of an Austronesian-specific type of mtDNA has previously been observed in Polynesians (Redd et al. 1995), this would be due to the small population size of mtDNA that is one-fourth of the autosomal population size. As for Polynesian-specific complete selective sweeps, it remains an alternative possibility that old-standing alleles versus CEU RM , 0.05 73 852 RM , 0.1 3,141 6,693 RM , 0.01 1,712 RH , 0.3 291 1,692 RH , 0.5 7,294 11,761 RH , 0.25 2,494 versus YRI RM , 0.05 952 2,516 RM , 0.1 8,592 11,362 RM , 0.01 4,342 RH , 0.3 2,551 5,316 RH , 0.5 17,121 19,582 RH , 0.25 7,246 All Filters 0 81 54 1,738 202 originated from Asians and/or from indigenous Melanesians have been fixed by positive selection in Polynesians. A looser threshold of the statistics may detect such loci, yet only a low power can be expected if the frequency of the selected allele was relatively high at the time when the selective pressure started to operate (fig. 3E and F). When we chose the threshold of RM , 0.25 in TGN versus EAS, most of such blocks showed high RM values in TGN versus GDP instead (supplementary data S6, Supplementary Material online). This indicates that when a haplotype with a low frequency in Asians reached either complete or near fixation in Polynesians, then the same haplotype from indigenous Melanesians may have also contributed to the fixation in most cases. Although these loci may be potential candidates for complete selective sweeps on standing alleles, careful interpretation is needed because most of these may be false positives generated by genetic drift. When we applied method 2 (HM 0.5) to TGN using the threshold of RM , 0.1 and RH , 0.5, numerous blocks were detected as candidates for loci where a single mutation gained a high frequency but did not reach fixation (table 1). As the reference population (EAS, CEU, or YRI) becomes genetically closer to TGN, the number of blocks with RM , 0.1 and RH , 0.5 becomes smaller (table 1). Giving an attention to the overlap of the results of TGN versus EAS, TGN versus CEU, and TGN versus YRI, we could then further narrow the candidates down to 54 regions (0.11% of the total blocks) (supplementary data S7, Supplementary Material online). In a scan for complete selective sweeps on GDP using method 1, blocks showing very low RM and RH values were abundant even when EAS was used as the reference population (table 1). This downshift in the distribution is thought to reflect the small population size as well as the long divergence time from the other populations (Kimura et al. 2007). The number of the blocks that satisfy RM , 0.05 and RH , 0.3 against the 3 reference populations (EAS, CEU, and YRI) was 81 regions across the genome (0.22% of the total blocks) (table 1 and supplementary data S8 [Supplementary Material online]). In method 2, the blocks with RM , 0.1 and RH , 0.5 were so many that we could not effectively reduce the candidates (1,738 regions) (table 1). Because these thresholds are thought to be too conservative for testing a small-size population (Kimura et al. 2007), we would be able to use more strict thresholds while keeping a high detection power. Therefore, we used RM , 0.01 and RH , 0.25, which correspond to approximately 50% power in our simulation Gene Flow and Natural Selection in Oceania 1759 assuming constant-size populations (fig. 3G and H). These conditions could reduce the candidates down to 202 regions (0.48% of the total blocks) (table 1 and supplementary data S9 [Supplementary Material online]). Candidate Regions under Selective Sweeps The methods used to scan for selective sweeps in this study have their own characteristics. The test using AREHH is potentialized to detect selective sweeps where the selected allele has gained a greater than 50% frequency, but it has not yet reached fixation. This test can detect selective sweeps occurring in the common ancestry of different populations as well as in a local population. In the method 1 of the RM and RH test (HM 0.9), the thresholds of RM , 0.05 and RH , 0.3 detect only loci fixed or nearly fixed by population-specific positive selection. If a looser threshold such as RM , 0.25 is used in the same test, we may identify positive selections that have acted on old-standing alleles, but only a low detection power and high false positive rate can be expected. Method 2 of the RM and RH test (HM 0.5) is applicable to a scan for those regions where the locally selected allele reached over approximately 70% frequency including fixation. The chromosomal positions of the candidate regions detected by the respective methods are exhibited in supplementary figure S8 (Supplementary Material online). In some regions, the signatures detected by different methods overlapped. Such regions are considered to have a higher possibility to be true positives. Other regions show the signature unique to one method, which may be attributed to the uniqueness of the characteristics of the methods or to type I and type II errors. Our scans suggested no private mutation to exist on the Tongan autosomes that had reached fixation. However, there remain alternative possibilities that old-standing alleles have reached fixation by local selective pressures and that newly generated advantageous mutations have gained a high frequency but have not yet reached fixation. The block showing the lowest RM value (0.076) in the test of TGN versus EAS using method 1 was located at 92788024–92838919 on chromosome 12 (supplementary data S6, Supplementary Material online), which is at 41-kb distance from the CRADD gene (fig. 4). It is worth noting that an approximately 500-kb deletion around this gene in mouse has been reported to cause a ‘‘high growth’’ mutant that shows a proportional increase in tissue and organ size without obesity (Horvat and Medrano 1998). Another candidate for the selected region in which an old-standing allele reached fixation was VLDLR (supplementary data S6, Supplementary Material online), which is involved in triglyceride and fatty acid metabolism (Tacken et al. 2001). In addition, overlapping signatures in both methods 1 and 2 (supplementary data S6 and S7, Supplementary Material online) were observed in the gene region of EXT2, which is a causal gene of the type II form of multiple exostoses, and it plays a crucial role in bone formation (Stickens and Evans 1997). These genes can be candidates that are associated with the large fat, muscle, and bone masses of Polynesians. A recent paper examining the interpopulation differentiation of the type II diabetes–associated genes FIG. 4.—A candidate of selective sweeps in Tongans. (A) RH values (TGN vs. EAS of the method 1) on chromosome 12. (B) CRADD region and frequencies of the Tongan’s major allele in all the populations. The region indicated by left right arrow shows a signature of selective sweep in Tongans. has suggested that a susceptible allele of PPARGC1A may play a role in the large difference in the prevalence of the disease between Polynesians and neighboring populations (Myles et al. 2007). However, our scans did not identify any signature of positive selection on the gene region of PPARGC1A. One of the strongest signatures of selective sweeps in GDP was located at the region including the LHX4 and ACBD6 genes on chromosome 1 (supplementary data S8, Supplementary Material online). LHX4 encodes a transcriptional regulator involved in the control of the development of the pituitary gland, and mutations in this gene are associated with syndromic short stature and pituitary defects (Machinis et al. 2001; Castro-Feijoo et al. 2005). ACBD6 is a binding protein of acyl-coenzyme A that has a role in fatty acid metabolism (Entrez Gene). The gene region of IGF1R, which is the receptor for insulin-like growth factor 1 (Castro-Feijoo et al. 2005), was also considered to be a candidate of the selected genes (supplementary data S8, Supplementary Material online). These genes may be involved in the slow growth, short stature, and lightweight characteristics of New Guineans. Future association studies between genotypes and phenotypes are indispensable. 1760 Kimura et al. Other candidates of selective sweeps in Oceanic populations included several interesting genes such as DDX58, SIAT4A (supplementary data S7, Supplementary Material online), and IVNS1ABP (supplementary data S8, Supplementary Material online), which code molecules related with infection of the influenza A viruses (Wolff et al. 1998; Shinya et al. 2006; Mibayashi et al. 2007; Nicholls et al. 2007). If we could identify a protective effect of the selected allele against the influenza, these kinds of signatures may therefore suggest evidence for the epidemic history of the virus in Oceania and human conquest of the disease by genetic adaptation. We observed the candidates of selective sweeps that include no gene or genes whose functions have not been known yet. The selected loci should have some phenotypic functions because natural selection acts on phenotypes. Therefore, the scans for signatures of selective sweeps can be a trigger to identify genes or DNA sequences with some important function as well as to determine the functional difference between alleles. Such an approach based on evolutionary genetics, which thus provide clues to understand how humans have adapted to our environments, are therefore also expected to help elucidate the genomic functions if further functional and association studies on the candidates are carried out. Conversely, this approach may also shed some light on the invisible phenotypic difference between populations. Our study demonstrated that genomewide SNP typing systems, which have exerted their power for identifying disease-associated polymorphisms (The Wellcome Trust Case Control Consortium 2007), are also useful for evolutionary study on human populations. Supplementary Material Supplementary table S1, figures S1–S8, and data S1–S9 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/). Acknowledgments We are grateful to the Gidra people in Papua New Guinea and the Tongan people for their kind cooperation in providing blood samples. We thank the staff of the Department of Health, Western Province of Papua New Guinea, Dr Tetsuro Hongo at Yamanashi Institute of Environmental Sciences, Dr Taniela Palu at Ministry of Health, Kingdom of Tonga, Dr Viliami Tangi at Diabetes Clinic, Kingdom of Tonga, Dr Kazumichi Katayama at Kyoto University for help in sample collection, and 2 anonymous reviewers for helpful comments. This study was partly supported by a Grant-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science, and Technology of Japan. This research was done mainly at the Department of Human Genetics, Graduate School of Medicine, The University of Tokyo. Literature Cited Bellwood P. 1989. The colonization of the Pacific: some current hypotheses. In: Serjeantson SW, editor. The colonization of the Pacific: a genetic trail. Oxford (UK): Clarendon Press. Bellwood P. 1991. The Austronesian dispersal and the origin of languages. Sci Am. 265:88–93. Bersaglieri T, Sabeti PC, Patterson N, Vanderploeg T, Schaffner SF, Drake JA, Rhodes M, Reich DE, Hirschhorn JN. 2004. Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet. 74:1111–1120. Bindon JR, Baker PT. 1997. Bergmann’s rule and the thrifty genotype. Am J Phys Anthropol. 104:201–210. Capelli C, Wilson JF, Richards M, Stumpf MP, Gratrix F, Oppenheimer S, Underhill P, Pascali VL, Ko TM, Goldstein DB. 2001. A predominantly indigenous paternal heritage for the Austronesian-speaking peoples of insular Southeast Asia and Oceania. Am J Hum Genet. 68:432–443. Castro-Feijoo L, Quinteiro C, Loidi L, Barreiro J, Cabanas P, Arevalo T, Dieguez C, Casanueva FF, Pombo M. 2005. Genetic basis of short stature. J Endocrinol Invest. 28:30–37. Cavalli-Sforza LL, Menozzi P, Piazza A. 1994. The history and geography of human genes. Princeton (NJ): Princeton University Press. Diamond JM. 1988. Express train to Polynesia. Nature. 336: 307–308. Friedlaender JS, Friedlaender FR, Reed FA, et al. (12 co-authors). 2008. The genetic structure of Pacific Islanders. PLoS Genet. 4:e19. Fujimoto A, Kimura R, Ohashi J, et al. (15 co-authors). 2008. A scan for genetic determinants of human hair morphology: EDAR is associated with Asian hair thickness. Hum Mol Genet. 17:835–843. Hill AV, Bowden DK, Trent RJ, Higgs DR, Oppenheimer SJ, Thein SL, Mickleson KN, Weatherall DJ, Clegg JB. 1985. Melanesians and Polynesians share a unique alpha-thalassemia mutation. Am J Hum Genet. 37:571–580. Horvat S, Medrano JF. 1998. A 500-kb YAC and BAC contig encompassing the high-growth deletion in mouse chromosome 10 and identification of the murine Raidd/Cradd gene in the candidate region. Genomics. 54:159–164. Houghton P. 1990. The adaptive significance of Polynesian body form. Ann Hum Biol. 17:19–32. Kayser M, Brauer S, Cordaux R, et al. (15 co-authors). 2006. Melanesian and Asian origins of Polynesians: mtDNA and Y chromosome gradients across the Pacific. Mol Biol Evol. 23:2234–2244. Kayser M, Brauer S, Weiss G, Underhill PA, Roewer L, Schiefenhovel W, Stoneking M. 2000. Melanesian origin of Polynesian Y chromosomes. Curr Biol. 10:1237–1246. Kayser M, Lao O, Saar K, Brauer S, Wang X, Nurnberg P, Trent RJ, Stoneking M. 2008. Genome-wide analysis indicates more Asian than Melanesian ancestry of Polynesians. Am J Hum Genet. 82:194–198. Kayser M, Liu F, Janssens AC, et al. (22 co-authors). 2008. Three genome-wide association studies and a linkage analysis identify HERC2 as a human iris color gene. Am J Hum Genet. 82:411–423. Kim Y, Stephan W. 2002. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics. 160:765–777. Kimura R, Fujimoto A, Tokunaga K, Ohashi J. 2007. A practical genome scan for population-specific strong selective sweeps that have reached fixation. PLoS ONE. 2:e286. Kruskal JB, Wish M. 1978. Multidimensional scaling. New York: SAGE Publications. Kumar S, Tamura K, Nei M. 2004. MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform. 5:150–163. Lum JK, Cann RL, Martinson JJ, Jorde LB. 1998. Mitochondrial and nuclear genetic relationships among Pacific Island and Asian populations. Am J Hum Genet. 63:613–624. Gene Flow and Natural Selection in Oceania 1761 Lum JK, Rickards O, Ching C, Cann RL. 1994. Polynesian mitochondrial DNAs reveal three deep maternal lineage clusters. Hum Biol. 66:567–590. Machinis K, Pantel J, Netchine I, et al. (11 co-authors). 2001. Syndromic short stature in patients with a germline mutation in the LIM homeobox LHX4. Am J Hum Genet. 69: 961–968. Martinson JJ. 1996. Molecular perspectives on the colonization of the Pacfic. In: Macie-Taylor CGN, editor. Molecular biology and human diversity. London: Cambridge University Press. p. 171–195. Melton T, Peterson R, Redd AJ, Saha N, Sofro AS, Martinson J, Stoneking M. 1995. Polynesian genetic affinities with Southeast Asian populations as identified by mtDNA analysis. Am J Hum Genet. 57:403–414. Mibayashi M, Martinez-Sobrido L, Loo YM, Cardenas WB, Gale M Jr, Garcia-Sastre A. 2007. Inhibition of retinoic acidinducible gene I-mediated induction of beta interferon by the NS1 protein of influenza A virus. J Virol. 81:514–524. Myles S, Hradetzky E, Engelken J, Lao O, Nurnberg P, Trent RJ, Wang X, Kayser M, Stoneking M. 2007. Identification of a candidate genetic variant for the high prevalence of type II diabetes in Polynesians. Eur J Hum Genet. 15:584–589. Neel JV. 1962. Diabetes mellitus: a ‘‘thrifty’’ genotype rendered detrimental by ‘‘progress. Am J Hum Genet. 14:353–362. Nei M. 1973. Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci USA. 70:3321–3323. Nei M, Roychoudhury AK. 1993. Evolutionary relationships of human populations on a global scale. Mol Biol Evol. 10: 927–943. Nicholls JM, Chan MC, Chan WY, et al. (12 co-authors). 2007. Tropism of avian influenza A (H5N1) in the upper and lower respiratory tract. Nat Med. 13:147–149. Nielsen R, Williamson S, Kim Y, Hubisz MJ, Clark AG, Bustamante C. 2005. Genomic scans for selective sweeps using SNP data. Genome Res. 15:1566–1575. Norgan NG. 1995. Changes in patterns of growth and nutritional anthropometry in two rural modernizing Papua New Guinea communities. Ann Hum Biol. 22:491–513. Ohtsuka R. 1986. Low rate of population increase of the Gidra Papuans in the past: a genealogical-demographic analysis. Am J Phys Anthropol. 71:13–23. Oota H, Pakstis AJ, Bonne-Tamir B, et al. (14 co-authors). 2004. The evolution and population genetics of the ALDH2 locus: random genetic drift, selection, and low levels of recombination. Ann Hum Genet. 68:93–109. O’Shaughnessy DF, Hill AV, Bowden DK, Weatherall DJ, Clegg JB. 1990. Globin genes in Micronesia: origins and affinities of Pacific Island peoples. Am J Hum Genet. 46:144–155. Pritchard JK, Stephens M, Donnelly P. 2000. Inference of population structure using multilocus genotype data. Genetics. 155:945–959. Redd AJ, Takezaki N, Sherry ST, McGarvey ST, Sofro AS, Stoneking M. 1995. Evolutionary history of the COII/ tRNALys intergenic 9 base pair deletion in human mitochondrial DNAs from the Pacific. Mol Biol Evol. 12:604–615. Roberts RG, Jones R, Smith MA. 1990. Thermoluminescence dating of a 50,000-year-old human occupation site in northern Australia. Nature. 345:153–156. Sabeti PC, Reich DE, Higgins JM, et al. (17 co-authors). 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature. 419:832–837. Sabeti PC, Varilly P, Fry B, et al. (244 co-authors). 2007. Genome-wide detection and characterization of positive selection in human populations. Nature. 449:913–918. Saitou N, Nei M. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 4:406–425. Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D. 2005. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15:1576–1583. Scheet P, Stephens M. 2006. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 78:629–644. Serjeantson SW. 1985. Migration and admixture in the Pacific. In: Szathmary E, editor. Out of Asia: peopling the Americas and the Pacific. Canberra (Australia): The Journal of Pacific History. p. 133–145. Serjeantson SW, Gao X. 1996. The genetic prehistory of Australia and Oceania: new insights from DNA analyses. In: Szathmary EJE, editor. Prehistoric Mongoloid dispersals. Oxford: Oxford University Press. Shinya K, Ebina M, Yamada S, Ono M, Kasai N, Kawaoka Y. 2006. Avian flu: influenza virus receptors in the human airway. Nature. 440:435–436. Stickens D, Evans GA. 1997. Isolation and characterization of the murine homolog of the human EXT2 multiple exostoses gene. Biochem Mol Med. 61:16–21. Su B, Jin L, Underhill P, et al. (11 co-authors). 2000. Polynesian origins: insights from the Y chromosome. Proc Natl Acad Sci USA. 97:8225–8228. Tacken PJ, Hofker MH, Havekes LM, van Dijk KW. 2001. Living up to a name: the role of the VLDL receptor in lipid metabolism. Curr Opin Lipidol. 12:275–279. Tang K, Thornton KR, Stoneking M. 2007. A new approach for using genome scans to detect recent positive selection in the human genome. PLoS Biol. 5:e171. Terrell JE. 1988. History as a family tree, history as an entangled bank: constructing images and interpretations of prehistory in the South Pacific. Antiquity. 62:642–657. The International HapMap Consortium. 2005. A haplotype map of the human genome. Nature. 437:1299–1320. The Wellcome Trust Case Control Consortium. 2007. Genomewide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 447:661–678. Voight BF, Kudaravalli S, Wen X, Pritchard JK. 2006. A map of recent positive selection in the human genome. PLoS Biol. 4:e72. Wang ET, Kodama G, Baldi P, Moyzis RK. 2006. Global landscape of recent inferred Darwinian selection for Homo sapiens. Proc Natl Acad Sci USA. 103:135–140. White JP, O’Connell JF. 1979. Australian prehistory: new aspects of antiquity. Science. 203:21–28. Williamson SH, Hubisz MJ, Clark AG, Payseur BA, Bustamante CD, Nielsen R. 2007. Localizing recent adaptive evolution in the human genome. PLoS Genet. 3:e90. Wolff T, O’Neill RE, Palese P. 1998. NS1-binding protein (NS1BP): a novel human protein that interacts with the influenza A virus nonstructural NS1 protein is relocalized in the nuclei of infected cells. J Virol. 72:7170–7180. Zhivotovsky LA, Underhill PA, Cinnioglu C, et al. (17 coauthors). 2004. The effective mutation rate at Y chromosome short tandem repeats, with application to human populationdivergence time. Am J Hum Genet. 74:50–61. Yoko Satta, Associate Editor Accepted May 26, 2008

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Gene Flow and Natural Selection in Oceanic