The International Consortium. The International HapMap Project. Nature. 426, 789-796 (2003) The International HapMap Project ► Launched October 29, 2002 ► Major initiative to map human genetic variation based on haplotype patterns. ► Characterize sequence variants, their frequencies, and correlations between them. ► Serve as a key resource for finding genes that affect health, disease, and drug response. Direct Approach: Laborious and Expensive ► Whole genome sequencing of numerous patient samples to identify candidate variations ► Test each variant for correlation with a disease. ► Genotyping 3 million SNPs in 1000 people = 3 billion separate genotyping assays Indirect Approach: Efficient and Comprehensive ► Relatively small set of variants will capture most common variation patterns. ► Linkage Disequilibrium (LD) in SNPs = few haplotypes in many chromosome regions ► A set of sequence variants serve as genetic markers to detect association between a particular genomic region and disease. HapMap ► A few common haplotypes among many chromosome regions account for most of the variation in the human genome. ► Human genome can be divided into 200,000 haplotype blocks. ► Identify 200,000 to 1 million tag SNPs ► Efficient and comprehensive SNPs, Haplotypes, and Tag SNPs Genotyping 3 tag SNPs out of 20 SNPs is sufficient to distinquish one haplotype from another. Haplotype Map: Search for genes on Chromosome 5 Related to Crohn’s Disease Haplotype blocks contain 2-4 flavors of SNP combinations ( orange, purple, etc. ) Dashed lines indicate relationships between blocks Percentages indicate occurrence of each SNP set in patients. DNA Samples and Populations ► Population Sampling: samples chosen from particular populations based on ethnicity and geography. Ancestry N & W European African Japanese Chinese Location United States Ibadan, Nigeria Tokyo, Japan Beijing, China Number of Samples 90 90 ( 30 trios ) 45 ( unrelated ) 45 ( unrelated ) Include a substantial amount of genetic variation Trios and unrelated individuals: local LD patterns. Unrelated DNA samples: identify 99% of haplotypes, frequency of 5% or greater in a population SNP Selection ► ► ► High density of SNPs to adequately describe genetic variation LD and haplotype density varies 100 fold across the genome. Hierarchical strategy will allow regions of the genome with the least LD to be characterized with higher SNP density. Priorities Verified SNPs with available allele frequency and genotyping data Double-hit SNPs seen twice in two different DNA samples SNPs that cause amino acid changes GENOTYPING ► 10 genotyping centers: Japan, UK, Canada, China, US, and Nigeria ► 5 high-throughput genotyping technologies ► Performance criteria: 1. Data produced must be 99.2% complete & 99.5% accurate. 2. All experiments must include samples for internal quality checks. 3. Samples of SNP genotypes from each center re-genotyped by other centers. Various platforms allow for comparisons for accuracy, success rate, throughput, and cost. Complete and reliable data production DATA ANALYSIS ► ► ► ► ► ► ► Analyze LD between markers. Measure proportion of common ancestral chromosomes that have not recombined Sliding window LD profiles LD unit maps Haplotype blocks Meiotic recombination rates Statistical methods, replication studies, and functional analyses of variants – confirm the findings and identify functionally significant SNPs.