The International Consortium. The International HapMap Project.
Nature. 426, 789-796 (2003)
The International HapMap Project
► Launched
October 29, 2002
► Major
initiative to map human genetic variation based
on haplotype patterns.
► Characterize
sequence variants, their frequencies, and
correlations between them.
► Serve
as a key resource for finding genes that affect
health, disease, and drug response.
Direct Approach: Laborious and Expensive
► Whole
genome sequencing of numerous patient samples
to identify candidate variations
► Test
each variant for correlation with a disease.
► Genotyping
3 million SNPs in 1000 people =
3 billion separate genotyping assays
Indirect Approach: Efficient and Comprehensive
► Relatively
small set of variants will capture most
common variation patterns.
► Linkage
Disequilibrium (LD) in SNPs = few haplotypes
in many chromosome regions
► A set
of sequence variants serve as genetic markers to
detect association between a particular genomic region
and disease.
A few common haplotypes among many chromosome regions
account for most of the variation in the human genome.
Human genome can be divided into 200,000 haplotype blocks.
Identify 200,000 to 1 million tag SNPs
Efficient and comprehensive
SNPs, Haplotypes, and Tag SNPs
Genotyping 3 tag SNPs out of 20 SNPs is sufficient to distinquish
one haplotype from another.
Haplotype Map: Search for genes on Chromosome 5
Related to Crohn’s Disease
Haplotype blocks contain 2-4 flavors of SNP combinations ( orange, purple, etc. )
Dashed lines indicate relationships between blocks
Percentages indicate occurrence of each SNP set in patients.
DNA Samples and Populations
Population Sampling: samples chosen from particular populations
based on ethnicity and geography.
N & W European
United States
Ibadan, Nigeria
Tokyo, Japan
Beijing, China
Number of Samples
90 ( 30 trios )
45 ( unrelated )
45 ( unrelated )
Include a substantial amount of genetic variation
Trios and unrelated individuals: local LD patterns.
Unrelated DNA samples: identify 99% of haplotypes, frequency of
5% or greater in a population
SNP Selection
High density of SNPs to adequately describe genetic variation
LD and haplotype density varies 100 fold across the genome.
Hierarchical strategy will allow regions of the genome with the least
LD to be characterized with higher SNP density.
Verified SNPs with available allele frequency and genotyping data
Double-hit SNPs seen twice in two different DNA samples
SNPs that cause amino acid changes
10 genotyping centers: Japan, UK, Canada, China, US, and
► 5 high-throughput genotyping technologies
► Performance criteria:
1. Data produced must be 99.2% complete & 99.5% accurate.
2. All experiments must include samples for internal quality
3. Samples of SNP genotypes from each center re-genotyped
by other centers.
Various platforms allow for comparisons for accuracy, success
rate, throughput, and cost.
Complete and reliable data production
Analyze LD between markers.
Measure proportion of common ancestral chromosomes that have
not recombined
Sliding window LD profiles
LD unit maps
Haplotype blocks
Meiotic recombination rates
Statistical methods, replication studies, and functional analyses
of variants – confirm the findings and identify functionally
significant SNPs.