* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download McVean_CGAT_Mar2013
Minimal genome wikipedia , lookup
Group selection wikipedia , lookup
Genomic library wikipedia , lookup
Non-coding DNA wikipedia , lookup
Designer baby wikipedia , lookup
Dual inheritance theory wikipedia , lookup
Genetic testing wikipedia , lookup
Metagenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Pathogenomics wikipedia , lookup
Genetic engineering wikipedia , lookup
Human genome wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Genetic drift wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Human Genome Project wikipedia , lookup
Genetics and archaeogenetics of South Asia wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Genome (book) wikipedia , lookup
Behavioural genetics wikipedia , lookup
Heritability of IQ wikipedia , lookup
Medical genetics wikipedia , lookup
Population genetics wikipedia , lookup
Genome evolution wikipedia , lookup
Exome sequencing wikipedia , lookup
Public health genomics wikipedia , lookup
Gil McVean What makes us different? Image: Wikimedia commons The genetic axes Genetic disorders Strong Inherited Complex disease Cancer Somatic Weak Aging Images:Wikimedia commons Characterising individual genomes Image: Wikimedia commons Image: Wikimedia commons Image: Illumina Cambridge Ltd Why 1000 genomes? • To find all common (>5%) variants in the accessible human genome • To find at least 95% of variants at 1% in populations of medical genetics interest – 95% of variants at 0.1% in genes • To provide a fully public framework for interpreting rare genetic variation in the context of disease – Screening – Imputation The 1000 Genomes Project 1000 Genomes Project design Population sequencing Haplotypes 2x 10x A map of shared variation www.1000genomes.org http://browser.1000genomes.org Good, but not perfect Variant type Validation methods Estimated FDR Low-coverage SNPs Sequenom, 454, PacBio 1.8% Exome SNPs 454 1.6% LOF variants 454 5.2% Short indels PCR, Sanger, array genotypes 36% -> 5.4% Large deletions PCR, array CGH, SNP genotype 2.1% Other large SVs PCR, array CGH, SNP genotype 1.4% – 3.7% Post-hoc filtering Not genotyped 12,000 changes to proteins 4 million sites that differ from the human reference genome 5 rare variants that are known to cause disease 100 changes that knockout gene function Most variation is common – Most common variation is cosmopolitan Number of variants in typical genome Found in all continents 92% Found only in Europe 0.3% Found only in the UK 0.1% Found only in you 0.002% Imputation from 1000 Genomes • Imputation similar for all variant types across populations • Comparable to imputation from high quality SNP haplotypes …but it can work for common variants The 1000 Genomes Sampling design The 1000 Genomes Sampling design What have we learned about low-frequency genetic variation from the 1000 Genomes Project? • How many rare (<0.5%) and low-frequency (0.5-5%) variants are there, how does it vary between populations and what does it tell use about demography? • To what extent has natural selection shaped the distribution of rare variants within and between populations? • What are the implications of these findings for the interpretation of genetic variation in individual genomes? Populations differ in load of rare and common variants Most rare variation is private Rare variant differentiation within ancestry groupings increases as variant frequency decreases Not all populations are equal Rare variants identify recent historical links between populations ASW shows stronger sharing with YRI than LWK 48% of IBS variants shared with American populations What about variants that affect gene function? Conserved variant load per individual The proportion of rare variants is predicted by conservation, with the exception of splice-disrupting and STOP+ variants KEGG ‘pathways’ show variation in excess rare-variant load Patterns of variation inform about selective constraint CTCF-binding motif Variants under selection showed elevated levels of population differentiation Proportion of pairwise comparisons where nonsynonymous variants are more differentiated than synonymous ones Rare variant differentiation can confound the genetic study of disease Mathieson and McVean (2012) Implications • Rare variants have spatial and ancestry-related distributions that reflect recent demographic events and selection. • Purifying selection elevates local differentiation of rare variants. • The functional and aetiological interpretation of rare variants in the context of disease needs to be aware of the local genetic background. The final resource – mid 2013 AFRICA 100 Gambian in Western Division, The Gambia (GWD) Malawian in Blantyre, Malawi (MAB) 200 Mende in Sierra Leone (MSL) Esan in Nigeria (ESN) SOUTH ASIAN 100 Punjabi in Lahore, Pakistan (PJL) 100 Bengali in Bangladesh (BEB) 100 Sri Lankan Tamil in the UK (STU) 100 Indian Telugu in the UK (ITU) 80 AMERICAS African American in Jackson, MS (AJM) What more could we learn about human population genetics? • There is a need for continuing the programme of developing public resources describing genetic variation across new populations, with high resolution spatial information. – This will not just shed light on population history and selection, but be important for interpreting (rare) genetic variation in individual genomes. • The Phase 1 1000 Genomes data has made clear the extent of variation in conserved regulatory sequence within genomes – How does this relate to variation in function in different cell types? • Many of the most interesting parts of the genome (for the study of selection) are still poorly-covered by HTS data – Need to collect ‘bespoke’ data types for some genomic regions The 1000 Genomes Project Consortium http://www.1000genomes.org/