Download McVean_CGAT_Mar2013

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Minimal genome wikipedia , lookup

Group selection wikipedia , lookup

Genomic library wikipedia , lookup

Non-coding DNA wikipedia , lookup

Designer baby wikipedia , lookup

Dual inheritance theory wikipedia , lookup

Genetic testing wikipedia , lookup

Metagenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Pathogenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Human genome wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Genetic drift wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Human Genome Project wikipedia , lookup

Genetics and archaeogenetics of South Asia wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Genome (book) wikipedia , lookup

Behavioural genetics wikipedia , lookup

Genomics wikipedia , lookup

Heritability of IQ wikipedia , lookup

Medical genetics wikipedia , lookup

Population genetics wikipedia , lookup

Tag SNP wikipedia , lookup

Genome evolution wikipedia , lookup

Exome sequencing wikipedia , lookup

Public health genomics wikipedia , lookup

Microevolution wikipedia , lookup

Human genetic variation wikipedia , lookup

Transcript
Gil McVean
What makes us different?
Image: Wikimedia commons
The genetic axes
Genetic disorders
Strong
Inherited
Complex disease
Cancer
Somatic
Weak
Aging
Images:Wikimedia commons
Characterising individual genomes
Image: Wikimedia commons
Image: Wikimedia commons
Image: Illumina Cambridge Ltd
Why 1000 genomes?
• To find all common (>5%) variants in the accessible human
genome
• To find at least 95% of variants at 1% in populations of
medical genetics interest
– 95% of variants at 0.1% in genes
• To provide a fully public framework for interpreting rare
genetic variation in the context of disease
– Screening
– Imputation
The 1000 Genomes Project
1000 Genomes Project design
Population sequencing
Haplotypes
2x
10x
A map of shared variation
www.1000genomes.org
http://browser.1000genomes.org
Good, but not perfect
Variant type
Validation methods
Estimated FDR
Low-coverage SNPs
Sequenom, 454,
PacBio
1.8%
Exome SNPs
454
1.6%
LOF variants
454
5.2%
Short indels
PCR, Sanger, array
genotypes
36% -> 5.4%
Large deletions
PCR, array CGH, SNP
genotype
2.1%
Other large SVs
PCR, array CGH, SNP
genotype
1.4% – 3.7%
Post-hoc
filtering
Not
genotyped
12,000 changes to
proteins
4 million sites that differ from the
human reference genome
5 rare
variants that
are known
to cause
disease
100 changes
that knockout
gene function
Most variation is common –
Most common variation is cosmopolitan
Number of variants in typical genome
Found in all continents
92%
Found only in Europe
0.3%
Found only in the UK
0.1%
Found only in you
0.002%
Imputation from 1000 Genomes
• Imputation similar for all variant types across populations
• Comparable to imputation from high quality SNP haplotypes
…but it can work for common variants
The 1000 Genomes Sampling design
The 1000 Genomes Sampling design
What have we learned about low-frequency genetic variation
from the 1000 Genomes Project?
• How many rare (<0.5%) and low-frequency (0.5-5%) variants
are there, how does it vary between populations and what
does it tell use about demography?
• To what extent has natural selection shaped the distribution
of rare variants within and between populations?
• What are the implications of these findings for the
interpretation of genetic variation in individual genomes?
Populations differ in load of rare and common variants
Most rare variation is private
Rare variant differentiation within ancestry groupings
increases as variant frequency decreases
Not all populations are equal
Rare variants identify recent historical links between
populations
ASW shows stronger
sharing with YRI than LWK
48% of IBS
variants shared
with American
populations
What about variants that affect
gene function?
Conserved variant load per individual
The proportion of rare variants is predicted by conservation,
with the exception of splice-disrupting and STOP+ variants
KEGG ‘pathways’ show variation in excess rare-variant load
Patterns of variation inform about selective constraint
CTCF-binding motif
Variants under selection showed elevated levels of population
differentiation
Proportion of pairwise
comparisons where
nonsynonymous
variants are more
differentiated than
synonymous ones
Rare variant differentiation can confound the genetic study of
disease
Mathieson and McVean (2012)
Implications
• Rare variants have spatial and ancestry-related distributions
that reflect recent demographic events and selection.
• Purifying selection elevates local differentiation of rare
variants.
• The functional and aetiological interpretation of rare variants
in the context of disease needs to be aware of the local
genetic background.
The final resource – mid 2013
AFRICA
100
Gambian in Western Division, The Gambia (GWD)
Malawian in Blantyre, Malawi (MAB)
200
Mende in Sierra Leone (MSL)
Esan in Nigeria (ESN)
SOUTH ASIAN
100
Punjabi in Lahore, Pakistan (PJL)
100
Bengali in Bangladesh (BEB)
100
Sri Lankan Tamil in the UK (STU)
100
Indian Telugu in the UK (ITU)
80
AMERICAS
African American in Jackson, MS (AJM)
What more could we learn about human population genetics?
• There is a need for continuing the programme of developing
public resources describing genetic variation across new
populations, with high resolution spatial information.
– This will not just shed light on population history and selection, but be
important for interpreting (rare) genetic variation in individual
genomes.
• The Phase 1 1000 Genomes data has made clear the extent of
variation in conserved regulatory sequence within genomes
– How does this relate to variation in function in different cell types?
• Many of the most interesting parts of the genome (for the
study of selection) are still poorly-covered by HTS data
– Need to collect ‘bespoke’ data types for some genomic regions
The 1000 Genomes Project Consortium
http://www.1000genomes.org/