Download Analyzing Copy Number Variation in the Human Genome

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Polyploid wikipedia , lookup

Behavioural genetics wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene desert wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Gene wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Oncogenomics wikipedia , lookup

Tag SNP wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Metagenomics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Genomic imprinting wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transposable element wikipedia , lookup

History of genetic engineering wikipedia , lookup

Designer baby wikipedia , lookup

RNA-Seq wikipedia , lookup

Helitron (biology) wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

NUMT wikipedia , lookup

Non-coding DNA wikipedia , lookup

Human genetic variation wikipedia , lookup

Genome (book) wikipedia , lookup

ENCODE wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Copy-number variation wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Minimal genome wikipedia , lookup

Genomics wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Public health genomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Human genome wikipedia , lookup

Genome editing wikipedia , lookup

Genomic library wikipedia , lookup

Human Genome Project wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Analyzing Copy Number Variation
in the Human Genome
Jeff Bailey
S5-432
Continuum of Genomic Variation
Forms Single
of genetic
variation.
base-pair
changes
Nucleotide

Point mutations (1 per 800 bp)
Small insertions/deletions

Large-scale Deletions
Segmental Duplications
Local Rearangements


Chromosomal variation
Cytogenetics

Translocation, inversion, fusion
Structural Variants (SV)
Retroelement insertions (300bp -10 kb)
Large-scale genomic copy
number variation (>10 kb)

Copy Number Variation
Frameshift, microsatellite, minisatellite
Mobile elements

METHOD 1: Copy Number Variation:
Array Comparative Genomic Hybridization

Two genomic surveys of normal individuals identified 76
and 255 CNV regions by array CGH ( Sebat et al. Science
2004; Iafrate et al. Nat Genet 2004)
Gain
Gain
>green
>red
(blue line)
Loss
Modified:Feuk et al. Nat Rev Genet 2006

30% CNVs overlap duplicated regions (variant SD = CNV)
( Sebat et al. Science 2004)
Segmental Duplications (SD)
5.4% of the genome (>90% identity and >1 kb)
chr22
Properties:
•Clustered
•Complex regions
99.1% identical over 180 kb (VCF/DiGeorge Syndrome in 1 in 3000 births)
Bailey and Eichler (2006) Nat Rev Genet
SDs predispose to copy number variation
I
D
Cen
D’
Tel
I
D
Cen
D’
Non-allelic Homologous Recombination (Lupski, 1999)
Cen
I
D
D’- D
I
D’
Tel
GAMETES
Cen
D - D’
Tel
Change in Dosage Sensitive Genes → phenotype or disease
Dynamic Regions – predisposed to further rearrangements
Complex disease associations
1) Recurrent germline rearrangements causing congenital disease
2) Rare CNVs causing disease in a small proportion of affected individuals
in a Mendelian fashion
3) Common CNVs that are responsible for a proportion
of complex genetic risk in many individuals
CNV
Disease Association
CCL3L1
Decreased copies cause HIV/AIDS susceptibility (Gonzalez et al. 2005). Increased copies
increase risk of rheumatoid arthritis.(Mckinney et al. 2008)
FCGR3B
Decreased copies increases risk for lupus nephritis (Aitman et al. 2006)
APP
UGTB17
Synuclein
DEFB4
LCE3B &
LCE3C
Duplication leading to (Rovelet,Lecrux et al. 2006)
Deletion associated with 2-fold increased risk of osteoporosis (Yang et al. 2008)
Triplication causes Parkinson Disease (Singleton et al 2003)
More than 5 copies of beta-defensins associated with 1.7-fold increased risk of psoriasis
(Hollox et al. 2008). Less than 4 copies is associated with 3-fold increased risk for Crohn
disease.(Fellermann et al. 2006)
Multigene deletion of late cornified envelope genes are associated with psoriasis (de Cid, et
al. 2009)
Method 2: End-Sequence Pair (ESP)
Analysis



fosmid
~1.1 million fosmid end-sequence
pairs derived from a single donor
(sequenced by MIT to help close
gaps in the reference genome)
Fosmid insert size tightly distributed
around mean (40 kb)
insert
< 32 kb
>48 kb
Putative
Insertion
within
fosmid
Putative
Deletion
within
fosmid
Compare fosmid optimal placements
to detect deviations from expected.
Fosmid:
Concordant
Insertion
Deletion
Inversions
Reference
Genome
Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage)
639,204 fosmid pairs BEST pairs (8.8X genome coverage)
Results:
Tuzun*, Bailey*, Sharp* et al. Nat. Genet 2005
Fosmid SV Project

Fosmid End Sequencing 8 HapMap Individuals

1695 structural variants

525 novel insertion sequences
(Kidd et al. 2008 453:56)
NAHR-non-allelic homologous
recombination
NHEJ-- repair of double strand breaks
VNTR-- strand slippage
Retrotransposition-- insertion of L1, SVA or
Alu element
Method 3: Whole Genome
Sequencing

Genome Resequencing Studies

SNPs: 3,2 M bases

Non-SNP: 9.1 M bases

22% events, 74% variant bases
(Levy et al Plos Biol 2007:e266)
 Read Depth, Mismapping Pairs

Future: Perfect Whole Genome Assembly
Summary of Human Genome Copy
Number Variation (12/2006)
Summary of recent analyses of structural variation in the human genome (12/06).
Reference
Mills, 2006
Hinds, 2006
McCarrol, 2006
Conrad, 2006
Tuzun, 2005
Redon, 2006
Iafrate, 2004
Sharp, 2006
Wong, 2006
Sebat, 2004
Redon, 2006
All Vars
All Vars > 1 kb
Analysis
# Individuals # Events Av. Bp Median (bp)
Align trace data
36
415434
20
2
Oligo arrayCGH
1000
1379
947
HapMap SNP genotyping
269
538
16874
6887
HapMap SNP genotyping
180*
609
34996
17217
Paired End-sequence
1
269
55706
25230
Affyx 500 K data
269
980
165996
63140
BAC Array-CGH
55**
246
146189
150395
BAC Array-CGH
47
124
170019
164704
BAC Array-CGH
105 1365***
185504
175314
ROMA-CGH
20
72
350670
199800
BAC Array-CGH
269
913
349880
227889
NA
NA
323573
1901
2
NA
NA
4131
148578
93356
Total Mbp
8.36
0.14
9.08
21.31
14.98
162.68
35.96
21.08
253.21
25.25
319.44
615.10
613.77
*- effectively independent individuals equal to number of trios
** - 39 healthy controls, 16 with karyotype abnormalities
*** - accounting for only those sites that showed in 2 or more individuals
20% of the human genome is CNV?
 3000+ genes with exons in these regions CNV?

(Currently 30% of genome and 9473 genes)
How many genes are truly CNV?

Lack of Breakpoint Precision?

BAC
BACs: 150-250 kb clones of which
only a part of the sequence may be CNV

False positives?

TP
Study#1
#2
#3
Multiple studies: Increase
the proportion of false
positives since true positives
tend to overlap
CNV gene
FP
Design of Custom oligonucleotide aCGH
•Equal number of probes per exon (exon size 3 bp – 10 Kb).
•Limitation: NimbleGen algorithm creates equally spaced
probes across a region.
1
2
3
Select genomic regions to target for probe design
Merge overlapping regions
Select oligonucleotide probe sequences (average 12/exon) and place on microarray
Bailey et al. Cytogenet Genome Res 2008
Detection Method
Exon
Structure
Exon
1
Exon 2
Exon 3
Exon 4
Exon 5
Probe Regions
Hybridization
Log2
probe
intensity
Mean
intensity
difference
-0.2
SD
+1.1 SD
+1.4 SD
+0.6 SD
+1.2 SD
-0.2
SD
+1.1 SD
+1.4 SD
+0.6 SD
+1.2 SD
Step #1:
Seed
Step #2:
Extension
4-exon Partial-gene CNV
Bailey et al. Cytogenet Genome Res 2008
CNV in RHD
25
Chr 1 (kb)
25,350
25,370
25,390
Gene Model
Exons
Probe Regions
GM12878
GM18517
GM18507
GM18956
GM19129
GM12156
GM18502
GM19240
GM18555
Segmental
Duplications
Bailey et al. Cytogenet Genome Res 2008
Detecting >500 bp and >5% freq
8,599 CNV regions: 3.7% of genome (112.7 Mb)
2 genomes: 1,098 CNVs 0.78% (24 Mb)
Conrad, et al. 2009 Nature
Causal CNVs
Conrad, et al. 2009 Nature
Infectious Disease Genetics
Human
Genome
Pathogen
Genome
Environment

Vector
Genome
Complex interplay that results in infectious disease
phenotype

Potential host defense responses and pathogen virulence are encode
in respective genomes.

SD and CNV represent key mechanisms for adaptation and
diversification of responses for both host and pathogen.

The study of SD and CNV is necessary to fully
understand the genetics and biology of infectious
disease pathogenesis.
Human CNV typing
and association studies

Comprehensive CNV Typing Chip (1st generation)

Collaboration with the Eichler Lab
 Preferentially targeting gene CNVs
(5,000 CNVs → 1000 genic regions → 30% host defense)
 Agilent and NimbleGen oligoarray platforms

Defining copy number responsive probes

Defining copy specific probes to remove crosshybridization

Case-control studies to examine infectious disease
and immune phenotypes for association with CNVs
Human Malaria


Malaria: 2-3 million deaths per year
“strongest known force for evolutionary selection in
the recent history of the human genome” (Kwitkowski
2005 Am J Hum Genet)


HbS, HbC, HbE, thalassemia, ABO, Duffy null, SE
Asian ovalocytosis, IL-4, CR1, HLA-DRB ...
Hypothesis: Strong selection will have impacted
CNVs

Testing case-control samples for CNV
associations with resistance to infection and
cerebral malaria.