Download Analyzing Copy Number Variation in the Human Genome

Analyzing Copy Number Variation in the Human Genome Jeff Bailey S5-432 Continuum of Genomic Variation Forms Single of genetic variation. base-pair changes Nucleotide  Point mutations (1 per 800 bp) Small insertions/deletions  Large-scale Deletions Segmental Duplications Local Rearangements   Chromosomal variation Cytogenetics  Translocation, inversion, fusion Structural Variants (SV) Retroelement insertions (300bp -10 kb) Large-scale genomic copy number variation (>10 kb)  Copy Number Variation Frameshift, microsatellite, minisatellite Mobile elements  METHOD 1: Copy Number Variation: Array Comparative Genomic Hybridization  Two genomic surveys of normal individuals identified 76 and 255 CNV regions by array CGH ( Sebat et al. Science 2004; Iafrate et al. Nat Genet 2004) Gain Gain >green >red (blue line) Loss Modified:Feuk et al. Nat Rev Genet 2006  30% CNVs overlap duplicated regions (variant SD = CNV) ( Sebat et al. Science 2004) Segmental Duplications (SD) 5.4% of the genome (>90% identity and >1 kb) chr22 Properties: •Clustered •Complex regions 99.1% identical over 180 kb (VCF/DiGeorge Syndrome in 1 in 3000 births) Bailey and Eichler (2006) Nat Rev Genet SDs predispose to copy number variation I D Cen D’ Tel I D Cen D’ Non-allelic Homologous Recombination (Lupski, 1999) Cen I D D’- D I D’ Tel GAMETES Cen D - D’ Tel Change in Dosage Sensitive Genes → phenotype or disease Dynamic Regions – predisposed to further rearrangements Complex disease associations 1) Recurrent germline rearrangements causing congenital disease 2) Rare CNVs causing disease in a small proportion of affected individuals in a Mendelian fashion 3) Common CNVs that are responsible for a proportion of complex genetic risk in many individuals CNV Disease Association CCL3L1 Decreased copies cause HIV/AIDS susceptibility (Gonzalez et al. 2005). Increased copies increase risk of rheumatoid arthritis.(Mckinney et al. 2008) FCGR3B Decreased copies increases risk for lupus nephritis (Aitman et al. 2006) APP UGTB17 Synuclein DEFB4 LCE3B & LCE3C Duplication leading to (Rovelet,Lecrux et al. 2006) Deletion associated with 2-fold increased risk of osteoporosis (Yang et al. 2008) Triplication causes Parkinson Disease (Singleton et al 2003) More than 5 copies of beta-defensins associated with 1.7-fold increased risk of psoriasis (Hollox et al. 2008). Less than 4 copies is associated with 3-fold increased risk for Crohn disease.(Fellermann et al. 2006) Multigene deletion of late cornified envelope genes are associated with psoriasis (de Cid, et al. 2009) Method 2: End-Sequence Pair (ESP) Analysis    fosmid ~1.1 million fosmid end-sequence pairs derived from a single donor (sequenced by MIT to help close gaps in the reference genome) Fosmid insert size tightly distributed around mean (40 kb) insert < 32 kb >48 kb Putative Insertion within fosmid Putative Deletion within fosmid Compare fosmid optimal placements to detect deviations from expected. Fosmid: Concordant Insertion Deletion Inversions Reference Genome Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage) 639,204 fosmid pairs BEST pairs (8.8X genome coverage) Results: Tuzun*, Bailey*, Sharp* et al. Nat. Genet 2005 Fosmid SV Project  Fosmid End Sequencing 8 HapMap Individuals  1695 structural variants  525 novel insertion sequences (Kidd et al. 2008 453:56) NAHR-non-allelic homologous recombination NHEJ-- repair of double strand breaks VNTR-- strand slippage Retrotransposition-- insertion of L1, SVA or Alu element Method 3: Whole Genome Sequencing  Genome Resequencing Studies  SNPs: 3,2 M bases  Non-SNP: 9.1 M bases  22% events, 74% variant bases (Levy et al Plos Biol 2007:e266)  Read Depth, Mismapping Pairs  Future: Perfect Whole Genome Assembly Summary of Human Genome Copy Number Variation (12/2006) Summary of recent analyses of structural variation in the human genome (12/06). Reference Mills, 2006 Hinds, 2006 McCarrol, 2006 Conrad, 2006 Tuzun, 2005 Redon, 2006 Iafrate, 2004 Sharp, 2006 Wong, 2006 Sebat, 2004 Redon, 2006 All Vars All Vars > 1 kb Analysis # Individuals # Events Av. Bp Median (bp) Align trace data 36 415434 20 2 Oligo arrayCGH 1000 1379 947 HapMap SNP genotyping 269 538 16874 6887 HapMap SNP genotyping 180* 609 34996 17217 Paired End-sequence 1 269 55706 25230 Affyx 500 K data 269 980 165996 63140 BAC Array-CGH 55** 246 146189 150395 BAC Array-CGH 47 124 170019 164704 BAC Array-CGH 105 1365*** 185504 175314 ROMA-CGH 20 72 350670 199800 BAC Array-CGH 269 913 349880 227889 NA NA 323573 1901 2 NA NA 4131 148578 93356 Total Mbp 8.36 0.14 9.08 21.31 14.98 162.68 35.96 21.08 253.21 25.25 319.44 615.10 613.77 *- effectively independent individuals equal to number of trios ** - 39 healthy controls, 16 with karyotype abnormalities *** - accounting for only those sites that showed in 2 or more individuals 20% of the human genome is CNV?  3000+ genes with exons in these regions CNV?  (Currently 30% of genome and 9473 genes) How many genes are truly CNV?  Lack of Breakpoint Precision?  BAC BACs: 150-250 kb clones of which only a part of the sequence may be CNV  False positives?  TP Study#1 #2 #3 Multiple studies: Increase the proportion of false positives since true positives tend to overlap CNV gene FP Design of Custom oligonucleotide aCGH •Equal number of probes per exon (exon size 3 bp – 10 Kb). •Limitation: NimbleGen algorithm creates equally spaced probes across a region. 1 2 3 Select genomic regions to target for probe design Merge overlapping regions Select oligonucleotide probe sequences (average 12/exon) and place on microarray Bailey et al. Cytogenet Genome Res 2008 Detection Method Exon Structure Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Probe Regions Hybridization Log2 probe intensity Mean intensity difference -0.2 SD +1.1 SD +1.4 SD +0.6 SD +1.2 SD -0.2 SD +1.1 SD +1.4 SD +0.6 SD +1.2 SD Step #1: Seed Step #2: Extension 4-exon Partial-gene CNV Bailey et al. Cytogenet Genome Res 2008 CNV in RHD 25 Chr 1 (kb) 25,350 25,370 25,390 Gene Model Exons Probe Regions GM12878 GM18517 GM18507 GM18956 GM19129 GM12156 GM18502 GM19240 GM18555 Segmental Duplications Bailey et al. Cytogenet Genome Res 2008 Detecting >500 bp and >5% freq 8,599 CNV regions: 3.7% of genome (112.7 Mb) 2 genomes: 1,098 CNVs 0.78% (24 Mb) Conrad, et al. 2009 Nature Causal CNVs Conrad, et al. 2009 Nature Infectious Disease Genetics Human Genome Pathogen Genome Environment  Vector Genome Complex interplay that results in infectious disease phenotype  Potential host defense responses and pathogen virulence are encode in respective genomes.  SD and CNV represent key mechanisms for adaptation and diversification of responses for both host and pathogen.  The study of SD and CNV is necessary to fully understand the genetics and biology of infectious disease pathogenesis. Human CNV typing and association studies  Comprehensive CNV Typing Chip (1st generation)  Collaboration with the Eichler Lab  Preferentially targeting gene CNVs (5,000 CNVs → 1000 genic regions → 30% host defense)  Agilent and NimbleGen oligoarray platforms  Defining copy number responsive probes  Defining copy specific probes to remove crosshybridization  Case-control studies to examine infectious disease and immune phenotypes for association with CNVs Human Malaria   Malaria: 2-3 million deaths per year “strongest known force for evolutionary selection in the recent history of the human genome” (Kwitkowski 2005 Am J Hum Genet)   HbS, HbC, HbE, thalassemia, ABO, Duffy null, SE Asian ovalocytosis, IL-4, CR1, HLA-DRB ... Hypothesis: Strong selection will have impacted CNVs  Testing case-control samples for CNV associations with resistance to infection and cerebral malaria.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Analyzing Copy Number Variation in the Human Genome