Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
By Patrick Brennock and Kylee Katsumata What are Copy Number Variations (CNVs)? ● DNA segments that are present in different numbers than normal. Due to deletions, insertions, duplications, ● Arise from errors in HR, segmental duplication, DNA damage and duplication of microsatellites. ● When in genes, can result in change in gene dosage. ● Not all genomes are the same! ● Ex: AMY1, Salivary Amylase Why do we care about CNVs? ● Potentially disease-causing o o Ex. Prader-willi/Angelman Cancer ● Need to better understand the human genome (How much variation exists? What is the “true” genome?) Where are CNVs? ● Previous studies found many CNVs, but an exhaustive list did not exist. ● Objectives: o o o o To create a map of CNVs and CNVRs, to determine the general distribution of them throughout the genome, to determine which types are most prevalent, and to determine the inheritance and population differentiation patterns. Defining CNVs and CNVRs Figure 3 | Defining CNVRs, CNVs and CNV ends. ● One CNV = a region within an individual that has been duplicated or deleted, giving a different copy number. ● One Copy Number Variable Region (CNVR) = all overlapping CNVs at a particular location. Microarray Hybridization Reference Genome n=2 Measure Fluorescence Fluorescence = 2x Measure Fluorescence Fluorescence = 3x Measure Fluorescence Fluorescence = 1x Genome with duplication n=3 Genome with deletion n=1 Comparing Fluorescence of test genome to reference genome ● log2 (Sample genome fluorescence/ ________________ ____reference genome fluorescence) ● Ex: log2 (4x/2x) = 1, twice as much fluorescence in test genome ● Ex:log2 (1x/2x) = -1, half as much fluorescence in test genome ● Put simply, a positive log2 value suggests a genomic duplication, and a negative log2 value suggests a genomic deletion Figure 1 | Protocol outline for two CNV detection platforms. ● Measured HapMap cell line genomes (~150 individuals, some parent-offspring trios, from different populations) Figure 4 | Genomic distribution of CNVRs. ● 1447 discrete CNVRs were found, 66% were validated or found in previous studies. ~ 50 CNVs detected per person, depending on platform used ● 12% of the genome are CNVRs, are susceptible to CNV ● Relatively evenly distributed throughout the genome ● Found that the 500K Affymetrix platform could detect smaller CNVs, whereas the WGTP platform could detect CNVs better in duplicated genomic regions Figure 2 Heritability of five CNVs in four HapMap trios. Figure 5 | Classes of CNVs. ● Assumed the rarer allele was the mutant CNV one ● Left = example CNV. Right = total # of each CNV that was found by each platform ● 500K EA platform had a higher resolution and thus was better able to map exact breakpoints of CNVRs, better able to tell which sequences they overlapped ● On 500K EA platform, found that CNVs are statistically significantly absent from protein coding genes and ultra-conserved elements. ● Also found that deletions are especially biased away from genes ● Still found thousands of genes that are flank or fall within CNVs ○ Plurality of these genes, based on gene ontology, were cell adhesion , nuerophysiological, and sensory perception of smell Why Study CNVs? Medical Relevance ● ~14.5% of genes in the OMIM morbid map had CNVs; some landed in regions vital for Mendelian and complex diseases. ● Examples: o o Genetic: Angelman syndromes Complex: Schizophrenia, psoriasis, cataracts Why Study CNVs? Medical Relevance ● Difficult to connect diseases to their respective genotype in complex CNVRs. o Example: One region about ~1Mb in size in the 1q21.1 gene can lead to: Congential heart defects Lens abnormalities Mental retardation ● The difference? Copy number variation. How They Studied CNVs Single Nucleotide Polymorphisms How about comparing them to SNPs? ● At the time, known to be important to human diversity. ● First genome-wide association study (GWAS) in 2005 o Deletions and duplications lead to disease. How They Studied CNVs SNPs ● Essentially extended what GWAS did. o o Can CNVs be found with GWAS results? Can we find CNVs with SNPs markers? ● Used SNPs from the HapMap Phase 1 database as markers to hopefully find CNVs. o o Measured linkage disequilibrium with r2. High linkage disequilibrium = found together a lot. If a CNV was close enough to the SNP marker, it was “tagged” by the marker. (r2 > 0.8) ● After looking at three populations from the database (European, African, or East Asian)... How They Studied CNVs SNPs Figure 6.a | Patterns of linkage disequilibrium between CNVs and SNPs. ● (r2) ○ Pairwise Linkage Disequilibrium ○ “Tagged” (r2) > 0.8 You can’t find CNVs with GWAS methods meant for SNPs. How They Studied CNVs Lower LD for CNVs ● They considered transposons and frequent mutations/reversions, but tests proved these unlikely. ● Ultimately ended with CNVs prefer areas with low SNP counts. CNVs frequent areas of high dynamicism (tandem repeats cross over unequally) o The database only had SNPs in stable regions. o Therefore, the low linkage disequilibrium is because of a lack of SNP coverage in the genome. o Can SNPs predict the number of copies in CNVs? ● R2 values ○ 0 = No correlation ○ 1 = strong correlation; very predictive. ● According to WGTP, the more accurate of the two arrays, they can’t. Figure 6.b | Patterns of linkage disequilibrium between CNVs and SNPs. CNVs are useful! Population genetics ● Population clusterings for 67 biallelic CNVs in 210 individuals (dots). ● If CNV genotypes could not predict ancestry, these dots would be all over. ● Clusters are clear - CNVs are similar at corners and can predict ancestry. Figure 7 | Population clustering from CNV genotypes. CNVs are useful! Population genetics Lighter color = WGTP results Darker color = 500K EA results Vst: 0-1; 1 being population specific. Figure 8 | Population differentiation for copy number variation. CNVs can be compared in different populations and be highly specific to one population or another. In the end... ● Bottom line? This paper is essentially a starting point. o o o CNVs are common CNVs are worth studying CNVs are in need of techniques to find their purpose in disease. ● They believed there will be advances in… o o o … discovering CNV’s hand in diseases … new techniques to identifying disease-causing CNVs … mapping CNVs, < 1kb or not, with fewer pieces of technology. Problems ● One Genome to Rule Them All? o Redon et al. heavily relied on a reference genome. However, there is no one true Genome that could fairly represent human genome CNV. ● Not all CNVs were found! o o o Limitations of the technology of the time. And the cost! The techniques used were inaccurate. Algorithms were adjusted so 5% of possible CNVs found were false positives. False negatives too. CNVs < 1kb? Updates? A lot has changed since 2006. ● There have been many, many CNV studies. ● Diseases o Crohn’s disease, rheumatoid arthritis, and Type 1 and 2 diabetes. ● More techniques have been established to find CNVs < 1kb. Further Readings? DISEASE, DISEASE EVERYWHERE? ● Craddock, N., et al. 2010. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature. 464: 713-720. CNVs and new techniques! ● Zhao, M., Wang, Q., Wang, Q., Jia, P., & Zhao, Z. 2013. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: Features and perspectives. BMC Bioinformatics. 14: S1S1. History and some more articles; by one of the original authors of the paper. ● Scherer, Stephen W. "Proof of Extensive Copy Number Variation in The Human Genome."