* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PPT
Long non-coding RNA wikipedia , lookup
Non-coding RNA wikipedia , lookup
Gene expression programming wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Pathogenomics wikipedia , lookup
Genetic code wikipedia , lookup
DNA methylation wikipedia , lookup
Primary transcript wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene expression profiling wikipedia , lookup
Minimal genome wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Epigenetics in stem-cell differentiation wikipedia , lookup
Human genome wikipedia , lookup
History of genetic engineering wikipedia , lookup
Epigenomics wikipedia , lookup
Genomic library wikipedia , lookup
Koinophilia wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Genome (book) wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Designer baby wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Non-coding DNA wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Helitron (biology) wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Genome editing wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Genome evolution wikipedia , lookup
Microevolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Frameshift mutation wikipedia , lookup
http://www.youtube.com/watch?v=Co7dvbhtsJg Smoking Habits There are over 1 billion people in the world that smoke tobacco Of these 5-6 million will die on an annual basis This habit increases the likelihood of developing lung cancer to 20 times that of a non-smoker Gail Butler, Chris Scodeller, Julie Ward, & Lori Foster Outline Sequencing of SCLC cell line Somatic mutation Mutation signatures in NCI-H209 DNA repair pathways Genomic Rearrangement-specifically CHD7 Sequencing of a SCLC cell line Why use SCLC? Not surgically resected Cell line NCI-H209 Immortal cell line 55-year-old male with SCLC Smoking history not recorded Showed histologically typical small cells >97% of such tumors associated with tobacco smoking Taken before chemotherapy Sequencing: The SOLiD Platform Massively parallel next-generation sequencing Greater than 99.94% accuracy Relatively inexpensive Allows for: Whole genome sequencing Targeted resequencing Gene expression data Sample preparation Fragment library or mate pair libraries Libraries are sheared and adaptor molecules are ligated to each unique molecule Each molecule attached to a bead Amplified using emulsion PCR 3’ end modification Beads are covalently attached to a glass slide A universal sequencing primer, ligase, and a set of fluorescently labeled di-base probes are introduced Multiple cycles of ligation, detection, and cleavage performed After the template has been read, synthesized strand removed Primer attaches to template offset by 1 nucleotide Coverage Figure 1A Minimum 30x coverage Figure 1B 39x coverage for tumour 31x coverage for normal cell line Bioinformatics Identify somatically acquired mutations from sequence data 77 coding substitutions 333 random variants Indels difficult to detect Supplementary Fig.1 Somatically acquired genomic variants 22,910 somatically acquired (not inherited) mutations 70% intergenic 28% intronic 0.8% non-coding translated 0.6% coding Figure 1C Somatic mutations of NCI-H209 genome Deletions, insertions, heterozygous and homozygous substitutions, mis-sense, nonsense, and rearrangements Point mutations in coding regions RB1 C706F Point Mutation Nonconservative amino acid substitution Inhibits phosphorylation and abolishes protein function • TP53 Splice Site Disruption – TP53 encodes p53, a tumor suppressor • Combination of RB1 and TP53 characteristic of SCLC Non-synonymous vs. Synonymous Non-synonymous Codes for different amino acid Synonymous Amino acid produced not modified Accumulation of mutations increasing fitness will be shown as an excess of non-synonymous Observed ratio not different than that expected by chance Suggests that the majority of coding variants do not confer selective advantage Mutations in regulatory regions Little known about mutations occurring on either side of transcription start sites Supplementary Fig. 2A Find somatic substitutions within 2kb of known transcription start sites Apply hidden Markov models AI program that can be trained to find sequences Predict which substitutions might affect transcription factor binding sites Supplementary Fig. 2B Distribution observed no different than that those mutations seen in random “simulated sets” of mutations May still be mutations that alter transcription factor binding and affect gene regulation Example Supplementary Fig. 2C T>G in RAS oncogene family gene, RAB42 Disrupts potential binding motif Big picture of somatic mutations Data indicates that most of the mutations in the coding and promoter regions are passenger events Events that don’t contribute to the development of cancer, but have occurred during cancer growth Mutations confer no selective advantage to the cells Tobacco smoke contains more than 60 carcinogens which bind and chemically modify DNA. The carcinogen binds to the DNA forming a bulky adducts at purine bases (guanine and adenine). -Change the alpha helix -Allow non-Watson–Crick pairing -Get in the way Most Common Transversions G>T/C>A (34%) G>A/C>T (21%) A>G/T>C (19%) Top 3 transversions are all purines… This distribution of transversions is consistent with the literature Shows there is consistenency with mutational patterns. Control for in vivo mutation (34%) of total mutations G>T transversions occur more frequently at methylated CpG dinucleotides CpG Sites cytosine-phosphate- guanine (34%) of total mutations G>T transversions occur more frequently at methylated CpG dinucleotides In mammals, 70% to 80% of CpG are methylated CpG Island: High frequency of cytosine connected to guanine. 5’ 3’ 3’ 5’ CpG islands are regions that contain a high CpG content. They are in and near approximately 40% of promoters of mammalian genes. It’s getting complicated so lets recap: Most transversion mutations (34% of total) are G>T The G >T mutations happen often at CpG sites The G >T mutations which happen at CpG sites are often methylated CpG sites When looking at guanines in the genome, how often is the nucleotide preceding it a cytosine? This often in the genome, a C is expected to precede a G When looking at guanines in the genome, how often is the nucleotide preceding it a cytosine? This often in a G>T mutations, a C precedes the G Wait, what? -N-N-N-N-?-G-N-N-N-N-N-N-N-C-G-N-N-N-N-?-G>T-N-N-N-N-N-?5’ G-N-N-N3’ 3’ 5’ The expected fraction of CpG’s per Guanine in genomic DNA The fraction of G>Ts mutations on CpG’s per guanine in CpG islands. If everything was random, we would expect the G>T mutations to have an equal make up of CpG/G, as genomic CpG/G… …but that is not so! Wait, what? When looking at guanines in the genome, how often is the nucleotide preceding it a cytosine? This often in a G>T mutations, a C precedes the G When looking at guanines in the genome, how often is the nucleotide preceding it a cytosine? This often in a G>A mutation, a C precedes the G •Often occur outside CpG islands. •Unusually high fraction likely due to spontaneous deamination of methylated cytosine to thymine When looking at guanines in the genome, how often is the nucleotide preceding it a cytosine? This often in a G>C mutation, a C precedes the G •similar to G>T but these were significantly more likely to occur within CpG islands WHAT DOES THIS ALL MEAN? “Thus, the sequence context of the 23,000 mutations in the NCI-H209 genome provides tremendous power to identify multiple distinctive mutation signatures, not evident from targeted re-sequencing studies of limited genomic regions.” It’s getting complicated (still) so lets recap: Most transversion mutations (34% of total) are G>T The G >T mutations happen often at CpG sites The G >T mutations which happen at CpG sites are often methylated CpG sites. So how does the Methylation play into all this? Only 10–20% of CpG dinucleotides in CpG islands are methylated while 60–70% CpG sites are methylated outside the islands. This provides a model to see how methylation of CpG sites affects C>T mutations. In other words, lets compare the frequency of G>C mutations here and here to see how methylation effects mutation. Non CpG Island CpG Island 5’ 3’ 3’ 5’ 60-70 Percent Methylated 10-20 Percent Methylated Non CpG islands CpG islands Less CpG mutations in CpG islands than CpGs in non CpG islands. Non CpG Island 60-70 Percent Methylated More C>T Mutation CpG Island 10-20 Percent Methylated Less C>T Mutation 5’ 3’ Less G>C mutations in the islands…and there is less methylation in the islands….. …suggesting that C>T mutations preferentially occur at methylated CpGs 3’ 5’ Can’t we fix this??? Bulky adducts on purines are the most common source of DNA damage from tobacco carcinogens. These bulky adducts get in the way of the RNA polymerase. When the RNA polymerase stops, it recruits nucleotide excision repair machinery, leading to excision of the altered nucleotide, preventing mutation. The more expression, the more the repair. Mutation repair in non transcribed regions occurred less frequently than transcribed regions (good!). G>A mutations •Mutations occurred about equally on transcribed and non-transcribed strands •Mutations on both strands were significantly reduced in more highly expressed genes. A>G mutations •Transcribed strand mutations decreased with higher gene expression. •Non Transcribed mutations were relatively level. This suggests at least two separate DNA repair pathways Which suggests “distinct physicochemical effects on DNA structure, with variable recognition and excision by the genome surveillance machinery.” Genomic Rearrangements & Copy Number NCI-H209 genome has 58 somatic genome rearrangements • • • • • 18 deletions (31%) 9 tandem duplications (16%) 15 Inverted intrachromosomal rearrangements (26%) 9 non-inverted intrachromosomal rearrangements (16%) 7 interchromosomal rearrangements Figure 3. Rearrangements between chromosomes 1 & 4 Intrachromosomal inversions Non-inverted intrachromosomal rearrangements Interchromosomal rearrangements Not classical inversions: • Clear boundaries separating changes in copy number in genes on both chromosomes • Breakpoints between chromosomes aren't reciprocal • Unbalanced rearrangements Oncogenic Fusion Genes Oncogenic Fusion Gene: A hybrid gene formed from two genes previously separated Chromosomal rearrangements can result in an oncogenic fusion gene if: 2 genes side by side Intact ORF Genes in the same orientation NCI-H209 Fusion gene: 240 bp deletion on chromosome 16: • 1st 2 exons of CREBBP • 3' portion of BTBD12 RT-PCR showed expression of fusion transcript This wasn't expressed in 55 other SCLS Direct further studies here???? Figure 4. CHD7 significance CHD7 codes for a chromatin helicase DNA binding protein NCI-H209: • 39.5kb tandem duplication of exons 3-8 of CHD7 (Figure 4a &4c.) NCI-H2171: • Fusion gene of exons 1-3 of PVT1 (non-coding RNA gene immediately downstream of MYC) & exons 4-38 of CHD7 (Figure 4c.)-MYC amplification LU-135: • Fusion gene of exon 1 of PVT1 (non-coding RNA gene immediately downstream of MYC) & exons 14-38 of CHD7 (Figure 4c.) -MYC amplification This suggests that CHD7 rearrangements are a regular phenomenon in SCLC Figure 4. LU-135 LU-135 studied by mate pair sequencing showed: Fusion gene of exon 1 of PVT1 (non-coding RNA gene immediately downstream of MYC) & exons 14-38 of CHD7 CHD7 amplicon linked to MYC expression amplification • • MYC codes for a transcription factor that regulates expression of multiple genes Rearrangements resulted in increased expressivity in MYC & 3' end of CHD7 Figure 4. NCI-H2171 & LU-135 show elevated levels of expression SCLC in general have a greater normalized expression of CHD7 than non-SCLC & other tumor types CHD7 Summary • CHD7 rearrangements found in 3 SCLC cell lines • LU-135 & NCI-H2171: have PVTI-CHD7 fusion genes + MYC amplification • PVTI downstream of MYC & may be a transcriptional target of the MYC protein • Insertion of CHD7 with subsequent amplification results in increased gene copy number & regulatory elements • OVEREXPRESSION • NCI-H209: duplication of parts of the CHD7 gene • CHD7 is a chromatin remodeller that promotes enhancer-mediated transcription through histone methylation • Histone modifiers have been implicated as cancer genes previously Rearrangements of CHD7 would make for an interesting extension of this paper Summary Each mutation due to the carcinogen affect causes consequences in three processes: Chemical modification of a purine Failure to repair via surveillance pathways Incorrect nucleotide incorporation due to base distortion during DNA replication Summary Transcription-coupled repair Stall RNA polymerase observed with NCI-H209 A>G mutations Expression-linked repair More effective in highly transcribed regions G>A mutations Combined G>T and A>T mutations After Thought Lung cancer develops after 50 pack years of smoking 7,300 cigarettes a year On average you acquire one mutation for every 15 cigarettes smoked Questions?