Download PPT

Document related concepts

Long non-coding RNA wikipedia , lookup

Non-coding RNA wikipedia , lookup

Gene expression programming wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Pathogenomics wikipedia , lookup

Genetic code wikipedia , lookup

DNA methylation wikipedia , lookup

Primary transcript wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene expression profiling wikipedia , lookup

Minimal genome wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Epigenetics in stem-cell differentiation wikipedia , lookup

Human genome wikipedia , lookup

History of genetic engineering wikipedia , lookup

Epigenomics wikipedia , lookup

Genomic library wikipedia , lookup

Koinophilia wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

NEDD9 wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Genome (book) wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Genomics wikipedia , lookup

Designer baby wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Non-coding DNA wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Mutagen wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Genome editing wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Epistasis wikipedia , lookup

Genome evolution wikipedia , lookup

Microevolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Frameshift mutation wikipedia , lookup

Oncogenomics wikipedia , lookup

Mutation wikipedia , lookup

Point mutation wikipedia , lookup

Transcript
http://www.youtube.com/watch?v=Co7dvbhtsJg
Smoking Habits
 There are over 1 billion
people in the world that
smoke tobacco
 Of these 5-6 million
will die on an annual
basis
 This habit increases the
likelihood of developing
lung cancer to 20 times
that of a non-smoker
Gail Butler, Chris Scodeller, Julie Ward, & Lori Foster
Outline
 Sequencing of SCLC cell line
 Somatic mutation
 Mutation signatures in NCI-H209
 DNA repair pathways
 Genomic Rearrangement-specifically CHD7
Sequencing of a SCLC cell line
 Why use SCLC?
 Not surgically resected
 Cell line
 NCI-H209
 Immortal cell line
 55-year-old male with SCLC
 Smoking history not recorded
 Showed histologically typical small cells
 >97% of such tumors associated with tobacco smoking
 Taken before chemotherapy
Sequencing: The SOLiD Platform
 Massively parallel next-generation sequencing
 Greater than 99.94% accuracy
 Relatively inexpensive
 Allows for:
 Whole genome sequencing
 Targeted resequencing
 Gene expression data
 Sample preparation
 Fragment library or mate pair libraries
 Libraries are sheared and adaptor molecules are ligated to
each unique molecule
 Each molecule attached to a bead
 Amplified using emulsion PCR
 3’ end modification
 Beads are covalently attached to a glass slide
 A universal sequencing primer, ligase, and a set of
fluorescently labeled di-base probes are introduced
 Multiple cycles of ligation, detection, and cleavage
performed
 After the template has been read, synthesized strand
removed
 Primer attaches to template offset by 1 nucleotide
Coverage
 Figure 1A
 Minimum 30x coverage
 Figure 1B
 39x coverage for tumour
 31x coverage for normal
cell line
Bioinformatics
 Identify somatically
acquired mutations
from sequence data
 77 coding
substitutions
 333 random variants
 Indels difficult to
detect
Supplementary Fig.1
Somatically acquired genomic variants
 22,910 somatically
acquired (not inherited)
mutations
 70% intergenic
 28% intronic
 0.8% non-coding
translated
 0.6% coding
 Figure 1C
 Somatic mutations of
NCI-H209 genome
 Deletions, insertions,
heterozygous and
homozygous
substitutions, mis-sense,
nonsense, and
rearrangements
Point mutations in coding regions
 RB1 C706F Point Mutation
 Nonconservative amino acid
substitution
 Inhibits phosphorylation and abolishes
protein function
• TP53 Splice Site Disruption
– TP53 encodes p53, a tumor suppressor
• Combination of RB1 and TP53 characteristic of
SCLC
Non-synonymous vs.
Synonymous
 Non-synonymous
 Codes for different amino acid
 Synonymous
 Amino acid produced not modified
 Accumulation of mutations increasing fitness
will be shown as an excess of non-synonymous
 Observed ratio not different than that
expected by chance
 Suggests that the majority of coding variants do
not confer selective advantage
Mutations in regulatory regions
 Little known about mutations occurring on either side of
transcription start sites
 Supplementary Fig. 2A
 Find somatic substitutions within 2kb of known transcription
start sites
 Apply hidden Markov models
 AI program that can be trained to find sequences
 Predict which substitutions might affect transcription factor
binding sites
 Supplementary Fig. 2B
 Distribution observed no different than that those mutations
seen in random “simulated sets” of mutations
 May still be mutations that alter transcription
factor binding and affect gene regulation
 Example Supplementary Fig. 2C
 T>G in RAS oncogene family gene, RAB42
 Disrupts potential binding motif
Big picture of somatic mutations
 Data indicates that most of the mutations in the
coding and promoter regions are passenger
events
 Events that don’t contribute to the
development of cancer, but have occurred
during cancer growth
 Mutations confer no selective advantage to the
cells
Tobacco smoke
contains more than 60
carcinogens which
bind and chemically
modify DNA.
The carcinogen binds to the DNA forming a bulky
adducts at purine bases (guanine and adenine).
-Change the alpha helix
-Allow non-Watson–Crick
pairing
-Get in the way
Most Common
Transversions
G>T/C>A (34%)
G>A/C>T (21%)
A>G/T>C (19%)
Top 3
transversions are
all purines…
 This distribution of transversions is consistent
with the literature
 Shows there is consistenency with mutational
patterns.
 Control for in vivo mutation
(34%) of total mutations
 G>T transversions occur
more frequently at
methylated CpG
dinucleotides
CpG Sites
cytosine-phosphate- guanine
(34%) of total mutations
 G>T transversions occur more
frequently at methylated CpG
dinucleotides
 In mammals, 70% to 80% of
CpG are methylated
CpG Island: High frequency of cytosine connected to guanine.
5’
3’
3’
5’
 CpG islands are regions that contain a high CpG content.
 They are in and near approximately 40% of promoters of
mammalian genes.
It’s getting complicated so lets recap:
Most transversion mutations (34% of total) are G>T
The G >T mutations happen often at CpG sites
The G >T mutations which happen at CpG sites are
often methylated CpG sites
When looking at guanines in the genome, how
often is the nucleotide preceding it a cytosine?
This often in the genome, a C is expected to precede a G
When looking at guanines in the genome, how
often is the nucleotide preceding it a cytosine?
This often in a G>T mutations, a C precedes the G
Wait, what?
-N-N-N-N-?-G-N-N-N-N-N-N-N-C-G-N-N-N-N-?-G>T-N-N-N-N-N-?5’
G-N-N-N3’
3’
5’
The expected fraction of CpG’s per
Guanine in genomic DNA
The fraction of G>Ts mutations on
CpG’s per guanine in CpG islands.
If everything was random, we
would expect the G>T mutations to
have an equal make up of CpG/G, as
genomic CpG/G…
…but that is not so!
Wait, what?
When looking at guanines in the genome, how
often is the nucleotide preceding it a cytosine?
This often in a G>T mutations, a C precedes the G
When looking at guanines in the genome, how
often is the nucleotide preceding it a cytosine?
This often in a G>A mutation, a C precedes the G
•Often occur outside CpG islands.
•Unusually high fraction likely due to spontaneous deamination of
methylated cytosine to thymine
When looking at guanines in the genome, how often is
the nucleotide preceding it a cytosine?
This often in a G>C mutation, a C precedes the G
•similar to G>T but these were significantly more likely to occur within CpG
islands
WHAT DOES THIS ALL MEAN?
“Thus, the sequence context of the 23,000 mutations in the
NCI-H209 genome provides tremendous power to identify
multiple distinctive mutation signatures, not evident from
targeted re-sequencing studies of limited genomic regions.”
It’s getting complicated (still) so lets recap:
Most transversion mutations (34% of total) are G>T
The G >T mutations happen often at CpG sites
The G >T mutations which happen at CpG sites are
often methylated CpG sites.
So how does the Methylation play into all
this?
 Only 10–20% of CpG dinucleotides in CpG islands are
methylated while 60–70% CpG sites are methylated
outside the islands.
 This provides a model to see how methylation of CpG
sites affects C>T mutations.
In other words, lets compare the frequency of G>C
mutations here and here to see how methylation
effects mutation.
Non CpG Island
CpG Island
5’
3’
3’
5’
60-70 Percent
Methylated
10-20 Percent
Methylated
Non CpG islands
CpG islands
Less CpG mutations in CpG islands than CpGs in non CpG
islands.
Non CpG Island
60-70 Percent Methylated
More C>T Mutation
CpG Island
10-20 Percent Methylated
Less C>T Mutation
5’
3’
Less G>C mutations in the islands…and there is less
methylation in the islands…..
…suggesting that C>T mutations preferentially occur at
methylated CpGs
3’
5’
Can’t we fix this???
 Bulky adducts on purines are the most common source of DNA
damage from tobacco carcinogens.
 These bulky adducts get in the way of the RNA polymerase.
 When the RNA polymerase stops, it recruits nucleotide excision
repair machinery, leading to excision of the altered nucleotide,
preventing mutation.
 The more expression,
the more the repair.
 Mutation repair in non
transcribed regions
occurred less frequently
than transcribed
regions (good!).
G>A mutations
•Mutations occurred about
equally on transcribed and
non-transcribed strands
•Mutations on both
strands were significantly
reduced in more highly
expressed genes.
A>G mutations
•Transcribed strand
mutations decreased with
higher gene expression.
•Non Transcribed
mutations were relatively
level.
 This suggests at least two separate DNA repair pathways
 Which suggests “distinct physicochemical effects on DNA
structure, with variable recognition and excision by the
genome surveillance machinery.”
Genomic Rearrangements & Copy
Number
NCI-H209 genome has 58 somatic genome rearrangements
•
•
•
•
•
18 deletions (31%)
9 tandem duplications (16%)
15 Inverted intrachromosomal rearrangements (26%)
9 non-inverted intrachromosomal rearrangements (16%)
7 interchromosomal rearrangements
Figure 3.
Rearrangements between
chromosomes 1 & 4
Intrachromosomal inversions
Non-inverted intrachromosomal
rearrangements
Interchromosomal rearrangements
Not classical inversions:
• Clear boundaries separating
changes in copy number in
genes on both chromosomes
• Breakpoints between
chromosomes aren't reciprocal
• Unbalanced rearrangements
Oncogenic Fusion Genes
Oncogenic Fusion Gene: A hybrid gene formed from two genes
previously separated
Chromosomal rearrangements can result in an oncogenic fusion gene
if:
 2 genes side by side
 Intact ORF
 Genes in the same orientation
NCI-H209
Fusion gene: 240 bp deletion on chromosome 16:
• 1st 2 exons of CREBBP
• 3' portion of BTBD12
RT-PCR showed expression of fusion transcript
This wasn't expressed in 55 other SCLS
Direct further studies here????
Figure 4.
CHD7 significance
CHD7 codes for a chromatin
helicase DNA binding protein
NCI-H209:
• 39.5kb tandem duplication of
exons 3-8 of CHD7 (Figure 4a &4c.)
NCI-H2171:
• Fusion gene of exons 1-3 of PVT1 (non-coding
RNA gene immediately downstream of MYC) &
exons 4-38 of CHD7 (Figure 4c.)-MYC
amplification
LU-135:
• Fusion gene of exon 1 of PVT1 (non-coding RNA
gene immediately downstream of MYC) & exons
14-38 of CHD7 (Figure 4c.) -MYC amplification
This suggests that CHD7 rearrangements are a
regular phenomenon in SCLC
Figure 4.
LU-135
LU-135 studied by mate pair sequencing showed:
Fusion gene of exon 1 of PVT1 (non-coding RNA gene immediately
downstream of MYC) & exons 14-38 of CHD7
CHD7 amplicon linked to MYC expression amplification
•
•
MYC codes for a transcription factor that regulates expression of multiple genes
Rearrangements resulted in increased expressivity in MYC & 3' end of CHD7
Figure 4.
NCI-H2171 & LU-135 show elevated levels of
expression
SCLC in general have a greater normalized
expression of CHD7 than non-SCLC & other tumor
types
CHD7 Summary
• CHD7 rearrangements found in 3 SCLC cell lines
•
LU-135 & NCI-H2171: have PVTI-CHD7 fusion genes + MYC
amplification
• PVTI downstream of MYC & may be a transcriptional target of the
MYC protein
• Insertion of CHD7 with subsequent amplification results in
increased gene copy number & regulatory elements
• OVEREXPRESSION
• NCI-H209: duplication of parts of the CHD7 gene
• CHD7 is a chromatin remodeller that promotes enhancer-mediated
transcription through histone methylation
• Histone modifiers have been implicated as cancer genes
previously

 Rearrangements of CHD7 would make for an
 interesting extension of this paper
Summary
 Each mutation due to the carcinogen affect causes
consequences in three processes:
 Chemical modification of a purine
 Failure to repair via surveillance pathways
 Incorrect nucleotide incorporation due to base distortion during
DNA replication
Summary
 Transcription-coupled repair
 Stall RNA polymerase observed with NCI-H209
 A>G mutations
 Expression-linked repair
 More effective in highly transcribed regions
 G>A mutations
 Combined
 G>T and A>T mutations
After Thought
 Lung cancer develops after 50 pack years of
smoking
 7,300 cigarettes a year
 On average you acquire one mutation for
every 15 cigarettes smoked
Questions?