Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Genome Sciences Centre BC Cancer Agency, Vancouver, BC, Canada ALEXA-Seq analysis reveals breast cell type specific mRNA isoforms www.AlexaPlatform.org Malachi Griffith 30 Sept. 2010 1 In most genes, transcript diversity is generated by alternative expression Gene expression Types of alternative expression 2 Transcript variation is important to the study of human disease • Alternative expression generates multiple distinct transcript variants from most human loci • Specific transcript variants may represent useful therapeutic targets or diagnostic markers (Venables, 2006) 3 Massively parallel RNA sequencing Tissues/Cell Lines Luminal Isolate RNAs Generate cDNA, fragment, size select, add linkers Myoepithelial vHMECs hESCs Sequence ends Map to genome, transcriptome, and predicted exon junctions Discover isoforms and measure abundance 263 million paired reads 21 billion bases of sequence 4 Pipeline overview 5 What is an ALEXA-Seq sequence ‘feature’ Summary of features for human: ~4 million total (14% ‘known’) 37k Genes 62k Transcripts 278k exons 2,210k exon junctions 407k alternative exon boundaries 560k intron regions 227k intergenic regions 6 Data analyzed to date • ALEXA-Seq processing: 19 projects – REMC + 18 others • 105 libraries (200+ lanes) • 3.9 billion paired-end reads • 36-mers to 75-mers 7 Output • Expression, differential expression and alternative expression values for 3.8 million features for each library processed • Library quality analysis • Number of features expressed (above background) – Genes, transcripts, exon regions, junctions, etc. • Differential gene expression – Ranked lists • Alternative expression – Ranked lists – Alternative isoforms involving exon skipping, alternative transcript initiation sites, etc. – Known or predicted novel isoforms • Candidate peptides – Ranked lists 8 ALEXA-Seq data browser (using REMC analysis as an example) • Goals – Visualization, interpretation, design of validation experiments, distribute results to internal/external collaborators • What kinds of questions does ALEXA-Seq allow us to ask/answer? • http://www.alexaplatform.org/alexa_seq/Breast/Summary.htm 9 Is the RNA-Seq library suitable for alternative expression analysis? • • • • • • • Library summary Read quality Tag redundancy End bias Mapping rates Signal-to-noise hnRNA & gDNA contamination • Features detected 10 Is my favorite gene expressed? alternatively expressed? 11 What are the most highly expressed genes, exons, etc. in each library? • • • • • Expression Differential expression Alternative expression Provided for each feature type (gene, exon, junction, etc.) Ranked lists of events 12 e.g. most highly expressed genes 13 What are the top DE and AE genes for each tissue comparison? • • • • Candidate genes Each comparison DE or AE events Gains or Losses 14 Summary page for vHMECs vs. Luminal 15 Candidate features gained in vHMECs vHMECs vs. Luminal CD10 16 Which exons/junctions and corresponding peptides might be suitable for antibody design? 17 Candidate peptides gained in vHMECs vHMECs vs. Luminal 18 Example housekeeping gene (Actin; no change) 19 CD10 (used to sort myoepithelial cells) Myoepithelial & vHMECs Luminal 422-fold higher in Myoepithelial than Luminal 20 CD227 (used to sort luminal epithelial cells) CD227 Luminal Myoepithelial CD227 21 Differential gene expression of CASP14 (Caspase 14 gained in vHMECs) 22 Novel skipping of PTEN exon 6 23 Exon 12 skipping of DDX5 (p68) 24 Tissue specific isoforms of CA12 Myoepithelial vHMECs Luminal 25 Alternative first exons of INPP4B 26 Alternative first exons of SERPINB7 27 FERM domain containing proteins are alternatively expressed * * (FRM6, FRM4A, FRMD4B are AE) (FRMD3, FRMD8 are DE) 28 Novel isoforms observed only in vHMECs E6-E10 E7-E10 29 How reliable are predictions from ALEXA-Seq? • Are novel junctions real? – What proportion validate by RT-PCR and Sanger sequencing? • Are differential/alternative expression changes observed between tissues accurate? – How well do DE values correlate with qPCR? • To answer these questions we performed ~400 validations of ALEXA-Seq predictions from a comparison of two cell lines… 30 Validation (qualitative) 33 of 189 assays shown. Overall validation rate = 85% 31 Validation (quantitative) qPCR of 192 exons identified as alternatively expressed by ALEXA-Seq Validation rate = 88% 32 Conclusions • ALEXA-Seq approach provides comprehensive global transcriptome profile – Input: paired-end RNA sequence data – Output: expression, differential expression, alternative expression, candidate peptides, etc. • Detection of both known and novel isoforms – Subset that differ between conditions • Predictions are highly accurate – 86% validation rate by RT-PCR, qPCR and Sanger sequencing • www.AlexaPlatform.org 33 Acknowledgements Griffith M, Griffith OL, Morin RD, Tang MJ, Pugh TJ, Ally A, Asano JK, Chan SY, Li I, McDonald H, Teague K, Zhao Y, Zeng T, Delaney AD, Hirst M, Morin GB, Jones SJM, Tai IT, Marra MA. Alternative expression analysis by RNA sequencing. In review (Nature Methods). Supervisor Marco Marra Committee Joseph Connors Stephane Flibotte Steve Jones Gregg Morin Bioinformatics Obi Griffith Ryan Morin Rodrigo Goya Allen Delaney Gordon Robertson Richard Corbett Sequencing Martin Hirst Thomas Zeng Yongjun Zhao Helen McDonald Laboratory Trevor Pugh Tesa Severson Neuroblastoma Olena Morozova Marco Marra Morgen Pamela Hoodless Jacquie Schein Inanc Birol Gordon Robertson Shaun Jackman 5-FU resistance Michelle Tang Isabella Tai Marco Marra Iressa and Sutent Obi Griffith Steven Jones Multiple Myeloma Rodrigo Goya Marco Marra Lymphoma Ryan Morin Marco Marra 34 35