* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download lncRNA in
Genomic library wikipedia , lookup
Metalloprotein wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Genetic code wikipedia , lookup
Exome sequencing wikipedia , lookup
Magnesium transporter wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Community fingerprinting wikipedia , lookup
Expression vector wikipedia , lookup
Point mutation wikipedia , lookup
Western blot wikipedia , lookup
Interactome wikipedia , lookup
Structural alignment wikipedia , lookup
Proteolysis wikipedia , lookup
Protein purification wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Gene expression wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Protein–protein interaction wikipedia , lookup
1/19/2016 What are lncRNA’s? • Arbitrarily defined as >200bp RNA’s that do not code for protein • Excludes Pseudogenes and Nonsense Mediated Decay products Using Long Read Transcriptome Sequencing for LncRNA Prediction in Non-model Organisms • Usually categorized by genomic relationship to protein coding genes • Really unknown territory (classes within this category may be more different from each other than with protein coding genes) Protein Coding Richard Kuo lncRNA in silico discovery • Motif Method – Approaches and Issues • Look for structural evidence – Structural possibility does not mean non-coding • Look for sequence binding motif – Could still be protein coding • Look for sequence similarity to known lncRNA – Not enough known about lncRNA and low evolutionary conservation • Subtractive Method Sense Overlapping Intronic Intergenic Anti-sense lncRNA Short Read VS Long Read Data • Short Read Data – Usually not stranded – Issues with constructing correctly spliced model – Issues with transcription start and end • Long Read Data – Full length transcript sequence with accurate splice junctions – Approaches and issues • Find protein coding evidence, if none then label lncRNA – Requires accurate full length transcript sequence – Assumes we know all proteins – In phylogenomic approaches need enough interspecies multiple sequence alignments Long Read Technologies • Nanopore Sequencing – – – – Maximum read length of about 100kb Poorly characterized error rates More expensive per base than Pacbio Still not widely available and issues with quality control • Pacific Bioscience SMRT Sequencing (Isoseq) – – – – 10 kb average with 30 kb possible 15% error rate mostly comprised of insertion/deletion Circular sequencing Size selection is advised Using Pacbio Isoseq • Library preparation – Poly-A tail selected – Optional 5’ cap selection – Size selection • Analysis – Create Read of Insert (ROI) from circular sequences – Remove non-full length and chimeric sequences – Iterative Clustering for Error correction (ICE) – Map sequences to genome using GMAP – Resolve redundancies 1 1/19/2016 Our lncRNA pipeline Protein Sequence Similarity • Make list of ORF’s • Three methods for finding protein coding evidence – Protein sequence similarity – Coding Potential Calculator (CPC) – Coding-Potential Assessment Tool (CPAT) • Rank list by length • Convert ORF’s to amino acid sequence • If transcript does not have evidence from any of those methods, label as lncRNA • Blastp against Uniref 90 – For non-model organisms some species specific proteins may not be represented but evidence may come from CPC or CPAT • Can continue to add different tools to the criteria • Any Blastp hits counted as protein coding evidence • Conservative pipeline (low sensitivity but high specificity) Coding Potential Calculator Coding-Potential Assessment Tool • Developed in 2007 by Lei Kong • Developed in 2013 by Liguo Wang • Uses 6 metrics for prediction • Only uses sequence analysis – 3 ORF based • 4 metrics • Log-odds score • Coverage of predicted ORF • Integrity of predicted ORF – – – – – 3 protein similarity based • Number of protein hits • Hit score • Frame score ORF (Open Reading Frame) size ORF coverage Ficket TESTCODE statistic Hexamer usage bias • Requires training data set with annotated protein coding and non-coding RNA • Requires selection of arbitrary threshold Image from: http://s3-production.bobvila.com/blogs/wp-content/uploads/2013/05/Wrench.jpg Image from: https://upload.wikimedia.org/wikipedia/commons/3/3d/Casio-fx115ES-5564.jpg Method Comparison lncRNA Summary • From 2 samples types (Brain and Embryo) Uniprot • 20,539 lncRNA predicted in Chickens 697 • 1,822 Anti-sense to Ensembl gene lncRNA Lengths lncRNA Number of Exons 8000 4 5 6+ 3% 1% 2% 7000 3 6% 6000 # of Transcripts 1953 427 20539 CPAT CPC 3610 17069 78 Ensembl 75 17954Chicken Transcripts 0Chicken lncRNA 215170Human Transcripts 12101Human lncRNA 94929Mouse Transcripts 2538Mouse lncRNA 5000 1 2 16% 4000 2 3000 3 2000 4 1 72% 1000 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 Transcript Length (kb) 2 1/19/2016 LncRNA per Chromosome lncRNA Functional SNP LncRNA per Chromosome 3500 3000 # lncRNA 2500 2000 1500 1000 500 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 W Z Chromosome LncRNA per Chromosome Normalized # lncRNA/Chromosome Size 0.00006 0.00005 0.00004 0.00003 0.00002 0.00001 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 W Z Chromosome Antisense lncRNA Alternative Splice Variant lncRNA Immune lncRNA Discussion • Long read transcriptome sequencing is a powerful tool for predicting lncRNA using ORF methods • Focus on high confidence lncRNA predictions first • Compare lncRNA annotations with RNAseq sets to predict function • More known lncRNA’s will make it easier to find more (better training sets) - Male Neg vs Male Inf Neg has 0 expression, Inf has significant expression Ensembl gene: tumor necrosis factor receptor superfamily member 21 precursor 3 1/19/2016 Acknowledgement Thank you! Professor Dave Burt Professor Alan Archibald Bob Paton Dr. Jacqueline Smith Dr. Lel Eory Choon-Kiat Khoo Pip Beard - TC Chair Tom Freeman – Expert Advisor 4