* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Alternative Splicing
Survey
Document related concepts
Transcript
Alternative Splicing As an introduction to microarrays QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Human Genome • 90,000 Human proteins, initially assumed near that number of genes (initial estimates 153,000) • The 1000 cell roundworm Caenorhabditis elegans has 19,500 genes, corn has 40,000 genes • Current estimates are 25,000 or fewer genes • Alternative splicing allows different tissue types to perform different function with same gene assortment Implications • 75% of human genes are subject to alternative editing • faulty gene splicing leads to cancer and congenital diseases. • gene therapy can use splicing Application • We talked before about apoptotis when the cell determines it cant be repaired • Bcl-x is a regulator of apoptotis, is alternatively spliced to produce either Bcl-x(L) that suppresses apoptosis, or Bcl-x(S) that promotes it. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Spliceosome • Five snRNA molecules U1, U2, U3, U4, U5, U6 combine with as many as 150 proteins to form the spliceosome • It recognizes sites where introns begin and end – Cuts introns out of pre-mRNA – joins exons QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Spliceosome • The 5’ splice site is at the beginning of the intron, the 3’ site is at the end • The average human protein coding gene is 28000 nucleotides long with 8.8 exons separated by 7.8 introns • exons are 120 nucleotides long while introns are 100-100,000 nucleotides long Splicing errors • familial dysautonomia results from a singlenucleotide mutation that causes a gene to be alternatively spliced in nervous system tissue • The decrease in the IKBKAP protein leads to abnormal nervous system development (half die before 30) • > 15% of gene mutations that cause genetic diseases and cancers are caused by splicing errors. Why splicing • • • • Each gene generates 3 alternatively spliced mRNAs Why so much intron (1-2% of genome is exons)? Mouse and human differences are almost all splicing Half of the human genome is made up of transposable elements, Alus being the most abundant (1.4 million copies) – They continue to multiply and insert themselves into the genome at the rate of one insertion per 100 human births • mutations in the Alu can create a 5’ or 3’ site in an intron causing it to be an exon • This mutation doesn’t impact existing exons • It only has effect when it is alternatively spliced in QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Microarrays For Alt. Splicing • Use short oligonucleotides • Get a guess at the rate of expression of the oligo Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Affymetrix Microarrays For Alt. Splicing Exon 1 Exon 2 Exon 4 Exon 5 Exon 3 Isoform 1: Exon 1 Exon 2 Exon 4 Exon 5 Isoform 2: Exon 1 Exon 3 Exon 5 Probe types Constitutive Junction Exon Unique (“Cassette”) Expression Ideal Microarray Readings a Isoform 1: b a Exon 1 c Probe d c Exon 2 Exon 4 Exon 5 b Isoform 2: a Exon 1 e d Exon 3 Exon 5 e Probe types Constitutive Exon Junction Unique (“Cassette”) Motivation • Why alternatively splice? • How does it affect the resulting proteins? • Look at domains: – High level summary of protein – ~80% of eukaryotic proteins are multidomain – Domains are big relative to an exon Some Previous Work • Signatures of domain shuffling in the human genome. Kaessmann, 2002. Intron phase symmetry around domain boundaries • The Effects of Alternative Splicing On Transmembrane Proteins in the Mouse Genome. Cline, 2004. Half of TM proteins studied affected by altsplicing. Method • Predict Alternative Splicing • Predict Protein Domains • Look for effects of Alt-Splicing on predicted domains – “Swapping” – “Knockout” – “Clipping” Microarray Design • Genes based on mRNA and EST data in mouse • Mapped to Feb. 2002 mouse genome freeze • ~500,000 probes (~66,000 sets) • ~100,000 transcripts • ~13,000 gene models Technical work Genome Space Provided data gene models transcripts Overlap Probe to transcript mapping Generated Data Overlap E@NM_021320 cc-chr10-000017.82.0 G6836022@J911445 cc-chr10-000017.91.1 G6807921@J911524_RC cc-chr10-000018.4.0 probes Predicting Alternative Splicing • Using mouse alt-splicing microarrays • Data from Manny Ares – 8 tissues – 3 replicates of each tissue Predicting Alternative Splicing • General Approach: Clustering, then Anti-Clustering 107 Clusters QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Detail View Gene Expression Measurement • mRNA expression represents dynamic aspects of cell • mRNA expression can be measured with latest technology • mRNA is isolated and labeled with fluorescent protein • mRNA is hybridized to the target; level of hybridization corresponds to light emission which is measured with a laser Gene Expression Microarrays The main types of gene expression microarrays: • Short oligonucleotide arrays (Affymetrix); • cDNA or spotted arrays (Brown/Botstein). • Long oligonucleotide arrays (Agilent Inkjet); • Fiber-optic arrays • ... Affymetrix Microarrays Raw image 1.28cm 50um ~107 oligonucleotides, half Perfectly Match mRNA (PM), half have one Mismatch (MM) Raw gene expression is intensity difference: PM - MM Microarray Potential Applications • Biological discovery – new and better molecular diagnostics – new molecular targets for therapy – finding and refining biological pathways • Recent examples – molecular diagnosis of leukemia, breast cancer, ... – appropriate treatment for genetic signature – potential new drug targets Microarray Data Analysis Types • Gene Selection – find genes for therapeutic targets – avoid false positives (FDA approval ?) • Classification (Supervised) – identify disease – predict outcome / select best treatment • Clustering (Unsupervised) – find new biological classes / refine existing ones – exploration •… Microarray Data Mining Challenges • too few records (samples), usually < 100 • too many columns (genes), usually > 1,000 • Too many columns likely to lead to False positives • for exploration, a large set of all relevant genes is desired • for diagnostics or identification of therapeutic targets, the smallest set of genes is needed • model needs to be explainable to biologists