Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Alternative Splicing As an introduction to microarrays QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Human Genome • 90,000 Human proteins, initially assumed near that number of genes (initial estimates 153,000) • The 1000 cell roundworm Caenorhabditis elegans has 19,500 genes, corn has 40,000 genes • Current estimates are 25,000 or fewer genes • Alternative splicing allows different tissue types to perform different function with same gene assortment Implications • 75% of human genes are subject to alternative editing • faulty gene splicing leads to cancer and congenital diseases. • gene therapy can use splicing Application • We talked before about apoptotis when the cell determines it cant be repaired • Bcl-x is a regulator of apoptotis, is alternatively spliced to produce either Bcl-x(L) that suppresses apoptosis, or Bcl-x(S) that promotes it. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Spliceosome • Five snRNA molecules U1, U2, U3, U4, U5, U6 combine with as many as 150 proteins to form the spliceosome • It recognizes sites where introns begin and end – Cuts introns out of pre-mRNA – joins exons QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Spliceosome • The 5’ splice site is at the beginning of the intron, the 3’ site is at the end • The average human protein coding gene is 28000 nucleotides long with 8.8 exons separated by 7.8 introns • exons are 120 nucleotides long while introns are 100-100,000 nucleotides long Splicing errors • familial dysautonomia results from a singlenucleotide mutation that causes a gene to be alternatively spliced in nervous system tissue • The decrease in the IKBKAP protein leads to abnormal nervous system development (half die before 30) • > 15% of gene mutations that cause genetic diseases and cancers are caused by splicing errors. Why splicing • • • • Each gene generates 3 alternatively spliced mRNAs Why so much intron (1-2% of genome is exons)? Mouse and human differences are almost all splicing Half of the human genome is made up of transposable elements, Alus being the most abundant (1.4 million copies) – They continue to multiply and insert themselves into the genome at the rate of one insertion per 100 human births • mutations in the Alu can create a 5’ or 3’ site in an intron causing it to be an exon • This mutation doesn’t impact existing exons • It only has effect when it is alternatively spliced in QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Microarrays For Alt. Splicing • Use short oligonucleotides • Get a guess at the rate of expression of the oligo Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Affymetrix Microarrays For Alt. Splicing Exon 1 Exon 2 Exon 4 Exon 5 Exon 3 Isoform 1: Exon 1 Exon 2 Exon 4 Exon 5 Isoform 2: Exon 1 Exon 3 Exon 5 Probe types Constitutive Junction Exon Unique (“Cassette”) Expression Ideal Microarray Readings a Isoform 1: b a Exon 1 c Probe d c Exon 2 Exon 4 Exon 5 b Isoform 2: a Exon 1 e d Exon 3 Exon 5 e Probe types Constitutive Exon Junction Unique (“Cassette”) Motivation • Why alternatively splice? • How does it affect the resulting proteins? • Look at domains: – High level summary of protein – ~80% of eukaryotic proteins are multidomain – Domains are big relative to an exon Some Previous Work • Signatures of domain shuffling in the human genome. Kaessmann, 2002. Intron phase symmetry around domain boundaries • The Effects of Alternative Splicing On Transmembrane Proteins in the Mouse Genome. Cline, 2004. Half of TM proteins studied affected by altsplicing. Method • Predict Alternative Splicing • Predict Protein Domains • Look for effects of Alt-Splicing on predicted domains – “Swapping” – “Knockout” – “Clipping” Microarray Design • Genes based on mRNA and EST data in mouse • Mapped to Feb. 2002 mouse genome freeze • ~500,000 probes (~66,000 sets) • ~100,000 transcripts • ~13,000 gene models Technical work Genome Space Provided data gene models transcripts Overlap Probe to transcript mapping Generated Data Overlap E@NM_021320 cc-chr10-000017.82.0 G6836022@J911445 cc-chr10-000017.91.1 G6807921@J911524_RC cc-chr10-000018.4.0 probes Predicting Alternative Splicing • Using mouse alt-splicing microarrays • Data from Manny Ares – 8 tissues – 3 replicates of each tissue Predicting Alternative Splicing • General Approach: Clustering, then Anti-Clustering 107 Clusters QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Detail View Gene Expression Measurement • mRNA expression represents dynamic aspects of cell • mRNA expression can be measured with latest technology • mRNA is isolated and labeled with fluorescent protein • mRNA is hybridized to the target; level of hybridization corresponds to light emission which is measured with a laser Gene Expression Microarrays The main types of gene expression microarrays: • Short oligonucleotide arrays (Affymetrix); • cDNA or spotted arrays (Brown/Botstein). • Long oligonucleotide arrays (Agilent Inkjet); • Fiber-optic arrays • ... Affymetrix Microarrays Raw image 1.28cm 50um ~107 oligonucleotides, half Perfectly Match mRNA (PM), half have one Mismatch (MM) Raw gene expression is intensity difference: PM - MM Microarray Potential Applications • Biological discovery – new and better molecular diagnostics – new molecular targets for therapy – finding and refining biological pathways • Recent examples – molecular diagnosis of leukemia, breast cancer, ... – appropriate treatment for genetic signature – potential new drug targets Microarray Data Analysis Types • Gene Selection – find genes for therapeutic targets – avoid false positives (FDA approval ?) • Classification (Supervised) – identify disease – predict outcome / select best treatment • Clustering (Unsupervised) – find new biological classes / refine existing ones – exploration •… Microarray Data Mining Challenges • too few records (samples), usually < 100 • too many columns (genes), usually > 1,000 • Too many columns likely to lead to False positives • for exploration, a large set of all relevant genes is desired • for diagnostics or identification of therapeutic targets, the smallest set of genes is needed • model needs to be explainable to biologists