* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Wrap up Genes and Expression
RNA interference wikipedia , lookup
Polyadenylation wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Genomic library wikipedia , lookup
Transposable element wikipedia , lookup
RNA silencing wikipedia , lookup
Alternative splicing wikipedia , lookup
Biochemistry wikipedia , lookup
Genetic engineering wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Gene therapy wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene desert wikipedia , lookup
Biosynthesis wikipedia , lookup
Genetic code wikipedia , lookup
Gene nomenclature wikipedia , lookup
Non-coding DNA wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Community fingerprinting wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Expression vector wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Messenger RNA wikipedia , lookup
Point mutation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene regulatory network wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Epitranscriptome wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Genes and Expression 51:123 Terry Braun 1 Today's Outline • Gene structure – – – – – – – genomic structure vs mRNA structure ESTs coding and noncoding exons introns primary transcript processing memory pneumonic alternative splicing and differential polyadenylation 2 Genome (3 Bb) – zoom in Adenine Thymine Guanine Cytosine ATGC purines AG pyrimidines CT 3 www.ensembl.org Central Dogma • gene – portions of a genome that affect the transcription, translation, and expression of functionally active molecules (proteins, DNA [promoters] rRNA, mRNA, tRNA, etc) • gene – often used to describe the “coding” regions of genomes – the portions of DNA that are “made” into a protein (via transcription, and translation) • DNA -> pre-mRNA -> mRNA -> protein 4 Central Dogma • DNA -> pre-mRNA -> mRNA -> protein – DNA is “transcribed” into pre-mRNA – “introns” are removed • lariat structure – “exons” remain (“spliced together”), also called the “coding regions” – called mRNA • splice site junctions – mRNA is “translated” into protein 5 Schellenberg MJ, Ritchie DB, MacMillan AM. Pre-mRNA splicing: a complex picture in higher definition. Trends Biochem Sci. 2008 Jun;33(6):243-6. Epub 2008 May 9. Review. 6 Gene Structure: gene to protein 7 Example of Gene in Genomic Context Context of gene – BBS4 – in the human genome. Scale = 72.28 Kb Exons and introns Note possible upstream gene, on other strand Less than 3% of the genome is transcribed and translated into a protein. 8 Human Genome Project • Problem – How do you find all of the genes in a sea of DNA? 9 Where’s the gene? >BBS4 exon2 TAAAGTAACTCTATCACAATATGGATTTAATGGATTAATTGCATAATTGGTGAGCTACTG ATTATTCTTGTTATTTGGATGCTTCTTTAAGTTAGCAAGTTTATATTGTGGTGCTTCAAT ATAGACTACTTATTTCATTTCAGAGAACTCAATTTCCTGTATCTACTGAGTCTCAAAAAC CCCGGCAGAAAAAAGGTCTGTATGCAGTTTCATGGTATGTGTATGTTTGCACAGACAGAT TTCTCTTTTATTTATTTATTTATTTTTTTTTTTGGAGGCAGAGTCTCACTGTCACCCAGG CTGGAGTGCAGTAGCACAATCTTGGCTCACTGCAACCTTTGCCTCTGGGGCTCAAGCAAT TCTCCTGCCTCAGCCTCCCGAGTAGCTGGGATTACAGGTGCACGCCACCACACCTGGCTA 10 Where’s the gene? >BBS4 exon2 TAAAGTAACTCTATCACAATATGGATTTAATGGATTAATTGCATAATTGGTGAGCTACTG ATTATTCTTGTTATTTGGATGCTTCTTTAAGTTAGCAAGTTTATATTGTGGTGCTTCAAT ATAGACTACTTATTTCATTTCAGAGAACTCAATTTCCTGTATCTACTGAGTCTCAAAAAC CCCGGCAGAAAAAAGGTCTGTATGCAGTTTCATGGTATGTGTATGTTTGCACAGACAGAT TTCTCTTTTATTTATTTATTTATTTTTTTTTTTGGAGGCAGAGTCTCACTGTCACCCAGG CTGGAGTGCAGTAGCACAATCTTGGCTCACTGCAACCTTTGCCTCTGGGGCTCAAGCAAT TCTCCTGCCTCAGCCTCCCGAGTAGCTGGGATTACAGGTGCACGCCACCACACCTGGCTA 11 ESTs • Expressed Sequence Tags • If we could read the sequence at only the front (5') or end(3') of mRNAs (transcripts), or even in the middle, that would be conclusive evidence of a gene – Uniquely (?) identify all of the genes – Do not have full expense of sequencing the whole gene sequence (100's of nucleotides VS 1000's) – Can observe differences of expression in tissues – Many questioned whether the complete genome should even be sequenced 12 ESTs at Iowa • Approach – Harvest mRNAs and sequence them – Subtract out what you have already seen (serial subtraction) • Rat gene discovery at Iowa (2003) – 233,890 3-prime ESTs, 50,075 5-prime ESTs – 57,822 clusters (8/26/2003) – novelty = 57,822/(233,890 + 50,075) = 0.20 13 14 C-Value Paradox Hartl, “Molecular melodies in high and low C,” Nat. Rev. Genetics, Nov 2001 • refers to the massive, counterintuitive and seemingly arbitrary differences in genome size observed in eukaryotic organisms – Drosophila melanogaster 180 Mb – Podisma pedestris 18,000 Mb – difference is difficult to explain in view of apparently similar levels of evolutionary, developmental, and behavioral complexity • more to a genome than coding sequences – example – Alu repeats ~ 250 nucleotides – humans, chimps, gorillas – Not in rat/mouse 15 Repetitive Elements number elements LINEs SINEs alus transposons 20.4% 13.4 10.6 2.8 868,000 1,558,000 1,090,000 294,000 Sudbery 2002 Human Mol Genetics 16 Alternative Splicing Every conceivable pattern of alternative splicing is found in nature. Exons have multiple 5’ or 3’ splice sites alternatively used (a, b). Single cassette exons can reside between 2 constitutive exons such that alternative exon is either included or skipped ( c ). Multiple cassette exons can reside between 2 constitutive exons such that the splicing machinery must choose between them (d). Finally, introns can be retained in the mRNA and become translated. Graveley, “Alternative splicing: increasing diversity in the proteomic world.” Trends in Genetics, Feb., 2001. 17 Relevance to disease: changes L to I? Cysteine and disulfide bonds Each amino acid contains an "amine" group (NH3) and a "carboxy" group (COOH) (shown in black in the diagram). The amino acids vary in their side chains (indicated in blue in the diagram). The eight amino acids in the orange area are nonpolar/ hydrophobic. The other amino acids are polar/ hydrophilic ("water loving"). The two amino acids in the purple box are acidic ("carboxy" group in the side chain). The three amino acids in the blue box are basic ("amine" group in the side chain). Know relationship between DNA, mRNA, and aa’s nonpolar: internal, polar: external (interacts with H20) 18 • • • • • • • • • • A ala alanine M met methionine C cys cysteine N asn aspargine D asp aspartic acid P pro proline E glu glutamic acid Q gln glutamine F phe phenylalanine R arg arginine G gly glycine S ser serine H his histidine T thr threonine I ile isoleucine V val valine K lys lysine W trp tryptophane L leu leucine Y tyr tyrosine 19 The Genetic Code (mRNA) Review 1st position (5' end) C A G U Phe F Phe F Leu L Leu L Ser S Ser S Ser S Ser S Tyr Y Tyr Y STOP STOP Cys C Cys C STOP Trp W U C A G C Leu L Leu L Leu L Leu L Pro P Pro P Pro P Pro P His H His H Gln Q Gln Q Arg R Arg R Arg R Arg R U C A G A Ile I Ile I Ile I Met M Thr T Thr T Thr T Thr T Asn N Asn N Lys K Lys K Ser S Ser S Arg R Arg R U C A G G Val V Val V Val V Val V Ala A Ala A Ala A Ala A Asp D Asp D Glu E Glu E Gly G Gly G Gly G Gly G U C A G gene prediction One codon: Met, Trp. Two codons: Asn, Asp, Cys, Gln, Glu, His, Lys, Phe, Tyr, Three codons: Ile, STOP ("nonsense"). Four codons: Ala, Gly, Pro, Thr, Val. Five codons: none. Six codons: Arg, Leu, Ser. 3rd position (3' end) U Codon Table degenerate code 2nd position (middle) 20 Mutations • Mis-sense • Non-sense • www.hgvs.org • http://www.hgvs.org/mutnomen/ 21 From Slide 6… ATG CCC TTC TCC AAC AGC M P F S N GT -- splice donor S CCT GCC CCC CAT GCC TGA P A P H A STOP Delete CC ATG CCC TTC TAA CAG CCC M P F Stop Q P TGC CCC CCA TGC CTG AGG GGC C P P C L R G …? 22 Codon Bias • PAM1 (Point Accepted Mutations) Dayhoff 1978 – global alignment of closely related proteins (85% identical) – <= 1% divergence between proteins • Blosum62 (Blocks Substitution Matrix) Henikoff 1992 – proteins across species containing “blocks” of homology with at least 62 percent were compared – a residue change measurement was computed based on observed residue changes • rare change = -4 • common change = 11 23 # # # # # # A R N D C Q E G H I L K M F P S T W Y V B Z X * Matrix made by matblas from blosum62.iij * column uses minimum score BLOSUM Clustered Scoring Matrix in 1/2 Bit Units Blocks Database = /data/blocks_5.0/blocks.dat Cluster Percentage: >= 62 Entropy = 0.6979, Expected = -0.5209 A R N D C Q E G H I L K M F P S T 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -2 -1 3 4 -3 0 1 -1 0 -3 -4 0 -3 -3 -2 0 -1 -1 0 0 1 -3 3 4 -2 0 -3 -3 1 -1 -3 -1 0 -1 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4 -3 -2 -4 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 -3 -2 -1 -4 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 -3 -2 -1 -4 B -2 -1 3 4 -3 0 1 -1 0 -3 -4 0 -3 -3 -2 0 -1 -4 -3 -3 4 1 -1 -4 Z -1 0 0 1 -3 3 4 -2 0 -3 -3 1 -1 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1 -1 -1 -4 * -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 24 1 Expression • A gene is expressed when the DNA sequence in the genome is transcribed into an mRNA molecule, and that mRNA molecule is correctly made into a protein (aka. string of amino acids for polypeptide). • Note that evaluation of expression is often done by examining/counting the amount/number of mRNA molecules made by the cells of a particular tissue. 25 DNA/RNA/Protein and Strands • promoters – – – – anywhere from 1 to 10 KB to ??? upstream of a gene many proteins and other molecules (RNAs) involved largely unknown “promoter bashing” • replace or delete regions of DNA in promoter • measure level of expression • trans- and cis- regulatory elements – trans – not co-localized to the gene – cis – generally localized to the gene 26 Example -- LCR An example of the functional potential for non-coding regions is the locus control region of the opsin gene cluster (Nathans, et. al. 1989) shown to cause 50% of the cases of blue cone monochromacy. The locus control region is approximately 4 kilobases upstream of the red opsin gene, and 43 kilobases upstream of the green opsin gene. The 579 base region was mapped to the X-chromosome using observed deletions upstream of the red-green opsin gene cluster in individuals with blue cone monochromacy. Blue cone monochromatism is characterized by poor central vision and color discrimination and nearly normal retinal appearance. 27 DNA/RNA/protein figure 28 End 29 Microarray Technology • No genomics discussion would be complete without describing microarray technology. • A powerful tool for genetic research which utilizes nucleic acid hybridization techniques and recent advancements in computing technology to evaluate the mRNA expression profile of thousands of gene in one single experiment. • It has proven to be an extremely valuable method to better utilize the enormous amount of information provided by the completion of the human Genome Project. 30 Gene Expression: Motivation Pattern of gene expression in a cell is characteristic of its current state Virtually all differences in cell state or type can be correlated with differences in mRNA expression levels Expression patterns can provide clues to gene function and metabolic pathway architecture 31 Potential Impact Preventative medicine Subtype diseases in order to design better drugs for a specific genotype More targeted drug treatment -- treat disease rather than symptoms 32 Steps involved in Designing Microarray Experiment • Preparation of fluorescently labeled target from RNA isolated from the biological sample (aka biological sample). • Hybridization of the labeled target to the microarray. • Washing, staining, and scanning of the array. • Analysis of the scanned image. • Generation of gene expression profiles. 33 Physical Spotting 34 DNA Array Technology cDNA libraries and/or gene sequence data Cell Lines RNA Hybridization Target Surface Probe Data Acquisition Expression Levels Analysis 35 Probe Example 36 37 Microarrays: What are they? 38 Microarray Experiment 326 Rat Heart Genes, 2x spotting 39 Affymetrix Technology 40 Affymetrix Chip 41 Hybridization/Microarray Tech. • Very large scale • multiples of 1K density for glass slides – cheap – custom – considered not as reliable • Affy – U133 – – – – 2 chips 45,000 probe sets 39,000 transcripts 33,000 genes • SNP chip – 11,500 SNPs (single nucleotide polymorphisms, or genotypes) – 100,000 SNPs (another year?) • Research and funding dilemma – NIH sponsored funding – only distilled data (if that) made available – confidentiality issues 42 Examples of Analysis • simple filter – all up, all down • clustering – Eisen diagrams – volcano plots – Mootha approach 43 End Expression 44 Polyadenylation (Poly-A) • The addition of multiple adenines to a premRNA and is part of the end of the transcription process • Three steps – 1) the RNA strand is cleaved at a particular site – 2) the addition of poly-A's to the 3' end – 3) the degradation of the remainder of the RNA transcript 45 Polyadenylation Cut polyadenylated degraded AAAn 46 Poly-A Signal AAUAAA – specifies where the mRNA is cleaved, and the Poly-A is added Typically 23 or 24 bases downstream of this signal 10-200 A's added Increases translatability by about 20-fold (mechanism unknown). Also thought to improve stability – protecting the end of the mRNA molecule from exonucleases. 47 Alternative Polyadenylation • common in human RNA (Edwards-Gilbert 1997) • in many genes, 2 or more poly-A signals in 3’ UTR – alternative transcripts can show tissue specificity • alternative poly-A signals may be brought into play following alternative splicing 48 Edwards-Gilbert. Nucleic Acids Res, 13, 1997 49 End 50