Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
CS 6990 Bioinformatics Fall 2004 Dr. Susan Bridges Department of Computer Science and Engineering Bioinformatics DNA, RNA, and Protein Macromolecule Repeating Unit Role DNA Deoxyribonucleotides (A,C,G,T) Genome RNA Ribonucleotides (A,C,G,U) Genome Messenger Gene product Amino acids (A,C,D,E,F,G,H,I,K,I,M, N,P,Q,R,S,T,V,W,Y) Gene product Protein Department of Computer Science and Engineering Bioinformatics Central Dogma of Molecular Biology Transcription Replication DNA Translation RNA Reverse Transcription Department of Computer Science and Engineering Bioinformatics Protein DNA Department of Computer Science and Engineering Bioinformatics Base Pairs (bp) A, G Purines C, T Pyrimidines Department of Computer Science and Engineering Bioinformatics Base Pairs in Detail Department of Computer Science and Engineering Bioinformatics String Representation 5’ ……. 3’ ……. TACTGAGGC 3’ 5’ Department of Computer Science and Engineering Bioinformatics Department of Computer Science and Engineering Bioinformatics Department of Computer Science and Engineering Bioinformatics How RNA differs from DNA 1. Sugar is ribose rather than deoxyribose 2. Thymine (T) replaced by uracil (U) 3. Does not typically form a double helix 4. Performs many functions Nucleotide Codes A G C T U R Y N Adenine Guainine Cytosine Thymine Uracil Purine (A or G) Pyrimidine (C or T) Any nucleotide W S M K B H D V Weak (A or T) Strong (G or C) Amino (A or C) Keto (G or T) Not A (G or C or T) Not G (A or C or T) Not C (A or G or T) Not T (A or G or C) Department of Computer Science and Engineering Bioinformatics Biology Terms • Prokaryotes: – “Primitive” organisms that do not have a nuclear membrane – Includes the bacteria • Eukaryotes: – “Higher” organisms in which the genetic material is localized in the nucleus of the cells. – Includes plants and animals like yeast, corn, protozoa, humans Department of Computer Science and Engineering Bioinformatics Protein • The most “active” molecules in organisms are proteins – Structural – Enzymes • Proteins are polymers of amino acids—a long string of amino acid residues • 20 amino acids (+ a few strange ones that occur occasionally) Department of Computer Science and Engineering Bioinformatics Department of Computer Science and Engineering Bioinformatics Protein Backbone • Backbone • N-terminus (N-terminal) (amino group) • C-terminus (C-terminal) (carboxyl group) Department of Computer Science and Engineering Bioinformatics Department of Computer Science and Engineering Bioinformatics Genes and the Genetic Code • Each chromosome is a long chain of DNA • Certain sequences on the chromosome contain the code for a protein. These are called genes. • A gene is composed of a sequence of codons – A codon is a nucleotide triplet (3 base sequence) – The first triplet in a gene is a special codon called the start codon (usually AUG) – The gene ends with a stop codon. • The genetic code consists of the 3 letter codes for each amino acid. Department of Computer Science and Engineering Bioinformatics Features of the Genetic Code • Written in linear form in terms of sequences of bases. • Each “word” in the code is a sequence of 3 bases. • The code is degenerate: most amino acids can be specified by more than one codon. • The code contains start and stop signals but no internal punctuation (commaless). • The code is non-overlapping (codons are read in a single reading frame.) • The code is nearly universal. Department of Computer Science and Engineering Bioinformatics Analogy Acoapzzcordkathedogatetheratpercliosidklancocoaiem ifuzzclqzthecatandthehatareredpercopoqpooijcc9a8cjkal;c ackcccjasoeuejlschjw8eicnxkdoaoejknthecrivhejelpauenvy pzznccmqthecowranforthedogandthecatandthedogateper cxqoicqickvperyerlcaperkcaeiakd Department of Computer Science and Engineering Bioinformatics Messenger RNA has Copy of Message from DNA Department of Computer Science and Engineering Bioinformatics Flow of Genetic Information Gene DNA template strand (antisense) Messenger RNA (mRNA) TACGGC CA A transcription AUGCCG GU U Translation on ribosomes protein met arg val Department of Computer Science and Engineering Bioinformatics More terminology • Promoter sequence sometimes used to recognize start of strand • DNA has 2 strands – Coding strand (sense): looks like mRNA – Template strand (anticoding or antisense): transcribed • DNA is “read” from the 3’ end to 5’ end to make mRNA • mRNA is built from 5’ to 3’ • Upstream—before the start of the gene • Downstream—after the end of the gene Department of Computer Science and Engineering Bioinformatics RNA synthesis 2 complementary DNA strands Coding strand 5’ ATGCCGTTAGACCGTTAGCGGACC Template strand 3’ TACGGCAATCTGGCAATCGCCTGG RNA 5’ AUGCCGUUAGACCGUUAGCGGACC Department of Computer Science and Engineering Bioinformatics Each gene in most eukaryotes is divided into coding sections (exons) and noncoding sections (introns). Department of Computer Science and Engineering Bioinformatics Introns are spliced out or mRNA. Only exons used to build protein Department of Computer Science and Engineering Bioinformatics Department of Computer Science and Engineering Bioinformatics Web References • Access Excellence Graphics Gallery, http://www.accessexcellence.org/AB/GG/ • http://bioinfo.mbb.yale.edu/course/classes/c2/ppframe.htm • http://cmgm.stanford.edu/biochem218/01Representation.html • http://www.math.tau.ac.il/~rshamir/algmb/01/algmb01.html • http://www.hgmp.mrc.ac.uk/GenomeWeb/docs-bioinformatics.html Department of Computer Science and Engineering Bioinformatics