* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download C - Bioinformatics Research Center
Gene regulatory network wikipedia , lookup
RNA interference wikipedia , lookup
DNA supercoil wikipedia , lookup
Metalloprotein wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Polyadenylation wikipedia , lookup
RNA silencing wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Proteolysis wikipedia , lookup
Non-coding DNA wikipedia , lookup
Messenger RNA wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Protein structure prediction wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Point mutation wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Biochemistry wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Genetic code wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Epitranscriptome wikipedia , lookup
BI I: Bioinformatics and Basic Molecular Biology August 27, 2013 1 Announcements • Remember: course website up and running • http://statgen.ncsu.edu/st590a/index.php • Login: students • Password: st590a 2 Sequences are letters in an alphabet • Much of class focuses on biological sequence analysis • Sequences are strings of “letters” • Come from a biological “alphabet” • Most biological “alphabets” look like the alphabet • DNA: A,C,G,T • RNA: A,C,G,U • Amino acids: A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y 3 Sequences are words in a sentence • Proper function depends on: • Both spelling and grammar Class is interesting but it moves • Both tell us how a sentence should look • But change either one and you change the meaning 4 Cells • Complex system enclosed in a membrane • Organisms can be unicellular • e.g. bacteria, baker’s yeast • Organisms can be multicellular • Humans: • 60 trillion cells • 320 cell types www.ebi.ac.uk/microarray/ biology_intro.htm Example Animal Cell 5 A dichotomy of organisms, at least • Eukaryotes • contain a membrane-bound nucleus and organelles (plants, animals, fungi,…) • Prokaryotes • lack a true membrane-bound nucleus and organelles (single-celled, includes bacteria) • Not all single celled organisms are prokaryotes 6 Chromosomes • In eukaryotes, the nucleus contains one or several double stranded DNA molecules organized as chromosomes • Humans: • 22 Pairs of autosomes • 1 pair sex chromosomes Human Karyotype http://avery.rutgers.edu/WSSP/StudentScholars/Session8/Session8.html 7 A peek inside the cell Image source: www.biotec.or.th/Genome/whatGenome.html 8 What is DNA? • DNA: Deoxyribonucleic Acid • Single stranded molecule (oligomer, polynucleotide) chain of nucleotides • Four different nucleotides: • • • • Adenosine Cytosine Guanine Thymine (A) (C) (G) (T) • Our first alphabet • Question: Does a small alphabet require long words? 9 Nucleotide bases • Purines (A and G) • Pyrimidines (C and T) • Difference is in base structure Image Source: www.ebi.ac.uk/microarray/ biology_intro.htm 10 Nucleotides chain together to form DNA Phosphate group Base Ribose or Deoxyribose (shown here) • A nucleotide unit consists of a pentose sugar, a phosphate moiety (containing up to 3 phosphate groups) and a Base. • Subunits are linked together by phosphodiester bond, to form a ‘sugar-phosphate backbone’: 11 Single-stranded DNA polynucleotide • Example polynucleotide: 5’ G→T→A→A→A→G→T→C→C→C→G→T→T→A→G→C 3’ • Or more commonly: GTAAAGTCCCGTTAGC 12 Double-stranded DNA • DNA can be single-stranded or double-stranded • Double-stranded DNA • second strand is the “reverse complement” strand • Reverse complement runs in opposite direction and bases are complementary • Complementary bases: • A complements T • C complements G 13 Double-stranded sequence • Example double-stranded polynucleotide: 5’ G→T→A→A→A→G→T→C→C→C→G→T→T→A→G→C 3’ | | | | | | | | | | | | | | | | 3’ C←A←T←T←T←C←A←G←G←G←C←A←A←T←C←G 5’ • Or more commonly GTAAAGTCCCGTTAGC CATTTCAGGGCAATCG • Chromosomal DNA is double-stranded • Why store redundant information? 14 H-bonding between complementary bases Why store redundant information? • Nomenclature: sequence hybridizes to its reverse complement 15 BI: Integrating your knowledge • Hydrogen bonding holds the two strands together • 2 bonds between A and T • 3 bonds between G and C • How much of a organism’s DNA is G-C vs. A-T? • Fact: Heat can denature molecules, DNA included • Would bacteria in a hot environment benefit from an excess of G-C base pairs? 16 An interesting research question 17 Comparative analysis of prokaryotes requires: • To answer the research question, first we must ask • • • • • • • Where do we get the sequences? Which subset of the genome should we use? For which prokaryotes are genome sequences known? What if the prokaryotic genomes are different lengths? How do we account for evolutionary relationships? What are the other constraints on G-C content? … 18 Replication yields two identical(?) double-helices • What if replication makes a mistake? • The birth of sequence variation! • This is the subject of our next class 19 Broad summary • Thus far: • Storage of genetic information • Replication of genetic information • Next up: • Execution of genetic information 20 Central dogma of molecular biology Flow of Information in Living Systems • Genome = blueprint • Proteins = building blocks transcription DNA translation RNA Protein DNA Sequence Implies Structure Implies Function 21 Dogma detailed and depicted Transcription mRNA Transport Translation Nascent polypeptide mRNA ribosome Post-transl. modif functional protein 22 RNA • Ribonucleic Acid • Another alphabet • Similar to DNA • Thymine (T) is replaced by uracil (U) • Note that U is also complementary to A • RNA can be: • Single stranded • Double stranded • Hybridized with DNA 23 RNA • RNA is generally single stranded • Forms secondary or tertiary structures • When spelled correctly! • RNA folding will be discussed later • Important in a variety of ways, including protein synthesis 24 mRNA • Messenger RNA • Linear molecule encoding genetic information copied from DNA molecules • Transcription • Process in which DNA is copied into an RNA molecule • mRNA is complementary to the DNA from which it is transcribed: CTGAAT GACUUA 25 (Some) types of RNA • mRNA • Messenger RNA • tRNA • Transfer RNA • rRNA • Ribosomal RNA • snRNA • Small nuclear RNA 26 RNA self-complementarity • A single-stranded RNA can fold back on itself • Complementary bases are “sticky” GU GGUGCG A GGUGCGGUAAGAGCGCACC A CCACGC G GA • Folded molecules “make things work” • We will see this with proteins 27 RNA secondary structure • E. coli Rnase P RNA secondary structure • How might one predict whether or not a sequence of bases (A,C,G,U) is likely an RNA? Image source: www.mbio.ncsu.edu/JWB/MB409/lecture/ lecture05/lecture05.htm 28 Life depends on three critical molecules • DNAs • Hold information on how cell works • RNAs • Act to transfer short pieces of information to cell parts • Provide templates to synthesize into protein • Proteins • Form enzymes that send signals to other cells and regulate gene activity • Form body’s major components (e.g. hair, skin, etc.) 29 DNA, RNA, and the flow of information Replication Transcription Translation 30 Genes make proteins • Proteins do all sorts of things • Catalysis, structure, movement, defense, regulation, transport, storage, stress response,… 31 But what is a gene? 32 Gene structure • Genes must have: • Exons (usually, protein-coding DNA to be translated) • Start site • Control region Eukaryotic gene structure • Proper gene function requires: • Parts spelled correctly in grammatically correct order 33 Eukaryotic genes Enhancer Promoter Transcribed Region Terminator Transcription RNA Polymerase II Primary transcript 5’ Intron1 3’ Exon1 Cap Splice Cleave/Polyadenylate Translation C N Polypeptide Exon2 7mG An Transport 7mG An 34 Prokaryotic genes Promoter Cistron1 Cistron2 Transcription CistronN Terminator RNA Polymerase mRNA 5’ 3’ 1 2 Translation C N N N Ribosome, tRNAs, Protein Factors C N Polypeptides C 1 2 3 • Prokaryotic and eukaryotic genes have very different grammars 35 Prokaryotes versus eukaryotes • In prokaryotes • The transcribed mRNA is ready to be translated into protein (polypeptide) product • In eukaryotes • The transcribed mRNA (pre-mRNA) must first be processed into mature mRNA • The protein-coding regions (exons) are interspersed with non-coding regions (introns) which must be excised 36 Key feature of Eukaryotic gene structure • Most eukaryotic genes are split, containing large untranscribed sequences • Exon • Part of the gene contributing to mature mRNA • Intron • Part of the gene which is not transcribed • Introns found in all types of genes, including those coding for RNAs 37 The structure of human PSA • How might you find the exons? • There are grammatical rules • And exons and introns are spelled differently! 38 How does the gene “become” a protein? DNA Transcription RNA Translation Protein 39 Translation • mRNA is used as a template to make proteins • mRNA is bound by tRNA three bases at a time • Occurs at the ribosome • The bound tRNA also binds to a specific amino acid • This amino acid extends the nascent protein sequence 40 mRNA encodes the protein message in DNA • Exonic DNA is interpreted in groups of 3 (codons) AGTTTTGGGCCCAAA • The 64 (4 × 4 × 4) codons correspond to actions to be taken at the ribosome • Start transcription (begin a protein) • Add one of twenty amino acids (extend a protein) • Stop transcription (end a protein) 41 Genetic Code • From RNA triplets to amino acids • • • • • • • 4 possible bases (A, C, G, U) 3 bases in the codon 4 x 4 x 4 = 64 possible codon sequences Start codon: AUG (also encodes methionine) Stop codons: UAA, UAG, UGA 61 codons to code for amino acids (AUG as well) 20 amino acids – redundancy in genetic code • Yet another alphabet! • This one spells proteins 42 Genetic code: 64 triplets code 22 tasks http://www.geneticengineering.org/chemis/Chemis-NucleicAcid/RNA.htm 43 Amino acids • Building blocks for proteins (20 different) • vary by side chain groups • Side chain group gives amino acid physical and chemical properties • e.g. hydrophilic amino acids are water soluble (vs. Hydrophobic) • Linked together via a single chemical bond (peptide bond) • Peptide: Short linear chain of amino acids (< 30) • Polypeptide: long chain of amino acids (which can be upwards of 4000 residues long). 44 The 20 naturally occurring amino acids • • • • • • • • • • • • • • • • • • • • Glycine (G, GLY) Alanine (A, ALA) Valine (V, VAL) Leucine (L, LEU) Isoleucine (I, ILE) Phenylalanine (F, PHE) Proline (P, PRO) Serine (S, SER) Threonine (T, THR) Cysteine (C, CYS) Methionine (M, MET) Tryptophan (W, TRP) Tyrosine (T, TYR) Asparagine (N, ASN) Glutamine (Q, GLN) Aspartic acid (D, ASP) Glutamic Acid (E, GLU) Lysine (K, LYS) Arginine (R, ARG) Histidine (H, HIS) 45 Peptide bonds link amino acids 46 Side-chains distinguish amino acid properties • Each AA has a unique side-chain • Unique molecular properties • Molecular properties of AAs determine protein structure • Another alphabet! • More like a linguistic alphabet 47 Proteins • Polypeptides having a three dimensional structure. • Proteins can form interations: • Proteins (complexes, oligomers) • mRNA • DNA • Proteins can bind to each other depending on their relative charges and structures 48 Four levels of protein structure 49 Secondary structure • Parts of a protein may fold on themselves to form • α-helix, β-sheet, random coil 50 Domains • Domains are structural and/or functional modules within the protein that are usually separately folded Leucine zipper E-F Hand Zinc finger 51 Recognizing domains • Predict protein fold (secondary structure) based on primary sequence • Requires knowledge of spelling and grammar • Key point: • Similar spelling often implies similar function 52 Tertiary interactions 53 Predicting protein function and structure • Given the sequence of amino acids (primary structure) of an anonymous protein • How might one predict its function? • How might one predict its folded structure? 54 Review of alphabetic BI data • DNA • String from 4-letter alphabet of nucleotides (A,G,C,T) • RNA • String from 4-letter alphabet of nucleotides (A,G,C,U) • Coding sequence • String from 64-letter alphabet of nucleotide triplets (AAA,…) • Proteins • String from 20-letter alphabet of amino acids (Ala, Cys, …) 55 From now until December • Explore and exploit the grammar of the genome • How does evolution shape sequence diversity? • Change over time subject to constraints on grammar and spelling • Constraints themselves change over time! • Distinguish random sequences from those structured by function • One million monkeys on one million typewriters? • Or one Shakespeare with vellum? 56 Biological sequence analysis in one sentence Use biological knowledge of (1) grammar (2) spelling (3) evolution to identify sequences with too much structure to have occurred by chance 58 Looking forward • Two goals of this class • What are the important problems? • What tools and techniques can we use to address them? • Next time • Origins of sequence diversity • Reading? 59