* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download C - NCSU Bioinformatics Research Center
Community fingerprinting wikipedia , lookup
Molecular cloning wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Polyadenylation wikipedia , lookup
DNA supercoil wikipedia , lookup
RNA silencing wikipedia , lookup
Promoter (genetics) wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Proteolysis wikipedia , lookup
Messenger RNA wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Non-coding DNA wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Point mutation wikipedia , lookup
Biochemistry wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genetic code wikipedia , lookup
Epitranscriptome wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Gene expression wikipedia , lookup
BI I: Bioinformatics and Basic Molecular Biology August 23, 2011 1 Announcements • Remember: course website up and running • http://statgen.ncsu.edu/st590a/index.php • Login: students • Password: st590a 2 Sequences are letters in an alphabet • Class focuses on biological sequence analysis • Sequences are strings of “letters” • Come from a biological “alphabet” • Most biological “alphabets” look like the alphabet • DNA: A,C,G,T • RNA: A,C,G,U • Amino acids: A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y 3 Sequences are words in a sentence • Proper function depends on: • Both spelling and grammar Class is interesting but it moves • Both tell us how a sentence should look • But change either one and you change the meaning 4 Goals of today • Topics • Cells • Chromosomes • Basic alphabets • DNA, RNA, amino acids • Basic grammatical structures • Genes, proteins • The genome, transcriptome, and proteome 5 Cells • Complex system enclosed in a membrane • Organisms can be unicellular • e.g. bacteria, baker’s yeast • Organisms can be multicellular • Humans: • 60 trillion cells • 320 cell types www.ebi.ac.uk/microarray/ biology_intro.htm Example Animal Cell 6 A dichotomy of organisms, at least • Eukaryotes • contain a membrane-bound nucleus and organelles (plants, animals, fungi,…) • Prokaryotes • lack a true membrane-bound nucleus and organelles (single-celled, includes bacteria) • Not all single celled organisms are prokaryotes 7 Chromosomes • In eukaryotes, the nucleus contains one or several double stranded DNA molecules organized as chromosomes • Humans: • 22 Pairs of autosomes • 1 pair sex chromosomes Human Karyotype http://avery.rutgers.edu/WSSP/StudentScholars/Session8/Session8.html 8 A peek inside the cell Image source: www.biotec.or.th/Genome/whatGenome.html 9 DNA densely packed into chromosomes • How long is unwound human chromosome? ~5cm 10 Ex: Bioinformatics informs biology • Discovery a new grammar, publish in Nature 11 What is DNA? • DNA: Deoxyribonucleic Acid • Single stranded molecule (oligomer, polynucleotide) chain of nucleotides • Four different nucleotides: • • • • Adenosine Cytosine Guanine Thymine (A) (C) (G) (T) • Our first alphabet • Question: Does a small alphabet require long words? 12 Nucleotide bases • Purines (A and G) • Pyrimidines (C and T) • Difference is in base structure Image Source: www.ebi.ac.uk/microarray/ biology_intro.htm 13 Nucleotides chain together to form DNA Phosphate group Base Ribose or Deoxyribose (shown here) • A nucleotide unit consists of a pentose sugar, a phosphate moiety (containing up to 3 phosphate groups) and a Base. • Subunits are linked together by phosphodiester bond, to form a ‘sugar-phosphate backbone’: 14 Single-stranded DNA polynucleotide • Example polynucleotide: 5’ G→T→A→A→A→G→T→C→C→C→G→T→T→A→G→C 3’ • Or more commonly: GTAAAGTCCCGTTAGC 15 Double-stranded DNA • DNA can be single-stranded or double-stranded • Double-stranded DNA • second strand is the “reverse complement” strand • Reverse complement runs in opposite direction and bases are complementary • Complementary bases: • A complements T • C complements G 16 Double-stranded sequence • Example double-stranded polynucleotide: 5’ G→T→A→A→A→G→T→C→C→C→G→T→T→A→G→C 3’ | | | | | | | | | | | | | | | | 3’ C←A←T←T←T←C←A←G←G←G←C←A←A←T←C←G 5’ • Or more commonly GTAAAGTCCCGTTAGC CATTTCAGGGCAATCG • Chromosomal DNA is double-stranded • Why store redundant information? 17 The Double Helix • Two complementary DNA strands form a stable DNA double helix • Spring ‘03 was the 50th anniversary of its discovery 18 DNA Replication • “It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material” • Watson & Crick, Nature (1953) 19 Think: Unwind and copy 20 Replication yields two identical(?) double-helices 21 H-bonding between complementary bases Why store redundant information? • Nomenclature: sequence hybridizes to its reverse complement 22 BI: Integrating your knowledge • Hydrogen bonding holds the two strands together • 2 bonds between A and T • 3 bonds between G and C • How much of a organism’s DNA is G-C vs. A-T? • Fact: Heat can denature molecules, DNA included • Would bacteria in a hot environment benefit from an excess of G-C base pairs? 23 An interesting research question 24 Comparative analysis of prokaryotes requires: • To answer the research question, first we must ask • • • • • • • Where do we get the sequences? Which subset of the genome should we use? For which prokaryotes are genome sequences known? What if the prokaryotic genomes are different lengths? How do we account for evolutionary relationships? What are the other constraints on G-C content? … 25 Replication yields two identical(?) double-helices • What if replication makes a mistake? • The birth of sequence variation! • This is the subject of our next class 26 Replication precision in humans • Error rate: • Once for every 10 billion operations! (???) • DNA Polymerase • Makes sure complementary bases are placed with one another as a strand of bases becomes duplicated DNA • Before the DNA polymerase adds the next nucleotide, previous nucleotide pair is “checked” • Incorrect pair are clipped off and replaced 27 Broad summary • Thus far: • Storage of genetic information • Replication of genetic information • Next up: • Execution of genetic information 28 Central dogma of molecular biology Flow of Information in Living Systems • Genome = blueprint • Proteins = building blocks transcription DNA translation RNA Protein DNA Sequence Implies Structure Implies Function 29 Dogma detailed and depicted Transcription mRNA Transport Translation Nascent polypeptide mRNA ribosome Post-transl. modif functional protein 30 RNA • Ribonucleic Acid • Another alphabet • Similar to DNA • Thymine (T) is replaced by uracil (U) • Note that U is also complementary to A • RNA can be: • Single stranded • Double stranded • Hybridized with DNA 31 RNA • RNA is generally single stranded • Forms secondary or tertiary structures • When spelled correctly! • RNA folding will be discussed later • Important in a variety of ways, including protein synthesis 32 mRNA • Messenger RNA • Linear molecule encoding genetic information copied from DNA molecules • Transcription • Process in which DNA is copied into an RNA molecule • mRNA is complementary to the DNA from which it is transcribed: CTGAAT GACUUA 33 Types of RNA • mRNA • Messenger RNA • tRNA • Transfer RNA • rRNA • Ribosomal RNA • snRNA • Small nuclear RNA 34 RNA self-complementarity • A single-stranded RNA can fold back on itself • Complementary bases are “sticky” GU GGUGCG A GGUGCGGUAAGAGCGCACC A CCACGC G GA • Folded molecules “make things work” • We will see this with proteins 35 RNA secondary structure • E. coli Rnase P RNA secondary structure • How might one predict whether or not a sequence of bases (A,C,G,U) is likely an RNA? Image source: www.mbio.ncsu.edu/JWB/MB409/lecture/ lecture05/lecture05.htm 36 Life depends on three critical molecules • DNAs • Hold information on how cell works • RNAs • Act to transfer short pieces of information to cell parts • Provide templates to synthesize into protein • Proteins • Form enzymes that send signals to other cells and regulate gene activity • Form body’s major components (e.g. hair, skin, etc.) 37 DNA, RNA, and the flow of information Replication Transcription Translation 38 Genes make proteins • Proteins do all sorts of things • Catalysis, structure, movement, defense, regulation, transport, storage, stress response,… 39 But what is a gene? 40 Gene structure • Genes must have: • Exons (usually, protein-coding DNA to be translated) • Start site • Control region Eukaryotic gene structure • Proper gene function requires: • Parts spelled correctly in grammatically correct order 41 Eukaryotic genes Enhancer Promoter Transcribed Region Terminator Transcription RNA Polymerase II Primary transcript 5’ Intron1 3’ Exon1 Cap Splice Cleave/Polyadenylate Translation C N Polypeptide Exon2 7mG An Transport 7mG An 42 Prokaryotic genes Promoter Cistron1 Cistron2 Transcription CistronN Terminator RNA Polymerase mRNA 5’ 3’ 1 2 Translation C N N N Ribosome, tRNAs, Protein Factors C N Polypeptides C 1 2 3 • Prokaryotic and eukaryotic genes have very different grammars 43 Prokaryotes versus eukaryotes • In prokaryotes • The transcribed mRNA is ready to be translated into protein (polypeptide) product • In eukaryotes • The transcribed mRNA (pre-mRNA) must first be processed into mature mRNA • The protein-coding regions (exons) are interspersed with non-coding regions (introns) which must be excised 44 Key feature of Eukaryotic gene structure • Most eukaryotic genes are split, containing large untranscribed sequences • Exon • Part of the gene contributing to mature mRNA • Intron • Part of the gene which is not transcribed • Introns found in all genes, including those coding for RNAs 45 The structure of human PSA • How might you find the exons? • There are grammatical rules • And exons and introns are spelled differently! 46 How does the gene “become” a protein? DNA Transcription RNA Translation Protein 47 Translation • mRNA is used as a template to make proteins • mRNA is bound by tRNA three bases at a time • Occurs at the ribosome • The bound tRNA also binds to a specific amino acid • This amino acid extends the nascent protein sequence 48 mRNA encodes the protein message in DNA • Exonic DNA is interpreted in groups of 3 (codons) AGTTTTGGGCCCAAA • The 64 (4 × 4 × 4) codons correspond to actions to be taken at the ribosome • Start transcription (begin a protein) • Add one of twenty amino acids (extend a protein) • Stop transcription (end a protein) 49 Genetic Code • From RNA triplets to amino acids • • • • • • • 4 possible bases (A, C, G, U) 3 bases in the codon 4 x 4 x 4 = 64 possible codon sequences Start codon: AUG Stop codons: UAA, UAG, UGA 61 codons to code for amino acids (AUG as well) 20 amino acids – redundancy in genetic code • Yet another alphabet! • This one spells proteins 50 Genetic code: 64 triplets code 22 tasks http://www.geneticengineering.org/chemis/Chemis-NucleicAcid/RNA.htm 51 Amino acids • Building blocks for proteins (20 different) • vary by side chain groups • Side chain group gives amino acid physical and chemical properties • e.g. hydrophilic amino acids are water soluble (vs. Hydrophobic) • Linked together via a single chemical bond (peptide bond) • Peptide: Short linear chain of amino acids (< 30) • Polypeptide: long chain of amino acids (which can be upwards of 4000 residues long). 52 The 20 naturally occurring amino acids • • • • • • • • • • • • • • • • • • • • Glycine (G, GLY) Alanine (A, ALA) Valine (V, VAL) Leucine (L, LEU) Isoleucine (I, ILE) Phenylalanine (F, PHE) Proline (P, PRO) Serine (S, SER) Threonine (T, THR) Cysteine (C, CYS) Methionine (M, MET) Tryptophan (W, TRP) Tyrosine (T, TYR) Asparagine (N, ASN) Glutamine (Q, GLN) Aspartic acid (D, ASP) Glutamic Acid (E, GLU) Lysine (K, LYS) Arginine (R, ARG) Histidine (H, HIS) 53 Peptide bonds link amino acids 54 Side-chains distinguish amino acid properties • Each AA has a unique side-chain • Unique molecular properties • Molecular properties of AAs determine protein structure • Another alphabet! • More like a linguistic alphabet 55 Proteins • Polypeptides having a three dimensional structure. • Proteins can form interations: • Proteins (complexes, oligomers) • mRNA • DNA • Proteins can bind to each other depending on their relative charges and structures 56 Four levels of protein structure 57 Secondary structure • Parts of a protein may fold on themselves to form • α-helix, β-sheet, random coil 58 Domains • Domains are structural and/or functional modules within the protein that are usually separately folded Leucine zipper E-F Hand Zinc finger 59 Recognizing domains • Predict protein fold (secondary structure) based on primary sequence • Requires knowledge of spelling and grammar • Key point: • Similar spelling often implies similar function 60 Tertiary interactions 61 Predicting protein function and structure • Given the sequence of amino acids (primary structure) of an anonymous protein • How might one predict its function? • How might one predict its folded structure? 62 Review of alphabetic BI data • DNA • String from 4-letter alphabet of nucleotides (A,G,C,T) • RNA • String from 4-letter alphabet of nucleotides (A,G,C,U) • Coding sequence • String from 64-letter alphabet of nucleotide triplets (AAA,…) • Proteins • String from 20-letter alphabet of amino acids (Ala, Cys, …) 63 From now until December • Explore and exploit the grammar of the genome • How does evolution shape sequence diversity? • Change over time subject to constraints on grammar and spelling • Constraints themselves change over time! • Distinguish random sequences from those structured by function • One million monkeys on one million typewriters? • Or one Shakespeare with vellum? 64 Biological sequence analysis in one sentence Use biological knowledge of (1) grammar (2) spelling (3) evolution to identify sequences with too much structure to have occurred by chance 66 Looking forward • Two goals of this class • What are the important problems? • What tools and techniques can we use to address them? • Next time • Origins of sequence diversity • Reading? 67