Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
CS273a A Zero-Knowledge Based Introduction to Biology Courtesy of George Asimenos Biology From the greek word βίος = life Timeline: 1683 1858 1865 1953 1955 1978 1983 1990 2000 – – – – – – – – – discovery of bacteria Darwin’s natural selection Mendel’s laws double helix suggested by Watson-Crick discovery of DNA and RNA polymerase sequencing of first genome (5kb virus) invention of PCR discovery of RNAi human genome (draft) Properties of Biology Still a lot to understand Experimental data… are non-discrete have errors Hard to model ∀ rule ∃ exception The Cell cell, nucleus, cytoplasm, mitochondrion © 1997-2005 Coriell Institute for Medical Research Building an Organism DNA cell Every cell has the same sequence of DNA Subsets of the DNA sequence determine the identity and function of different cells From DNA To Organism ? Proteins do most of the work in biology, and are encoded by subsequences of DNA, known as genes. How many? Cells in the human body: >1014 (100 trillion) Chromosomes histone, nucleosome, chromatin, chromosome, centromere, telomere telomere centromere nucleosome DNA H1 chromatin ~146bp H2A, H2B, H3, H4 How many? Chromosomes in a human cell: 46 (2x22 + X/Y) Nucleotide deoxyribose, nucleotide, base, A, C, G, T, purine, pyrimidine, 3’, 5’ purines to previous nucleotide O O P O- 5’ H O H C H Guanine (G) Thymine (T) Cytosine (C) to base O C Adenine (A) C H H C 3’ to next nucleotide C H pyrimidines H Let’s write “AGACC”! “AGACC” (backbone) “AGACC” (DNA) deoxyribonucleic acid (DNA) 3’ 5’ 3’ 5’ DNA is double stranded strand, reverse complement 5’ 3’ 3’ 5’ DNA is always written 5’ to 3’ AGACC or GGTCT RNA ribose, ribonucleotide, U purines to previous ribonucleotide O O P O- 5’ H O H C H Uracil (U) Cytosine (C) C H C 3’ Guanine (G) to base O C Adenine (A) H C OH to next ribonucleotide H pyrimidines How many? Nucleotides in the human genome: ~ 3 billion Genes & Proteins gene, transcription, translation, protein Double-stranded DNA 5’ TAGGATCGACTATATGGGATTACAAAGCATTTAGGGA...TCACCCTCTCTAGACTAGCATCTATATAAAACAGAA 3’ 3’ ATCCTAGCTGATATACCCTAATGTTTCGTAAATCCCT...AGTGGGAGAGATCTGATCGTAGATATATTTTGTCTT 5’ (transcription) Single-stranded RNA AUGGGAUUACAAAGCAUUUAGGGA...UCACCCUCUCUAGACUAGCAUCUAUAUAA (translation) protein How many? Genes in the human genome: ~ 20,000 – 25,000 Gene Transcription promoter 5’ 3’ G A T T A C A . . . C T A A T G T . . . 3’ 5’ Gene Transcription transcription factor, binding site, RNA polymerase 5’ 3’ G A T T A C A . . . C T A A T G T . . . 3’ 5’ Transcription factors recognize transcription factor binding sites and bind to them, forming a complex. RNA polymerase binds the complex. Gene Transcription 5’ 3’ 3’ 5’ The two strands are separated Gene Transcription 5’ 3’ 3’ 5’ An RNA copy of the 5’→3’ sequence is created from the 3’→5’ template Gene Transcription G A T T A C A . . . 5’ 3’ 3’ 5’ C T A A T G T . . . pre-mRNA 5’ G A U U A C A . . . 3’ RNA Processing 5’ cap, polyadenylation, exon, intron, splicing, UTR, mRNA 5’ cap poly(A) tail exon intron mRNA 5’ UTR 3’ UTR Gene Structure introns 5’ 3’ promoter 5’ UTR exons 3’ UTR coding non-coding How many? Exons per gene: ~ 8 on average (max: 148) Nucleotides per exon: 170 on average (max: 12k) Nucleotides per intron: 5,500 on average (max: 500k) Nucleotides per gene: 45k on average (max: 2,2M) Amino acid amino acid H N H O C C H R OH Alanine Arginine Asparagine Aspartate Cysteine Glutamate Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine There are 20 standard amino acids Proteins N-terminus, C-terminus to previous aa N H O C C to next aa H R H N-terminus OH C-terminus Translation ribosome, codon P site A site mRNA The ribosome synthesizes a protein by reading the mRNA in triplets (codons). Each codon is translated to an amino acid. The Genetic Code U C A G UUU Phenylalanine (Phe) UCU Serine (Ser) UAU Tyrosine (Tyr) UGU Cysteine (Cys) U UUC Phe UCC Ser UAC Tyr UGC Cys C UUA Leucine (Leu) UCA Ser UAA STOP UGA STOP A UUG Leu UCG Ser UAG STOP UGG Tryptophan (Trp) G CUU Leucine (Leu) CCU Proline (Pro) CAU Histidine (His) CGU Arginine (Arg) U CUC Leu CCC Pro CAC His CGC Arg C CUA Leu CCA Pro CAA Glutamine (Gln) CGA Arg A CUG Leu CCG Pro CAG Gln CGG Arg G AUU Isoleucine (Ile) ACU Threonine (Thr) AAU Asparagine (Asn) AGU Serine (Ser) U AUC Ile ACC Thr AAC Asn AGC Ser C AUA Ile ACA Thr AAA Lysine (Lys) AGA Arginine (Arg) A AUG Methionine (Met) or START ACG Thr AAG Lys AGG Arg G GUU Valine (Val) GCU Alanine (Ala) GAU Aspartic acid (Asp) GGU Glycine (Gly) U GUC Val GCC Ala GAC Asp GGC Gly C GUA Val GCA Ala GAA Glutamic acid (Glu) GGA Gly A GUG Val GCG Ala GAG Glu GGG Gly G U C A G Translation 5’ ...AUUAUGGCCUGGACUUGA... UTR Met Start Codon Ala Trp Thr Stop Codon 3’ Translation 5’ ...AUUAUGGCCUGGACUUGA... 3’ Translation Met 5’ Trp Ala ...AUUAUGGCCUGGACUUGA... 3’ Errors? mutation What if the transcription / translation machinery makes mistakes? What is the effect of mutations in coding regions? Reading Frames reading frame G C U U G U U U A C G A A U U A G G C U U G U U U A C G A A U U A G G C U U G U U U A C G A A U U A G G C U U G U U U A C G A A U U A G Synonymous Mutation synonymous (silent) mutation, fourfold site G G C U U G U U U A C G A A U U A G G C U U G U U U A C G A A U U A G Ala Cys Leu Arg Ile G C U U G U U U G C G A A U U A G Ala Cys Leu Arg Ile Missense Mutation missense mutation G G C U U G U U U A C G A A U U A G G C U U G U U U A C G A A U U A G Ala Cys Leu Arg Ile G C U U G G U U A C G A A U U A G Ala Trp Leu Arg Ile Nonsense Mutation nonsense mutation A G C U U G U U U A C G A A U U A G G C U U G U U U A C G A A U U A G Ala Cys Leu Arg Ile G C U U G A U U A C G A A U U A G Ala STOP Frameshift frameshift G C U U G U U U A C G A A U U A G G C U U G U U U A C G A A U U A G Ala Cys Leu Arg Ile G C U U G U U A C G A A U U A G Ala Tyr Cys Glu Leu Gene Expression Regulation regulation When should each gene be expressed? Regulate gene expression Examples: Make more of gene A when substance X is present Stop making gene B once you have enough Make genes C1, C2, C3 simultaneously Regulatory Mechanisms enhancer, silencer Transcription Factor Specificity: Enhancer: Silencer: The end?