Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Bio-Breaks July, 2009 Genes to Genomes Bio-Bre ea ks July 1st, 2009 Genes to Genomes What does life do? subunits A T length DNA “Structure” Deoxyribo-nucleic Acid 1 Bio-Breaks July, 2009 Genes to Genomes chromosome The Genome nucleus And more order Genome: All of the genes Genes: Instructions Proteins: Workhorses histones Genes Central Dogma: DNA sequence DNA RNA Protein “Structure” More order “We wish to suggest a structure for … [DNA]. This structure has novel features which are of considerable biological interest.” Watson Crick DNA Replication DNA New Watson Crick Watson Complementary Base-pairing New Crick 2 Bio-Breaks July, 2009 Genes to Genomes Base Pairing Complementary Base-pairing A G T A C G of DNA T C A Typical T G “sequence” C “velcro” nucleotide Sugar Phosphate 3’ Base 3’ … Base … Base Base Sugar Sugar Phosphate Phosphate 3’ 3’ Base Phosphate 5’ DNA Sugar Phosphate 3’ 3’ Double Helix Phosphate … Sugar Base Sugar 5’ Phosphate T A G C C A 3’ Sugar A T C G Base … Sugar Base 3’ Phosphate G T 5’ Length: 4 base-pairs long (4 bp) 3 Bio-Breaks July, 2009 Genes to Genomes Instructions (DNA) for every function (proteins) Gene Expression Gene Structure - Chromatin Sequence has great importance Central Dogma: DNA RNA Protein Watson New Crick 100 million base pairs long Gene Product (protein) RNA: Ribo-nucleic acid Sequence has great importance 5’ 3’ 5’ nucleotides 5’ Phosphate DNA vs. RNA Base Sugar 3’ Guanine Guanine Adenine Adenine Thymine Uracil Cytosine Cytosine DNA 5’ 3’ DNA RNA 3’ RNA 4 Bio-Breaks July, 2009 Genes to Genomes Transcription DNA RNA 5’ Phosphate A U Occurs in the nucleus Sugar Phosphate C G G C A T Sugar Phosphate Sugar Phosphate Complementary Base-pairing Sugar 3’ Digital copy… …of digital info (19-bp) 146 base-pairs of DNA wrap around the octamer 8 histone proteins (octamer) H2A H2A H3 H2B H3 H4 nucleosome H3 H2A H2A H4 H2B H4 H4 H2B H3 H2B H2A H3 H4 H2A H2B H4 H3 H2B H2A DNA H3 H2A H2A H3 H2A H3 H4 H2A H4 H3 H2B H4 H3 C l ro t on H4 H2B H4 H2B H2B ion g Re H2A H2B H4 H3 H2B n gio e R ing d Co a gene Smallest gene: ILGFII (252 bp) Largest gene: Titin (>300,000 bp) 5 Bio-Breaks July, 2009 Genes to Genomes a gene Genes Watson A T T T A G G A G A C G A T T G G A T A C C T C T A G A G C | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Crick T A A A T C C T C T G C T A A C C T A T G G A G A T C T C G H2A H3 H2A H4 H4 H2B H3 H2B H2A H3 H4 H2A H2B H4 H3 H2B H2A H3 H4 H2A H3 H2A H2A H3 H2A H4 H3 H2B H4 H3 C l ro t on H4 H2B H4 H2B H2B H2A H2B H4 H3 H2B n gio e R ing d Co ion g Re 6 Bio-Breaks July, 2009 Genes to Genomes a gene Control of Gene Expression Transcription Activators RNA Polymerase (regulatory proteins) Coding Region TFII DNA Pol II GATA-1 P P P P ~30 nucleotides/second Nucleotide sequence TF H TFI IB Kin28II P Control Region Transcription Nucleotide sequence 5’ 3’ messenger RNA(s) (proper # of copies) Translation Met•Ser•Ser•Val•Asn•Ala•Asn•Gly•Gly•Tyr…..Xxx Proteins are made from the 20 different amino acids (Proper sequence of amino acids) proper shape proper amount Transcription Activators (regulatory proteins) mRNA Basal transcription machinery TFII DNA Pol II GATA-1 TF H TFI IB P P Kin28II P P Pol II ~30 nucleotides/second P Control Region CCCCTATAGGGG |||||||||||| GGGGATATCCCC Feelin’ Groovy! Sequence specificity TTAACCGGGATATTAACCGG |||||||||||||||||||| AATTGGCCCTATAATTGGCC Transcription factors: Sequence-specific binding proteins 7 Bio-Breaks July, 2009 Genes to Genomes How do we know the function of a gene? WT &inMutant DNA Differences DNA sequences mutations - functional aspects Met•Tyr•Leu•His•Ser•Ala•Asn•Gly•Gly•Tyr•Thr•Lys•Pro•Gln•Lys•Tyr….. Normal (“wild-type”) protein Trans________ Trans________ Wild-type Gene A T T T C G G A G A C A A T T A T G T A C C T C C A T A G C | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | T A A A G C C T C T G T T A A T A C A T G G A G G T A T C G Control Region (promoter) Mutations Coding Region Smallest change possible: 1bp Mutant GENOTYPE A T T T C G G A G A C A A T T A T G T A C C T C C A G A G C | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | T A A A G C C T C T G T T A A T A C A T G G A G G T C T C G Mutant Gene Mutant protein G Met•Tyr•Leu•Gln•Ser•Ala•Asn•Gly•Gly•Tyr•Thr•Lys•Pro•Gln•Lys•Tyr….. Mutant protein fails to function, results in a mutant PHENOTYPE C Sequence has great importance 1 in 15,000 births Normal sequence of PKU gene A T MUTANT sequence of PKU gene C G A C MUTANT sequence of PKU mRNA MUTANT sequence of PKU polypeptide Phenylalanine hydroxylase Phenylalanine Tyrosine Mutant Phenylalanine hydroxylase (non-functional) Phenylalanine X Tyrosine 8 Bio-Breaks July, 2009 Genes to Genomes Human Genome Sequencing GGGCGGCCCTAAATA… AAATTTGGGCCCTATATTGCGCCATAATGGGCGGCCC GGGATCCCATAAATTTGGGCCC 1 2 3 4 5 6 7 seven “reads” each ~ 500 nucleotides long one contig (watson) GGGTTTCCCTTT |||||||||||| (crick) CCCAAAGGGAAA 2 million reads of ~ 500 nucleotides/read Grouped into contigs ~ 10,000 base pairs > 29,000 contigs = ~ 3 Billion base pairs Human Genome Sequencing >3,000 genes M bp AAATTTGGGCCCTATATTGCGCCATAATGGGCGGCCCTAAATA… 200-400 genes 47 M bp 2 25 21 1 22 autosomes 1 sex chromosome 2 million reads of ~ 500 nucleotides/read Grouped into contigs ~ 10,000 base pairs > 29,000 contigs = ~ 3 Billion base pairs 9 Bio-Breaks July, 2009 Genes to Genomes Bufo bufo 6.9 Billion Budding yeast 12 million TCCAGTCCCCTCAAGTCCGAAGCCCCTACCCACTCTCACGCCAGGCAGGGGTGGGGGCCG CCGGGGTCATATAACCGGGCCCCTTCTCTGCCTTGATGAGCTCCGTTAACGCAAATGGAG GATATACCAAACCACAAAAATATGTGCCAGGGCCAGGTGATCCTGAACTTCCACCCCAAC TATCCGAATTTAAAGATAAAACATCGGATGAAATCTTGAAAGAAATGAACAGAATGCCTT TTTTCATGACCAAGTTGGATGAAACAGACGGTGCAGGTGGTGAAAACGTGGAGTTAGAAG CTTTAAAGGCATTAGCTTATGAAGGCGAACCACACGAAATCGCTGAAAATTTCAAGAAGC AAGGTAACGAACTATACAAAGCAAAAAGATTCAAGGATGCAAGGGAACTTTACTCAAAGG GCTTGGCTGTAGAATGCGAAGATAAATCAATAAATGAGTCACTATATGCCAATAGAGCGG CATGTGAGTTAGAGCTGAAAAATTACAGGAGGTGTATCGAGGACTGCAGTAAAGCTCTAA CTATTAACCCCAAGAATGTTAAGTGCTACTATCGTACAAGCAAGGCTTTTTTCCAATTAA ACAAGTTGGAGGAGGCCAAATCAGCCGCAACATTTGCCAATCAAAGGATTGACCCAGAGA ACAAATCAATTTTGAATATGTTATCAGTGATTGATAGAAAAGAACAAGAATTGAAAGCAA AAGAAGAAAAACAGCAAAGAGAAGCTCAGGAACGTGAAAACAAGAAAATTATGTTAGAGA GCGCAATGACGCTGAGAAACATAACTAACATCAAAACTCACTCTCCAGTAGAGTTACTTA ATGAGGGTAAAATAAGGCTAGAAGACCCAATGGATTTTGAATCTCAATTGATCTATCCCG CATTAATTATGTACCCCACGCAAGATGAATTTGATTTTGTAGGTGAAGTAAGTGAGTTAA CTACTGTGCAAGAACTTGTTGACCTAGTTTTGGAAGGGCCGCAAGAACGCTTCAAAAAAG AAGGTAAGGAAAACTTCACACCAAAGAAAGTGTTGGTGTTCATGGAAACAAAGGCAGGTG GTTTGATTAAAGCTGGTAAGAAACTGACATTTCACGATATCTTGAAGAAAGAGTCGCCAG ATGTACCATTGTTCGATAACGCTTTGAAAATATATATTGTGCCAAAGGTAGAAAGTGAAG GGTGGATTTCCAAGTGGGATAAGCAAAAAGCCTTAGAAAGAAGATCTGTGTGAGGGGGCC CGGGGGACGTCTTCCCAGGGCTCACTAAAACCGGCCGGGAAGCCTGGGCTGCACTAGGAG CCGGCGACCCTGGGGCGAGGGGCGGCCCGGAGCCCTGCGGGAGGAGCTGGCGGCCGCCCC AGGTAGCAACCATCCTGCCTCCCGCTGGAGCGGCGTCTCCTCCCCGGGAGGAGGGCAGGG Various Genome SizesAmoeba dubia 670 Billion Muntiacus muntjak 2.5 Billion Homo sapiens Plasmodium falciparum 25 Million 3 Billion TCCAGTCCCCTCAAGTCCGAAGCCCCTACCCACTCTCACGCCAGGCAGGGGTGGGGGCCG CCGGGGTCATATAACCGGGCCCCTTCTCTGCCTTGATGAGCTCCGTTAACGCAAATGGAG Bioinformatics = Molecular Biology + Computer Science GATATACCAAACCACAAAAATATGTGCCAGGGCCAGGTGATCCTGAACTTCCACCCCAAC TATCCGAATTTAAAGATAAAACATCGGATGAAATCTTGAAAGAAATGAACAGAATGCCTT TTTTCATGACCAAGTTGGATGAAACAGACGGTGCAGGTGGTGAAAACGTGGAGTTAGAAG Transcription start sites CTTTAAAGGCATTAGCTTATGAAGGCGAACCACACGAAATCGCTGAAAATTTCAAGAAGC AAGGTAACGAACTATACAAAGCAAAAAGATTCAAGGATGCAAGGGAACTTTACTCAAAGG Transcription termination sites GCTTGGCTGTAGAATGCGAAGATAAATCAATAAATGAGTCACTATATGCCAATAGAGCGG CATGTGAGTTAGAGCTGAAAAATTACAGGAGGTGTATCGAGGACTGCAGTAAAGCTCTAA CTATTAACCCCAAGAATGTTAAGTGCTACTATCGTACAAGCAAGGCTTTTTTCCAATTAA Translation start sites ACAAGTTGGAGGAGGCCAAATCAGCCGCAACATTTGCCAATCAAAGGATTGACCCAGAGA ACAAATCAATTTTGAATATGTTATCAGTGATTGATAGAAAAGAACAAGAATTGAAAGCAA AAGAAGAAAAACAGCAAAGAGAAGCTCAGGAACGTGAAAACAAGAAAATTATGTTAGAGA Translation stop sites GCGCAATGACGCTGAGAAACATAACTAACATCAAAACTCACTCTCCAGTAGAGTTACTTA ATGAGGGTAAAATAAGGCTAGAAGACCCAATGGATTTTGAATCTCAATTGATCTATCCCG CATTAATTATGTACCCCACGCAAGATGAATTTGATTTTGTAGGTGAAGTAAGTGAGTTAA CTACTGTGCAAGAACTTGTTGACCTAGTTTTGGAAGGGCCGCAAGAACGCTTCAAAAAAG AAGGTAAGGAAAACTTCACACCAAAGAAAGTGTTGGTGTTCATGGAAACAAAGGCAGGTG GTTTGATTAAAGCTGGTAAGAAACTGACATTTCACGATATCTTGAAGAAAGAGTCGCCAG ATGTACCATTGTTCGATAACGCTTTGAAAATATATATTGTGCCAAAGGTAGAAAGTGAAG GGTGGATTTCCAAGTGGGATAAGCAAAAAGCCTTAGAAAGAAGATCTGTGTGAGGGGGCC CGGGGGACGTCTTCCCAGGGCTCACTAAAACCGGCCGGGAAGCCTGGGCTGCACTAGGAG CCGGCGACCCTGGGGCGAGGGGCGGCCCGGAGCCCTGCGGGAGGAGCTGGCGGCCGCCCC AGGTAGCAACCATCCTGCCTCCCGCTGGAGCGGCGTCTCCTCCCCGGGAGGAGGGCAGGG Bioinformatics 10 Bio-Breaks July, 2009 Genes to Genomes Bioinformatics = Molecular Biology + Computer Science Gene Hallmarks protein ribosome mRNA UGA AUG Translation mRNA UGA AUG Translation start site Transcription Translation STOP site X X X T A T A X X X X X X X X X X X X A T G X X X X X X X X X X X X X X X X X X X X X X X X T G A X X X X X X X T T A A A X X X X DNA sequence Transcription start site Protein-coding region Transcription termination site TCCAGTCCCCTCAAGTCCGAAGCCCCTACCCACTCTCACGCCAGGCAGGGGTGGGGGCCG CCGGGGTCATATAACCGGGCCCCTTCTCTGCCTTGATGAGCTCCGTTAACGCAAATGGAG GATATACCAAACCACAAAAATATGTGCCAGGGCCAGGTGATCCTGAACTTCCACCCCAAC TATCCGAATTTAAAGATAAAACATCGGATGAAATCTTGAAAGAAATGAACAGAATGCCTT TTTTCATGACCAAGTTGGATGAAACAGACGGTGCAGGTGGTGAAAACGTGGAGTTAGAAG CTTTAAAGGCATTAGCTTATGAAGGCGAACCACACGAAATCGCTGAAAATTTCAAGAAGC AAGGTAACGAACTATACAAAGCAAAAAGATTCAAGGATGCAAGGGAACTTTACTCAAAGG Transcription GCTTGGCTGTAGAATGCGAAGATAAATCAATAAATGAGTCACTATATGCCAATAGAGCGG Start Translation CATGTGAGTTAGAGCTGAAAAATTACAGGAGGTGTATCGAGGACTGCAGTAAAGCTCTAA Control CTATTAACCCCAAGAATGTTAAGTGCTACTATCGTACAAGCAAGGCTTTTTTCCAATTAA Sequence Sequence ACAAGTTGGAGGAGGCCAAATCAGCCGCAACATTTGCCAATCAAAGGATTGACCCAGAGA ACAAATCAATTTTGAATATGTTATCAGTGATTGATAGAAAAGAACAAGAATTGAAAGCAA AAGAAGAAAAACAGCAAAGAGAAGCTCAGGAACGTGAAAACAAGAAAATTATGTTAGAGA GCGCAATGACGCTGAGAAACATAACTAACATCAAAACTCACTCTCCAGTAGAGTTACTTA ATGAGGGTAAAATAAGGCTAGAAGACCCAATGGATTTTGAATCTCAATTGATCTATCCCG CATTAATTATGTACCCCACGCAAGATGAATTTGATTTTGTAGGTGAAGTAAGTGAGTTAA CTACTGTGCAAGAACTTGTTGACCTAGTTTTGGAAGGGCCGCAAGAACGCTTCAAAAAAG STOP Translation AAGGTAAGGAAAACTTCACACCAAAGAAAGTGTTGGTGTTCATGGAAACAAAGGCAGGTG GTTTGATTAAAGCTGGTAAGAAACTGACATTTCACGATATCTTGAAGAAAGAGTCGCCAG Sequence ATGTACCATTGTTCGATAACGCTTTGAAAATATATATTGTGCCAAAGGTAGAAAGTGAAG GGTGGATTTCCAAGTGGGATAAGCAAAAAGCCTTAGAAAGAAGATCTGTGTGAGGGGGCC CGGGGGACGTCTTCCCAGGGCTCACTAAAACCGGCCGGGAAGCCTGGGCTGCACTAGGAG CCGGCGACCCTGGGGCGAGGGGCGGCCCGGAGCCCTGCGGGAGGAGCTGGCGGCCGCCCC AGGTAGCAACCATCCTGCCTCCCGCTGGAGCGGCGTCTCCTCCCCGGGAGGAGGGCAGGG 11 Bio-Breaks July, 2009 Genes to Genomes CATGGGTCATATAACCGGGCCCCTTCTCTGCCTTGATGAGCTCCGTTAACGCAAATGGAG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GTACCCAGTATATTGGCCCGGGGAAGAGACGGAACTACTCGAGGCAATTGCGTTTACCTC Chromosome Schematic Level of Gene with ORFs Control Region (promoter) Region encoding protein CNS1 1,155 base pairs long ARA1 APD1 TBS1 SPP3 RIB7 CNS1 RPB5 AMN1 SLI15 YBR159W ICS2 Chromosome II 200 400 600 800 Length of DNA (x 1,000) (=kb) 22,000 genes ( DNA expressed as protein products) Need experimental verification Only ~ 2% of genome Protein products of many are unknown Most are closely similar to those of other organisms February 2001 Only 1% of human genes are unique to humans 12 Bio-Breaks July, 2009 Genes to Genomes Comparative Genomics Comparative Genomics I (different organisms) Species Bioinformatics Base Pairs Genes Human 3 Billion 22,000 Worm 100 Million 19,000 Fruit Fly 120 Million 14,000 Arabidopsis 125 Million 25,000 Baker’s yeast 12 Million 6,000 E. coli 4 Million 4,800 Nature September, 2005 No Alzheimer’s, little cancer ~ 99% of chimp DNA same as human 35 million differences: one every ~ 100 base pairs Krystii Melaine Pan troglodytes ~ 29% of all proteins 71% proteins have some identical sequence difference ~ 1-2 aa differences between most proteins ~ 24% of the differences are in regions that control expression of a gene. 13 Bio-Breaks July, 2009 Genes to Genomes Human Apes a a b c b f d e e f g Inversions on 9 chromosomes Present in each of the 18 species of great apes February 2001 d c g h i j h i j k k Dog Genome bioinformatics suite Genomes Organism “Sequenced” Human ~ $ Billion Eukaryotes > 150 Bacteria > 800 Archaea ~ 50 Viruses > 200 July 2004 2.9 Billion BP Today: ~ $10,000-$50,000/human genome Tasha ~ $30 Million 2.4 Billion BP 14