Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Developmental Integration of Bioinformatics Activities at Different Levels of the Biology Curriculum Jeff Newman Lycoming College, Williamsport PA August 5, 2016 Genomics & Bioinformatics Throughout the Curriculum Jeffrey D. Newman, Lycoming College November 10, 2006 Outline • The starting line: Where we were. • Philosophy: Use of bioinformatics & genome data is as important to a 21st century biologist as using a microscope!!! • 3 phases – Incorporate Molecular Biology, Incorporate Genomics and Bioinformatics, Add New Upper-Level Courses. – Introductory Biology – Genetics – Microbiology – Upper-level courses – Biochemistry, Molecular Biology, Genome Analysis, Cell & Molecular Research Methods. • Assessment Surveys – Knowledge, Skills, Attitudes • Where to go from here? Incorporation of Molecular Biology, Bioinformatics, Genomics • Phase I (‘96-’99) Integrate Molecular Biology into Introductory and core course labs. – Introductory Biology – pGLO plasmid prep, transformation, restriction digest, gel. – Genetics – PCR of Clotting Factor IX fragment from cheek cell DNA, cloning into pBS, blue-white screening – Microbiology – PCR of unknown’s rRNA gene, sequence PCR product. Incorporation of Molecular Biology, Bioinformatics, Genomics • Phase II (’99-’04) Genomics & Bioinformatics added to many courses – Introductory Biology – Comparative genomics, Human Genome Characteristics, 3D structures, DNA sequence analysis, Multiple sequence alignment, phylogenetic trees – Genetics – Sequence construction, discussion of microarrays – Microbiology – MSA, trees, consensus seq’s, Microbial Genome Papers, Metagenomics – Molecular Biology – Microarrays (thanks to GCAT), Integrated Informatics Projects • Pedagogical Approach – Increase sophistication of analysis as students progress through the curriculum • Project assessment survey – Spring ’01, GCAT Spring ‘02 Incorporation of Molecular Biology, Bioinformatics, Genomics • Phase III (’04 - ?) – New course development – Genome Analysis – Fall ’04, ‘06 – Cell and Molecular Research Methods, Fall ‘06 Courses Taught (all have labs except Public Health) • Fall – Bio 110–Introduction to Biology I (with 2-3 lab sections) – Bio 150–Public Health or Bio 432–Molecular Biology or Bio 437 – Genome Analysis or Bio 447 – Research Methods • Spring – Bio 321 – Microbiology (with 2 lab sections) – Bio/Chem 444 – Biochemistry • Research lab with 5-15 students – – – – Research Methods, Independent Study & Honors students, Paid lab assistants High school student volunteers Bio 110 – Introduction to Biology I (majors) Lab activities designed to support course topics. • • • • • • • • Biomolecules – Lab #2 = 3D structures of molecules Cell Biology Enzymes & Metabolism – Lab #4c = Kinetic Analysis with Excel Information Flow – Lab #5a = Gene ID in a sequence, predicting traits from plasmid and genome, restriction mapping of plasmid. Cell signaling Cell Cycle Mutations Cancer Meiosis Mendelian Genetics – Lab #7 = OMIM for basis of traits Biotech, Genomics, Developmental Biology Evolution, Population Genetics – Lab #10 = Retrieve myoglobin protein sequences from different animals, align, create tree, ID lineages where mutations occurred. Intro Bio Lab #2 = 3D structures of molecules • Small Molecules using Biomodel-3, developed by Angel Herráez ([email protected]), lecturer in Biochemistry and Molecular Biology at the University of Alcalá de Henares (Spain). http://biomodel.uah.es/en/model3/inicio.htm • Concepts – pdb files, rendering structures in different ways, manipulating structures, standard color schemes for elements. – # of bonds on atoms, chemical formula, atomic/molecular mass, functional groups – Saturated vs unsaturated fatty acids, components of phospholipids, arrangement of phospholipids into a bilayer Intro Bio Lab #2 = 3D structures of molecules • DNA structure tutorial originally by Eric Martz (UMass) • Concepts – 5’, 3’ ends, antiparallelism – Backbone vs bases, components of nucleotides – AT vs GC base pairs, complementary H-bond donors and acceptors Intro Bio Lab #2 = 3D structures of molecules • Tripeptide Concepts - Amino acid structure, peptide bonds, Directionality • Protein – oxyhemoglobin • AA sequence – structure correlation • 2o structure – alpha helix, H-bonding • 3o structure – location of hydrophilic, hydrophobic residues • 4o structure – intersubunit interfaces, • Ligand binding – interaction with heme group Intro Bio Lab #2 = 3D structures of molecules Lab #4c = Kinetic Analysis with Excel • Enzyme assay lab • Week 1 - Protein extracted from raw wheat germ, measured with Bradford assay. • Week 2 - Acid phosphatase enzyme activity compared between crude and purified, substrate concentration varied. • Week 3 – Calculations and graphing of data in Excel Lab #5a = Mr. Green Genes - Gene ID in a sequence, predicting traits from plasmid and genome, restriction mapping of plasmid. • Lab developed using BioRad’s pGLO plasmid. • Students provided with pGLO DNA sequence find genes develop hypotheses about traits of bacteria with plasmid design experiments to test hypothesis about function of the DNA find restriction sites develop hypotheses/predictions about fragment sizes after cutting with restriction enzyme tests physical properties of the DNA Lab #5a = Mr. Green Genes - Gene ID in a sequence, predicting traits from plasmid and genome, restriction mapping of plasmid. • Sequence pasted into NCBI ORF finder tool • Concepts – Start, stop codons, genetic code, 5’3’ directionality, 6 frame translations, ORF vs protein length New and “Improved”? ORF Finder New and “Improved”? ORF Finder BLAST Search with translated ORFs • Discuss principles of BLAST search, significance of E value and score. • ID of AraC, GFP, Beta-lactamase. • What traits? • How to test? • Controls? Lab #5a = Mr. Green Genes - Gene ID in a sequence, predicting traits from plasmid and genome, restriction mapping of plasmid. • RAST = Rapid Annotation with Subsystem Technology • Students browse through a RAST annotation to see similar approach with whole genome. Lab #5a = Mr. Green Genes - Gene ID in a sequence, predicting traits from plasmid and genome, restriction mapping of plasmid. • pGLO sequence pasted into New England Biolabs NEBCutter used to ID restriction sites • Students predict what size fragments will be obtained when cutting pGLO with different enzymes • Students construct map of plasmid for lab report • Week 2 – students isolate plasmid (Qiagen), Prep competent cells, do transformation and plating, set up restriction digest. • Week 3 – students prep and run gel, observe and discuss transformation plates, photograph and discuss gel to compare with hypotheses/predictions Lab #7 = OMIM for basis of traits PTC (non)tasting haplotype Lab #7 = OMIM for basis of traits Skin Pigmentation Look at type of mutation, global distribution of SNP Lab #7 = OMIM for basis of traits Red Hair/Fair Skin Multiple genes, phenotypes, signals, tanning response Lab #7 = OMIM for basis of traits Colorblindness, Blood type…. 23 and Me Lab #10 = Retrieve myoglobin protein sequences from different animals, align, create tree, ID mutations. • Week 14 lab – Hybrid lab on Evolution – Watch video, compare hominid skulls, ape chromosome banding patterns, myoglobin sequences. • UniProt used to retrieve myoglobin protein sequences from a diverse set of vertebrates, including Human • After editing species names, MEGA used to create multiple sequence alignment & construct phylogenetic tree • Students compare tree and alignment to ID ancestor where mutations occurred. • Best evidence for evolution 1 2 4 3 Long-Finned Pilot Whale Killer Whale Beluga Whale Amazon River Dolphin 1 Sperm Whale Humpback Whale 2 Olive Babonn 5 Ayla Human Chimpanzee 4 Mouse Rat Three-Stripeed Night Monkey 3 Loggerhead Sea Turtle Green Sea Turtle 0.05 5 5 1 2 3 4 Short Beaked Common Dolphin Killer Whale Horse 5 Sheep Bison Buffalo Pig Night Monkey 1 Chimpanzee 2 3 Lindsay Folmar Gorilla Mouse Rat 4 Chicken Turkey 0.05 2 3 4 5 1 short-beaked common dolphin Dalls porpoise finback whale Gillian Barkell chimpanzee dog african wild dog Middle East blind mole rat mouse rat brown woolly monkey common squirrel monkey loggerhead sea turtle green sea turtle 0.05 Bioinformatics in Intro Biology - Summary • Students spend 4.5 lab periods in the computer lab • Advantages – Students develop key skills, become experienced with basic bioinformatics tools and databases – Abstract concepts become more concrete through hands-on analysis and visualization – It’s free!!!! • Disadvantages – Fewer wet labs, frequent software and web site changes require regular revision of instructions – Some students find computer work boring The LycoMicro Unknown Microbe Lab Week 8 – Analyze DNA sequence @ http://www.ezbiocloud.net/eztaxon , - Construct Phylogenetic Tree w/MEGA, - Literature Research (IJSEM) Pantoea anthophila JJM Escherichia coli Acinetobacter johnsonii Pseudomonas aeruginosa Neisseria gonorrhoeae Aquaspirillum sinuosum Helicobacter pylori Bdellovibrio bacteriovorus Blastopirellula marina Cytophaga hutchinsonii Sphingobacterium anhuiense Chryseobacterium indologenes Prochlorococcus marinus Geovibrio ferrireducens Lactococcus lactis Streptococcus pyogenes Exiguobacterium undae Bacillus subtilis Staphylococcus aureus Oerskovia jenensis Arthrobacter aurescens Streptomyces coelicolor Corynebacterium callunae Nitrospira moscoviensis Aquifex pyrophilus Thermomicrobium roseum Chloroflexus aurantiacus 0.02 Bio/Chem 444 Protein Structure Lab • Students use RCSB to examine Phenylalanine Hydroxylase. • Concepts – amphipathic helix interactions, beta sheet, turn structure details, cofactor and substrate interactions and binding, paralogs, substrate analogs Bio/Chem 444 Metabolic Reconstruction • Students use RAST to reconstruct pathways in an organism, ID steps – must map 20 subsystems, all interconnected. Bio447 - Research Methods • Complete & deposit 16S sequence • Determine reference organisms from phylogenetic tree • Sequence & compare genome(s) • Obtain reference organisms • Repeat experiments in parallel to determine differences and similarities • Prepare poster for ASM • Write a paper for IJSEM B. indicus B. cibi B. sp. SJS • Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI LargeScale Genome Sequencing Program Available at: www.genome.gov/sequencingcosts. Accessed [6-15-16]. GCAT GCAT-SEEK • Genome Consortium for Active Teaching (GCAT) founded in 2000 to bring Genomics (Microarrays) to the undergraduate curriculum. • Multiple HHMI & NSF funded workshops • GCAT-SEEKquence “spin-off” to bring NextGen sequencing to the undergraduate curriculum. • 3 genomes (Ion Torrent & 454 as part of pilot) • NSF Research Collaboration Network, Juniata’s HHMI Genomics Leadership Initiative Shared MiSeq (2x300) Runs • NextGen Instruments generate more data than most UG faculty can use or afford. • November 2013 – 27 bacteria @$200 each (including Flavobacterium aquatile) • April, 2014 – Opened to Microedu Listserv 35 Bacteria and Phage from 16 institutions @$190/sample • October 2014 – 30 phage, viruses and bacteria @$175/sample. Sample Reads est. Bases est. GSF665-1-E_coli-C06b GSF665-2-Chryseobacterium-LO GSF665-3-Linfield-KH GSF665-4-Linfield-NH GSF665-5-Exiguobacterium GSF665-6-Plesiomonas_shigelloides GSF665-7-Halosimplex_carlsbadense GSF665-8-Phage_Eapen GSF665-9-Phage_Aspire GSF665-10-strain_3572 GSF665-11-Gracilibacillus_dipsosauri GSF665-12-Serratia_S12 GSF665-13-Rhodococcus_T1Sofl-14 GSF665-14-Janthinobacterium-BJB1 GSF665-15-Janthinobacterium-BJB349 GSF665-16-Janthinobacterium-BJB304 GSF665-17-Janthinobacterium-BJB317 GSF665-18-Iodobacter-BJB302 GSF665-19-Asaia_bogorensis GSF665-20-Asaia_siamensis GSF665-21-Asaia_astilbes GSF665-22-Asaia_platycodi GSF665-23-Asaia_krungthepensis GSF665-24-Asaia_prunellae GSF665-27-Serratia -DL GSF665-28-Phage-KitKat GSF665-29-Cyanobacterium-RC610 GSF665-30-Serratia_marcescens-RH GSF665-31-Bacillus_cibi GSF665-32-Pedobacter-BMA GSF665-33-Flavobacterium-KMS GSF665-34-Flavobacterium_hibernum GSF665-36-Flavobacterium_hydatis GSF665-39-Kaistella_koreensis GSF665-40-Kaistella_haifense 217,320 1,317,872 809,893 301,171 794,482 656,143 595,655 573,447 170,895 593,179 986,925 827,533 297,153 823,488 883,287 1,098,516 549,616 206,973 1,096,204 820,818 783,447 808,325 1,152,811 1,035,414 129,258 53,773 909,265 307,886 693,101 1,200,365 185,975 1,432,517 744,893 1,238,892 1,067,969 130,391,966 790,723,170 485,935,870 180,702,758 476,689,384 393,685,659 357,393,201 344,068,354 102,536,927 355,907,159 592,154,880 496,519,794 178,292,067 494,092,592 529,972,260 659,109,346 329,769,324 124,183,611 657,722,373 492,490,968 470,068,239 484,994,710 691,686,698 621,248,288 77,554,903 32,263,632 545,559,194 184,731,584 415,860,714 720,218,713 111,585,274 859,510,422 446,935,512 743,334,928 640,781,490 Total Average 25,364,460 724,699 15,218,675,963 434,819,313 Assembly statistics – discussed in Intro and Micro [SoftGenetics Assembler: Assembly Results Statistics Report] • Total Reads Number: 2056329 • Matched Reads Number: 1983986 • Unmatched Reads Number: 72343 • Assembled Sequences Number: 61 • Average Sequence Length: 57497 • Minimum Sequence Length: 158 • Maximum Sequence Length: 641985 • N50 Length: 366076 [Final Contig Merge Results Statistics Report] • Final Contig Merge Sequences Number: 13 • Final Contig Merge Average Sequence Length: 269063 • Final Contig Merge Minimum Sequence Length: 173 • Final Contig Merge Maximum Sequence Length: 856388 • Final Contig Merge N50 Length: 586767 • Matched Reads Count: 1977550 • Number of Matched Bases: 562514128 • Average Read Length: 285 • Average Coverage: 161 • Reference Length: 3507364 Phenotype Comparisons Seed Viewer Sequence Based Comparison Tool RAST – Sequence based comparison tool to ID orthologs C.populense lacks carotenoid biosynthetic genes C.hispalense C.populense Explain phenotypic differences – e.g. Pigment “Landscapes” C.hispalense carotenoid flexirubin C.populense CF314 Flexirubins only Sequence-Based Comparison color codes similarity Sequence Based Comparison provides protein seq similarity AAI Sequence Based Comparison can ID unique and shared genes….. Venn Diagram Tool Venn Diagram Template Identify Core, Genus or Family-Specific Genes Links/Tools available at novelmicrobe.com