Download slides - QUBES Hub

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Developmental Integration of Bioinformatics Activities at
Different Levels of the Biology Curriculum
Jeff Newman
Lycoming College, Williamsport PA
August 5, 2016
Genomics & Bioinformatics
Throughout the Curriculum
Jeffrey D. Newman, Lycoming College
November 10, 2006
Outline
• The starting line: Where we were.
• Philosophy: Use of bioinformatics & genome data is as
important to a 21st century biologist as using a microscope!!!
• 3 phases – Incorporate Molecular Biology, Incorporate
Genomics and Bioinformatics, Add New Upper-Level Courses.
– Introductory Biology
– Genetics
– Microbiology
– Upper-level courses – Biochemistry, Molecular Biology,
Genome Analysis, Cell & Molecular Research Methods.
• Assessment Surveys – Knowledge, Skills, Attitudes
• Where to go from here?
Incorporation of Molecular Biology, Bioinformatics, Genomics
• Phase I (‘96-’99) Integrate Molecular Biology into
Introductory and core course labs.
– Introductory Biology – pGLO plasmid prep, transformation,
restriction digest, gel.
– Genetics – PCR of Clotting Factor IX fragment from cheek
cell DNA, cloning into pBS, blue-white screening
– Microbiology – PCR of unknown’s rRNA
gene, sequence PCR product.
Incorporation of Molecular Biology, Bioinformatics, Genomics
• Phase II (’99-’04) Genomics & Bioinformatics added to many courses
– Introductory Biology – Comparative genomics, Human Genome Characteristics,
3D structures, DNA sequence analysis, Multiple sequence alignment,
phylogenetic trees
– Genetics – Sequence construction, discussion of microarrays
– Microbiology – MSA, trees, consensus seq’s,
Microbial Genome Papers, Metagenomics
– Molecular Biology – Microarrays (thanks to GCAT),
Integrated Informatics Projects
• Pedagogical Approach – Increase sophistication of analysis
as students progress through the curriculum
• Project assessment survey – Spring ’01,
GCAT Spring ‘02
Incorporation of Molecular Biology, Bioinformatics, Genomics
• Phase III (’04 - ?) – New course development
– Genome Analysis – Fall ’04, ‘06
– Cell and Molecular Research Methods, Fall ‘06
Courses Taught (all have labs except Public Health)
• Fall
– Bio 110–Introduction to Biology I (with 2-3 lab sections)
– Bio 150–Public Health or Bio 432–Molecular Biology or
Bio 437 – Genome Analysis or Bio 447 – Research Methods
• Spring
– Bio 321 – Microbiology (with 2 lab sections)
– Bio/Chem 444 – Biochemistry
• Research lab with 5-15 students
–
–
–
–
Research Methods,
Independent Study & Honors students,
Paid lab assistants
High school student volunteers
Bio 110 – Introduction to Biology I (majors)
Lab activities designed to support course topics.
•
•
•
•
•
•
•
•
Biomolecules – Lab #2 = 3D structures of molecules
Cell Biology
Enzymes & Metabolism – Lab #4c = Kinetic Analysis with Excel
Information Flow – Lab #5a = Gene ID in a sequence, predicting
traits from plasmid and genome, restriction mapping of plasmid.
Cell signaling Cell Cycle Mutations Cancer
Meiosis  Mendelian Genetics – Lab #7 = OMIM for basis of traits
Biotech, Genomics, Developmental Biology
Evolution, Population Genetics – Lab #10 = Retrieve myoglobin
protein sequences from different animals, align, create tree, ID
lineages where mutations occurred.
Intro Bio Lab #2 =
3D structures of molecules
• Small Molecules using Biomodel-3, developed by
Angel Herráez ([email protected]), lecturer in
Biochemistry and Molecular Biology at the University
of Alcalá de Henares (Spain).
http://biomodel.uah.es/en/model3/inicio.htm
• Concepts
– pdb files, rendering structures in different ways, manipulating
structures, standard color schemes for elements.
– # of bonds on atoms, chemical formula, atomic/molecular
mass, functional groups
– Saturated vs unsaturated fatty acids, components of
phospholipids, arrangement of phospholipids into a bilayer
Intro Bio Lab #2 =
3D structures of molecules
• DNA structure tutorial originally
by Eric Martz (UMass)
• Concepts
– 5’, 3’ ends, antiparallelism
– Backbone vs bases, components
of nucleotides
– AT vs GC base pairs,
complementary H-bond donors
and acceptors
Intro Bio Lab #2 =
3D structures of molecules
• Tripeptide Concepts - Amino acid structure,
peptide bonds, Directionality
• Protein – oxyhemoglobin
• AA sequence – structure correlation
• 2o structure – alpha helix, H-bonding
• 3o structure – location of hydrophilic,
hydrophobic residues
• 4o structure – intersubunit interfaces,
• Ligand binding – interaction with heme
group
Intro Bio Lab #2 =
3D structures of molecules
Lab #4c = Kinetic Analysis with Excel
• Enzyme assay lab
• Week 1 - Protein extracted from
raw wheat germ, measured with
Bradford assay.
• Week 2 - Acid phosphatase
enzyme activity compared
between crude and purified,
substrate concentration varied.
• Week 3 – Calculations and
graphing of data in Excel
Lab #5a = Mr. Green Genes - Gene ID in a sequence, predicting traits
from plasmid and genome, restriction mapping of plasmid.
• Lab developed using BioRad’s pGLO
plasmid.
• Students provided with pGLO DNA sequence
 find genes  develop hypotheses about
traits of bacteria with plasmid  design
experiments to test hypothesis about
function of the DNA
 find restriction sites  develop
hypotheses/predictions about fragment sizes
after cutting with restriction enzyme
 tests physical properties of the DNA
Lab #5a = Mr. Green Genes - Gene ID in a sequence, predicting traits
from plasmid and genome, restriction mapping of plasmid.
• Sequence pasted into NCBI
ORF finder tool
• Concepts
– Start, stop codons, genetic
code, 5’3’ directionality, 6
frame translations, ORF vs
protein length
New and “Improved”? ORF Finder
New and “Improved”? ORF Finder
BLAST Search with translated ORFs
• Discuss principles
of BLAST search,
significance of E
value and score.
• ID of AraC, GFP,
Beta-lactamase.
• What traits?
• How to test?
• Controls?
Lab #5a = Mr. Green Genes - Gene ID in a sequence, predicting traits
from plasmid and genome, restriction mapping of plasmid.
• RAST = Rapid Annotation with Subsystem
Technology
• Students browse through a RAST annotation
to see similar approach with whole genome.
Lab #5a = Mr. Green Genes - Gene ID in a sequence, predicting traits
from plasmid and genome, restriction mapping of plasmid.
• pGLO sequence pasted into New England Biolabs NEBCutter used to ID restriction sites
• Students predict what size fragments will be obtained when
cutting pGLO with different enzymes
• Students construct map of
plasmid for lab report
• Week 2 – students isolate plasmid (Qiagen),
Prep competent cells, do transformation and plating,
set up restriction digest.
• Week 3 – students prep and run gel,
observe and discuss transformation plates,
photograph and discuss gel to compare with
hypotheses/predictions
Lab #7 = OMIM for basis of traits 
PTC (non)tasting haplotype
Lab #7 = OMIM for basis of traits  Skin Pigmentation
Look at type of mutation, global distribution of SNP
Lab #7 = OMIM for basis of traits  Red Hair/Fair Skin
Multiple genes, phenotypes, signals, tanning response
Lab #7 = OMIM for basis of traits 
Colorblindness, Blood type…. 23 and Me
Lab #10 = Retrieve myoglobin protein sequences
from different animals, align, create tree, ID mutations.
• Week 14 lab – Hybrid lab on Evolution – Watch video, compare hominid skulls, ape
chromosome banding patterns, myoglobin sequences.
• UniProt used to retrieve myoglobin protein sequences
from a diverse set of vertebrates, including Human
• After editing species names, MEGA used
to create multiple sequence alignment &
construct phylogenetic tree
• Students compare tree and
alignment to ID ancestor where
mutations occurred.
• Best evidence for evolution
1
2
4
3
Long-Finned Pilot Whale
Killer Whale
Beluga Whale
Amazon River Dolphin
1
Sperm Whale
Humpback Whale
2
Olive Babonn
5
Ayla Human
Chimpanzee
4
Mouse
Rat
Three-Stripeed Night Monkey
3
Loggerhead Sea Turtle
Green Sea Turtle
0.05
5
5
1
2
3
4
Short Beaked Common Dolphin
Killer Whale
Horse
5
Sheep
Bison
Buffalo
Pig
Night Monkey
1
Chimpanzee
2
3
Lindsay Folmar
Gorilla
Mouse
Rat
4
Chicken
Turkey
0.05
2
3
4
5
1
short-beaked common dolphin
Dalls porpoise
finback whale
Gillian Barkell
chimpanzee
dog
african wild dog
Middle East blind mole rat
mouse
rat
brown woolly monkey
common squirrel monkey
loggerhead sea turtle
green sea turtle
0.05
Bioinformatics in Intro Biology - Summary
• Students spend 4.5 lab periods in the computer lab
• Advantages
– Students develop key skills, become experienced with basic
bioinformatics tools and databases
– Abstract concepts become more
concrete through hands-on analysis
and visualization
– It’s free!!!!
• Disadvantages
– Fewer wet labs, frequent software and
web site changes require regular revision
of instructions
– Some students find computer work boring
The LycoMicro Unknown Microbe Lab
Week 8 – Analyze DNA sequence @
http://www.ezbiocloud.net/eztaxon ,
- Construct Phylogenetic Tree w/MEGA,
- Literature Research (IJSEM)
Pantoea anthophila JJM
Escherichia coli
Acinetobacter johnsonii
Pseudomonas aeruginosa
Neisseria gonorrhoeae
Aquaspirillum sinuosum
Helicobacter pylori
Bdellovibrio bacteriovorus
Blastopirellula marina
Cytophaga hutchinsonii
Sphingobacterium anhuiense
Chryseobacterium indologenes
Prochlorococcus marinus
Geovibrio ferrireducens
Lactococcus lactis
Streptococcus pyogenes
Exiguobacterium undae
Bacillus subtilis
Staphylococcus aureus
Oerskovia jenensis
Arthrobacter aurescens
Streptomyces coelicolor
Corynebacterium callunae
Nitrospira moscoviensis
Aquifex pyrophilus
Thermomicrobium roseum
Chloroflexus aurantiacus
0.02
Bio/Chem 444 Protein Structure Lab
• Students use RCSB to examine
Phenylalanine Hydroxylase.
• Concepts – amphipathic helix interactions,
beta sheet, turn structure details,
cofactor and substrate
interactions and binding,
paralogs, substrate analogs
Bio/Chem 444 Metabolic Reconstruction
• Students use RAST
to reconstruct
pathways in an
organism, ID steps –
must map 20
subsystems, all
interconnected.
Bio447 - Research Methods
• Complete & deposit 16S sequence
• Determine reference organisms
from phylogenetic tree
• Sequence & compare genome(s)
• Obtain reference organisms
• Repeat experiments in
parallel to determine
differences and similarities
• Prepare poster for ASM
• Write a paper for IJSEM
B. indicus
B. cibi
B. sp.
SJS
•
Wetterstrand KA. DNA Sequencing
Costs: Data from the NHGRI LargeScale Genome Sequencing Program
Available at:
www.genome.gov/sequencingcosts.
Accessed [6-15-16].
GCAT GCAT-SEEK
• Genome Consortium for Active Teaching (GCAT)
founded in 2000 to bring Genomics (Microarrays)
to the undergraduate curriculum.
• Multiple HHMI & NSF funded workshops
• GCAT-SEEKquence “spin-off” to bring
NextGen sequencing to the undergraduate
curriculum.
• 3 genomes (Ion Torrent & 454 as part of pilot)
• NSF Research Collaboration Network,
Juniata’s HHMI Genomics
Leadership Initiative
Shared MiSeq
(2x300) Runs
• NextGen Instruments generate more data than
most UG faculty can use or afford.
• November 2013 – 27 bacteria @$200 each
(including Flavobacterium aquatile)
• April, 2014 – Opened to Microedu Listserv 
35 Bacteria and Phage from
16 institutions @$190/sample
• October 2014 – 30 phage, viruses and bacteria
@$175/sample.
Sample
Reads est.
Bases est.
GSF665-1-E_coli-C06b
GSF665-2-Chryseobacterium-LO
GSF665-3-Linfield-KH
GSF665-4-Linfield-NH
GSF665-5-Exiguobacterium
GSF665-6-Plesiomonas_shigelloides
GSF665-7-Halosimplex_carlsbadense
GSF665-8-Phage_Eapen
GSF665-9-Phage_Aspire
GSF665-10-strain_3572
GSF665-11-Gracilibacillus_dipsosauri
GSF665-12-Serratia_S12
GSF665-13-Rhodococcus_T1Sofl-14
GSF665-14-Janthinobacterium-BJB1
GSF665-15-Janthinobacterium-BJB349
GSF665-16-Janthinobacterium-BJB304
GSF665-17-Janthinobacterium-BJB317
GSF665-18-Iodobacter-BJB302
GSF665-19-Asaia_bogorensis
GSF665-20-Asaia_siamensis
GSF665-21-Asaia_astilbes
GSF665-22-Asaia_platycodi
GSF665-23-Asaia_krungthepensis
GSF665-24-Asaia_prunellae
GSF665-27-Serratia -DL
GSF665-28-Phage-KitKat
GSF665-29-Cyanobacterium-RC610
GSF665-30-Serratia_marcescens-RH
GSF665-31-Bacillus_cibi
GSF665-32-Pedobacter-BMA
GSF665-33-Flavobacterium-KMS
GSF665-34-Flavobacterium_hibernum
GSF665-36-Flavobacterium_hydatis
GSF665-39-Kaistella_koreensis
GSF665-40-Kaistella_haifense
217,320
1,317,872
809,893
301,171
794,482
656,143
595,655
573,447
170,895
593,179
986,925
827,533
297,153
823,488
883,287
1,098,516
549,616
206,973
1,096,204
820,818
783,447
808,325
1,152,811
1,035,414
129,258
53,773
909,265
307,886
693,101
1,200,365
185,975
1,432,517
744,893
1,238,892
1,067,969
130,391,966
790,723,170
485,935,870
180,702,758
476,689,384
393,685,659
357,393,201
344,068,354
102,536,927
355,907,159
592,154,880
496,519,794
178,292,067
494,092,592
529,972,260
659,109,346
329,769,324
124,183,611
657,722,373
492,490,968
470,068,239
484,994,710
691,686,698
621,248,288
77,554,903
32,263,632
545,559,194
184,731,584
415,860,714
720,218,713
111,585,274
859,510,422
446,935,512
743,334,928
640,781,490
Total
Average
25,364,460
724,699
15,218,675,963
434,819,313
Assembly statistics – discussed in Intro and Micro
[SoftGenetics Assembler: Assembly Results Statistics Report]
• Total Reads Number: 2056329
• Matched Reads Number: 1983986
• Unmatched Reads Number: 72343
• Assembled Sequences Number: 61
• Average Sequence Length: 57497
• Minimum Sequence Length: 158
• Maximum Sequence Length: 641985
• N50 Length: 366076
[Final Contig Merge Results Statistics Report]
• Final Contig Merge Sequences Number: 13
• Final Contig Merge Average Sequence Length: 269063
• Final Contig Merge Minimum Sequence Length: 173
• Final Contig Merge Maximum Sequence Length: 856388
• Final Contig Merge N50 Length: 586767
• Matched Reads Count: 1977550
• Number of Matched Bases: 562514128
• Average Read Length: 285
• Average Coverage: 161
• Reference Length: 3507364
Phenotype Comparisons
Seed Viewer Sequence Based Comparison Tool
RAST – Sequence based comparison tool to ID orthologs
C.populense lacks
carotenoid biosynthetic genes
C.hispalense
C.populense
Explain phenotypic differences
– e.g. Pigment “Landscapes”
C.hispalense
 carotenoid
 flexirubin
C.populense CF314
 Flexirubins
only
Sequence-Based Comparison color codes similarity
Sequence Based Comparison provides
protein seq similarity  AAI

Sequence Based Comparison can ID
unique and shared genes…..





Venn Diagram Tool
Venn Diagram Template
Identify Core, Genus or Family-Specific Genes
Links/Tools available at novelmicrobe.com