* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CAP5510 - Bioinformatics - UF CISE
History of RNA biology wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genetic code wikipedia , lookup
Human genome wikipedia , lookup
Genome (book) wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Designer baby wikipedia , lookup
Genome evolution wikipedia , lookup
Synthetic biology wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Minimal genome wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene expression profiling wikipedia , lookup
Non-coding DNA wikipedia , lookup
Microevolution wikipedia , lookup
Protein moonlighting wikipedia , lookup
Primary transcript wikipedia , lookup
History of genetic engineering wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Point mutation wikipedia , lookup
Helitron (biology) wikipedia , lookup
CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Tamer Kahveci CISE Department University of Florida 1 Vital Information • • • • • Instructor: Tamer Kahveci Office: E566 Time: Mon/Wed/Fri 9:35 - 10:25 AM Office hours: Mon/Thu 2:00-2:50 PM Course page: – http://www.cise.ufl.edu/~tamer/teaching/spring2014 2 Goals • This course will discuss the cutting edge developments in bioinformatics and computational biology. We will discuss in depth the recent publications on computational biology and bioinformatics with emphasis on computer science challenges and contributions particularly on biological networks. 3 Bioinformatics & Systems Biology • Bioinformatics is the science where computational and information science is used to understand biological data. • Systems biology studies the interactions between the components of biological systems, and how these interactions give rise to the function and behavior of that system. 4 This Course will • Give you exposure to research topics in bioinformatics. • Strongly encourage you to explore research problems and make contribution. 5 This Course will not • Teach you biology or fundamentals of bioinformatics. • Teach you programming • Teach you how to be an expert user of offthe-shelf molecular biology computer packages. 6 Course Outline • • • • • • Introduction to terminology Biological networks Comparison of biological networks Network motifs Essentiality in networks Network reconstruction 7 Grading How can I get an A ? Paper presentations Project HW & Quizzes 90+ = A- & above 80+ = B & above 70+ = C & above Bonus • 2.5% attendance • 2.5% project contribution 8 Expectations • Require – Data structures and algorithms. – Coding (C, Java) • Encourage – actively participate in discussions in the classroom – read bioinformatics literature in general – attend colloquiums on campus • Academic honesty 9 Text Book • Not required, but recommended. • Class notes + papers. 10 Where to Look ? • Journals – – – – – Bioinformatics Genome Research PLOS Computational Biology Journal of Computational Biology IEEE Transaction on Computational Biology and Bioinformatics • Conferences – – – – – RECOMB ISMB ECCB PSB BCB 11 A Gentle Introduction to Molecular Biology 12 Goals • Understand major components of biological data – DNA, protein sequences, expression arrays, protein structures • Get familiar with basic terminology • Learn commonly used data formats 13 Genetic Material: DNA • Deoxyribonucleic Acid, 1950s – Basis of inheritance – Eye color, hair color, … • 4 nucleotides – A, C, G, T 14 Chemical Structure of Nucleotides Pyrmidines Purines 15 Making of Long Chains 5’ -> 3’ 16 DNA structure • Double stranded, helix (Watson & Crick) • Complementary – A-T – G-C • Antiparallel – 3’ -> 5’ (downstream) – 5’ -> 3’ (upstream) • Animation (ch3.1) 17 Base Pairs 18 Question • • • • 5’ - GTTACA – 3’ 5’ – XXXXXX – 3’ ? 5’ – TGTAAC – 3’ Reverse complements. 19 Repetitive DNA • Tandem repeats: highly repetitive – – – – Satellites (100 k – 1 Gbp) / (a few hundred bp) Mini satellites (1 k – 20 kbp) / (9 – 80 bp) Micro satellites (< 150 bp) / (1 – 6 bp) DNA fingerprinting • Interspersed repeats: moderately repetitive – LINE – SINE • Proteins contain repetitive patterns too 20 Genetic Material: an Analogy • • • • Nucleotide => letter Gene => sentence Contig => chapter Chromosome => book – – – – Traits: Gender, hair/eye color, … Disorders: down syndrome, turner syndrome, … Chromosome number varies for species We have 46 (23 + 23) chromosomes • Complete genome => volumes of encyclopedia • Hershey & Chase experiment show that DNA is the genetic material. (ch14) 21 Functions of Genes 1/2 • Signal transduction: sensing a physical signal and turning into a chemical signal • Enzymatic catalysis: accelerating chemical transformations otherwise too slow. • Transport: getting things into and out of separated compartments – Animation (ch 5.2) 22 Functions of Genes 2/2 • Movement: contracting in order to pull things together or push things apart. • Transcription control: deciding when other genes should be turned ON/OFF – Animation (ch7) • Structural support: creating the shape and pliability of a cell or set of cells 23 Central Dogma 24 Introns and Exons 1/2 25 Introns and Exons 2/2 • Humans have about 25,000 genes = 40,000,000 DNA bases < 3% of total DNA in genome. • Remaining 2,960,000,000 bases for control information. (e.g. when, where, how long, etc...) 26 DNA (Genotype) Protein Gene expression Phenotype 27 Gene Expression • Building proteins from DNA – Promoter sequence: start of a gene – 13 nucleotides. • Positive regulation: proteins that bind to DNA near promoter sequences increases transcription. • Negative regulation 28 Microarray Animation on creating microarrays 29 Amino Acids • 20 different amino acids – ACDEFGHIKLMNPQRSTVWY but not BJOUXZ • ~300 amino acids in an average protein, hundreds of thousands known protein sequences • How many nucleotides can encode one amino acid ? – – – – 42 < 20 < 43 E.g., Q (glutamine) = CAG degeneracy Triplet code (codon) 30 Triplet Code 31 Molecular Structure of Amino Acid Side Chain C •Non-polar, Hydrophobic (G, A, V, L, I, M, F, W, P) •Polar, Hydrophilic (S, T, C, Y, N, Q) •Electrically charged (D, E, K, R, H) 32 Peptide Bonds 33 Direction of Protein Sequence Animation on protein synthesis (ch15) 34 Data Format • • • • • • GenBank EMBL (European Mol. Biol. Lab.) SwissProt FASTA NBRF (Nat. Biomedical Res. Foundation) Others – IG, GCG, Codata, ASN, GDE, Plain ASCII 35 Primary Structure of Proteins >2IC8:A|PDBID|CHAIN|SEQUENCE ERAGPVTWVMMIACVVVFIAMQILG DQEVMLWLAWPFDPTLKFEFWRYFT HALMHFSLMHILFNLLWWWYLGGA VEKRLGSGKLIVITLISALLSGYVQQK FSGPWFGGLSGVVYALMGYVWLRGER DPQSGIYLQRGLIIFALIWIVAGWFD LFGMSMANGAHIAGLAVGLAMAFVD SLNA 36 Secondary Structure: Alpha Helix • • • • 1.5 A translation 100 degree rotation Phi = -60 Psi = -60 37 Secondary Structure: Beta sheet anti-parallel Phi = -135 Psi = 135 parallel 38 Tertiary Structure phi2 phi1 psi1 2N angles 39 Tertiary Structure • 3-d structure of a polypeptide sequence – interactions between non-local atoms tertiary structure of myoglobin 40 Ramachandran Plot Sample pdb entry ( http://www.rcsb.org/pdb/ ) 41 Quaternary Structure • Arrangement of protein subunits quaternary structure of Cro human hemoglobin tetramer 42 Structure Summary • 3-d structure determined by protein sequence • Prediction remains a challenge • Diseases caused by misfolded proteins – Mad cow disease • Classification of protein structure 43 Systems biology • A biological system is made up of components (e.g., proteins, genes, compounds) that interact with each other to affect one another. As a result they serve a set of functions of that system. • Internal factors can alter the networks. – E.g., gene expression and regulation. • External factors can alter the network. – E.g., drugs, radiation, food, temperature, bacteria and virus. • We develop quantitative mathematical models that can explain the how the interactions take place. – E.g., Boolean, stochastic, ordinary differential equations, probabilistic, etc. • We develop algorithmic methods to analyze the networks under these models. 44 Signal Transduction Networks • Vertices are proteins. • A directed edge from vertex X to vertex Y if X changes the activity level of Y under certain conditions 45 Transcription regulation networks • Two types of vertices: proteins (transcription factors, or TF’s) and genes • Edges are directed from TF’s to genes. • An edge from TF X to gene Y if X transcribes Y 46 Post-transcription regulation • Two types of vertices – RNA binding proteins – RNA • Directed edge from proteins to RNA RNA binding protein 47 Metabolic networks 1/2 • Various representations – Vertices are compounds and directed edges are biochemical reactions – Two types of vertices, one for compounds one for reactions. Directed edges from one type to the other. 48 Metabolic networks 2/2 • Reactions – Catabolism: breaking down large molecules, for example to harvest energy in cellular respiration – Anabolism: using energy to construct components of cells, such as proteins and nucleic acids 49 Protein-protein interaction (PPI) network • Vertices are proteins. • An edge between two vertices if the two proteins interact (i.e., form a protein complex). • Undirected edges. 50 Gene expression network • Vertices are genes. • An edge between two vertices imply that the genes corresponding to those two vertices have similar expression patterns • Edges are undirected 51 Phylogenetic networks • Two types of vertices – Leaf nodes: taxanomical units (e.g., genes, proteins, organisms) – Internal nodes: inferred ancestors • Directed, acyclic (often rooted tree) • Edges from X to Y if X can evolve into Y. 52 Ecological networks Zombie Human 53 Some interaction network datasets KEGG http://www.genome.jp/kegg/ BioCyc http://biocyc.org/ MIPS http://mips.helmholtz-muenchen.de/proj/ppi/ DIP http://dip.doe-mbi.ucla.edu/ GRID http://biodata.mshri.on.ca/grid/servlet/Index BIND http://bind.ca/ String http://www.bork.embl-heidelberg.de/STRING/ InterAct http://www.ebi.ac.uk/intact/index.html MINT http://cbm.bio.uniroma2.it/mint/ 54