* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Nucleic acid analogue wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
DNA barcoding wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Molecular evolution wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Synthetic biology wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Community fingerprinting wikipedia , lookup
Molecular ecology wikipedia , lookup
Genomic library wikipedia , lookup
DNA sequencing wikipedia , lookup
Topics • Basis of Bioinformatics • Goals of Bioinformatics • Bioinformatics Jargon 101 Lecture 1 CS566 1 Basis of Bioinformatics • What makes Bioinformatics possible? – Advances in Biotechnology • PCR, Sequencing, Shotgun sequencing, Large scale data generation – Advances in Computer hardware – Advent of the WWW – Representation of problems amenable to Statistics and Computer Science – Evolutionary underpinnings of life Lecture 1 CS566 2 Biotechnology: Polymerase Chain Reaction • The anti-thesis, happily, of NPcompleteness – Used to form exact copies of section of DNA – Doubling of template per cycle, i.e., after n cycles, 2n copies of DNA – Advantages: • Precise subsequence can be selected using appropriate primers • Can create large amounts from small sample • Sine qua none for DNA sequencing projects, and a lot of experimental biology Lecture 1 CS566 3 Biotechnology: Sequencing • Analogy: Reading a phrase – Assumption: Can read only letter at a time – Start with copies of the phrase to be read – Allow several cycles of PCR to proceed – At any moment in time, entire set of partial phrases is present (all having the same start point) – Freeze – Arrange phrases by size and just read terminal letter Lecture 1 CS566 4 Biotechnology: Sequencing “This is the best course I’ve ever taken” Shotgun sequencing T This is the best cou Th the best course I’ve I’ve ever taken Thi This is the best course I’ve ever taken This This This i This is Lecture 1 CS566 5 Shotgun Sequencing • Analogy: Reading a long sentence, indirectly – Fragment few copies of a sentence into phrases, randomly – Find the order of characters in each phrase – Find overlaps between phrases – Assemble phrases into original sentence – ‘Shotgun’ refers to parallel sequencing of multiple ‘phrases’ Lecture 1 CS566 6 Large Scale Data Generation • Sequencing robots permit complete sequences to be obtained in a short time • Expression arrays allow for simultaneous measurement of the activity of thousands of genes • Mass spectrometric pipelines allow for the simultaneous identification of several proteins • Autoanalyzers allow the automation of measurement of numerous chemicals Lecture 1 CS566 7 Advances in Computer Hardware • Exponential increase in biological data has been matched by Moore’s law: Periodic doubling of CPU speeds • Memory and Disk sizes have kept pace with the increase in data volumes (from 1.44 Kb to Petabytes) • Clustering allows for handling of many of the parallel problems in biology (IBM’s many shades of blue..) Role of the WWW • Wide range of data and analysis tools just a few clicks away (oversimplification) • Results and Ideas within and between disciplines disseminated very fast • Web offers potential for mining across several databases Lecture 1 CS566 8 Meat for Statistics and Computer Science • A lot can be learnt from the string representation of biological molecules • Now have data volumes for reliable statistical inferencing • Now have computer hardware to support implementation of algorithms • Challenges: – Stimulus for creating and refining statistical and computational approaches – Emulating Biology, as well as learning strategies from it • “Computer Science was invented for Bioinformatics”Ewan Birney, GRC 2003 Lecture 1 CS566 9 Evolutionary Stochasticity • “The chimpanzee is our cousin, but so is yeast, albeit billions of years removed” • Building evolutionary trees has a lot of academic interest • But the simple fact of evolutionary relationships is useful in many ways – Comparison across species useful in understanding biology of individual species Lecture 1 CS566 10 Goals in Bioinformatics • Understand Biology – Cataloguing biomolecules – Understand what they do, in isolation – Understand how things work together, at different levels of abstraction • Cure disease – Drug target approach – Classical – Integrated approach – Futuristic • Multiple drugs for non-linear effects • Address source of problem, not effect Lecture 1 CS566 11 Bioinformatics Jargon 101 • • • • • • Nucleotide/Base/Phosphate DNA/cDNA RNA/mRNA/tRNA/rRNA Protein/Amino Acid Sequence/Sequencing Homology/Orthology/ Paralogy/Analogy • Exon/Intron/Intergenic region • Genetic code/Codon • Splicing/Alternative splicing • Species/System/Tissue/ Organ/Cell/Organelle • Genome/Chromosome/ Chromatin/Histones/Gene/ Allele/Diploid/Haploid • Recombination/Mutation • Replication/Transcription/ Expression/Translation • Eubacteria/Archaea/ Eukaryotes/Viruses • Maternal Inheritance Lecture 1 CS566 12