* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download What is a Genome? - Mainlab Bioinformatics
Gene regulatory network wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Molecular ecology wikipedia , lookup
Transposable element wikipedia , lookup
Biochemistry wikipedia , lookup
Personalized medicine wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Genetic engineering wikipedia , lookup
Biosynthesis wikipedia , lookup
Expression vector wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Genetic code wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Gene expression wikipedia , lookup
Community fingerprinting wikipedia , lookup
Point mutation wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Protein structure prediction wikipedia , lookup
Genomic library wikipedia , lookup
Homology modeling wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Non-coding DNA wikipedia , lookup
Bioinformatics for Research Module 1 Introduction to Genomics and Bioinformatics January 12, 2017 Mainlab Bioinformatics Washington State University 1 Introduction to Genomics Learning Outcomes • Refresh your knowledge of basic genomic concepts and terminology • Understand conceptually the different areas of genomic research • Know the basic tools of genomics 2 Prokaryotes vs. Eukaryotes Prokaryote Eukaryote No nucleus Nucleus Circular or linear chromosomes, plasmids Chromosomes in nucleus, mitochondria and chloroplasts also have genomes Polycistronic operons (multiple genes controlled by single promoter) Monocistronic operons (one gene, one promoter) No introns Introns 3 The central dogma of genetics mRNA protein trait translation expression transcription 4 DNA nucleotides Standard Bases Abbreviation Base A Adenosine C Cytidine G Guanosine T Thymidine U Uridine Degenerate Bases Abbreviation W S M K R Y B D H V Base A, T C, G A, C G, T A, G C, T C, G, T A, G, T A, C, T A, C, G 5 Genes and ORFs • Gene • A DNA segment that encodes a specific protein that contributes to the expression of a trait • Open Reading Frame (ORF) • Section of mRNA without stop codons that is translated 6 Structure of a Gene • Regulatory regions: up to 50 kb upstream of +1 site • Exons: protein coding and untranslated regions (UTR) • 1 to 178 exons per gene (mean 8.8) • 8 bp to 17 kb per exon (mean 145 bp) • Introns: splice acceptor and donor sites, other DNA • average 1 kb – 50 kb per intron 7 DNA to RNA to Protein http://www.carolguze.com/images/Human%20Genome/dna-rna-protein.jpg 8 Amino Acids • mRNA is translated into protein which is a series of amino acids • Each amino acid is coded for by a 3 nucleotide codon • Each amino acid has a unique structure and chemical properties http://www.carolguze.com/text/102-3biomolecules2.shtml 9 Amino Acid Codon Table http://upload.wikimedia.org/wikipedia/commons/c/cc/Codontable1.PNG 10 Substitution Matrices http://swift.cmbi.ru.nl/teach/ALIGN/Align_8.html 11 Protein structure • Properties of amino acids determine the structure of the protein • Structure is important for protein function • Mutations that alter structure can destabilize/inactivate the protein http://www.accessexcellence.org/RC/VL/ 12 GG/images/protein.gif What is a Genome? • The DNA content of an organism. Contains all the biological information needed to construct and maintain an organism • In eukaryotic organisms, it is measured in haploid equivalents • Size is most commonly measured in base pairs (e.g. Mb) • Genome sizes vary widely in size and do not correspond to the complexity of an organism 13 Basic Genome Statistics • • • • • Chromosome number and ploidy GC content Genome size Codon bias Gene content and order What is the chromosome number of your favorite organism and how many genes does it have? 14 Genomics vs Genetics • Genomics is the study of • Genetics is the study of 15 Genomics Comprises • Structural Genomics The study of genome structure and organization on a large scale • Functional Genomics The study of gene (and protein) function on a large scale • Translational Genomics The adaptation of information derived from genome technologies for organism improvement What about Comparative Genomics? 16 Structural Genomics • The study of genome structure and organization on a large scale • Tools of Structural Genomics 1. 2. 3. 17 Functional Genomics • The study of gene function and expression on a large scale • Tools of Functional Genomics • EST libraries (cDNAs) • RNA-Seq technology • Next Generation Sequencing Technology • Real time PCR 18 Translational Genomics • • Transferring the knowledge gained from one species to another or translating basic knowledge to applied knowledge. As we identify gene(s) associated with interesting traits, markers can be identified for marker-assisted selection. 19 Traditional Breeding X Waiting years to select for trees Wild species Cultivar undesirable fruit low yield disease resistance desirable fruit high yield disease susceptible Successive Backcrosses Improved cultivar 20 Molecular Breeding X X Wild species Cultivar undesirable fruit low yield disease resistance desirable fruit high yield disease susceptible Select desired progeny long before any fruit is grown using molecular markers for the trait Improved cultivar 21 Assignment – Extra Credit (10 Pts) • There are many different types of “-omics” that have emerged in the last few years. Please define each of the types below in one paragraph (do not copy from a website). Email to Jodi by Friday Sept 4 • • • • Transcriptomics is the study of …. Proteomics is the study of …. Metabolomics is the study of …. Phenomics is the study of ….. 22 Overview of Bioinformatics Learning Outcomes • Understand the broad concept and approaches used in bioinformatics 23 What is Bioinformatics ? Bioinformatics Working Definition • The application of information technology, computer science, mathematics and statistics to the organization, processing, storage, analysis, visualization and dissemination of genomic, genetic and breeding data. What is the Range of Bioinformatics ? • Mathematical modeling of biological systems • Developing algorithms for sequence and network analysis • Building databases and web tools 24 Bioinformatics Approach • Mathematical Modeling: Abstraction of biological systems - DNA is a “String” • Developing Algorithms for Sequence Analysis - Analysis of “Strings” • Sequence alignment • Sequence composition • Building databases and web tools - Dissemination and data mining of “Strings” 25 Bioinformatics Approach • Mathematical Modeling: Abstraction of biological systems - DNA is a “String” TAAGTTATTATTTAGTTAATACTTTTAACAATATT ATTAAGGTATTTAAAAAATACTATTATAGTATTTA ACATAGTTAAATACCTTCCTTAATACTGTTAAATT ATATTCAATCAATACATATATAATATTATTAAAAT ACTTGATAAGTATTATTTAGATATTAGACAAATAC TAATTTTATATTGCTTTAATACTTAATAAATACTA CTTATGTATTAAGTAAATATTACTGTAATACTAAT AACAATATTATTACAATATGCTAGAATAATATTGC TAGTATCAATAATTACTAATATAGTATTAGGAAAA TACCATAATAATATTTCTACATAATACTAAGTTAA TACTATGTGTAGAATAATAAATAATCAGATTAAAA AAATTTTATTTATCTGAAACATATTTAATCAATTG AACTGATTATTTTCAGCAGTAATAATTACATATGT ACATAGTACATATGTAAAATATCATTAATTTCTGT TATATATAATAGTATCTATTTTAGAGAGTATTAAT TATTACTATAATTAAGCATTTATGCTTAATTATAA GCTTTTTATGAACAAAATTA 26 Bioinformatics Approach • Mathematical Modeling: Abstraction of biological systems - DNA is a “String” • Developing Algorithms for Sequence Analysis - Analysis of “Strings” • Sequence alignment • Sequence composition • Building databases and web tools - Dissemination and data mining of “Strings” 27 Sequence Alignment • Pairwise Sequence Comparison is the cornerstone of bioinformatics • Infer function (homology) • • Orthologs (occur in separate species, common ancestors) Parology (Gene duplication independent of speciation) • Build Evolutionary Trees • Do whole genome comparisons • Infer structure A A A X X T T G C A X X A X X C T G A X X G T C T G C A X X X X X X X X X X X X X X T C T G A X X X X X X X X X X X X X X X X X X G C C X X X X X X X X X 28 Assembly Algorithms: Newbler, Velvet, Mira, Celera, CAP3, PHRAP, etc. e.g. GDR Unigenes 29 Multiple Sequence Alignment 30 Phylogenetic Analysis Fragaria: Member of the distantly-related Rosoideae (x = 7) Malus and Prunus: Members of the Spireaoideae (x = 17) and (x = 8) respectively 31 Domain Prediction 32 PC 1 ----- -PC 8 Malus MC 1 -----------------MC17 Fragaria FC 1 --- --FC 7 Genome Mapping/Comparison • The innermost circle represents the nine ancestral chromosomes of Rosaceae. • The eight chromosomes of peach are repeated, each section showing the regions that are orthologous to each ancestral chromosome. • Concentric circle enables us to identify the ancestral relationships and origins (breakage and fusion). Prunus Rosaceae N=9 33 Genome Annotation 34 Visualization Tools Comparative Mapping: CMap Genome Browsers: GBrowse/JBrowse, 35 etc Structural Bioinformatics 36 Statistical Analysis of Functional Genomics Data Arrays • What statistical measures can be used to quantify up and down regulation of genes • Technical and biological error RNA Seq 37 Bioinformatics Approach • Mathematical Modeling: Abstraction of biological systems - DNA is a “String” • Developing Algorithms for Sequence Analysis - Analysis of “Strings” • Sequence alignment • Sequence composition • Building databases and web tools - Dissemination and data mining of “Strings” 38 Database Resources 39 Database Similarity Searching Primary databases and community databases 40 NCBI : Primary Database for Genomics Data www.ncbi.nih.gov 41 Querying Databases 42 Querying Databases 43