* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download On bioinformatics
Gene nomenclature wikipedia , lookup
Genomic imprinting wikipedia , lookup
Transposable element wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Molecular cloning wikipedia , lookup
Epigenomics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Public health genomics wikipedia , lookup
Primary transcript wikipedia , lookup
Cancer epigenetics wikipedia , lookup
DNA vaccination wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genetic engineering wikipedia , lookup
DNA barcoding wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Human genome wikipedia , lookup
Gene expression profiling wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Genome (book) wikipedia , lookup
Genomic library wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Pathogenomics wikipedia , lookup
Minimal genome wikipedia , lookup
Point mutation wikipedia , lookup
Designer baby wikipedia , lookup
Non-coding DNA wikipedia , lookup
Metagenomics wikipedia , lookup
Genome editing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome evolution wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genomics, Proteomics, and Bioinformatics Biology 224 Instructor: Tom Peavy August 30, 2010 What is bioinformatics? • Interface of biology and computers • Analysis of genomes, genes, mRNA and proteins using computer algorithms and computer databases What is Genomics? What is Proteomics? What is the Transcriptome? On bioinformatics “Science is about building causal relations between natural phenomena (for instance, between a mutation in a gene and a disease). The development of instruments to increase our capacity to observe natural phenomena has, therefore, played a crucial role in the development of science - the microscope being the paradigmatic example in biology. With the human genome, the natural world takes an unprecedented turn: it is better described as a sequence of symbols. Besides high-throughput machines such as sequencers and DNA chip readers, the computer and the associated software becomes the instrument to observe it, and the discipline of bioinformatics flourishes.” Martin Reese and Roderic Guigó, Genome Biology 2006 7(Suppl I):S1, introducing EGASP, the Encyclopedia of DNA Elements (ENCODE) Genome Annotation Assessment Project What do you want out of this course? Themes throughout the course: gene/protein families Retinol-binding protein 4 (RBP4) member of the lipocalin family small, abundant carrier protein We will study it in a variety of contexts including --homologs in various species --sequence alignment --gene expression --protein structure --phylogeny bioinformatics medical informatics Tool-users public health informatics Tool-makers algorithms databases infrastructure DNA genomic DNA databases RNA cDNA ESTs UniGene Microarrays protein protein sequence databases phenotype There are three major public DNA databases EMBL Housed at EBI European Bioinformatics Institute GenBank DDBJ Housed at NCBI National Center for Biotechnology Information Housed in Japan Sequences (millions) Base pairs of DNA (billions) Growth of GenBank Updated 8-12-04: >40b base pairs 1982 1986 1990 1994 Year 1998 2002 Number of sequences in GenBank (millions) 250 200 150 100 50 0 1982 1987 1992 1997 2002 2007 Base pairs of DNA in GenBank (billions) Base pairs in GenBank + WGS (billions) Growth of GenBank + Whole Genome Shotgun (1982-November 2008) Taxonomy at NCBI: ~200,000 species are represented in GenBank 2010: 230,682 species 11/08 http://www.ncbi.nlm.nih.gov/Taxonomy/txstat.cgi The most sequenced organisms in GenBank Homo sapiens Mus musculus Rattus norvegicus Bos taurus Zea mays Sus scrofa Danio rerio Oryza sativa (japonica) Strongylocentrotus purpurata Nicotiana tabacum Updated 11-6-08 GenBank release 168.0 Excluding WGS, organelles, metagenomics 13.1 billion bases 8.4b 6.1b 5.2b 4.6b 3.6b 3.0b 1.5b 1.4b 1.1b Go to NCBI website http://www.ncbi.nlm.nih.gov/ • National Library of Medicine's search service • 12 million citations in MEDLINE • links to participating online journals • PubMed Central has access to full articles •Entrez integrates the scientific literature; DNA and protein sequence databases; 3D protein structure data; population study data sets; assemblies of complete genomes; etc Entrez is a search and retrieval system that integrates NCBI databases BLAST: Basic Local Alignment Search Tool • NCBI's sequence similarity search tool • supports analysis of DNA and protein databases • 80,000 searches per day Online Mendelian Inheritance in Man: catalog of human genes and genetic disorders OMIA: Online Mendelian Inheritance in Animals Structure site includes: Molecular Modelling Database (MMDB); biopolymer structures obtained from the Protein Data Bank (PDB); Cn3D (a 3D-structure viewer); vector alignment search tool (VAST), and other protein structure resources Review of Genetics, Biochemistry & Evolution Human Genome Project What is a typical Genomic structure for a Eukaryotic gene? Synonymous vs. nonsynonymous changes Proline C C C C C C C C T C A G Arginine C G T four fold degenerate amino acid Synonymous changes Nonsynonmous changes Synonymous Substitution Non-synonymous Substitution Central Dogma • DNA RNA protein • sequence structure function evolution What kind of modifications Are made to Eukaryotic mRNAs? RNA Modifications What are cDNAs? Protein structures • X-ray crystallography and Nuclear magnetic resonance (NMR) • Primary structure – linear AA • Secondary structure– alpha helix and beta sheet • Tertiary structures– 3-d that exposes binding domains etc Linkage maps • YAC Yeast artificial chromosome & • BAC Bacterial artificial chromosome -used to clone large pieces of DNA -overlapping clones • Are genes linked? Organization of genomes • Groups of genes within a species -Comparative Genomics • plastid genomes and mt genomes How do we determine functions of genes? How do we determine functions of genes? • Expression patterns – – – – Northerns RT-PCR SAGE Microarrays • Transgenics – insert genes what results? • Mutants – classical genetics – molecular genetics • And Functional Protein Assays Charles Darwin • Descent with modification – species change through time and are related to a common ancestor • Natural Selection is the process by which this change occurs Understanding Natural selection • acts on individuals though consequences occur in populations – Individual’s phenotype reason survived and reproduced – after a time this will change the distribution in the population, – what ultimately changes? • Gene pool New alleles • Point change is all that is needed – not always a "big deal" • neutral change – can be in Sickle cell anemia Gene duplication • creates an additional copy of a gene – unequal cross-over – X-rays • Are these duplicates maintained in populations? – Psuedogenes Polyploidy • additional set of chromosomes – Found in plants – Amphibians, invertebrates • Through a type of parthenogenesis – Triploid • Poor fertility • Hybridization or meiosis malfunction Homology • study of likeness (literal) • Similarity between species (or genes) that results from inheritance of traits from a common ancestor – Unless know of a common ancestor have to be careful when using this word. Orthologous vs Paralogous Genes a Gene Duplication a b Speciation a Species 1 b a Species 2 b Species • All organisms alive today can trace their ancestry back to the origin of life some 3.8 billion years ago – Since then millions if not billions of branching events have occurred • Mechanisms have to be in place for change to occur – genetic drift and natural selection