* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download See a Sample
Oncogenomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genomic library wikipedia , lookup
Transposable element wikipedia , lookup
Gene expression programming wikipedia , lookup
Public health genomics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene desert wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Non-coding DNA wikipedia , lookup
History of genetic engineering wikipedia , lookup
Human genome wikipedia , lookup
Genome (book) wikipedia , lookup
Metagenomics wikipedia , lookup
Genome editing wikipedia , lookup
Gene expression profiling wikipedia , lookup
Pathogenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Minimal genome wikipedia , lookup
Designer baby wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Microevolution wikipedia , lookup
Chapter 10 Comparative Genomics Insights gained through comparison of genomes from different species © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Contents History Synteny Conservation and function Sequence similarity searches Gene finding Regulatory sequence identification Interaction mapping Genes and evolution © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 History Human Genome Project decided to use smaller genomes as warm-up for human genome Resulted in sequencing: Many bacteria Model organism genomes Yeast, C. elegans, Arabidopsis, Drosophila Comparison of these genome sequences provided basis for field of “Comparative Genomics” © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Early comparative genomics Comparative genomics prior to obtaining full genome sequence: Genome size Compared DNA content among species Single copy and repetitive DNA Used hybridization kinetics Found amount of repetitive DNA differed greatly among species © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Synteny Synteny: genes that are in the same relative position on two different chromosomes Genetic and physical maps compared between species Or between chromosomes of the same species Closely related species generally have similar order of genes on chromosomes Synteny can be used to identify genes in one species based on map-position in another © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Synteny of Grass genomes Synteny among crop genomes: rice, maize and wheat Rice is smallest genome in center Wheat largest - outer circle Genes found in similar places on chromosomes are indicated © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Synteny of sequenced genomes When sequence from mouse and human genomes compared: Find regions of remarkable synteny Genes are in almost identical order for long stretches along the chromosome Human Chr 14 Mouse Chr 14 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Mouse/human synteny © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Comparing sequenced genomes Comparison of genomic sequences from different species can help identify: Gene structure Gene function Regulatory sequences Interactions between gene products © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Evolution and sequence conservation Genome comparisons based on observation: conservation = function If no constraints on DNA sequence Random mutations will occur Over tens of millions of years these random mutations will make two related sequences different © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Function and sequence conservation However: if there are constraints: e.g. DNA codes for protein Or transcription factor binds DNA Then there will be sequence similarity when related sequences compared Basic rule when comparing two related sequences: Sequence conservation = functional importance © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Orthologs and Paralogs When comparing sequence from different genomes Must distinguish between two types of closely related sequences: Orthologs are genes found in two species that had a common ancestor Paralogs are genes found in the same species that were created through gene duplication events © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Orthologues and Paralogues A A’ A’’ B” B’ B © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequence similarity and gene function Sequence comparisons that implicate function are widely used: To determine if newly sequenced cDNA or genomic region encodes gene of known function Search for similar sequence in other species (or in same species) © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Homology searches Search databases of DNA sequences Use computer algorithms to align sequences Don’t require perfect matches between sequences Allow for insertions, deletions and base changes Most commonly used algorithms: BLAST FAST-A © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Homology search example The seasquirt, Ciona intestinalis makes a coat primarily of cellulose A BLAST search was performed on the Ciona genome using an Arabidopsis endoglucanase gene involved in cellulose synthesis Extensive homology was found with a Ciona gene flanked by genes found in Drosophila and human It is postulated that the Ciona endoglucanase gene may have arisen by lateral gene transfer © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Discovery of endoglucanase gene in Seasquirt genome Arabidopsis Korrigan Transporter Endoglucanase Splicing factor C. intestinalis cDNA C. elegans and Drosophila Human © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Homology search for the mouse genome Homology search of all genes in the mouse genome : 27% in other metazoans 29% in other eukaryotes 6% in other chordates 14 % in other mammals Less than 1% rodent specific © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Problems of Genome annotation Identifying genes and regulatory regions in sequenced genomes is challenging Open reading frames (ORFs) are usually good indication of genes Problem is: difficult to determine which ORFs belong to a gene Many mammalian genes have small exons and large introns Regulatory sequences even more difficult © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Computational approaches to gene identification Computer programs analyze genomic sequence GRAIL, GeneFinder Look for ORFs, splice sites, poly A addition sites etc. Predict gene structure Frequently wrong Usually miss exons at beginning or end of gene Or predict exon when doesn’t really exist © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 How genome comparisons help When comparing genomes of different species Genes normally have same exon/intron structure Look for conserved ORFs in both genomes Frequently permits accurate identification of genes Fugu/human comparison found >1000 genes Mouse/human comparison indicates only 30,000 genes in genome © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequence comparison example Comparison of the human and mouse spermidine synthase genes Revealed an additional intron in the human gene that is not found in the mouse homologue Human Mouse 5,500 bp © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Identifying small RNAs Growing evidence that small RNAs can regulate gene expression Small RNAs are 20-25 bases Conservation between genomes suggests functionality Example:Small RNAs conserved in Arabidopsis and rice © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Regulatory sequence identification A large portion of the genome contains regulatory information Regulatory sequence includes: Cis-regulatory elements: tell genes when and where to turn on Basal transcription machinery binding sites Enhancers Can be 5’ of gene, 3’ of gene or in intron © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Regulatory sequences 5’ TATA 3’ © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Finding regulatory sequences Regulatory sequences are difficult to identify using computer programs Problem is: most enhancer sequences have yet to be identified They are usually short: 6-10 basepairs Those that are known are usually degenerate They can differ in one or more basepairs Still bind the cognate transcription factor © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Comparisons to identify regulatory elements Comparisons of genomes of different species can identify regulatory elements Change in intergenic regions and introns usually more rapid than in coding regions Nevertheless, regulatory elements tend to be conserved Conserved regions called “phylogenetic footprint” © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Phylogenetic footprint To identify conserved regulatory regions usually requires comparing genomes of closely related species If too distantly related, very difficult to find conservation Nevertheless, mouse/human sequence comparison has revealed many conserved cisregulatory elements © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Mouse/human comparison © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Using multiple species for Phylogenetic footprinting The location of regulatory sequences can also be found comparing several related sequences Multiple alignments performed Better able to home in on important regions Conservation alone not enough, need to validate importance of elements © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Interaction mapping Protein-protein interactions include: The transfer of information in a genetic pathway Scaffolding to tether other proteins Enzymatic reactions Large molecular machines such as motors © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Rosetta Stone Observation: in some species, interaction proteins encoded by single gene In other species same proteins encoded in two genes Systematic search through sequenced genomes for these relationships should identify proteins that interact Called “Rosetta Stone” approach © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Rosetta Stone example Equivalent of yeast protein topoisomerase II In E. coli two proteins: gyrase A and gyrase B Suggests gyrase B and gyrase A interact Yeast topoisomerase II E. coli gyrase B gyrase A © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Rosetta stone Escherichia coli Haemophilus influenzae Methanococcus jannaschii © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Higher level comparisons Comparisons between genomes not just to better identify genes and regulatory sequences Evolution of adaptive traits occurs through: Evolution of new genes Changing when and where genes express Thus comparisons of genes found in genome can provide information about mechanisms of evolution © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Genes and genomes Comparison of total gene numbers in sequenced genomes: Smaller than originally expected Ex: Human genome thought to have 100,000 genes Now think closer to 30-35,000 genes Suggests that many new functions arise in gene expression Use old genes in new ways © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Selective expansion of genes Although comparisons show not as much difference in numbers of genes as expected Still see striking differences in numbers of some gene families Example: Roundworm C. elegans has a large number of nuclear receptor genes Drosophila has large number of zinc-finger transcription factors Plants have no G-protein coupled receptors © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 What is difference between man and ape? Man and chimpanzee have a genome wide similarity of greater than 95%. What accounts for differences in species?. Recent study suggests due to specific gene expression differences. Striking differences found only in brain © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Human/ape gene expression comparisons 1.3 Human 1.0 Chimp Human Chimp Human 5.5 Chimp Rhesus Rhesus Rhesus © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Trait-to-gene Methods being developed to identify genes involved in adaptive traits Example: “Trait-to-gene” Underlying reasoning: Organisms that have a particular trait either share related genes Or have developed new genes to perform same function © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Relating traits to genes Species 1 Species 2 Trait A Trait A Gene Gene Species 3 Trait A Gene Species 4 Species 5 Trait A Gene COG 3 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Trait-to-gene Comparisons made of bacterial genomes Need many genomes Looked for genes involved in flagellar function Identified 43 of 45 known genes Found 5 additional genes that program said should be involved in flagella function Knocked out 3 and found that 2 resulted in bacteria with defective flagella © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Trait-to-gene B. subtilis 168 yqeW yuxH B. subtilis 168 Overnight growth at 37°C. Swim medium (LB + 0.25% agar). Similar results at 20°C (4 days) and 30°C (2 days). © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 The goal of comparative genomics © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Summary Synteny = similar relative positions of genes on chromosomes Conservation = function Homology searches Gene structure prediction Regulatory sequence identification Interaction mapping Genes and evolution © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458