* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Genome - people.iup.edu
Gene therapy wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Copy-number variation wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Point mutation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Ridge (biology) wikipedia , lookup
Gene desert wikipedia , lookup
Oncogenomics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Genomic imprinting wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Genetic engineering wikipedia , lookup
Transposable element wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Public health genomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Gene expression profiling wikipedia , lookup
Human genome wikipedia , lookup
Genome (book) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Human Genome Project wikipedia , lookup
Non-coding DNA wikipedia , lookup
Metagenomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Genomic library wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
History of genetic engineering wikipedia , lookup
Designer baby wikipedia , lookup
Minimal genome wikipedia , lookup
Microevolution wikipedia , lookup
Genome editing wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
I. Investigating Genomes • 6.1 Introduction to Genomics • 6.2 Sequencing Genomes • 6.3 Bioinformatics and Annotating Genomes © 2015 Pearson Education, Inc. 6.1 Introduction to Genomics • Genome • Entire complement of genetic information • Includes genes, regulatory sequences, and noncoding DNA • Genomics • Discipline of mapping, sequencing, analyzing, and comparing genomes © 2015 Pearson Education, Inc. 6.1 Introduction to Genomics • Several thousand prokaryotic genomes now sequenced and available • RNA virus MS2 • First genome; sequenced in 1976 • 3,569 bp • Haemophilus influenzae • First cellular genome sequenced in 1995 • 1,830,137 bp © 2015 Pearson Education, Inc. 6.2 Sequencing Genomes-Methods • First generation sequencing: • Maxam Gilbert chemical degradation • Controlled breakdown of DNA chains • No enzymes or cloning involved • Hazardous materials • Could not be widely applied © 2015 Pearson Education, Inc. 6.2 Sequencing Genomes-methods • First generation sequencing: • Sanger dideoxy method • Invented by Nobel Prize winner Fred Sanger • Chain termination method • Relies on extension of primer by DNA polymerase • Utilized cloned DNA © 2015 Pearson Education, Inc. Missing OH Normal deoxyribonucleotide Dideoxyribonucleotide DNA chain Direction of chain growth No free 3′-OH; replication will stop at this point © 2015 Pearson Education, Inc. Figure 6.1 DNA strand to be sequenced 3′C G A C T C G A T T C 5′ 5′ G C T G 3′ Add DNA polymerase, mixture Radioactive of all four deoxyribonucleotide DNA primer triphosphates; separate into four reaction tubes. Only one dideoxyribonucleotide triphosphate (ddGTP, ddATP, ddTTP, or ddCTP) is added to each tube and the reaction is allowed to proceed. Reaction products ddGTP ddATP ddCTP ddTTP -A (1) A G C -T (4) A G -C (3) A -G (2) A G C T A A -G (7)A G C T -A (5) A G C T A -A (6) G © 2015 Pearson Education, Inc. A T C Figure 6.2a © 2015 Pearson Education, Inc. Figure 6.2c 6.2 Sequencing Genomes-methods • Second-generation DNA sequencing • Generates data 100x faster than Sanger method • Massively parallel methods • Large number of samples sequenced side by side • Uses increased computer power, robotics and miniaturization © 2015 Pearson Education, Inc. 6.2 Sequencing Genomes-methods • Third-generation DNA sequencing • Continued technical improvement and miniaturization • Sequencing of single molecules of DNA © 2015 Pearson Education, Inc. 6.2 Sequencing Genomes-methods • Fourth-generation DNA sequencing • Optical detection no longer used • Detection of tiny ionic changes © 2015 Pearson Education, Inc. 6.2 Sequencing Genomes-methods • Fourth-generation DNA sequencing • Oxford Nanopore Technologies system (Figure 6.4b) • Passes DNA through nanoscale biological pores • Detector measures change in electric current • Extremely fast • Measures long chains of DNA © 2015 Pearson Education, Inc. Double-stranded DNA Protein nanopore As DNA passes through the nanopore, base-specific electrical charges are emitted. Singlestranded Electrical signal DNA to monitor Nanopore sequencing © 2015 Pearson Education, Inc. Figure 6.4b 6.2 Sequencing Genomes-Strategies • Two possible approaches • Map and order fragments first-then sequence (the “Gold Standard”) • Shotgun sequencing: sequence first then map and order (“Fast and Dirty”) © 2015 Pearson Education, Inc. 6.2 Sequencing Genomes • Virtually all genomic sequencing projects use shotgun sequencing strategy • Entire genome is cloned or fragmented randomly and resultant pieces are sequenced • Much of the sequencing is redundant • Generally 7- to 10-fold coverage • Computer algorithms are used to look for replicate sequences and organize them © 2015 Pearson Education, Inc. 6.2 Sequencing Genomes • Genome assembly consists of connecting the DNA fragments in the correct order • Occasionally assembly is not possible quickly • Closure can be pursued using PCR to target areas of the genome • Closed vs. draft genome • Closed genome relies on direct human activity • More expensive • More information © 2015 Pearson Education, Inc. 6.3 Bioinformatics and Annotating Genomes • Annotation: converting raw sequence data into a list of genes present in the genome • Annotation is "bottleneck" in genomics • Bioinformatics • Science that applies powerful computational tools to DNA and protein sequences • For the purpose of analyzing, storing, and accessing the sequences for comparative purposes © 2015 Pearson Education, Inc. 6.3 Bioinformatics and Annotating Genomes • Functional ORF: an open reading frame that encodes a protein • Computer algorithms used to search for ORFs (Figure 6.6) • Look for start/stop codons and Shine–Dalgarno sequences, codon bias • ORFs can be compared to ORFs in other genomes © 2015 Pearson Education, Inc. Structure of an ORF Ribosomal Start binding site codon Stop codon Coding sequence 4. Computer finds possible RBS. 5. Computer calculates codon bias in ORF. © 2015 Pearson Education, Inc. 1. Computer finds possible start codons. 2. Computer finds possible stop codons. 3. Computer counts codons between start and stop. 6. Computer decides if ORF is likely to be genuine. 7. List of probable ORFs Figure 6.6 6.3 Bioinformatics and Annotating Genomes • Number of genes with role that can be clearly identified in a given genome is 70% or less of total ORFs detected • Hypothetical proteins: uncharacterized ORFs; proteins that likely exist but whose function is currently unknown • Likely encode nonessential genes • In E. coli, many predicted to encode regulatory or backup proteins © 2015 Pearson Education, Inc. 6.3 Bioinformatics and Annotating Genomes • Noncoding RNA: RNA that does not code for protein • Lack start codons and have multiple stop codons • Examples • Transfer RNA (tRNA) • Ribosomal RNA (rRNA) • Noncoding regulatory RNA molecules © 2015 Pearson Education, Inc. II. Microbial Genomes • 6.4 Genome Size and Content • 6.5 Genomes of Organelles • 6.6 Eukaryotic Microbial Genomes © 2015 Pearson Education, Inc. 6.4 Genome Size and Content • Correlation between genome size and ORFs (Figure 6.7) • On average, a prokaryotic gene is 1,000 bp long • ~1,000 genes per megabase (1 Mbp = 1,000,000 bp) • As genome size increases, gene content proportionally increases © 2015 Pearson Education, Inc. © 2015 Pearson Education, Inc. Figure 6.7 6.4 Genome Size and Content • Smallest cellular genomes belong to “parasitic” or endosymbiotic prokaryotes • Obligate parasites may be as low as 490 kbp (Nanoarchaeum equitans) or 525 kbp (Mycoplasma genitalium) • Endosymbionts can be smaller (e.g., 160-kbp genome of Carsonella ruddii-an endosymbiont of plant lice) • Estimates suggest the minimum number of genes for a viable cell is 250–300 genes © 2015 Pearson Education, Inc. 6.4 Genome Size and Content • Largest prokaryotic genomes comparable to those of some eukaryotes • Sorangium cellulosum (Bacteria) • Soil microbe • Largest prokaryotic genome to date at >12.3 Mbp • Largest archaeal genomes tend to be smaller (~5 Mbp) © 2015 Pearson Education, Inc. 6.4 Genome Size and Content • Many genes can be identified by sequence similarity to genes found in other organisms (comparative analysis) • Comparative analyses allow for predictions of metabolic pathways and transport systems • Example:Thermotoga maritima (Figure 6.9) © 2015 Pearson Education, Inc. © 2015 Pearson Education, Inc. Figure 6.9 6.5 Genomes of Organelles • Mitochondria and chloroplasts contain a “small” genome • Cyanobacteria -> chloroplast (cp or ct) • Ct is one member of plastid family • Rickettsiae (?????) -> mitochondrion (mt) • Small, obligate intracellular parasites • Human diseases-typhus, RMSF © 2015 Pearson Education, Inc. 6.5 Genomes of Organelles • Known chloroplast genomes • Circular (usually) DNA molecules (Figure 6.11) • Typically 120–170 kbp • Usually contain two inverted repeats of 6–76 kbp (rRNA, tRNA) • About 100 genes, many for photosynthesis or protein synthesis, transport • Introns common; primarily of self-splicing type © 2015 Pearson Education, Inc. 6.5 Genomes of Organelles • Known mitochondrial genomes • Diverse structures; some linear and some circular • Primarily encode proteins for oxidative phosphorylation • Use simplified genetic codes rather than "universal" code • Some contain small plasmids • Human mitochondrial genome contains 37 genes in 16.5 kbp (Figure 6.12) © 2015 Pearson Education, Inc. 6.5 Genomes of Organelles • Many genes in the nucleus encode proteins required for organelle function • Organelle, nucleus must cooperate • Examples: translational machinery, energy generation • RUBISCO © 2015 Pearson Education, Inc. 6.5 Genomes of Organelles • Genome reduction in organelles • Cyanobacteria: 1500 genes, chloroplasts-100 • Rickettsiae: 8-900 genes, mitochondria-37 • Many insects and other invertebrates contain symbiotic bacteria (Figure 6.15) • Symbiont no longer capable of existing independently • Symbiont provides nutrients to host • Host cannot survive without symbiont © 2015 Pearson Education, Inc. 6.6 Eukaryotic Microbial Genomes • Largest eukaryotic genome belongs to Trichomonas vaginalis • Parasite • pathogen • ~60,000 genes estimated • Count likely to change © 2015 Pearson Education, Inc. 6.6 Eukaryotic Microbial Genomes • Smallest eukaryotic cellular genome belongs to Encephalitozoon cuniculi • Intracellular fungal pathogen • Haploid genome contains 11 chromosomes • Genome size 2.9 Mbp; ~2,000 genes • Smallest eukaryotic genome belongs to nucleomorph • Degenerate remains of a eukaryotic endosymbiont • Ranges in size from 0.45 to 0.85 Mbp © 2015 Pearson Education, Inc. 6.6 Eukaryotic Microbial Genomes • The haploid yeast genome is more representative • Contains 16 chromosomes, ranging in size from 220 kbp to 2,352 kbp • Entire genome is ~13,400 kbp; encodes ~6,000 ORFs; ~4,000 encode proteins with known function • About 900 ORFs are essential • Contains a large amount of repetitive DNA © 2015 Pearson Education, Inc. III. Functional Genomics • 6.7 Microarrays and the Transcriptome • 6.8 Proteomics and the Interactome • 6.9 Metabolomics and Systems Biology • 6.10 Metagenomics © 2015 Pearson Education, Inc. 6.7 Microarrays and the Transcriptome • Transcriptome • The entire complement of RNA produced under a given set of conditions • Hybridization techniques can be used in conjunction with genomic sequence data to measure gene expression © 2015 Pearson Education, Inc. 6.7 Microarrays and the Transcriptome • Hybridization allows identification of complementary nucleic acid sequences © 2015 Pearson Education, Inc. 6.7 Microarrays and the Transcriptome Small solid-state supports to which genes or portions of genes are fixed and arrayed spatially in a known pattern Complementary matches light up specific spots Allows identification of mRNAs that come from specific genes © 2015 Pearson Education, Inc. Gene “X” Hybridizes only To its own mRNA Gene X Gene Y Gene Z Synthesize short ss oligonucleotides complementary to genes X, Y, and Z. Affix DNA to chip at known locations. Gene X DNA chip Growth condition 1 © 2015 Pearson Education, Inc. Gene Y Gene Z Growth Probe chip with condition labeled mRNA and scan chip. 2 Gene X expressed Gene X not expressed Genes Y and Z not expressed Genes Y and Z expressed Figure 6.17 6.7 Microarrays and the Transcriptome • DNA segments on arrays are hybridized with mRNA from cells grown under specific conditions and analyzed to determine patterns of gene expression • Arrays are large and dense enough that the transcription pattern of an entire genome can be analyzed (Figure 6.18) © 2015 Pearson Education, Inc. © 2015 Pearson Education, Inc. Figure 6.18 6.7 Microarrays and the Transcriptome • What can be learned from microarray experiments? • Global gene expression • Expression of specific groups of genes under different conditions • Expression of genes with unknown function; can yield clues to possible roles • Comparison of gene content in closely related organisms • Identification of specific organisms © 2015 Pearson Education, Inc. 6.8 Proteomics and the Interactome • Proteomics • Genome-wide study of the structure, function, and regulation of an organism's proteins © 2015 Pearson Education, Inc. 6.8 Proteomics and the Interactome • Two-dimensional (2-D) polyacrylamide gel electrophoresis • Technique for separating, identifying, and measuring all proteins present in a sample (Figure 6.20) • In first (horizontal) dimension, proteins are separated by differences in isoelectric points (charge) • In second (vertical) dimension, proteins are separated by size Generates pattern of spots-each spot is an individual protein © 2015 Pearson Education, Inc. 6.8 Proteomics and the Interactome • Generates pattern of spots-each spot is an individual protein • Individual spot can be cut out and studies • Mutant and wild type cells can be compared to study protein function Newer technology: HPLC (high pressure liquid chromatography) and mass spectrometry © 2015 Pearson Education, Inc. © 2015 Pearson Education, Inc. Figure 6.20 6.8 Proteomics and the Interactome • Proteomics relies on genomics • Sequence the genome of the organism • Compare to genomes of other organisms • Identify similar genes • Different DNA sequence may not change protein sequence © 2015 Pearson Education, Inc. 6.8 Proteomics and the Interactome • Proteins with >50% sequence similarity typically have similar functions • Proteins with >70% sequence similarity almost certainly have similar functions • Protein domains • Distinct structural modules within proteins • Have characteristic functions that can reveal much about a protein's role, even in the absence of complete sequence homology © 2015 Pearson Education, Inc. 6.8 Proteomics and the Interactome • Interactome • Does not refer to physical things or molecules • Refers to complete set of interactions among molecules • Protein-protein, protein-RNA etc • Data expressed in the form of network diagrams • Simplified version (Figure 6.22) © 2015 Pearson Education, Inc. © 2015 Pearson Education, Inc. Figure 6.22 6.9 Metabolomics and Systems Biology • Metabolome • The complete set of metabolic intermediates and other small molecules produced in an organism • Mass spectrometry is one of the primary techniques for monitoring metabolites • MALDI-TOF = well adapted for biomolecules • Can be used to identify unknown microbial species © 2015 Pearson Education, Inc. 6.9 Metabolomics and Systems Biology • Systems biology • Integration of different fields of research (Figure 6.24) • Genomics • Proteomics • Transcriptonomics • Metabolonomics • Other © 2015 Pearson Education, Inc. Systems biology Top level Compares data and builds a computer model of the system being studied Figure 6.24 © 2015 Pearson Education, Inc. IV. Evolution of Genomes • 6.11 Gene Families, Duplications (and Divergence), Deletions • 6.12 Horizontal Gene Transfer and Genome Stability © 2015 Pearson Education, Inc. 6.11 Gene Families, Duplications, and Deletions • Gene duplications thought to be mechanism for evolution of most new genes (Figure 6.28) • Gene analysis in the three domains of life suggests that many genes present in all organisms have common evolutionary roots • Mechanisms for duplication: • Replication errors • Recombination © 2015 Pearson Education, Inc. “Gene A” is the ancestral gene Gene Duplication Gene Divergence Two Related Genes Gene A and Gene A’ are members of a gene family © 2015 Pearson Education, Inc. Ancestral gene Methionine metabolism Gene duplication RLP alpha RLP beta Duplicate genes Different sequence changes Some sequence changes Gene retains original function © 2015 Pearson Education, Inc. RLP beta and gamma ancestor RLP gamma Duplicate gene Gene evolves new role Transcription and translation Enzyme retains original role. Ancestral gene Enzyme catalyzes novel reaction. Unknown function Purple bacteria RubisCO (large subunit) ancestor RubisCO Cyanobacteria Form II RubisCO Form I RubisCO Methanogens duplicate RubisCO Form III Figure 6.28 Mutations in duplicated gene may eliminate function of gene or gene product “Dead” genes are called pseudogenes Eliminated very quickly from prokaryotic genomes Due to pressure for genetic economy In general, the loss of nonfunctional or unnecessary genes is called gene deletion © 2015 Pearson Education, Inc. 6.11 Gene Families, Duplications, and Deletions • Homologous: related sequence that implies common genetic ancestry • A gene related to a second gene by descent from a common ancestral DNA sequence. The term homolog, may apply to the relationship between genes separated by the event of speciation (see ortholog) or to the relationship between genes separated by the event of genetic duplication (see paralog). • Gene families: groups of gene homologs (Figure 6.27) © 2015 Pearson Education, Inc. Paralogs: genes within an organism whose similarity to one or more genes in the same organism is the result of gene duplication Paralogs are genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one. Paralogs are homologs produced by duplication/divergence within a species Paralogs have homologous origin but heterologous activities. © 2015 Pearson Education, Inc. Orthologs: genes found in one organism that are similar to those in another organism but differ because of speciation Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes. Orthologs are homologs produced by speciation Orthologs have homologous origin and homologous activity. © 2015 Pearson Education, Inc. 6.12 Horizontal Gene Transfer and Genome Stability • Horizontal gene transfer (Figure 6.29) • Major force in evolution but not duplication and divergence • The transfer of genetic information between organisms, as opposed to vertical inheritance from parental organism(s) • May be extensive in nature • May cross phylogenetic domain boundaries © 2015 Pearson Education, Inc. Vertical gene transfer Horizontal gene transfer Chromosome Genome replication and cell division © 2015 Pearson Education, Inc. Figure 6.29 Bacterial Transformation Direct uptake of DNA from environment Followed by recombination (crossover) into host DNA © 2015 Pearson Education, Inc. Bacterial Conjugation © 2015 Pearson Education, Inc. Bacterial Conjugation • Tra operon genes control DNA transfer • Plasmid such as F plasmid may move alone • Or may carry chromosomal genes as well © 2015 Pearson Education, Inc. Bacterial Transduction Bacterial DNA carried from cell to cell by a virus or “phage” © 2015 Pearson Education, Inc. • General or generalized transduction can involve any gene from the chromosome • Special or specialized transduction involves only certain genes • Why? Because the virus that carries the genes has inserted itself into the host chromosome at a specific location © 2015 Pearson Education, Inc. 6.12 Horizontal Gene Transfer and Genome Stability • Detecting horizontal gene flow • Presence of genes typically found only in distantly related species • Presence of a DNA with GC content or codon bias that differs significantly from remainder of genome • “footprints” of gene transfer in the genome • Horizontally transferred genes typically do not encode core metabolic functions © 2015 Pearson Education, Inc. 6.12 Horizontal Gene Transfer and Genome Stability • Insertion sequences aka simple transposons— pieces of transposable DNA whose genes encode only transposition IS = insertion sequence © 2015 Pearson Education, Inc. 6.12 Horizontal Gene Transfer and Genome Stability • Complex Transposons—pieces of transposable DNA whose genes encode transposition genes and some other gene or genes Tn = transposon © 2015 Pearson Education, Inc. Another example comparison of simple and complex transposons © 2015 Pearson Education, Inc. A mechanism for transposition Results in target site duplication © 2015 Pearson Education, Inc. 6.12 Horizontal Gene Transfer and Genome Stability • Transposons may transfer DNA between different organisms • Transposons may also mediate large-scale chromosomal changes within a single organism • Presence of multiple insertion sequences (IS) • Recombination among identical IS can result in chromosomal rearrangements • Examples: deletions, inversions, or translocations © 2015 Pearson Education, Inc. 6.13 Core Genome versus Pan Genome • The "pan"/"core" concept: genomes of bacterial species consist of two components • Core genome: shared by all strains of the species (Figure 6.31) • Pan genome: includes all the optional extras present in some but not all strains of the species (Figure 6.31) © 2015 Pearson Education, Inc. 6.13 Core Genome versus Pan Genome • Chromosomal islands • Region of bacterial chromosome of foreign origin that contains clustered genes for some extra property such as virulence or symbiosis • Pathogenicity islands: chromosomal islands containing genes for virulence (Figure 6.33) © 2015 Pearson Education, Inc. 6.13 Core Genome versus Pan Genome • Chromosomal islands believed to have a "foreign" origin based on several observations • Extra regions often flanked by inverted repeats • Base composition and codon usage in chromosomal islands often differ from rest of genome • Often found in some strains of a species but not others © 2015 Pearson Education, Inc. 6.13 Core Genome versus Pan Genome • Chromosomal islands contribute specialized functions not essential to growth • Virulence • Biodegradation of recalcitrant compounds • For example, hydrocarbons and herbicides • Symbiosis © 2015 Pearson Education, Inc.