* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download How Are Complete Genomes Sequenced?
Essential gene wikipedia , lookup
Gene therapy wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Copy-number variation wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Point mutation wikipedia , lookup
Gene expression programming wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene desert wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genetic engineering wikipedia , lookup
Oncogenomics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Transposable element wikipedia , lookup
Genomic imprinting wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Public health genomics wikipedia , lookup
Human Genome Project wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome (book) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Human genome wikipedia , lookup
Genomic library wikipedia , lookup
Designer baby wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
History of genetic engineering wikipedia , lookup
Metagenomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Genome editing wikipedia , lookup
Minimal genome wikipedia , lookup
Microevolution wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genomics 20 BIOLOGICAL SCIENCE FOURTH EDITION SCOTT FREEMAN Lectures by Stephanie Scher Pandolfi © 2011 Pearson Education, Inc. Introduction • The complete DNA sequence of an organism is its genome. The human genome sequence was published in February 2001 as part of the Human Genome Project. • Genomics is the scientific effort to sequence, interpret, and compare whole genomes. • Genomics provides a list of the genes present in an organism. Functional genomics looks at when those genes are expressed and how their products interact. © 2011 Pearson Education, Inc. Whole-Genome Sequencing • Improved automation has increased the speed and reduced the cost of DNA sequencing. • The primary international repositories for DNA sequence data now contain over 194 billion nucleotides. • With about 3 billion nucleotides, humans have the largest haploid genome sequenced to date. • The size of the database increases by about 30 percent every year. © 2011 Pearson Education, Inc. © 2011 Pearson Education, Inc. How Are Complete Genomes Sequenced? • Most genome sequencing projects use a whole-genome shotgun sequencing approach. • In this process, the genome is broken up into a set of overlapping fragments that are sequenced, and these sequences are then put in order. © 2011 Pearson Education, Inc. The Shotgun Sequencing Process 1. Sonication (use of high-frequency sound waves) breaks a genome into pieces approximately 160 kilobases long. 2. Each piece is inserted into a plasmid called a bacterial artificial chromosome (BAC). A BAC library is created by inserting each BAC into a different Escherichia coli cell. Colonies of each cell are allowed to grow, creating multiple copies of each BAC library. 3. Each 160-kb DNA segment is broken into 1-kb segments. © 2011 Pearson Education, Inc. The Shotgun Sequencing Process 4. Each 1-kb segment is cloned into a plasmid. These plasmids are then inserted into E. coli cells and replicated, producing shotgun clones. 5. The fragments from each clone are then sequenced and analyzed by computer programs. 6. The computer puts the sequences in order, thus reconstructing the BACs. 7. The ends of the reconstructed BACs are similarly analyzed. The goal is to arrange each 160-kb segment in its correct position along the chromosome, based on regions of overlap. © 2011 Pearson Education, Inc. © 2011 Pearson Education, Inc. © 2011 Pearson Education, Inc. © 2011 Pearson Education, Inc. © 2011 Pearson Education, Inc. The Shotgun Sequencing Process • In essence, the shotgun strategy consists of breaking a genome into tiny fragments, sequencing the fragments, and then putting the sequence data back into the correct order. © 2011 Pearson Education, Inc. The Role of Next-Generation Sequences Strategies • Pyrosequencing is a cheaper and faster alternative to traditional sequencing. • It takes place on a single DNA fragment rather than multiple copies of the same fragment. © 2011 Pearson Education, Inc. How Are Complete Genomes Sequenced? • Bioinformatics is the effort to manage, analyze, and interpret biological information, and is key to managing the vast quantity of data generated by genome sequencing. © 2011 Pearson Education, Inc. Which Genomes Are Being Sequenced, and Why? • The first genome of an organism to be sequenced was that of the bacterium Haemophilus influenzae in 1995; it consists of about 1.8 million base pairs. • The first eukaryotic genome to be sequenced was that of the yeast Saccharomyces cerevisiae in 1996. • To date, complete genomes have been sequenced from over 800 species. • Most of the organisms that have been sequenced cause disease or have other interesting biological properties. © 2011 Pearson Education, Inc. Which Sequences Are Genes? The most basic task in annotating or interpreting a genome is to identify which bases constitute genes. • Identifying genes is relatively straightforward in bacteria and archaea but is much more difficult in eukaryotes, who have many noncoding sequences in their genomes. © 2011 Pearson Education, Inc. Identifying Genes in Bacterial and Archaeal Genomes • Computer programs are used to scan a genome sequence in both directions in order to identify open reading frames (ORFs). ORFs are possible genes—long stretches of sequence that lack a stop codon but are flanked by a start codon and a stop codon. • The computer programs also look for sequences typical of promoters, operators, and other regulatory sites. • Researchers can confirm that an ORF is actually a gene by analyzing its product or by finding that it is homologous (similar due to common ancestry) to a known gene. © 2011 Pearson Education, Inc. © 2011 Pearson Education, Inc. Identifying Genes in Eukaryotic Genomes • In eukaryotic organisms, genes contain introns, and most of the genome does not code for a product—thus, it is not possible to scan for ORFs. • The most effective strategy for identifying genes is to use reverse transcriptase to produce a cDNA version of each mRNA, and sequence a portion of the resulting molecule to produce an expressed sequence tag, or EST. ESTs represent protein-coding genes. © 2011 Pearson Education, Inc. Bacterial and Archaeal Genomes • By sequencing the genomes of various strains of the same prokaryotic species, researchers can now compare the genomes of closely related organisms that have different ways of life. © 2011 Pearson Education, Inc. The Natural History of Prokaryotic Genomes In bacteria, there is a general correlation between the size of the genome and the metabolic capabilities of the organism. • The function of many bacterial genes is still unknown. • There is tremendous genetic diversity among bacteria and archaea. About 15 percent of the genes in a prokaryotic genome are unique to its own species. • Redundancy among genes is common. Some genes are found multiple times within a prokaryotic genome. © 2011 Pearson Education, Inc. The Natural History of Prokaryotic Genomes • Multiple chromosomes and plasmids are more common than expected. • In many bacterial and archaeal species, a significant portion of the genome appears to have been acquired from other, often distantly related, species. © 2011 Pearson Education, Inc. © 2011 Pearson Education, Inc. Lateral Gene Transfer • The movement of DNA from one species to another species is called lateral gene transfer. • Recent evidence suggests that over 50 percent of archaean species and 30–50% of bacterial species have at least one gene acquired by lateral gene transfer. © 2011 Pearson Education, Inc. Evidence for Lateral Gene Transfer • Two general criteria support the hypothesis that sequences in bacterial or archaeal genomes originated in another species: 1. A gene is much more similar to genes in distantly related species than it is to those in closely related species. 2. When the proportion of G-C base pairs to A-T base pairs in a particular gene or series of genes is markedly different from the base composition of the rest of the genome. © 2011 Pearson Education, Inc. How Does Lateral Gene Transfer Occur? • Lateral gene transfer often results because genes are carried on plasmids. • Another way lateral gene transfer occurs is through transformation, taking up DNA fragments from the environment. • Thus, mutation and genetic recombination within species are not the only sources of genetic variation in bacteria and archaea. © 2011 Pearson Education, Inc. Eukaryotic Genomes Many eukaryotic genomes are dominated by repeated DNA sequences that occur between genes or inside introns and do not code for products used by the organism. • Sequencing eukaryotic genomes presents unique challenges. – Eukaryotic genomes are much larger than the genomes of bacteria and archaea. – The presence of noncoding repetitive sequences. © 2011 Pearson Education, Inc. Parasitic and Repeated Sequences • Protein-coding sequences constitute a very small percentage of the human genome, and repetitive sequences make up more than 50 percent. In contrast, over 90 percent of the prokaryotic genome consists of genes. • Repeated sequences in the human genome are often the result of transposable elements—segments of DNA that can move from one location in a genome to another. © 2011 Pearson Education, Inc. Characteristics of Transposable Elements • Transposable elements are examples of selfish genes—parasitic DNA sequences that survive and reproduce but that do not increase the fitness of the host genome. • Transposable elements are classified as parasitic because they decrease their host’s fitness: – It takes time and resources to copy them along with the rest of the genome. – They can disrupt gene function when they insert in a new location. © 2011 Pearson Education, Inc. Repeated Sequences • Eukaryotic genomes have several thousand loci called short tandem repeats (STRs). These are small sequences repeated down the length of a chromosome. There are two types of STRs. 1. Microsatellites, or simple sequence repeats, are repeating units of 1 to 5 bases. 2. Minisatellites, or variable number terminal repeats (VNTRs), are repeating units of 6 to 500 bases. • Repeated sequences are hypervariable and vary among individuals much more than any other type of sequence. © 2011 Pearson Education, Inc. Repeated Sequences and DNA Fingerprinting • DNA fingerprinting refers to any technique for identifying individuals on the basis of unique features of their genomes. • Because microsatellite and minisatellite loci vary so much among individuals, they are now the markers of choice for DNA fingerprinting. © 2011 Pearson Education, Inc. DNA Fingerprinting Process • A sample of DNA is acquired from the individual. • PCR is performed using primers that flank a region containing an STR. • The region is cloned. • The region can be analyzed to determine the number of repeats present. © 2011 Pearson Education, Inc. © 2011 Pearson Education, Inc. Gene Families • In eukaryotes, the major source of new genes is duplication of existing genes. • Within a species, genes that are extremely similar to each other in structure and function are considered to be part of the same gene family. • Genes that make up gene families are hypothesized to have arisen from a common ancestral sequence through gene duplication. © 2011 Pearson Education, Inc. How Do Gene Families Arise? • When gene duplication occurs, an extra copy of a gene is added to the genome. • The most common type of gene duplication results from unequal crossing over during meiosis. • The redundancy of duplicated genes may allow one copy to mutate to create a new gene with different function or regulation, possibly leading to the evolution of novel traits. © 2011 Pearson Education, Inc. © 2011 Pearson Education, Inc. New Genes—New Functions? • Gene duplication is important because the original gene is still functional and produces a normal product. • The duplicated gene may: 1. Retain its original function and provide additional quantities of the same product. 2. Undergo mutation resulting in a beneficial altered protein, thus creating an important new gene. 3. Be a nonfunctional pseudogene, a remnant of a functional copy of the gene that does not produce a working product. © 2011 Pearson Education, Inc. Why Do Humans Have So Few Genes? • A surprising observation about eukaryotic genomes is that organisms with complex morphology and behavior do not appear to have large numbers of genes. • Before the human genome was sequenced, scientists expected that humans would have at least 100,000 genes. However, the actual sequence revealed that we have only about 20,000 genes. • The alternative-splicing hypothesis proposes that certain multicellular eukaryotes do not need large numbers of genes because alternative splicing creates different proteins from the same gene. © 2011 Pearson Education, Inc. Similarities between Human and Chimp Genomes • At the level of base sequence, the human and chimpanzee genomes are 98.8 percent identical. • This raises the question of how humans and chimps can be so similar genetically but so different in morphology and behavior. • One hypothesis proposes that even though many structural genes (those that code for products) in humans and chimps are identical, regulatory genes (those that code for regulatory transcription factors) of the two species might have important differences. © 2011 Pearson Education, Inc. Functional Genomics and Proteomics • Whole-genome data can be used to answer fundamental questions about how organisms work. • Large-scale analyses of gene expression are called functional genomics. • One of the basic tools of functional genomics is a DNA microarray. Microarrays, used to study gene expression, consist of a large number of single-stranded DNAs that are permanently affixed to a glass slide. © 2011 Pearson Education, Inc. © 2011 Pearson Education, Inc. How Are DNA Microarrays Used? • mRNAs produced in two contrasting types of cells are isolated, and then cDNAs produced from these mRNAs are used to probe the microarray. • Researchers can thus identify differences in which genes are expressed in the two cell types. • A microarray allows researchers to study the expression of thousands of genes at a time, and to identify which sets of genes are expressed together under specific sets of conditions. © 2011 Pearson Education, Inc. © 2011 Pearson Education, Inc. © 2011 Pearson Education, Inc. © 2011 Pearson Education, Inc. What Is Proteomics? • A transcriptome is the complete set of genes that are transcribed in a particular cell. – A proteome is the complete set of proteins that are produced. • Proteomics is the large-scale study of protein function. • Instead of studying individual proteins or how two proteins might interact, proteomics is based on studying all of the proteins present at once. © 2011 Pearson Education, Inc. Applied Genomics: Understanding Cancer Researchers are using tools created by advances in genomics to deepen our understanding of cancer. – Microarrays allow researchers to compare gene expression in normal versus cancerous cells. – The Human Genome Project has revealed common sets of genes that are mutated in cancerous cells. – The complete genome sequences of cancerous and noncancerous cells from the same person identified over 600 mutations in the cancerous cells. © 2011 Pearson Education, Inc.