* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The E. coli genome. - life.illinois.edu.
Nutriepigenomics wikipedia , lookup
Gene desert wikipedia , lookup
Human genetic variation wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Copy-number variation wikipedia , lookup
Essential gene wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genetic engineering wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Oncogenomics wikipedia , lookup
Transposable element wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Public health genomics wikipedia , lookup
Designer baby wikipedia , lookup
Non-coding DNA wikipedia , lookup
Microevolution wikipedia , lookup
Genome (book) wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Whole genome sequencing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Human genome wikipedia , lookup
Helitron (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Metagenomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Genome editing wikipedia , lookup
Human Genome Project wikipedia , lookup
Genomic library wikipedia , lookup
IB404 - 3. Bacterial Genomics - Jan 25 1. Fred Sanger sequenced the first complete genomes, e.g. the 5kbp genome of the phiX174 phage in 1978, the 16kb human mitochondrial genome in 1981, and then developed the method of whole genome shotgun cloning and sequencing to determine the 48kb lambda phage genome in 1982. 2. All alternatives, such as primer-walking, nested deletions, transposoninsertions, etc., involve additional costs. 3. When faced with the 4.6 Mbp E. coli genome, Fred Blattner at the University of Wisconsin, chose to map the genome physically as overlapping large clones, before shotgun sequencing each clone to build the genome. It took a decade, using mostly manual radioactive sequencing, finally published in 1997. It is annotated as containing about 4,500 genes, so one gene per kb (generally true for bacteria and viruses, e.g. 10kb HIV has 10 genes). The E. coli genome. The origin and terminus of replication are shown as green lines, with blue arrows indicating replichores 1 and 2. A scale indicates the coordinates both in base pairs and in “minutes” of recombination. The distribution of genes is depicted on two outer rings: The orange boxes are genes located on the presented strand, and the yellow boxes are genes on the opposite strand. Red arrows show the location and direction of transcription of rRNA genes, and tRNA genes are shown as green arrows. 4. Craig Venter, who had already shaken up the human genome field by generating large numbers of ESTs (expressed sequence tags) from the ends of randomly picked human cDNA clones at his new TIGR institute (The Institute for Genome Research – later run by his wife, Claire Fraser) in Maryland, tried a whole genome shotgun (WGS) in 1995 to sequence the 1.8 Mbp genome of Haemophilus influenzae and the 0.58 Mbp genome of Mycoplasma genitalium, together with Hamilton Smith at Johns Hopkins (he grew up here, went to Uni and UIUC, won a Nobel for the first endonuclease restriction enzyme in H. influenzae). J. Craig Venter Claire Fraser Hamilton Smith Origin of replication 60kb total here H. influenzae genome - outer circle is genes in one direction, inner circle the other. Colors are functional categories, e.g. enzyme, channel, receptor, repair, transporter, structural, replication, transcription, translation, etc. Arrowhead is the origin of replication. Detail of region around the origin of replication. Note that there is little “spacer” DNA between genes. There are operons of multiple genes. Not all genes are named or had known functions, e.g. HIN0006, at least when this was done. Even today, ~100 of the 483 genes in M. genitalium have unknown functions. Whole genome shotgun sequencing strategy 1. Randomly shear genomic DNA into small pieces, size-fractionate on a gel (e.g. only 2-3kb or 9-11kb pieces), and clone in a plasmid. 2. Sequence each randomly picked plasmid clone insert from each end using flanking primers that anneal to the plasmid vector sequence. These plasmid insert end sequences don’t usually overlap, but their orientation and a rough size are known - they are mate-pairs. 3. Do this enough times that you have generated 6-10X coverage of the entire genome, usually from roughly 20-30X clone coverage. 4. Use an assembly program to build the genome, for bacteria usually circular, by first building contigs of contiguous overlapping sequence, and then link these contigs into scaffolds using mate-pair information, leaving sequence gaps between contigs. 5. Finish sequence gaps, and any clone gaps between scaffolds, by directed methods, e.g. using PCR with primers to the ends of contigs or scaffolds to amplify across gaps and sequence the purified PCR products, usually directly without cloning them. WGS schema One plasmid clone with two mate-pairs sequenced from ends – dots are unknown sequence. Contig1 Sequence gap A scaffold Contig2 Clone gap Many bacterial genomes 1. Today there are >2000 genomes available and >200 from Archaea. 2. For example, Blattner sequenced several strains of E. coli, including the “hamburger” strain, and related Shigella and Salmonella species, yielding information on pathogenicity islands of genes implicated in causing disease. 3. Many others are other famous pathogens, e.g. Borrelia burgdorferi, Helicobacter pylori, Treponema pallidum, Neisseria menigitidis, Yersinia pestis, and Vibrio cholera. 4. Others exhibit unusual biology, e.g. Deinococcus radiodurans, Thermatoga maritima, and Methanococcus jannaschii. 5. They range in size from around 0.5 Mbp for various intracellular parasites, such as Buchnera species, to over 12 Mbp for Streptomyces species, which form colonies making antibiotics. 6. The small genomes of intracellular parasites result from gene loss, e.g. Rickettsia only have about 800 genes, while the aphid endosymbiont Buchnera genome is largely colinear with E. coli, but has lost about 4000 genes! 7. The phylogenetic trees derived from these genome sequences largely agree with the 3-domain 16S rRNA-based trees of Carl Woese, but only when the core set of replication, transcription, and translation proteins are employed. 8. When other gene sets are examined the result is usually a web rather than a tree, indicating that horizontal gene transfer between distantly related bacteria, and even archaea, but seldom eukaryota, has been widespread. Metagenomics Venter and others have continued to push the envelope of bacterial genome sequencing, most prominently by doing metagenomics, in which genomic DNA is extracted from environmentally collected samples, e.g. ocean water or a mine dump or human skin, without trying to culture bacteria, and sequenced extensively. These studies have confirmed that there is an extraordinary diversity of uncultured Bacteria and Archaea out there, and that some have entirely novel metabolic abilities. They also confirm that there are only the known three domains of life. When the sample is relatively simple, e.g. a few species from a toxic mine sample, entire circular genomes will sometimes assemble. Otherwise they generally obtain long scaffolds containing multiple genes together in operons, which is often enough to define metabolic pathways. Today a major effort is underway to do this for human commensal bacteria, called the microbiome, including oral, gut, vaginal, and skin bacterial communities. As an example of the kinds of findings from this work, last year a group published an analysis of the frequency of horizontal gene transfer (HGT)across bacteria that are human commensals versus those that are not. They had ~1000 genomes in each category, and looked for regions with 99% DNA sequence identity in species with <97% rRNA identity (so they were not closely related). They found high levels of HGT across human commensals, and even higher HGT across species living in the same regions of the human body. Thus ecology facilitates or drives HGT.