* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Inquiry into Life Twelfth Edition
Epigenetics of neurodegenerative diseases wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Essential gene wikipedia , lookup
Human genetic variation wikipedia , lookup
Point mutation wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Gene desert wikipedia , lookup
Y chromosome wikipedia , lookup
Copy-number variation wikipedia , lookup
Neocentromere wikipedia , lookup
Oncogenomics wikipedia , lookup
Genetic engineering wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Transposable element wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Gene expression programming wikipedia , lookup
X-inactivation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Public health genomics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Metagenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Human Genome Project wikipedia , lookup
Helitron (biology) wikipedia , lookup
History of genetic engineering wikipedia , lookup
Microevolution wikipedia , lookup
Genomic library wikipedia , lookup
Designer baby wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome editing wikipedia , lookup
Human genome wikipedia , lookup
Gene expression profiling wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome (book) wikipedia , lookup
Lecture PowerPoint to accompany Molecular Biology Fourth Edition Robert F. Weaver Chapter 24 Genomics, Proteomics, and Bioinformatics Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 24.1 Positional Cloning • Positional cloning is a method for discovery of genes involved in genetic traits • Positional cloning was very difficult in the absence of genomic information • Begins with mapping studies to pin down the location of the gene of interest to a relatively small region of DNA 24-2 Classical Tools of Positional Cloning • Mapping depends on a set of landmarks to which gene position can be related • Restriction Fragment Length Polymorphisms (RFLP) are landmarks with lengths of restriction fragments given by a specific enzyme vary from one individual to another • Exon Traps use a special vector to help clone exons only • CpG Islands are DNA regions containing unmethylated CpG sequences 24-3 Detecting RFLPs 24-4 Exon Trapping 24-5 Identifying the Gene Mutated in a Human Disease • Using RFLps, geneticists mapped the Huntington disease gene (HD) to a region near the end of chromosome 4 • Used an exon trap to identify the gene itself • Mutation causing the disease is an expansion of a CAG repeat from the normal range of 11-34 copies to abnormal range of at least 38 copies • Extra repeats cause extra Glu inserted into huntingtin, product of the HD gene 24-6 24.2 Sequencing Genomes • What information can be gleaned from genome sequence? – Location of exact coding regions for all the genes – Spatial relationships among all the genes and exact distances between them • How is coding region recognized? – Contains an ORF long enough to code for a phage protein – ORF must • Start with ATG triplet • End with stop codon – Phage or bacterial ORF is the same as a gene’s coding region 24-7 Phage X174 Genome • First genome sequenced was a very simple one, phage X174 – Completed by Sanger in 1977 – 5375-nt complete • Note that some of these phage genes overlap 24-8 Genome Results • The base sequences of viruses and organisms that have been obtained range from: – – – – Phages Bacteria Animals Plants • A rough draft and finished versions of the human genome have also been obtained • Comparison of the genomes of closely related and more distantly related organisms can shed light on the evolution of these species 24-9 Sequencing Milestones 24-10 The Human Genome Project • In 1990, geneticists started to map and ultimately sequence the entire human genome • Original plan was systematic and conservative – Prepare genetic and physical maps of genome with markers to allow piecing DNA sequences together in proper order – Most sequencing would be done only after mapping was complete 24-11 1998 – Human Genome Project • Celera, a private, for-profit company, shocked genomic community by announcing Celera would complete a rough draft of human genome by 2000 • Method that would be used was shotgun sequencing, whole human genome would be chopped up and cloned – Clones sequenced randomly – Sequences would be pieced together using computer programs 24-12 Vectors for Large-Scale Genome Projects • Two high-capacity vectors have been used extensively in the Human Genome Project – Mapping was done mostly using the yeast artificial chromosome, accepts million base pairs – Sequencing with bacterial artificial chromosomes accepting about 300,000 bp 24-13 • BACs are more stable, easier to work with than YACs Clone-by-Clone Strategy • Mapping the human genome requires a set of landmarks to which we can relate the positions of genes • Some of these markers are genes, many more are nameless stretches of DNA – RFLPs – VNTRs, variable number tandem repeats – STSs, sequence-tagged sites, expressedsequence tags and microsatellites 24-14 Variable Number Tandem Repeats • VNTRs derive from minisatellites, stretches of DNA that contain a short core sequence repeated over and over in tandem (head to tail) • The number of repeats of the core sequence in a VNTR is likely to be different from one individual to another – So VNTRs are highly polymorphic – This makes them relatively easy to map – Disadvantage as genetic markers as they tend to bunch together at chromosome ends 24-15 Sequence-Tagged Sites • STSs are short sequences – 60-1000 bp long – Detectable by PCR • Can design short primers – Hybridize few hundred bp apart – Amplify a predictable length of DNA 24-16 Sequence-Tagged Sites Mapping 24-17 Microsatellites • STSs are very useful in physical mapping or locating specific sequences in the genome – Worthless as markers in traditional genetic mapping unless polymorphic • Microsatellites are a class of STSs that are highly polymorphic – Similar to minisatellites – Consist of a core sequence repeated over and over many times in a row – Core here is 2-4 bp long, much shorter 24-18 Contig • A set of clones used by geneticists in physically mapping or sequencing a given region is called a contig – Contains contiguous (or overlapping) DNAs spanning long distances – Used like putting together a jigsaw puzzle – Easier to complete with bigger pieces – Helpful to assemble in overlapping fashion 24-19 Shotgun Sequencing Massive sequencing projects can take two forms: 1. Map-then-sequence strategy – Produces physical map of genome including STSs – Sequences clones (mostly BACs) used in mapping – Places sequences in order to be pieced together 2. In the shotgun approach – Assembles libraries of clones with different size inserts – Sequences the inserts at random – Relies on computer program to find areas of overlap among sequences and piece them together 24-20 Shotgun-Sequencing Method 24-21 Sequencing Standards • A “working draft” may be: – Only 90% complete – Error rate of up to 1% • A “final draft” (less consensus): – Error rate of less than 0.01% – Should have as few gaps as possible • Some researchers require a “final draft” is not completely sequenced until every last gap is completed 24-22 Sequencing the Human Genome • First chromosome completed in the Human Genome Project was chromosome 22 in late 1999 • Second completed was chromosome 21 • These are the 2 smallest human autosomes, have very valuable sequence information 24-23 Chromosome 22 • Only the long arm (22q) was sequenced • Short arm (22p) is composed of pure heterochromatin, likely devoid of genes • 11 gaps remained in the sequence – 10 are gaps between contigs likely due to “unclonable” DNA – Other a 1.5-kb region of cloned DNA that resisted sequencing 24-24 Findings from Chromosome 22 1. We must learn to live with gaps in our sequence 2. 679 annotated genes categorized as: – 274 Known genes, previously identified – 150 Related genes, homologous to known genes – 148 Predicted genes, sequence homology to ESTs – 134 Pseudogenes, sequences are homologous to known genes, but contain defects that preclude proper expression 24-25 Contigs and Gaps 24-26 More From Chromosome 22 3. Coding regions of genes account for only tiny fraction of length of the chromosome • • • Annotated genes are 39% of total length Exons are only 3% Repeat sequences (Alu, LINEs, etc) are 41% 4. Rate of recombination varies across the chromosome • Long regions of low recombination interspersed with short regions where it is relatively frequent 24-27 Repetitive DNA Content 24-28 More From Chromosome 22 5. There are local and long-range duplications • Immunoglobin l locus • 36 gene segments are clustered together that can encode variable regions • 60-kb region is duplicated with greater than 90% fidelity almost 12 Mb away • Duplications found in few copies, low-copy repeats 6. Large chunks of human chromosome 22q are conserved in several different mouse chromosomes • 113 human genes with mouse orthologs mapped to mouse chromosomes 24-29 Homologs • Orthologs are homologous genes in different species that evolved from a common ancestor – 8 regions on 7 mouse chromosomes • Paralogs are homologous genes that evolved by gene duplication within a species • Homologs are any kind of homologous genes, both orthologs and paralogs 24-30 Regions of Conservation 24-31 Chromosome 21 • Human chromosome 21q, and some of 21p have been sequenced • Gaps remaining are relatively few and short • Sequence reveals a relative poverty of genes – 225 genes – 59 pseudogenes • All 24 genes known to be shared between mouse chromosome 10 and human chromosome 21 are in the same order in both chromosomes 24-32 The X Chromosome • The sequence of 151 Mb of human X chromosome (99.3% of its euchromatin) revealed 1098 protein-encoding genes – 168 genes governing X-linked phenotype – Genes for 173 noncoding RNAs • Chromosome is rich in LINE1 elements – May serve as way station for X inactivation mechanism in female cells 24-33 X Chromosome Orthologs • Comparison of the X chromosome sequence with the chicken whole genome confirmed that X (and partner Y) evolved from an ancestral pair of autosomes • Comparison of 3 mammalian X chromosome sequences demonstrate high degree of synteny among these chromosomes • This synteny likely reflects high degree of evolutionary pressue to keep order of genes on X chromosome relatively stable 24-34 Human Genome Project Status • Working draft of human genome reported by 2 groups allowed estimates that genome contains fewer genes than anticipated – 25,000 to 40,000 • About half the genome has derived from the action of transposons • Transposons themselves have contributed dozens of genes to the genome • Bacteria also have donated dozens of genes • Finished draft is much more accurate than working draft, but there are still gaps • Information also about gene birth and death during human evolution 24-35 Other Vertebrate Genomes • Comparing human genome with that of other vertebrates has taught us much about similarities and differences among genomes – Comparison has also helped to identify many human genes – In future, will likely help identify defective genes involved in human genetic diseases • Closely related species like mouse can be used to find when and where genes are expressed so predict when and where human genes are likely expressed 24-36 The Minimal Genome • It is possible to define the essential gene set of a simple organism – Mutate one gene at a time – See which genes are required for life • In theory, also possible to define the minimal genome= set of genes that is minimum required for life – Minimum genome likely larger than the essential gene set • In principle, possible to place minimal genome into a cell lacking genes of its own, create a new life form that can live and reproduce under lab 24-37 conditions The Barcode of Life • There is a movement which has begun to create a barcode to identify any species of life on earth • The first such barcode will consist of the sequence of a 648-bp piece of mitochondrial COI gene from each organism • This sequence is sufficient to identify uniquely almost any organism • Other sequences will be worked out for plants and perhaps later for bacteria 24-38 24.3 Applications of Genomics: Functional Genomics • Functional genomics refers to those areas that deal with the function or expression of genomes • All transcripts an organism makes at any given time is an organism’s transcriptome • Use of genomic information to block expression systematically is called genomic functional profiling • Study of structures and functions of the protein products of genomes is proteomics 24-39 Transcriptomics • This area is the study of all transcripts an organism makes at any given time • Create DNA microarrays and microchips that hold 1000s of cDNAs or oligos – Hybridize labeled RNAs from cells to these arrays or chips – Intensity of hybridization at each spot reveals the extent of expression of the corresponding gene • Microarray permits canvassing expression patterns of many genes at once • Clustering of expression of genes in time and space suggest products of these genes collaborate in some process 24-40 Oligonucleotides on a Glass Substrate 24-41 Serial Analysis of Gene Expression • Serial Analysis of Gene Expression (SAGE) allows us to determine: – Which genes are expressed in a given tissue – The extent of that expression • Short tags, characteristic of particular genes, are generated from cDNAs and ligated together between linkers • These ligated tags are then sequenced to determine which genes are expressed and how abundantly 24-42 SAGE 24-43 Whole Chromosome Transcription Mapping • High density whole chromosome transcriptional mapping studies have shown a majority of sequences in cytoplasmic poly(A)RNAs derive from non-exon regions of human chromosomes • Almost half of the transcription from these same chromosomes is nonpolyadenylated • Results indicate that great majority of stable nuclear and cytoplasmic transcripts in these chromosomes come from regions outside exons • Helps to explain the great differences between species whose exons are almost identical 24-44 Transcription Maps 24-45 Genomic Functional Profiling • Genomic functional profiling can be performed in several ways – A type of mutation analysis, deletion analysis mutants created by replacing genes one at a time with antibiotic resistance gene flanked by oligomers serving as barcode for that mutant – A functional profile can be obtained by growing the whole group of mutants together under various conditions to see which mutants disappear most rapidly 24-46 RNAi Analysis • Another means of genomic functional analysis on complex organisms can be done by inactivating genes via RNAi • An application of this approach targeting the genes involved in early embryogenesis in C. elegans has identified: – 661 important genes – 326 are involved in embryogenesis 24-47 Tissue-Specific Functional Profiling • Tissue-specific expression profiling can be done by examining spectrum of mRNAs whose levels are decreased by an exogenous miRNA • Then compare to the spectrum of expression of genes at the mRNA level in various tissues • If that miRNA causes decrease in levels of mRNAs naturally low in cells expressing the miRNA – Suggests that the miRNA is at least a partial cause of those natural low levels • This type of analysis has implicated – miR-124 in destabilizing mRNAs in brain tissue – miR-1 in destabilizing mRNAs in muscle tissue 24-48 Locating Target Sites for Transcription Factors • Chromatin immunoprecipitation followed by DNA microarray analysis can be used to identify DNAbinding sites for activators and other proteins • Small genome organisms - all of the intergenic regions can be included in the microarray • If genome is large, that is not practical • To narrow areas of interest can use CpG islands – These are associated with gene control regions – If timing/conditions of activator’s activity are known, control regions of genes known to be activated at those times, or under those conditions, can be used 24-49 In Situ Expression Analysis • The mouse can be used as a human surrogate in large-scale expression studies that would be ethically impossible to perform on humans • Scientists have studied the expression of almost all the mouse orthologs of the genes on human chromosome 21 – Expression followed through various stages of embryonic development – Catalogued the embryonic tissues in which these genes are expressed 24-50 Single-Nucleotide Polymorphisms • Single-nucleotide polymorphisms can probably account for many genetic conditions caused by single genes and even some by multiple genes • Might be able to predict response to a drug • Haplotype map with over 1 million SNPs makes it easier to sort out important SNPs from those with no effect 24-51 Structural Variation • Structural variation is a prominent source of variation in human genomes – – – – Insertions Deletions Inversions Rearrangements of DNA chunks • Some structural variation can in principle predispose certain people to contract diseases – Some variation is presumably benign – Some also is demonstrably beneficial 24-52 24.4 Proteomics • The sum of all proteins produced by an organism is its proteosome • Study of these proteins, even smaller subsets, is called proteomics • Such studies give a more accurate picture of gene expression than transcriptomics studies do 24-53 Protein Separations and Analysis • Current research in proteomics requires first that proteins be resolved, sometimes on a massive scale – Best tool for separation of many proteins at once is 2D gel electrophoresis • After separation, proteins must be identified – Best method of identification involves digestion of proteins one by one with proteases – Then identify the peptides by mass spectrometry • In the future, microchips with antibodies attached may allow analysis of proteins in complex mixtures without separation 24-54 MALDI-TOF Mass Spectrometry 24-55 Detecting Protein-Protein Interactions 24-56 Protein Interactions • Most proteins work with other proteins to perform their functions • Several techniques are available to probe these interactions • Yeast two-hybrid analysis has been used for some time, now other methods are available – Protein microarrays – Immunoaffinity chromatography with mass spectrometry – Other combinations 24-57 24.5 Bioinformatics • Bioinformatics involves the building and use of biological databases – Some of these databases contain the DNA sequences of genomes – Essential for mining the massive amounts of biological data for meaningful knowledge about gene structure and expression 24-58 Finding Regulatory Motifs in Mammalian Genomes Using computational biology techniques, Lander and Kellis have discovered highly conserved sequence motifs in 4 mammalian species, including humans: – In the promoter regions, these motifs probably represent binding sites for transcription factors – 3’-UTRs motifs probably represent binding sites for miRNAs 24-59 Using the Databases • The National Center for Biological Information (NCBI) website contains a vast store of biological information, including genomic and proteomic data • Start with a sequence and discover gene to which it belongs, then compare that sequence with that of similar genes • Query the database with a topic for information • View structures of protein in 3D by rotating the structure on your computer screen 24-60