* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Genomes 3/e
X-inactivation wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Quantitative trait locus wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Essential gene wikipedia , lookup
Copy-number variation wikipedia , lookup
Point mutation wikipedia , lookup
Oncogenomics wikipedia , lookup
Gene therapy wikipedia , lookup
Genetic engineering wikipedia , lookup
Transposable element wikipedia , lookup
Genomic library wikipedia , lookup
Non-coding DNA wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene nomenclature wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Human genome wikipedia , lookup
Ridge (biology) wikipedia , lookup
Public health genomics wikipedia , lookup
Gene desert wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenetics of human development wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene expression programming wikipedia , lookup
Metagenomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome (book) wikipedia , lookup
Genome editing wikipedia , lookup
Pathogenomics wikipedia , lookup
Minimal genome wikipedia , lookup
Gene expression profiling wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Helitron (biology) wikipedia , lookup
Chapter 5: Understanding a Genome Sequence Copyright © Garland Science 2007 Understanding Genome Sequence • Most important step of genomics • Genome annotation techniques 5-1. Locate genes 5-2. Function annotation 5-3. An example: yeast genome (Will cover microbial genomes later) Understanding Genome Sequence 5-1. Locate genes by in silico analysis or experimental techniques Figure 5.1 Genomes 3 (© Garland Science 2007) Open reading frame scanning finds start & stop codons spanning >100 codons (theoretical frequency is 1 stop codon per 50 codons but average 317 in E. coli & 450 in human) The coding regions of genes are ORFs; both strands can be coding strand; therefore 6 possibilities for a given dsDNA sequence. Figure 5.2 Genomes 3 (© Garland Science 2007) ORF scanning is effective (if not completely accurate) for bacterial genome. E.g. E. coli lactose operon below. Red is the real gene, yellow is the predicted ORF. Figure 5.3 Genomes 3 (© Garland Science 2007) ORF scanning is not optimal for large eukaryotic genomes, because: Unlike bacteria (11% intergenic in E. coli), eukaryotic genes are widely spaced by noncoding regions (62% in human) Unlike bacteria, eukaryotic genes are not continuous (split by introns) & sometimes overlap ORF scans are complicated by introns. Line 2 is the real amino acid sequence. Intron is excised during mRNA modification. Line 3 is the predicted amino acid sequence w/o consideration of intron, shorter than it really is. Figure 5.4 Genomes 3 (© Garland Science 2007) ORF scanning can be improved for eukaryotic genomes by considering: Codon bias: Not all codons are used equally (not fully understood why but helpful for ORF search) Exon-intron boundaries: upstream GT & downstream AG (they are consensus but not always the case) Upstream regulatory sequences: distinctive sequence features (but can be variable) to identify where genes begin Upstream consensus where eukaryotic genes usually start Figure 5.5 Genomes 3 (© Garland Science 2007) 5-1. Locate genes by in silico analysis Search functional RNA (rRNA & tRNA) not encoded by ORF But have very distinctive features to form stemloop structures (intramolecular base pairing) Figure 5.6a Genomes 3 (© Garland Science 2007) Homology search & comparative genomics help gene location. Evolutionarily related genes share homologous regions in coding sequences. Figure 5.8 Genomes 3 (© Garland Science 2007) Homology search & comparative genomics help gene location (Cont.) Locate a gene by comparison of closely related genomes (e.g. within the same species). Figure 5.9 Genomes 3 (© Garland Science 2007) Computer-assisted genome annotation. All-in-one: Scan ORFs + exon-intron boundaries, upstream regulatory sequences, homology test, cDNA search, etc. (below: 15-Kb human genome by Genotator) Figure 5.10 Genomes 3 (© Garland Science 2007) 5-1. Locate genes by experimental techniques Detection of RNA transcribed from genes by northern hybrid detects if a DNA fragment contains transcribed sequences Note: 1 gene can give >2 transcripts w/ different lengths; some genes not expressed under certain conditions Figure 5.11 Genomes 3 (© Garland Science 2007) 5-1. Locate genes by experimental techniques Northern blotting gives no gene positional info. Therefore, need cDNA sequencing which can map genes (find exon-intron boundaries) in DNA fragments cDNA=mRNA copy =leader+gene+tailer 1. Construct a cDNA library (containing all expressed genes) 2. Use the target DNA fragment to hybrid with cDNA library 3. Repeat hybrid for multiple times (for those poorly expressed genes, called ”cDNA capture”) Accurate cDNA sequencing depends on reverse transcription of a complete mRNA Truncated cDNA always happens (lack of complete synthesis of gene 5’) Precise mapping the 5’ end of transcripts by Rapid Amplification of cDNA Ends (RACE) 5-1. Locate genes by RACE Purpose: amplify shorter/partial cDNA molecule but cover the complete 5’ end Prerequisite: a basic knowledge of the gene 1. An internal primer anneals close to 5’end 2. RT synthesis cDNA Figure 5.13 part 1 of 2 Genomes 3 (© Garland Science 2007) 5-1. Locate genes by RACE (Cont.) 3. Add poly A tail 4. Anneal anchor primer 5. Continue as a regular PCR 6. Sequence the PCR amplicon End product: a fragment w/5’-end of the mRNA 3’ end can be analyzed in a similar way Figure 5.13 part 2 of 2 Genomes 3 (© Garland Science 2007) 5-1. Locate genes by heteroduplex analysis Purpose: located start/end of a gene based on mRNA Prerequisite: a M13 library clone spanning the gene end is available Use of S1 nuclease to trim dsDNA molecule Figure 5.14 Genomes 3 (© Garland Science 2007) Locate exon boundary by exon trapping Purpose: to find exon boundaries by using an exon-trap vector Followed by PCR & DNA sequencing analysis Figure 5.15 Genomes 3 (© Garland Science 2007) 5-2. Determine the gene functions Genome is sequenced, then putative genes (start+end) are identified, but the work is just started. How these genes function? An example: E. coli K-12 has 4288 genes, only 1853 genes (43%) had been identified in the past >100 years of research; yeast (30%); human (largely unknown) by 2006. Therefore, the most important step is to study of functions of genes, referred as functional genomics 5-2. Determine the gene functions 5-2-1. Computer in silico analysis (mainly by homology search) 5-2-2. Experimental analysis (by gene inactivation or over-expression) 5-2-1. Homology search To what extent, an unknown gene is similar to a known gene from a different organism. Assumption: homologous genes share a common evolutionary ancestor. Two categories: orthologous & paralogous Ancestor predates speciation Figure 5.16 Genomes 3 (© Garland Science 2007) e.g. myoglobin & β–globin duplicated 550 Myr ago 5-2-1. Homology search (Cont.) Convert & align amino acid sequence (not simply DNA sequence) & give a score of identity Plus, consider the relatedness of translated amino acids (e.g. give leucine & isoleucine a higher score than cysteine & tyrosin) BLAST & PSI-BLAST (e.g. below 76% DNA identity vs. 28% amino acid identity) Figure 5.18 Genomes 3 (© Garland Science 2007) 5-2-1. Homology search (Cont.) Identify functional domain is another alternative Genes become different (low similarity) but contain conserved functional domain An example (left): tudor domain is conserved between fruit fly & human (RNA metabolism) Figure 5.19 Genomes 3 (© Garland Science 2007) 5-2-1. Homology search (Cont.) It is surprising by the fact how genetically close we are with the bugs that ferment beers. 5-2-1. Homology search (Cont.) Homology search helps finding functionally conserved genes across genus & studying human disease (e.g. many metabolic genes are conserved in yeast & human ) Table 5.1 Genomes 3 (© Garland Science 2007) 5-2-2. Experimental analysis of gene functions Most genes cannot be in silico compared Need to reverse the process (from genotype to phenotype) Inactivate gene & find out the altered phenotype by homologous recombination Figure 5.20 Genomes 3 (© Garland Science 2007) Inactivate yeast gene by homologous recombination of a deletion cassette (antibiotic resistance marker + two homologous region of the target gene) Figure 5.21 Genomes 3 (© Garland Science 2007) Inactivate mouse gene by homologous recombination of a deletion cassette in embryonic stem cell & screen non-chimeric knockout mouse 5-2-2. Experimental analysis of gene functions (Cont.) Inactivate gene by transposon tagging Most genomes contain transposons. Most quiescent, a few active. (Left) genetically engineered yeast transposon, responsive to an external stimulus (e.g. galactose) Figure 5.22 Genomes 3 (© Garland Science 2007) 5-2-2. Experimental analysis of gene functions (Cont.) Transposon tagging is random & hard to target specific genes Alternative method RNA interference (RNAi) Naturally occurring during gene expression regulation; degrade mRNA instead of gene insertional inactivation Figure 5.23 Genomes 3 (© Garland Science 2007) RNA interference was initially found in bacteriaeating worms Presence of dsRNA in cell prevents protein synthesis & lead to cell death; but 21-22 bp siRNA can circumvent Useful for 8K of 35K human genes but challenge is from in vitro to in vivo 5-2-2. Experimental analysis of gene functions (Cont.) Gene overexpression Instead of making a gene disappear, what about a gene (or its product) is excessively presented in a cell? Multiplies to 40200 copies/ cell Figure 5.24 Genomes 3 (© Garland Science 2007) e.g. high density bones found Sometimes, phenotypic effect of gene inactivation or overexpression is difficult to discern A list of phenotypes needs to be examined for the target organism e.g. mutation of the largest gene in yeast seemed no apparent effect but later found to be low pH intolerant Table 5.2 part 1 of 2 Genomes 3 (© Garland Science 2007) 5-2-2. Experimental analysis of gene functions (Cont.) Only 10% of 19 K genes in C. elegans is found to cause phenotypic changes No discernable phenotypic changes pose the challenge to identify gene functions Table 5.2 part 2 of 2 Genomes 3 (© Garland Science 2007) 5-2-2. Experimental analysis of gene functions (Cont.) Site-directed mutagenesis Useful for replacing a target gene w/ a partially modified gene Two-step homologous recombination; Loss of marker gene phenotype In this case, mutated gene still can be expressed. Figure 5.25 Genomes 3 (© Garland Science 2007) 5-2-2. Experimental analysis of gene functions (Cont.) Gene expression is sometimes restricted to a particular organ & a developmental stage. Reporter genes & immunocytochemistry can help to locate where & when genes are expressed. Figure 5.27 Genomes 3 (© Garland Science 2007) 5-3. Annotation of yeast genome sequence No homolog & no function assigned Genome sequenced completed in 1996. 6274 ORFs identified w/100codon cut-off. Homolog found but no function assigned Figure 5.28 Genomes 3 (© Garland Science 2007) 30% are true genes (previously identified) 5-3. Annotation of yeast genome sequence (Cont.) 100,000 ORFs identified if 15codon cut-off. Above: A short ORF containing a small protein (38 amino acids): predicted eukaryotic homolog of prokaryotic ribosomal protein L36 Find short genes is a huge task & can use: comparative genomics, evidence of transcription, or transposon tagging, etc. 5-3. Annotation of yeast genome sequence (Cont.) In-frame fusion of lacZ gene w/o start codon. If lacZ gene is expressed (detected by X-gal test), a functional gene is present. Figure 5.29 Genomes 3 (© Garland Science 2007) 5-3. Annotation of yeast genome sequence (Cont.) Barcode deletion strategy can be used for high-throughput screening of mutant library. Barcode sequence (20 bp) can be different for each homologous recombination deletion. Figure 5.30 Genomes 3 (© Garland Science 2007) Chapter 5 Summary A variety of methods are used for identification of genes in a genome sequence, including computerbased analysis (e.g. ORF scanning or homology searching) & experimental techniques (cDNA sequencing or transcript mapping). Gene functions can be annotated by computer analysis (e.g. homology searching) & experimental techniques as well (e.g. gene inactivation by transposon, RNA interference, gene overexpression, site-directed homologous recombination, reporter genes, etc).