* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Escherichia coli his2
Ancestral sequence reconstruction wikipedia , lookup
Western blot wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Proteolysis wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Transposable element wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Genetic engineering wikipedia , lookup
Gene desert wikipedia , lookup
Gene therapy wikipedia , lookup
Magnesium transporter wikipedia , lookup
Gene nomenclature wikipedia , lookup
Non-coding DNA wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Genomic library wikipedia , lookup
Gene expression profiling wikipedia , lookup
Expression vector wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene regulatory network wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Point mutation wikipedia , lookup
Gene expression wikipedia , lookup
Community fingerprinting wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Two-hybrid screening wikipedia , lookup
7. Understanding a Genome Sequence Learning outcomes When you have read Chapter 7, you should be able to: 1. Describe the strengths and weaknesses of the computational and experimental methods used to analyze genome sequences 2. Describe the basis of open reading frame (ORF) scanning, and explain why this approach is not always successful in locating genes in eukaryotic genomes 3. Outline the various experimental methods used to identify parts of a genome sequence that specify RNA molecules 4. Define the term ‘homology' and explain why homology is important in computer-based studies of gene function 5. Evaluate the limitations of homology analysis, using the yeast genome project as an example 6. Describe the methods used to inactivate individual genes in yeast and mammals, and explain how inactivation can lead to identification of the function of a gene 7. Give outline descriptions of techniques that can be used to obtain more detailed information on the activity of a protein coded by an unknown gene 8. Describe how the transcriptome and proteome are studied 9. Explain how protein interaction maps are constructed and indicate the key features of the yeast map 10. Evaluate the potential and achievements of comparative genomics as a means of understanding a genome sequence 7. Understanding a Genome Sequence 7.1. Locating the Genes in a Genome Sequence 7.2. Determining the Functions of Individual Genes 7.3. Global Studies of Genome Activity 7.4. Comparative Genomics 7.1. Locating the Genes in a Genome Sequence Figure 7.1. A double-stranded DNA molecule has six reading frames. Both strands are read in the 5′→3′ direction. Each strand has three reading frames, depending on which nucleotide is chosen as the starting position. Figure 7.2. ORF scanning is an effective way of locating genes in a bacterial genome. The diagram shows 4522 bp of the lactose operon of Escherichia coli with all ORFs longer than 50 codons marked. The sequence contains two real genes - lacZ and lacY indicated by the red lines. These real genes cannot be mistaken because they are much longer than the spurious ORFs, shown in blue. See Figure 2.20A for the detailed structure of the lactose operon. Figure 7.3. ORF scans are complicated by introns. The nucleotide sequence of a short gene containing a single intron is shown. The correct amino acid sequence of the protein translated from the gene is given immediately below the nucleotide sequence: in this sequence the intron has been left out because it is removed from the transcript before the mRNA is translated into protein. In the lower line, the sequence has been translated without realizing that an intron is present. As a result of this error, the amino acid sequence appears to terminate within the intron. The amino acid sequences have been written using the one-letter abbreviations (see Table 3.1 ). The genetic code was described in Section 3.3.2; introns are covered in detail in Section 10.1.3. Figure 7.4. Northern hybridization. An RNA extract is electrophoresed under denaturing conditions in an agarose gel (see Technical Note 4.4). After ethidium bromide staining, two bands are seen. These are the two largest rRNA molecules (Section 3.2.1) which are abundant in most cells. The smaller rRNAs, which are also abundant, are not seen because they are so short that they run out of the bottom of the gel and, in most cells, none of the mRNAs (the transcripts of proteincoding genes) are abundant enough to form a band visible after ethidium bromide staining. The gel is blotted onto a nylon membrane and, in this example, probed with a radioactively labeled DNA fragment. A single band is visible on the autoradiograph, showing that the DNA fragment used as the probe contains part or all of one transcribed sequence. Figure 7.5. Zoo-blotting. The objective is to determine if a fragment of human DNA hybridizes to DNAs from related species. Samples of human, chimp, cow and rabbit DNAs are therefore prepared, restricted, and electrophoresed in an agarose gel. Southern hybridization is then carried out with a human DNA fragment as the probe. A positive hybridization signal is seen with each of the animal DNAs, suggesting that the human DNA fragment contains an expressed gene. Note that the hybridizing restriction fragments from the cow and rabbit DNAs are smaller than the hybridizing fragments in the human and chimp samples. This indicates that the restriction map around the transcribed sequence is different in cows and rabbits, but does not affect the conclusion that a homologous gene is present in all four species. Figure 7.6. RACE - rapid amplification of cDNA ends. The RNA being studied is converted into a partial cDNA by extension of a DNA primer that anneals at an internal position not too distant from the 5′ end of the molecule. The 3′ end of the cDNA is further extended by treatment with terminal deoxynucleotidyl transferase (Section 4.1.4) in the presence of dATP, which results in a series of As being added to the cDNA. This series of As acts as the annealing site for the anchor primer. Extension of the anchor primer leads to a double-stranded DNA molecule which can now be amplified by a standard PCR. This is 5′-RACE, so-called because it results in amplification of the 5′ end of the starting RNA. A similar method - 3′-RACE - can be used if the 3′ end sequence is desired. Figure 7.7. S1 nuclease mapping. This method of transcript mapping makes use of S1 nuclease, an enzyme that degrades single-stranded DNA or RNA polynucleotides, including single-stranded regions in predominantly double-stranded molecules, but has no effect on double-stranded DNA or on DNARNA hybrids. In the example shown, a restriction fragment that spans the start of a transcription unit is ligated into an M13 vector and the resulting single-stranded DNA hybridized with an RNA preparation. After S1 treatment, the resulting heteroduplex has one end marked by the start of the transcript and the other by the downstream restriction site (R2). The size of the undigested DNA fragment is therefore measured by gel electrophoresis in order to determine the position of the start of the transcription unit relative to the downstream restriction site. Figure 7.8. Exon trapping. The exon-trap vector consists of two exon sequences preceded by promoter sequences - the signals required for gene expression in a eukaryotic host (Section 9.2.2). New DNA containing an unmapped exon is ligated into the vector and the recombinant molecule introduced into the host cell. The resulting RNA transcript is then examined by RT-PCR to identify the boundaries of the unmapped exon. 7.2. Determining the Functions of Individual Genes Figure 7.9. Two DNA sequences with 80% sequence identity Figure 7.10. Lack of homology between two sequences is often more apparent when comparisons are made at the amino acid level. Two nucleotide sequences are shown, with nucleotides that are identical in the two sequences given in red and non-identities given in blue. The two nucleotide sequences are 76% identical, as indicated by the asterisks. This might be taken as evidence that the sequences are homologous. However, when the sequences are translated into amino acids the identity decreases to 28%. Identical amino acids are shown in brown, and non-identities in green. The comparison between the amino acid sequences suggests that the genes are not homologous, and that the similarity at the nucleotide level was fortuitous. The amino acid sequences have been written using the one-letter abbreviations (see Table 3.1 ). Figure 7.11. The tudor domain. The top drawing shows the structure of the Drosophila tudor protein, which contains ten copies of the tudor domain. The domain is also found in a second Drosophila protein, homeless, and in the human A-kinase anchor protein (AKAP149), which plays a role in RNA metabolism. The proteins have dissimilar structures other than the presence of the tudor domains. The activity of each protein involves RNA in one way or another. Figure 7.12. Categories of gene in the yeast genome Figure 7.13. Gene inactivation by homologous recombination. The chromosomal copy of the target gene recombines with a disrupted version of the gene carried by a cloning vector. As a result, the target gene becomes inactivated. For more information on recombination see Section 14.3. Figure 7.14. The use of a yeast deletion cassette. The deletion cassette consists of an antibioticresistance gene preceded by the promoter sequences needed for expression in yeast, and flanked by two restriction sites. The start and end segments of the target gene are inserted into the restriction sites and the vector introduced into yeast cells. Recombination between the gene segments in the vector and the chromosomal copy of the target gene results in disruption of the latter. Cells in which the disruption has occurred are identified because they now express the antibioticresistance gene and so will grow on an agar medium containing geneticin. The gene designation ‘kan r ' is an abbreviation for ‘kanamycin resistance', kanamycin being the family name of the group of antibiotics that include geneticin. Figure 7.15. Artificial induction of transposition Recombinant DNA techniques have been used to place a promoter sequence (Section 3.2.2) that is responsive to galactose upstream of a Ty1 element in the yeast genome. When galactose is absent, the Ty1 element is not transcribed and so remains quiescent. When the cells are transferred to a culture medium containing galactose, the promoter is activated and the Ty1 element is transcribed, initiating the transposition process (Smith et al., 1995). For more information on activation of eukaryotic promoters, see Box 9.6 and for details of the retrotransposition process see Section 14.3.3. Figure 7.16. RNA interference. The double-stranded RNA molecule is broken down by the Dicer ribonuclease into ‘short interfering RNAs' (siRNAs) of 21–25 bp in length. One strand of each siRNA base pairs to the target mRNA, which is then degraded by the RDE-1 nuclease. For more details on RNA interference, see Section 10.4.2. Figure 7.17. Fusion with liposomes can be used to deliver double-stranded RNA into a human cell. Figure 7.18. Functional analysis by gene overexpression. The objective is to determine if overexpression of the gene being studied has an effect on the phenotype of a transgenic mouse. A cDNA of the gene is therefore inserted into a cloning vector carrying a highly active promoter sequence that directs expression of the cloned gene in mouse liver cells. The cDNA is used rather than the genomic copy of the gene because the former does not contain introns and so is shorter and easier to manipulate in the test tube. Figure 7.19. Two-step gene replacement. See the text for details Figure 7.20. A reporter gene. The open reading frame of the reporter gene replaces the open reading frame of the gene being studied. The result is that the reporter gene is placed under control of the regulatory sequences that usually dictate the expression pattern of the test gene. For more information on these regulatory sequences, see Sections 9.2 and 9.3. Note that the reporter gene strategy assumes that the important regulatory sequences do indeed lie upstream of the gene. This is not always the case for eukaryotic genes. Figure 7.21. Immunocytochemistry. The cell is treated with an antibody that is labeled with a blue fluorescent marker. Examination of the cell shows that the fluorescent signal is associated with the inner mitochondrial membrane. A working hypothesis would therefore be that the target protein is involved in electron transport and oxidative phosphorylation, as these are the main biochemical functions of the inner mitochondrial membrane. Technical Note 7.1. Site-directed mutagenesis 7.3. Global Studies of Genome Activity Figure 7.22. SAGE. See the text for details. In this example, the first restriction enzyme to be used is Alu I, which recognizes the 4-bp target site 5′-AGCT3′ (see Table 4.3 ). The oligonucleotide that is ligated to the cDNA contains the recognition sequence for Bsm FI, which cuts 10–14 nucleotides downstream, and so cleaves off a fragment of the cDNA. Fragments of different cDNAs are ligated to produce the concatamer that is sequenced. Using this method, the concatamer that is formed is made up partly of sequences derived from the Bsm FI oligonucleotides. To avoid this, and so obtain a concatamer made up entirely of cDNA fragments, the oligonucleotide can be designed so that the end that ligates to the cDNA contains the recognition sequence for a third restriction enzyme. Treatment with this enzyme cleaves the oligonucleotide from the cDNA fragment. Figure 7.23. Transcriptome analysis. (A) Transcriptome analysis with a DNA chip carrying oligonucleotides representing all the genes in a small genome. After adding labeled cDNA, the positions of the hybridization signals on the chip indicate which genes have contributed to the transcriptome under study. (B) With a larger genome, cDNA clones prepared from the transcriptome of one tissue are immobilized as a microarray and probed with cDNAs representing the same or a different transcriptome. By comparing the hybridization patterns, genes that are expressed differently in the tissues from which the transcriptomes are obtained can be identified. Figure 7.24. Studying a proteome by two-dimensional gel electrophoresis followed by MALDI-TOF. (A) After two-dimensional gel electrophoresis a protein of interest is excised from the gel and digested with a protease such as trypsin, which cuts immediately after arginine or lysine amino acids. This cleaves the protein into a series of peptides which can be analyzed by MALDI-TOF. (B) In the mass spectrometer the peptides are ionized by a pulse of energy from a laser and then accelerated down the column to the reflector and onto the detector. The time of flight of each peptide depends on its mass-to-charge ratio. The data are visualized as a spectrum (C). The computer contains a database of the predicted molecular weights of every trypsin fragment of every protein encoded by the genome of the organism under study. The computer compares the masses of the detected peptides with the database and identifies the most likely source protein. Figure 7.25. Phage display. (A) The cloning vector used for phage display is a bacteriophage genome with a unique restriction site located within a gene for a coat protein. The technique was originally carried out with the gene III coat protein of the filamentous phage called f1, but has now been extended to other phages including λ. To create a display phage, the DNA sequence coding for the test protein is ligated into the restriction site so that a fused reading frame is produced - one in which the series of codons continues unbroken from the coat protein gene into the test gene. After transformation of Escherichia coli, this recombinant molecule directs synthesis of a hybrid protein made up of the test protein fused to the coat protein. Phage particles produced by these transformed bacteria therefore display the test protein in their coats. (B) Using a phage display library. The test protein is immobilized within a well of a microtiter tray and the phage display library added. After washing, the phages that are retained in the well are those displaying a protein that interacts with the test protein. Figure 7.26. The yeast two-hybrid system. (A) On the left, a gene for a human protein has been ligated to the gene for the DNA-binding domain of a yeast activator. After transformation of yeast, this construct specifies a fusion protein, part human protein and part yeast activator. On the right, various human DNA fragments have been ligated to the gene for the activation domain of the activator: these constructs specify a variety of fusion proteins. (B) The two sets of constructs are mixed and cotransformed into yeast. A colony in which the reporter gene is expressed contains fusion proteins whose human segments interact, thereby bringing the DNA-binding and activation domains into proximity and stimulating the RNA polymerase. See Section 9.3.2 for more information on activators. Figure 7.27. Using homology analysis to deduce protein-protein interactions. The 5′ region of the yeast HIS2 gene is homologous to Escherichia coli his2, and the 3′ region is homologous to E. coli his10. Figure 7.28. The yeast protein interaction map. Each dot represents a protein, with connecting lines indicating interactions between pairs of proteins. Red dots are essential proteins: an inactivating mutation in the gene for one of these proteins is lethal. Mutations in the genes for proteins indicated by green dots are non-lethal; mutations in genes for proteins shown in orange lead to slow growth. The effects of mutation in genes for proteins shown as yellow dots are not known. From Jeong et al., Nature, 411, 41–42. Copyright 2001 Macmillan Magazines Limited 7.4. Comparative Genomics