* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Exercise 5
SNP genotyping wikipedia , lookup
History of genetic engineering wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Epigenomics wikipedia , lookup
Transposable element wikipedia , lookup
Genomic imprinting wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genetic code wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
RNA interference wikipedia , lookup
Genome evolution wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Molecular cloning wikipedia , lookup
Designer baby wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Microevolution wikipedia , lookup
Point mutation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Messenger RNA wikipedia , lookup
Pathogenomics wikipedia , lookup
Microsatellite wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Human genome wikipedia , lookup
Molecular Inversion Probe wikipedia , lookup
Polyadenylation wikipedia , lookup
Nucleic acid tertiary structure wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Non-coding DNA wikipedia , lookup
RNA silencing wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
History of RNA biology wikipedia , lookup
Genome editing wikipedia , lookup
Epitranscriptome wikipedia , lookup
Metagenomics wikipedia , lookup
Non-coding RNA wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Helitron (biology) wikipedia , lookup
Chemistry 256 Name: Exercise 5: A research project in biochemistry In the winter of 1982, I had the good fortune to work as part of Eric Davidson’s molecular biology research group at Caltech. Through the subsequent months, under the tutelage of one of the postdocs in the group, Howard Jacobs (now Director of the Institute of Biotechnology in Helsinki), I was able to participate in the research problem below. Reading over this material from nearly three decades ago makes me wish that I had known the material of this course, Chemistry 256, much better before starting the project. The following questions are designed to have you figure out what motivated that part of the research and what we have found out since 1982. Introduction (for a summer research proposal, submitted by T. Furutani, May, 1982) Maternal RNA (mtRNA) is the term that describes all of the RNA present in the sea urchin (Strongylocentrotus purpuratus) egg. A large proportion of this RNA has properties that distinguish it from messenger RNA (mRNA). For instance, mtRNA is far longer (typically 5 to 10 kilobases) than conventional mRNAs, and the same piece of single-copy genomic DNA gives rise to several different maternal transcripts. Furthermore, this maternal RNA also includes many interspersed genomic repeat sequences (sequences of nucleotides which occur many times in the genome) covalently linked to regions of single-copy sequence. The transcribed repeats are found in embryonic nuclear RNAs but not in embryonic polysomal mRNAs. These interspersed transcripts contain almost all the different types of single-copy sequence represented in maternal RNA. We want to know the relationship of this class of maternal RNAs to the genes from which they are transcribed, and to the corresponding functional mRNAs from which cellular proteins are translated. At least some of this maternal RNA cannot be translated by polysomes as a message for proteins: translational stop signals have been found in all frames in repeat and single-copy portions of maternal transcripts. In such molecules, the actual message may be interspersed with nonsensical sequences, so to form coding messages from them, some process (such as splicing parts of the RNA structure together, or trimming off sequences at the 5’ end) must occur during development to make the message translatable. By studying the structure of mtRNA, we can see how nonsense sequences and potentially functional sequences are arranged on it. Question 1: Since 1982, what would be another viable hypothesis for the existence of the “nonsense” sequences? SpP154 is a gene of S. purpuratus. This gene gives rise to multiple transcripts in mtRNA even though it is represented only once in the sea urchin genome. We know that this gene gives rise to three major maternal transcripts of 7500, 1600 and 1400 nucleotides in length. Thus, SpP154 is a good model to study developmental mechanisms in sea urchin mtRNA. Question 2: What is the reason for having three RNA copies of the same portion of the genomic DNA made? The gene SpP154 had been derived from a complementary DNA (cDNA) clone found in a pluteus stage embryo cDNA library. λ154A and λ154B are cloned segments of sea urchin genomic DNA that contain the 3’ end of the gene; these segments had been isolated by screening a genomic lambda phage “library” using SpP154 as a probe. Question 3: Briefly describe this “screening” process. Hint: it will involve using the radioactive isotope 32P. See page 64 in the text. The 5’ end of the gene is beyond the end of λ154B. In order to isolate the 5’ end, we carried out further screenings of other phage and cosmid libraries which revealed only tentative positive clones. A genomic library is a set of clones constructed by ligating digested or partially digested genomic DNA into a phage or cosmid vector. A sufficient number of recombinants were screened such that there would be a high probability of finding any given single-copy fragment. The failure to find a clone containing the 5’ end of the SpP154 gene may be due to the fact that for various reasons, some DNA sequences are cloned less efficiently than others. Question 4: What’s a “cosmid”? It’s not mentioned in the text. Your project (a message from H. Jacobs to T. Furutani, April, 1982) Your project will be to generate as much as possible of the primary sequence of the 7.5 kb transcript – using these cDNA clones as source material. These cDNA clones will be thoroughly mapped for restriction endonuclease sites by the time you start work: specific (and overlapping) restriction fragments from the cDNA clones will be subcloned in the M13 phage vectors mp8 and mp9. These permit the cloning of each fragment (asymmetric because it has two different restriction sites at its two ends!) in BOTH orientations. Thus, when ssDNA is synthesised in infected cells, these two vectors allow production of each of the two strands of any given fragment, and hence allow it to be sequenced in BOTH DIRECTIONS (necessary to be sure of the sequence). Sequencing technology The ss phage recombinant DNAs are sequenced by primer extension, in the presence of (4 different reactions) low concentrations of the chain terminating nucleotide analogues, the DIDEOXYNUCLEOTIDE TRIPHOSPHATES. Chains synthesised in the presence of ddATP, ddCTP, ddGTP and ddTTP respectively will contain the population of chains which terminate at a given nucleotide (A, C, G and T). By sizing these chains we can infer the normal positions of each of the four residues in the sequence. (insert circular DNA sketch here) The products of the reaction are analysed on 5% polyacrylamide urea gels which allow resolution of chains 1–250 nt long at the 1 nt level. (insert sample sequencing gel here) Bands are detected by AUTORADIOGRAPHY (we include some 32P labelled dATP in the reaction). For the extreme 5’ end of the transcript, we may need to use the genomic copy of the pP154 gene as source material. This is because full length cDNA clones (going right to the 5’ end of the corresponding RNA) are a rarity. Question 5: Wait, why is a full-length cDNA clone such a “rarity”? What about the technique of constructing a cDNA library makes a full-length clone difficult? For this we have available, from the S.U. [sea urchin] genomic library, clones in phage lambda which cover the entire region of the transcript. We can detect where the 5’ end of the 7.5 kb transcript maps by blotting RNA and using restriction fragments from the λ clones as tracers. What the sequence information will tell us 1. Is there an extended open reading frame somewhere near the 5’ end of the transcript (i.e., which could translate to give a polypeptide)? 2. Are regions of open reading frame interrupted by regions containing stop signals (i.e., does the transcript have the structure of a pre-spliced precursor to mRNA, from which intervening sequences have not yet been removed)? 3. Does the IMPLIED amino acid sequence bear any relation to any known protein sequences (by computer search)? 4. What is the internal LOCATION and STRUCTURE (including translatability) of the repeat elements? Summary of results October 1982 (written by H. Jacobs, in preparation of a manuscript submitted to Journal of Molecular Biology) 1. Maxam-Gilbert sequencing of 3’-most fragment of SpP154 (cDNA) and of corresponding fragment from genomic subclone pλ154RH2: a. SpP154 sequence with respect to previous (Sanger) data – several changes of nucleotide assignment – at all such positions M-G sequence is UNAMBIGUOUS. No frameshifts, so previous assumptions about reading frames were correct. b. Sequence of this fragment from EcoRI through AluI and poly-(A) tail into vector (HaeIII site) shows: • only 6 nucleotides of sequence beyond AluI before poly-(A) tail. • no classical poly-(A) addition signal, therefore most likely the cDNA was internally primed from an oligo-(A) sequence. • canonical splice acceptor (Py)nTXCAG appears at EcoRI + 155 nt: TGCAG; other AGs at EcoRI + 169 (AluI site), EcoRI + 127, 88, 75, 69, 25 and 19, all unlikely to be involved in generation of 1.4 kb transcript on basis of RNA blots. • if this splice is functional, the mRNA generated is blocked in 2 frames, therefore either is in untranslated region, or defines a unique polypeptide LSELIK(K) assuming A6 is encoded. Question 6: What is the purpose of the RNA having a poly-(A) tail? Question 7: The “canonical splice acceptor” referred to here; how well does it correspond with splice sequences shown in figure 26-22 in the text? At which end (5’ or 3’ of the intron) is the splice acceptor? c. Sequence from genomic clone shows homology except for 5 single base changes in the putative intron (not significantly above expectation, taken the extent of SC polymorphism in S. purpuratus). d. – and one deletion of 61 nt – no obvious reason for such to have occurred during cloning, so either it’s a bizarre cloning artefact or an even more bizarre genomic polymorphism. Irrelevant for the time being. 2. Genome blots with 3’ end fragment from SpP154 (ER) and corresponding fragment from genomic subclone pλ154RH2 (fragments sequenced above): Individual #7: Both fragments gave identical G blot patterns — as follows: EcoRI = 1.6 (different from Cyril!) HindIII = 1.5 BamHI = ≥30, 4.5 BglII = ≥35 PstI = ≥35, 15 SalI = ≥35 RH = 1.15 (=pλ154RH2) RM, RB, RP, RS = 1.6 Original hypothesis about a 3’ end splice was almost certainly wrong. The gene is single copy and there is no detectable splice at the 3’ end by genome blotting or sequencing. 3. Gastrula polysomal cDNA library in λgt70 screened with 154/RD probe. 2 positives selected which rescreened (4 did not) = λSpGP154A and λSpGP154B. These both have an insert of 3-400 nt and are almost certainly clones of each other. Subcloning proceeding in pUC8 by ligating λSpGP154A or B/BM into pUC8/M — should insert a 3.3 kb fragment bearing the insert. Selection by AmpRXgal– and minipreps/R. 4. Screening genomic libraries for 154: 1. #7/r library screened with 154/MH (no positives) and 154 total probe (many horrid positives, only one rescreened!). 2. #7/u library screened with 154 total probe – no positives. 5. Screening library (Cyril/R) for mt-homology element — 3 positives: λ389B, λ389C and λ389E — but having difficulty plaque purifying. Grows very poorly. 6. Screening #7/r library for cloned S. purpuratus mtDNA: 3 positives: λmt1, λmt2 and λmt3. Only λmt1 rescreened but gives invisible plaques. λmt2, λmt3 being rescreened at high density. The cDNA sequence EcoRI site GAATTCATGA AACATTGGAG ATGAGTGGAA AAAATGTGAT GAACTTTGGT TTGTTTTTCT CTTTTGAAGA ACAAGAACAA TTATATAAGT ATCATAAATC TGTTATTAAT TTTGTTTTGA TATGAAGATG TGCAGACCTT CTATTCTAAA TTTATATTTT Alu I site TTATCTGAGC TCATAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA Question 8: Find and underline the “stop” signals in each of the three reading frames. Identify each stop signal by the reading frame number (1, 2 or 3). Question 9: The outlined area actually shows up in the mRNA transcript; the non-shaded area is the intron. What amino acid sequence does the mRNA code for? Question 10: Do I have enough information to answer the first two of Jacob’s questions on page 4 (and thus satisfy my proposal on the first page)? If so, what are the answers? References: E. Davidson, B. Hough-Evans and R. Britten, Molecular Biology of the Sea Urchin Embryo, Science 217 (1982), 17 – 26. Abstract at: http://www.sciencemag.org/cgi/content/abstract/217/4554/17 H. Jacobs and B. Grimes, Complete nucleotide sequences of the nuclear pseudogenes for cytochrome oxidase subunit I and the large mitochondrial ribosomal RNA in the sea urchin Strongylocentrotus purpuratus, Journal of Molecular Biology 187 (1986), 509 527.