Download Exercise 5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SNP genotyping wikipedia , lookup

History of genetic engineering wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Epigenomics wikipedia , lookup

Transposable element wikipedia , lookup

Genomic imprinting wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genetic code wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

RNA interference wikipedia , lookup

Genome evolution wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Molecular cloning wikipedia , lookup

Designer baby wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Microevolution wikipedia , lookup

RNA world wikipedia , lookup

Point mutation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Messenger RNA wikipedia , lookup

Gene wikipedia , lookup

Pathogenomics wikipedia , lookup

Microsatellite wikipedia , lookup

RNA wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Human genome wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Polyadenylation wikipedia , lookup

Nucleic acid tertiary structure wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Non-coding DNA wikipedia , lookup

RNA silencing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

History of RNA biology wikipedia , lookup

Genome editing wikipedia , lookup

Epitranscriptome wikipedia , lookup

Metagenomics wikipedia , lookup

Non-coding RNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genomics wikipedia , lookup

Primary transcript wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomic library wikipedia , lookup

Transcript
Chemistry 256
Name:
Exercise 5: A research project in biochemistry
In the winter of 1982, I had the good fortune to work as part of Eric Davidson’s
molecular biology research group at Caltech. Through the subsequent months, under the
tutelage of one of the postdocs in the group, Howard Jacobs (now Director of the Institute
of Biotechnology in Helsinki), I was able to participate in the research problem below.
Reading over this material from nearly three decades ago makes me wish that I had
known the material of this course, Chemistry 256, much better before starting the project.
The following questions are designed to have you figure out what motivated that part of
the research and what we have found out since 1982.
Introduction (for a summer research proposal, submitted by T. Furutani, May, 1982)
Maternal RNA (mtRNA) is the term that describes all of the RNA present in the sea
urchin (Strongylocentrotus purpuratus) egg. A large proportion of this RNA has
properties that distinguish it from messenger RNA (mRNA). For instance, mtRNA is far
longer (typically 5 to 10 kilobases) than conventional mRNAs, and the same piece of
single-copy genomic DNA gives rise to several different maternal transcripts.
Furthermore, this maternal RNA also includes many interspersed genomic repeat
sequences (sequences of nucleotides which occur many times in the genome) covalently
linked to regions of single-copy sequence. The transcribed repeats are found in
embryonic nuclear RNAs but not in embryonic polysomal mRNAs. These interspersed
transcripts contain almost all the different types of single-copy sequence represented in
maternal RNA.
We want to know the relationship of this class of maternal RNAs to the genes
from which they are transcribed, and to the corresponding functional mRNAs from which
cellular proteins are translated. At least some of this maternal RNA cannot be translated
by polysomes as a message for proteins: translational stop signals have been found in all
frames in repeat and single-copy portions of maternal transcripts. In such molecules, the
actual message may be interspersed with nonsensical sequences, so to form coding
messages from them, some process (such as splicing parts of the RNA structure together,
or trimming off sequences at the 5’ end) must occur during development to make the
message translatable.
By studying the structure of mtRNA, we can see how nonsense sequences and
potentially functional sequences are arranged on it.
Question 1: Since 1982, what would be another viable hypothesis for the existence of the
“nonsense” sequences?
SpP154 is a gene of S. purpuratus. This gene gives rise to multiple transcripts in
mtRNA even though it is represented only once in the sea urchin genome. We know that
this gene gives rise to three major maternal transcripts of 7500, 1600 and 1400
nucleotides in length. Thus, SpP154 is a good model to study developmental mechanisms
in sea urchin mtRNA.
Question 2: What is the reason for having three RNA copies of the same portion of the
genomic DNA made?
The gene SpP154 had been derived from a complementary DNA (cDNA) clone
found in a pluteus stage embryo cDNA library. λ154A and λ154B are cloned segments of
sea urchin genomic DNA that contain the 3’ end of the gene; these segments had been
isolated by screening a genomic lambda phage “library” using SpP154 as a probe.
Question 3: Briefly describe this “screening” process. Hint: it will involve using the
radioactive isotope 32P. See page 64 in the text.
The 5’ end of the gene is beyond the end of λ154B. In order to isolate the 5’ end,
we carried out further screenings of other phage and cosmid libraries which revealed only
tentative positive clones. A genomic library is a set of clones constructed by ligating
digested or partially digested genomic DNA into a phage or cosmid vector. A sufficient
number of recombinants were screened such that there would be a high probability of
finding any given single-copy fragment. The failure to find a clone containing the 5’ end
of the SpP154 gene may be due to the fact that for various reasons, some DNA sequences
are cloned less efficiently than others.
Question 4: What’s a “cosmid”? It’s not mentioned in the text.
Your project (a message from H. Jacobs to T. Furutani, April, 1982)
Your project will be to generate as much as possible of the primary sequence of the 7.5
kb transcript – using these cDNA clones as source material. These cDNA clones will be
thoroughly mapped for restriction endonuclease sites by the time you start work: specific
(and overlapping) restriction fragments from the cDNA clones will be subcloned in the
M13 phage vectors mp8 and mp9. These permit the cloning of each fragment
(asymmetric because it has two different restriction sites at its two ends!) in BOTH
orientations. Thus, when ssDNA is synthesised in infected cells, these two vectors allow
production of each of the two strands of any given fragment, and hence allow it to be
sequenced in BOTH DIRECTIONS (necessary to be sure of the sequence).
Sequencing technology
The ss phage recombinant DNAs are sequenced by primer extension, in the presence of
(4 different reactions) low concentrations of the chain terminating nucleotide analogues,
the DIDEOXYNUCLEOTIDE TRIPHOSPHATES. Chains synthesised in the presence
of ddATP, ddCTP, ddGTP and ddTTP respectively will contain the population of chains
which terminate at a given nucleotide (A, C, G and T). By sizing these chains we can
infer the normal positions of each of the four residues in the sequence.
(insert circular DNA sketch here)
The products of the reaction are analysed on 5% polyacrylamide urea gels which allow
resolution of chains 1–250 nt long at the 1 nt level.
(insert sample sequencing gel here)
Bands are detected by AUTORADIOGRAPHY (we include some 32P labelled dATP in
the reaction).
For the extreme 5’ end of the transcript, we may need to use the genomic copy of the
pP154 gene as source material. This is because full length cDNA clones (going right to
the 5’ end of the corresponding RNA) are a rarity.
Question 5: Wait, why is a full-length cDNA clone such a “rarity”? What about the
technique of constructing a cDNA library makes a full-length clone difficult?
For this we have available, from the S.U. [sea urchin] genomic library, clones in phage
lambda which cover the entire region of the transcript. We can detect where the 5’ end of
the 7.5 kb transcript maps by blotting RNA and using restriction fragments from the λ
clones as tracers.
What the sequence information will tell us
1. Is there an extended open reading frame somewhere near the 5’ end of the transcript
(i.e., which could translate to give a polypeptide)?
2. Are regions of open reading frame interrupted by regions containing stop signals (i.e.,
does the transcript have the structure of a pre-spliced precursor to mRNA, from which
intervening sequences have not yet been removed)?
3. Does the IMPLIED amino acid sequence bear any relation to any known protein
sequences (by computer search)?
4. What is the internal LOCATION and STRUCTURE (including translatability) of the
repeat elements?
Summary of results October 1982 (written by H. Jacobs, in preparation of a manuscript
submitted to Journal of Molecular Biology)
1. Maxam-Gilbert sequencing of 3’-most fragment of SpP154 (cDNA) and of
corresponding fragment from genomic subclone pλ154RH2:
a. SpP154 sequence with respect to previous (Sanger) data – several changes of
nucleotide assignment – at all such positions M-G sequence is UNAMBIGUOUS. No
frameshifts, so previous assumptions about reading frames were correct.
b. Sequence of this fragment from EcoRI through AluI and poly-(A) tail into vector
(HaeIII site) shows:
• only 6 nucleotides of sequence beyond AluI before poly-(A) tail.
• no classical poly-(A) addition signal, therefore most likely the cDNA was internally
primed from an oligo-(A) sequence.
• canonical splice acceptor (Py)nTXCAG appears at EcoRI + 155 nt: TGCAG; other AGs
at EcoRI + 169 (AluI site), EcoRI + 127, 88, 75, 69, 25 and 19, all unlikely to be
involved in generation of 1.4 kb transcript on basis of RNA blots.
• if this splice is functional, the mRNA generated is blocked in 2 frames, therefore either
is in untranslated region, or defines a unique polypeptide LSELIK(K) assuming A6 is
encoded.
Question 6: What is the purpose of the RNA having a poly-(A) tail?
Question 7: The “canonical splice acceptor” referred to here; how well does it
correspond with splice sequences shown in figure 26-22 in the text? At which end (5’ or
3’ of the intron) is the splice acceptor?
c. Sequence from genomic clone shows homology except for 5 single base changes in the
putative intron (not significantly above expectation, taken the extent of SC polymorphism
in S. purpuratus).
d. – and one deletion of 61 nt – no obvious reason for such to have occurred during
cloning, so either it’s a bizarre cloning artefact or an even more bizarre genomic
polymorphism. Irrelevant for the time being.
2. Genome blots with 3’ end fragment from SpP154 (ER) and corresponding fragment
from genomic subclone pλ154RH2 (fragments sequenced above):
Individual #7: Both fragments gave identical G blot patterns — as follows:
EcoRI = 1.6 (different from Cyril!)
HindIII = 1.5
BamHI = ≥30, 4.5
BglII = ≥35
PstI = ≥35, 15
SalI = ≥35
RH = 1.15 (=pλ154RH2)
RM, RB, RP, RS = 1.6
Original hypothesis about a 3’ end splice was almost certainly wrong. The gene is single
copy and there is no detectable splice at the 3’ end by genome blotting or sequencing.
3. Gastrula polysomal cDNA library in λgt70 screened with 154/RD probe.
2 positives selected which rescreened (4 did not) = λSpGP154A and λSpGP154B. These
both have an insert of 3-400 nt and are almost certainly clones of each other.
Subcloning proceeding in pUC8 by ligating λSpGP154A or B/BM into pUC8/M —
should insert a 3.3 kb fragment bearing the insert.
Selection by AmpRXgal– and minipreps/R.
4. Screening genomic libraries for 154:
1. #7/r library screened with 154/MH (no positives) and 154 total probe (many horrid
positives, only one rescreened!).
2. #7/u library screened with 154 total probe – no positives.
5. Screening library (Cyril/R) for mt-homology element — 3 positives: λ389B, λ389C
and λ389E — but having difficulty plaque purifying. Grows very poorly.
6. Screening #7/r library for cloned S. purpuratus mtDNA: 3 positives: λmt1, λmt2 and
λmt3. Only λmt1 rescreened but gives invisible plaques. λmt2, λmt3 being rescreened at
high density.
The cDNA sequence
EcoRI site
GAATTCATGA
AACATTGGAG
ATGAGTGGAA
AAAATGTGAT
GAACTTTGGT
TTGTTTTTCT
CTTTTGAAGA
ACAAGAACAA
TTATATAAGT
ATCATAAATC
TGTTATTAAT
TTTGTTTTGA
TATGAAGATG
TGCAGACCTT
CTATTCTAAA
TTTATATTTT
Alu I site
TTATCTGAGC
TCATAAAAAA
AAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAA
Question 8: Find and underline the “stop” signals in each of the three reading frames.
Identify each stop signal by the reading frame number (1, 2 or 3).
Question 9: The outlined area actually shows up in the mRNA transcript; the non-shaded
area is the intron. What amino acid sequence does the mRNA code for?
Question 10: Do I have enough information to answer the first two of Jacob’s questions
on page 4 (and thus satisfy my proposal on the first page)? If so, what are the answers?
References:
E. Davidson, B. Hough-Evans and R. Britten, Molecular Biology of the Sea Urchin
Embryo, Science 217 (1982), 17 – 26. Abstract at:
http://www.sciencemag.org/cgi/content/abstract/217/4554/17
H. Jacobs and B. Grimes, Complete nucleotide sequences of the nuclear pseudogenes for
cytochrome oxidase subunit I and the large mitochondrial ribosomal RNA in the sea
urchin Strongylocentrotus purpuratus, Journal of Molecular Biology 187 (1986), 509
527.