Download Positional Cloning 08

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression wikipedia , lookup

Ridge (biology) wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Molecular cloning wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene regulatory network wikipedia , lookup

X-inactivation wikipedia , lookup

Gene desert wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Gene expression profiling wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genetic engineering wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene wikipedia , lookup

RNA-Seq wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Community fingerprinting wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Molecular evolution wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Genomic library wikipedia , lookup

Genome evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Positional Cloning
安徽大学生命科学学院
查向东
School of Life Sciences, Anhui University
2007.8
Positional Cloning
Before the genomics era, geneticists seeking
the genes responsible for human genetic
disorders frequently faced a problem: They did
not know the identity of the defective protein, so
they were looking for a gene without knowing its
function. Thus, they had to identify the gene by
finding its position on the human genetic map.
Positional cloning The process that
commences with searching for markers linked
to a particular inherited trait, then using those
markers to identify the approximate location of
the gene responsible for the trait, and then
using various cloning strategies to identify,
isolate and characterize the gene.
The strategy of positional cloning
 begins with the study of a family or families afflicted
with the disorder, with the goal of finding one or more
markers that are tightly linked to the gene causing the
disease. Because the position of the marker is known, the
disease gene can be pinned down to a relatively small
region of the genome. However, that "relatively small"
region usually contains about a million base pairs, so the
job is not over.
 The next step is to search through the million or so
base pairs to find a gene that is the likely culprit. Some
tools have traditionally been used in the search.
1. Searching for Markers
RFLP-花色连锁
RFLP与系统发育
RFLP
Demonstration of RFLPs by Southern blot analysis. A representation of five
different Southern blot genotypes detected with a single probe is shown. The
alleles present in each genotype are distinguished by numbers at the top of
each lane. Each numbered allele in this Figure corresponds to the identically
numbered genomic restriction map shown in Figure 8.3.
Figure 8.3 RFLP generation by different molecular events. Chromosome 1 is
the ancestral state, shown with three TaqI restriction sites (TCGA) and a fourth
non-restriction site (TAGA). Chromosome 2 has undergone a C to T change
(indicated by an asterisk) that destroys one of the restriction sites.
Chromosome 3 has undergone an A to C change that creates a TaqI site.
Chromosome 4 has had a 0.5 kb insertion. The boxed-in region on each
chromosome is the restriction fragment that will be recognized on a Southern
blot probed with the fragment indicated in the diagram.
1cM equivalent to around 500 kb.
Annals of Human Genetics
Vol. 64 Issue 3 Page 255 May 2000
Search for multifactorial disease susceptibility genes in founder populations
C. BOURGAIN, E. GENIN, H. QUESNEVILLE, F. CLERGET-DARPOUX
1cM is roughly equivalent to 1.8Mbp.
Vertebrate Genomes
Insights from Xenopus Genomes
Pollet N, Mazabraud A
Volff J-N (ed): Vertebrate Genomes. Genome Dyn. Basel, Karger, 2006, vol 2, pp 138-153 (DOI:
10.1159/000095101)
Human genome (3 × 109 bp)=1m =106μm
3 ×109 /106=3kb/μm
2 micron (2μm ) plasmid which is 6 kb in length
1μm≌3kb
2m (2 micron) plasmid ;2微米质粒:在某些酵母细胞中存在着一些染色体外
环状DNA分子,周长为2微米(micron),可作为基因工程的载体。
人基因组内存在(CA)n 形式的碱基重复。若父亲
有一n=12重复片段, 突变基因处在(CA) 12的附
近。母亲带正常基因,n=15,n=17。那么后代
若从父亲那里获得遗传疾病基因, 同时会出现
n=12重复片段。
在研究这个问题时,总是寻找尽可能大的有遗传
疾病的家庭,以利追踪基因,在他们的基因组内
寻找碱基重复的特征,只在患病成员身上出现而
不在健康成员身上出现的DNA片段。
20世纪80年代中期,詹姆斯古西拉幸运地在8个
DNA样品中找到了只存在于患者身上的片段。该
片段中一定含有“亨廷顿舞蹈病”基因。但1983年
,对人类基因组上的标记分析还很浅,而那个片
段有太大。10年后才在那个片段上找到了亨廷顿
基因。
全基因组扫描
所谓“全基因扫描策略”是首先扫描出基
因组上有一定密度分布的标记物,再通过
对病人群体与参照群体中标记物的不同分
布的比较,为基因定位。
多基因遗传病
2. gene identification
Three general approaches to gene
identification based on three corresponding
characteristics of mammalian genes: i. the
occurrence of introns in nearly all mammalian
genes; ii. the presence of "CpG" islands at the
5'-ends of most mammalian genes; and iii. the
evolutionary conservation of nearly all
mammalian genes from mice to humans and
sometimes beyond.
Locating and find genes with the
help of the markers
(1) Chromosome walking
(2) finding exons with exon traps;
(3) locating the CpG islands that tend
to be associated with genes.
(1)染色体步查
The strategy of map-based cloning is to find
molecular markers very closely linked to the
gene of interest. Those molecular markers can
serve as the starting point for chromosome
walking or jumping to the gene.
Contig A set of overlapping clones that
provide a physical map of a portion of a
chromosome. It refers to contiguous map.
In the diagram, the walk begins with a clone containing
mkrB. The ends of the clone (boxed) are used to probe
a library. Clones from adjacent genome segments are
thus identified and isolated. The distal ends of those
clones are used to reprobe the library. These steps are
continued until a clone contains either mkrA or mkrC
sequences.
Clones between mkr B and mkrC must then be
evaluated for the presence of yfg.
Chromosome walking has been used in the isolation of
centromere sequences, among others.
Fig. 13. Cloning a Disease Gene by Chromosome Walking. After a marker is
linked to within 1 cM of a disease gene, chromosome walking can be used to
clone the disease gene itself. A probe is first constructed from a genomic
fragment identified from a library as being the closest linked marker to the
gene. A restriction fragment isolated from the end of the clone near the disease
locus is used to reprobe the genomic library for an overlapping clone. This
process is repeated several times to walk across the chromosome and reach the
flanking marker on the other side of the disease- gene locus.
(2) Exon Traps Once we have a contig
stretching over hundreds of kilobases, how do we
sort out the genes from the other DNA? If that
DNA region has not yet been sequenced, we can
sequence it and look for ORFs, but that is very
laborious. Several more efficient methods are
available, including a procedure invented by Alan
Buckler called exon amplification or exon
trapping.
Figure 24.14 shows how an exon trap works. We begin
with a plasmid vector such as pSPL1, which Buckler
designed for this purpose. This vector contains a
chimeric gene under the control of the SV40 early
promoter. The gene was derived
from the rabbit 13-globin gene by removing its second
intron and substituting a foreign intron from the human
immunodeficiency virus (HIV), with its own 5'- and 3'splice sites. We splice human genomic DNA fragments
into a restriction site within the intron of this plasmid,
then insert the recombinant vector into monkey cells
(COS-7 cells) that can transcribe the gene from the
SV40 promoter.
Now if any of the genomic DNA fragments we placed
into the intron are complete exons, with their own 5'- and
3'-splice sites, this exon will become part of the
processed transcript in the COS cells. We purify the
RNA made by the COS cells, reverse transcribe it to
make cDNA, then subject this cDNA to amplification by
PCR, using primers designed to amplify any new exon.
Finally, we clone the PCR products, which should
represent only exons. Any other piece of DNA inserted
into the intron will not have splicing signals; thus, after
being transcribed, it will be spliced out along with the
surrounding intron and will be lost.
Figure 24.14 Exon
trapping. Begin with a
cloning vector, such as
pSPLI, shown here in
slightly simplified form.
This vector has an sr40
promoter (P), which
drives expression of a
hybrid gene containing
the rabbit β-globin gene
(orange), interrupted by
part of the HIV tat gene,
which includes two exon
fragments (blue)
surrounding an intron
(yellow). The exon-intron
borders contain 5'- and
3'-splice sites (ss). The
tat intron contains a
cloning site, into which
random DNAfragments
can be inserted.
In step I, an exon (red) has been inserted,flanked by
parts of its own introns, and its own 5'- and 3'-splice
sees.In step 2, insert this construct into COS cells, where
it can be transcribed and then the transcript can be
spliced. Note that the foreign exon (red) has bccn
retained in the spliced transcript, because it had its own
splice sites. Finally (steps 3 and 4), subject the
transcripts to reverse transcription and PCR
amplification, with primers indicated by the arrows.
This gives many copies of a DNA fragment containing
the foreign exon, which can now be cloned and
examined. Note that a nonexon will not have splice sites
and will therefore be spliced out of the transcript along
with the intron. It will not survive to be amplified in
step 3, so one does not waste time studying it.
(3) CpG Islands
Definition of CpG island :
Short region of DNA in which the frequency
of the CG sequence is higher than in other
regions. "p" indicates that "C" and "G" are
connected by a phosphodiester bond. CpG
islands are often located around the
promoters of housekeeping genes (which are
essential for general cell functions) or other
genes frequently expressed in a cell.
CpG island, CpG岛,是指哺乳动物基因启动子及
其附近大量的CpG位点。
事实上基因组中60%~ 90% 的CpG 都被甲基化,
未甲基化的CpG 成簇地组成CpG 岛, 位于结构基
因启动子的核心序列和转录起始点。有实验证明超
甲基化阻遏转录的进行。
脊椎动物的5’端的启动子周围是CpG岛,是寻找基
因的重要线索
在人类基因组内,存在有近3万个CpG岛;在大多
数染色体上,平均每100万碱基含有5~15个CpG
岛,其中有1.8万多个CpG岛的GC含量为60%~
70%。通常,这些CpG岛不仅是基因的一种标志
,而且还参与基因表达的调控和影响染色质的结构
。
Active human genes tend to be associated with
unmethylated CpG sequences, whereas the CpGs in
inactive regions are almost always methylated.
Furthermore, the restriction enzyme HpaII cuts at
the sequence CCGG, but only if the second C is
unmethylated. Thus, geneticists can scan large
regions of DNA for "islands" of sites that could be
cut with HpaII in a "sea" of other DNA sequences
that could not be cut. Such a site is called a CpG
island, or an HTF Island because it yields HpalI tiny
fragments.
89% of all NotI sites (GCGGCCGC) are located in
CpG islands, as is the case for 74% of all EagI
(CGGCCG), SacII (CCGCGG), and BssHII sites
(GCGCGC).
Partial digestion of the clone is performed with each of
the double CpG enzymes just described and the
resulting DNA is separated by PFGE, blotted and
probed sequentially with fragments from each of the
YAC arms. The appearance of bands of the same size
in digests obtained with two or more enzymes is highly
suggestive of a CpG island.
If NotI and one of these other enzymes both recognize
sites within one or two kilobases of each other (below
the resolution of PFGE), the presence of a CpG island
can be assumed with a probability of 97%.
Once a putative CpG island is identified,
various PCR-based methods can then be
used to clone the DNA adjacent to the
island, and these sequences can be
examined thoroughly to characterize the
associated transcription unit.
SUMMARY Several methods
are available for identifying
the genes in a large contig.
One of these is the exon trap,
which uses a special vector to
help clone exons only. Another
is to use methylation sensitive
restriction enzymes to search
for CpG islands--DNA regions
containing un methylated CpG
sequence.
two classical positional cloning
experiments-finding the genes responsible
for Huntington's disease and cystic
fibrosis.
But these tools are rapidly
being superceded by the Human
Genome Project. Now, once a scientific
team has done the initial linkage
study, and they have located the gene
to a small region of the genome, they
can simply look up that genome region
in the database and find the genes it
contains, along with their sequences.
That should usually provide all the
clues needed to find the gene of
interest.
Definition of genome :
The entire complement of genetic material in a
chromosome set. The entire genetic complement
of a prokaryote, virus, mitochondrion or
chloroplast or the haploid nuclear genetic
complement of a eukaryotic species.
Definition of Human Genome Project (HGP) :
Formerly titled Human Genome Initiative.
Collective name for several projects
begun in 1986 by DOE to create an ordered
set of DNA segments from known
chromosomal locations, develop new
computational methods for analyzing genetic
map and DNA sequence data, and develop
new techniques and instruments for detecting
and analyzing DNA. This DOE initiative is now
known as the Human Genome Program. The
joint national effort, led by DOE and NIH, is
known as the Human Genome Project.
Genomics Genomics is the study of
how genes and genetic information are
organized within the genome, and how this
organization determines their function.
人类基因组计划结束后,根据标记寻找基因,
更为简便.
Two-dimensional PAGE combines SDSPAGE in the first dimension with isoelectric
focusing in the second dimension to reveal
heterogeneity of charge in proteins and
can be used to show genetic
polymorphisms in a population.
熟悉以下概念:
SNP
SSCP
Haplotype
PFGE
RFLP