* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Positional Cloning 08
Gene expression wikipedia , lookup
Ridge (biology) wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Molecular cloning wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene regulatory network wikipedia , lookup
X-inactivation wikipedia , lookup
Gene desert wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genetic engineering wikipedia , lookup
Non-coding DNA wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Community fingerprinting wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Molecular evolution wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Genomic library wikipedia , lookup
Positional Cloning 安徽大学生命科学学院 查向东 School of Life Sciences, Anhui University 2007.8 Positional Cloning Before the genomics era, geneticists seeking the genes responsible for human genetic disorders frequently faced a problem: They did not know the identity of the defective protein, so they were looking for a gene without knowing its function. Thus, they had to identify the gene by finding its position on the human genetic map. Positional cloning The process that commences with searching for markers linked to a particular inherited trait, then using those markers to identify the approximate location of the gene responsible for the trait, and then using various cloning strategies to identify, isolate and characterize the gene. The strategy of positional cloning begins with the study of a family or families afflicted with the disorder, with the goal of finding one or more markers that are tightly linked to the gene causing the disease. Because the position of the marker is known, the disease gene can be pinned down to a relatively small region of the genome. However, that "relatively small" region usually contains about a million base pairs, so the job is not over. The next step is to search through the million or so base pairs to find a gene that is the likely culprit. Some tools have traditionally been used in the search. 1. Searching for Markers RFLP-花色连锁 RFLP与系统发育 RFLP Demonstration of RFLPs by Southern blot analysis. A representation of five different Southern blot genotypes detected with a single probe is shown. The alleles present in each genotype are distinguished by numbers at the top of each lane. Each numbered allele in this Figure corresponds to the identically numbered genomic restriction map shown in Figure 8.3. Figure 8.3 RFLP generation by different molecular events. Chromosome 1 is the ancestral state, shown with three TaqI restriction sites (TCGA) and a fourth non-restriction site (TAGA). Chromosome 2 has undergone a C to T change (indicated by an asterisk) that destroys one of the restriction sites. Chromosome 3 has undergone an A to C change that creates a TaqI site. Chromosome 4 has had a 0.5 kb insertion. The boxed-in region on each chromosome is the restriction fragment that will be recognized on a Southern blot probed with the fragment indicated in the diagram. 1cM equivalent to around 500 kb. Annals of Human Genetics Vol. 64 Issue 3 Page 255 May 2000 Search for multifactorial disease susceptibility genes in founder populations C. BOURGAIN, E. GENIN, H. QUESNEVILLE, F. CLERGET-DARPOUX 1cM is roughly equivalent to 1.8Mbp. Vertebrate Genomes Insights from Xenopus Genomes Pollet N, Mazabraud A Volff J-N (ed): Vertebrate Genomes. Genome Dyn. Basel, Karger, 2006, vol 2, pp 138-153 (DOI: 10.1159/000095101) Human genome (3 × 109 bp)=1m =106μm 3 ×109 /106=3kb/μm 2 micron (2μm ) plasmid which is 6 kb in length 1μm≌3kb 2m (2 micron) plasmid ;2微米质粒:在某些酵母细胞中存在着一些染色体外 环状DNA分子,周长为2微米(micron),可作为基因工程的载体。 人基因组内存在(CA)n 形式的碱基重复。若父亲 有一n=12重复片段, 突变基因处在(CA) 12的附 近。母亲带正常基因,n=15,n=17。那么后代 若从父亲那里获得遗传疾病基因, 同时会出现 n=12重复片段。 在研究这个问题时,总是寻找尽可能大的有遗传 疾病的家庭,以利追踪基因,在他们的基因组内 寻找碱基重复的特征,只在患病成员身上出现而 不在健康成员身上出现的DNA片段。 20世纪80年代中期,詹姆斯古西拉幸运地在8个 DNA样品中找到了只存在于患者身上的片段。该 片段中一定含有“亨廷顿舞蹈病”基因。但1983年 ,对人类基因组上的标记分析还很浅,而那个片 段有太大。10年后才在那个片段上找到了亨廷顿 基因。 全基因组扫描 所谓“全基因扫描策略”是首先扫描出基 因组上有一定密度分布的标记物,再通过 对病人群体与参照群体中标记物的不同分 布的比较,为基因定位。 多基因遗传病 2. gene identification Three general approaches to gene identification based on three corresponding characteristics of mammalian genes: i. the occurrence of introns in nearly all mammalian genes; ii. the presence of "CpG" islands at the 5'-ends of most mammalian genes; and iii. the evolutionary conservation of nearly all mammalian genes from mice to humans and sometimes beyond. Locating and find genes with the help of the markers (1) Chromosome walking (2) finding exons with exon traps; (3) locating the CpG islands that tend to be associated with genes. (1)染色体步查 The strategy of map-based cloning is to find molecular markers very closely linked to the gene of interest. Those molecular markers can serve as the starting point for chromosome walking or jumping to the gene. Contig A set of overlapping clones that provide a physical map of a portion of a chromosome. It refers to contiguous map. In the diagram, the walk begins with a clone containing mkrB. The ends of the clone (boxed) are used to probe a library. Clones from adjacent genome segments are thus identified and isolated. The distal ends of those clones are used to reprobe the library. These steps are continued until a clone contains either mkrA or mkrC sequences. Clones between mkr B and mkrC must then be evaluated for the presence of yfg. Chromosome walking has been used in the isolation of centromere sequences, among others. Fig. 13. Cloning a Disease Gene by Chromosome Walking. After a marker is linked to within 1 cM of a disease gene, chromosome walking can be used to clone the disease gene itself. A probe is first constructed from a genomic fragment identified from a library as being the closest linked marker to the gene. A restriction fragment isolated from the end of the clone near the disease locus is used to reprobe the genomic library for an overlapping clone. This process is repeated several times to walk across the chromosome and reach the flanking marker on the other side of the disease- gene locus. (2) Exon Traps Once we have a contig stretching over hundreds of kilobases, how do we sort out the genes from the other DNA? If that DNA region has not yet been sequenced, we can sequence it and look for ORFs, but that is very laborious. Several more efficient methods are available, including a procedure invented by Alan Buckler called exon amplification or exon trapping. Figure 24.14 shows how an exon trap works. We begin with a plasmid vector such as pSPL1, which Buckler designed for this purpose. This vector contains a chimeric gene under the control of the SV40 early promoter. The gene was derived from the rabbit 13-globin gene by removing its second intron and substituting a foreign intron from the human immunodeficiency virus (HIV), with its own 5'- and 3'splice sites. We splice human genomic DNA fragments into a restriction site within the intron of this plasmid, then insert the recombinant vector into monkey cells (COS-7 cells) that can transcribe the gene from the SV40 promoter. Now if any of the genomic DNA fragments we placed into the intron are complete exons, with their own 5'- and 3'-splice sites, this exon will become part of the processed transcript in the COS cells. We purify the RNA made by the COS cells, reverse transcribe it to make cDNA, then subject this cDNA to amplification by PCR, using primers designed to amplify any new exon. Finally, we clone the PCR products, which should represent only exons. Any other piece of DNA inserted into the intron will not have splicing signals; thus, after being transcribed, it will be spliced out along with the surrounding intron and will be lost. Figure 24.14 Exon trapping. Begin with a cloning vector, such as pSPLI, shown here in slightly simplified form. This vector has an sr40 promoter (P), which drives expression of a hybrid gene containing the rabbit β-globin gene (orange), interrupted by part of the HIV tat gene, which includes two exon fragments (blue) surrounding an intron (yellow). The exon-intron borders contain 5'- and 3'-splice sites (ss). The tat intron contains a cloning site, into which random DNAfragments can be inserted. In step I, an exon (red) has been inserted,flanked by parts of its own introns, and its own 5'- and 3'-splice sees.In step 2, insert this construct into COS cells, where it can be transcribed and then the transcript can be spliced. Note that the foreign exon (red) has bccn retained in the spliced transcript, because it had its own splice sites. Finally (steps 3 and 4), subject the transcripts to reverse transcription and PCR amplification, with primers indicated by the arrows. This gives many copies of a DNA fragment containing the foreign exon, which can now be cloned and examined. Note that a nonexon will not have splice sites and will therefore be spliced out of the transcript along with the intron. It will not survive to be amplified in step 3, so one does not waste time studying it. (3) CpG Islands Definition of CpG island : Short region of DNA in which the frequency of the CG sequence is higher than in other regions. "p" indicates that "C" and "G" are connected by a phosphodiester bond. CpG islands are often located around the promoters of housekeeping genes (which are essential for general cell functions) or other genes frequently expressed in a cell. CpG island, CpG岛,是指哺乳动物基因启动子及 其附近大量的CpG位点。 事实上基因组中60%~ 90% 的CpG 都被甲基化, 未甲基化的CpG 成簇地组成CpG 岛, 位于结构基 因启动子的核心序列和转录起始点。有实验证明超 甲基化阻遏转录的进行。 脊椎动物的5’端的启动子周围是CpG岛,是寻找基 因的重要线索 在人类基因组内,存在有近3万个CpG岛;在大多 数染色体上,平均每100万碱基含有5~15个CpG 岛,其中有1.8万多个CpG岛的GC含量为60%~ 70%。通常,这些CpG岛不仅是基因的一种标志 ,而且还参与基因表达的调控和影响染色质的结构 。 Active human genes tend to be associated with unmethylated CpG sequences, whereas the CpGs in inactive regions are almost always methylated. Furthermore, the restriction enzyme HpaII cuts at the sequence CCGG, but only if the second C is unmethylated. Thus, geneticists can scan large regions of DNA for "islands" of sites that could be cut with HpaII in a "sea" of other DNA sequences that could not be cut. Such a site is called a CpG island, or an HTF Island because it yields HpalI tiny fragments. 89% of all NotI sites (GCGGCCGC) are located in CpG islands, as is the case for 74% of all EagI (CGGCCG), SacII (CCGCGG), and BssHII sites (GCGCGC). Partial digestion of the clone is performed with each of the double CpG enzymes just described and the resulting DNA is separated by PFGE, blotted and probed sequentially with fragments from each of the YAC arms. The appearance of bands of the same size in digests obtained with two or more enzymes is highly suggestive of a CpG island. If NotI and one of these other enzymes both recognize sites within one or two kilobases of each other (below the resolution of PFGE), the presence of a CpG island can be assumed with a probability of 97%. Once a putative CpG island is identified, various PCR-based methods can then be used to clone the DNA adjacent to the island, and these sequences can be examined thoroughly to characterize the associated transcription unit. SUMMARY Several methods are available for identifying the genes in a large contig. One of these is the exon trap, which uses a special vector to help clone exons only. Another is to use methylation sensitive restriction enzymes to search for CpG islands--DNA regions containing un methylated CpG sequence. two classical positional cloning experiments-finding the genes responsible for Huntington's disease and cystic fibrosis. But these tools are rapidly being superceded by the Human Genome Project. Now, once a scientific team has done the initial linkage study, and they have located the gene to a small region of the genome, they can simply look up that genome region in the database and find the genes it contains, along with their sequences. That should usually provide all the clues needed to find the gene of interest. Definition of genome : The entire complement of genetic material in a chromosome set. The entire genetic complement of a prokaryote, virus, mitochondrion or chloroplast or the haploid nuclear genetic complement of a eukaryotic species. Definition of Human Genome Project (HGP) : Formerly titled Human Genome Initiative. Collective name for several projects begun in 1986 by DOE to create an ordered set of DNA segments from known chromosomal locations, develop new computational methods for analyzing genetic map and DNA sequence data, and develop new techniques and instruments for detecting and analyzing DNA. This DOE initiative is now known as the Human Genome Program. The joint national effort, led by DOE and NIH, is known as the Human Genome Project. Genomics Genomics is the study of how genes and genetic information are organized within the genome, and how this organization determines their function. 人类基因组计划结束后,根据标记寻找基因, 更为简便. Two-dimensional PAGE combines SDSPAGE in the first dimension with isoelectric focusing in the second dimension to reveal heterogeneity of charge in proteins and can be used to show genetic polymorphisms in a population. 熟悉以下概念: SNP SSCP Haplotype PFGE RFLP