Download Methods of Analysis and Resources Available for Genetic Trait

Methods of Analysis and Resources Available for Genetic Trait Mapping J. Ott Methods of genetic linkage analysis are reviewed and put in context with other mapping techniques. Sources of information are outlined (books, web sites, computer programs). Special consideration is given to statistical problems in canine genetic mapping (heterozygosity, inbreeding, marker maps). From Rockefeller University, 1230 York Avenue, New York, NY 10021-6399. This paper was delivered at the International Workshop on Canine Genetics at the College of Veterinary Medicine, Cornell University, Ithaca, New York, July 12–13, 1997. Support through grant HG00008 from NHGRI is gratefully acknowledged. q 1999 The American Genetic Association 90:68–70 68 Genetic linkage analysis is a technique for measuring the genetic distance between two loci on a chromosome. Alleles at loci in close proximity with each other will be coinherited as a “package” from parents to offspring. Linkage analysis is most often used to localize a disease gene by virtue of its linkage to a genetic marker locus on the gene map. For Mendelian diseases (mode of inheritance well known), the appropriate technique is the LOD score (maximum likelihood) approach, which may be applied with the aid of computer programs such as LINKAGE, MENDEL, MapMaker, etc. This typically results in a “LOD score curve” along the chromosome whose maximum identifies the estimated position of the disease locus and whose magnitude is indicative of the strength of linkage. For so-called complex traits (mode of inheritance unknown), it should first be established that the trait runs in families. Then a search for disease genes may be carried out by a variety of techniques. The LOD score method is often applied under the assumption of a single disease residing at a given genomic position. Another approach is the affected sib-pair method, which does not require any assumptions on mode of inheritance and only looks for deviations in the inheritance of marker alleles from parents to affected offspring. Variations of this identity by descent ( IBD) allele sharing technique are, for example, the affected pedigree member (APM) method. Many nonparametric approaches estimate IBD allele sharing by maximum likelihood and then present results in terms of LOD scores. This does not mean, however, that these methods are parametric in the sense that disease models are specified and recombination fractions are estimated. Computer Programs for Linkage Analysis Among the likelihood methods for the joint analysis of multiple loci, two basic algorithms exist (Idury and Elston 1997). In the Elston–Stewart (ES) algorithm (Elston and Stewart 1971), computational effort increases linearly with the number of individuals in a pedigree, but exponentially with the number of marker loci while this is reversed for the Lander–Green (LG) algorithm (Lander and Green 1987). An extensive list of these programs, including references and brief descriptions, may be found on our web site http://linkage.rockefeller.edu/soft/list.html (no literature references are provided in this outline). Here some practical aspects of computer programs are discussed. The ES algorithm has been implemented in the LIPED program and extensions of it in the LINKAGE programs. A fast version of the LINKAGE programs is FastLINK, which is the program of choice for large pedigrees and a small number of loops. However, only a limited number of markers (up to five or six) may be analyzed simultaneously. An extremely efficient program is VITESSE, but it is currently available only for nonlooped pedigrees. The MENDEL program is similar in its capabilities to LINKAGE/FastLINK but its method of handling loops is more sophisticated. It is also more flexible in parameter selection. On the other hand, MENDEL requires much more computing resources than FastLINK. The LG algorithm forms the basis of programs such as MapMaker, CRI-MAP, and GENEHUNTER. The latter represents the current state of the art, particularly the version modified by Kong and Cox (1997). The GENEHUNTER program can analyze all markers on a chromosome jointly. However, family size is limited to approximately 12 nonfounder individuals. MapMaker is specifically designed for marker mapping (see below). CRI-MAP is similar to MapMaker but has been modified to allow for limited disease gene mapping. A useful option in CRI-MAP shows all crossovers occurring on chromosomes (so does the GENEHUNTER program). While the ES algorithm is good for large pedigrees with a small number of genetic loci, and the LG algorithm can handle large numbers of loci with only small families, there is no method available for large families and numbers of loci. For those cases, and when large numbers of loops are present (a typical case in animal pedigrees), computer simulation methods have been developed that furnish approximate rather than exact results. One of the most flexible and versatile programs in this category is LOKI, which is part of the PANGAEA package. For heritable traits not following a Mendelian mode of inheritance, many researchers apply nonparametric analysis methods. Several computer programs for such analyses are available, for example, SIBPAL of the S.A.G.E. package, and ANALYZE. The former implements a specific procedure ( Haseman–Elston) for affected sib-pair analysis, but is restricted to working with one marker locus at a time. ANALYZE is a suite of programs developed by Dr. Joseph Terwilliger and allows for affected sibling and disequilibrium analyses, where some of the programs can work in a limited multimarker fashion. Also, GENEHUNTER can carry out nonparametric analysis. As it works with all markers on a chromosome, it is the program of choice for many researchers. Resources Many of the programs developed at academic institutions are freely available. Exceptions are, for example, the SAGE programs. Among the textbooks of interest for genetic linkage analysis, in addition to my own (Ott 1991), Hartl (1988) gives a nice introduction to population genetics. A handbook with practical examples for LINKAGE and other programs is that by Terwilliger and Ott (1994). Courses in linkage analysis are regularly conducted at Rockefeller University and the University of Zurich (Switzerland) on a basic and advanced level, one each in Zurich and New York, once a year. These courses are listed on our web site. A wealth of information is available from many web sites. For example, the NCBI (National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/) provides information and links to many databases relevant for genomics. Also, the MEDLINE list of publication references is available through NCBI (http://www4.ncbi.nlm.nih. gov/PubMed/). Marker Ordering To localize disease genes by linkage analysis, a good genetic marker map is essential. Canine marker maps are only now beginning to be constructed (Werner et al. 1997). Several dog reference families exist on which researchers are working to create maps; information on canine maps is available on the internet (http://mendel.berkeley.edu/dog.html; http:// tiberius.fhcrc.org/home/dog.html; http:// ubeclu.unibe.ch/itz/dogmap.html). A potential problem for genetic mapping is the relatively high inbreeding in dogs compared with that in humans (inbreeding in humans is essentially negligible). As outlined by Dr. Acland in his meeting presentation, the inbreeding coefficient in highly inbred dogs is F 5 0.5, F 5 0.2 in a typical purebred dog, and F 5 0.1 in crossbred dogs. An important property of a genetic marker is its heterozygosity, which indicates the proportion of individuals in a population who are heterozygous for that marker. Under Hardy–Weinberg equilibrium, as is well known, heterozygosity can be calculated from allele frequencies, pi, at the ith marker allele as H0 5 1 2 Spi2. With inbreeding, expected heterozygosity is reduced to H 5 (1 2 F )H0 5 (1 2 F )(1 2 Spi2) ( Hartl 1988). For example, a marker with heterozygosity H0 5 70% in humans will have expected heterozygosities of H 5 35% in highly inbred dogs, H 5 56% in purebred dogs, and H 5 63% in crossbred dogs. Clearly a dense map is better than a sparse map. However, it takes more observations to order closely spaced markers than loosely spaced markers. An approximate approach to the question of marker ordering may be found in my book (Ott 1991, p.137). Assume a map of markers with spacing x, where x is the relative interval length, that is, x is approximately equal to 1 divided by the number of markers on a chromosome. To establish local order among three markers, we must see at least one crossover in each of the two adjacent intervals. If we assume that in each meiosis only one crossover occurs, randomly placed on the map, then the number of meioses required for establishing local order with power 90% is given by n 5 log(1 2 Ï0.9)/log(1 2 P ). For example, with a chromosome length of 100 cM, ordering markers with a spacing of 20 cM (P 5 0.2) requires n 5 13.3 meioses, but with a spacing of 10 cM (P 5 0.1), n 5 28.2 meioses or about double the number for a 20 cM map are required. While these figures should not be taken as accurate indications of the numbers of meioses required, they do provide relative numbers of meioses needed for ordering maps with different spacing. More detailed analyses of such questions ( Bishop 1985; Ott and Lathrop 1987) show that markers are easiest to order when they are approximately 20 cM apart; denser as well as sparser maps are more difficult to order. To build a map for all marker loci on a chromosome, the ideal situation would be to consider all possible orders and, for each order, estimate marker distances and associated maximum likelihood. As the total number of orders prohibits such an approach, various ordering algorithms have been incorporated in computer programs. For example, MapMaker’s ordering algorithm is different from the one used by Généthon. Typically one starts with a pair of markers and successively adds more markers, reestimating all distances at each step. Much of the success of the human gene mapping project is based on the fact that a specific set of families (the CEPH reference families) is used for mapping and that DNA from these families is made available worldwide. The structure of these families is simple: One pair of parents, up to four grandparents, and a rather large number of children. This family type allows for streamlined ways of computing pedigree likelihoods that are much faster than the ones employed for general pedigrees. For example, such algorithms have been implemented in MapMaker and LINKAGE (this version of LINKAGE can work with a large number of markers jointly). In addition to genetic ordering methods, physical ordering techniques are available. A common method is radiation hybrid mapping. Also, excellent cytogenetic methods (e.g., FISH) are now available for marker ordering. The MULTIMAP program is kind of an expert system that creates a map of markers either by genetic mapping or radiation hybrid mapping. It is based Ott • Analysis Methods 69 on the CRI-MAP program as its compute engine. Disease Gene Mapping To localize a disease gene on a genetic map, it is important that markers be relatively dense. However, high marker polymorphism may substitute for low marker density and vice versa. Analytical investigations have been carried out to determine the number of meioses required to map a trait locus to the midpoint of a map of eight equally distant markers with equal heterozygosities ( Terwilliger et al. 1992). A phase-known fully informative situation was assumed. For intermarker distances of x cM and given heterozygosities H, the expected number of meioses M for a maximum LOD score of 3 was calculated. Results may be represented approximately by the relationship M 5 10 1 x(1 2 H). Clearly it is easier to map a new locus into an interval of length 20 cM than one of length 10 cM. If M20 denotes the number of meioses required for interval lengths of 20 cM and M10 refers to interval lengths of 10 cM, then the ratio M20/M10 is approximately given by (3–2H)/(2 2 H). For a marker heterozygosity of H 5 0.3, this ratio is 1.41, that is, it takes 40% more data to localize a new locus into a 10 cM interval than into a 20 cM interval. With marker heterozygosities of H 5 0.7, it only takes 23% more data, that is, higher marker heterozygosity increases the ability to localize a gene onto a dense map. These calculations show, approximately, the increase in observations required for decreasing marker heterozygosity and map density. In animals, mapping a disease gene may be facilitated by an appropriate choice of an efficient mating ( breeding) design. For example, consider a fully penetrant recessive disease and two possible mating types (family structures). Assume two alleles at the disease locus (d 5 disease allele, 1 5 normal allele) and alleles numbered 1, 2, etc., at a marker locus. Mating type 1 is the common form, d1/12 3 d3/ 14, that is, both parents are phase-known doubly heterozygous (phase must be established through grandparents). Offspring are affected (d/d, frequency 25%) or unaffected (d/1 or 1/1, frequency 75%). 70 The Journal of Heredity 1999:90(1) Mating type 2 is d1/12 3 d3/d3, that is, one parent is phase-known doubly heterozygous, while the other is doubly homozygous (affected). Offspring are affected (d/d, frequency 50%) or unaffected (d/1, frequency 50%). Which mating type is more efficient for linkage? ( I am grateful to Dr. Frode Lingaas for pointing out these mating types to me.) This question is not easy to answer. In mating type 1, both parents are informative for linkage but a large portion of offspring have ambiguous disease genotypes; in mating type 2, on the other hand, only one parent is informative yet all offspring have known disease genotypes. Through standard analytical methods (Ott 1991), expected LOD scores may be computed which furnish the following results: For equal offspring numbers, mating type 2 provides more linkage information than mating type 1, although the difference is not dramatic. For example, if disease and marker loci are 1 cM apart, it takes 25% more offspring for mating type 1 to be as informative as mating type 2. One of the motivations for doing genetic mapping in animals assumes that it will be possible to detect synteny between portions of genomes in different organisms. Several authors have contributed methodology to this area of research ( Hannenhalli et al. 1995; Nadkarni 1997, 1998), but a discussion of these methods is beyond the scope of this article. Complex traits are diseases without a known Mendelian mode of inheritance; they are generally taken to be under the control of multiple underlying (and perhaps interacting) genes. In addition, phenotype definition may be unclear in the sense that the genetically relevant diagnosis is unknown. Gene mapping for most of these traits is carried out under the assumption of a single gene located at the genomic point being analyzed. Both the LINKAGE and MENDEL programs come in versions that allow for the presence of two disease loci, but they are rarely used in practice as they make heavy demands on computing resources. In humans, only a few nonparametric two-locus analyses have been carried out, for example, for different diabetes loci (Cordell et al. 1995). With parametric linkage methods, analysis of complex traits almost surely is carried out under “wrong” assumptions (e.g., single gene, while in reality there must be multiple genes). As a result, the recombination fraction tends to be overestimated. While this is not a problem in two-point analysis, in multipoint analysis it tends to estimate disease loci outside the marker map even when disease genes reside inside the map (e.g., Risch and Giuffra 1992). Thus, two-point analysis is the method of choice until clear evidence for linkage has been obtained. References Bishop DT, 1985. The information content of phaseknown matings for ordering genetic loci. Genet Epidemiol 2:349–361. Cordell HJ, Todd JA, Bennett ST, Kawaguchi Y, and Farrall M, 1995. Two-locus maximum LOD score analysis of a multifactorial trait: joint consideration of IDDM2 and IDDM4 with IDDM1 in type I diabetes. Am J Hum Genet 57:920–934. Elston RC and Stewart J, 1971. A general model for the genetic analysis of pedigree data. Hum Hered 21:523– 542. Hannenhalli S, Chappey C, Koonin EV, and Pevzner PA, 1995. Genome sequence comparison and scenarios for gene rearrangements: a test case. Genomics 30:299– 311. Hartl DL, 1988. A primer of population genetics, 2nd edition. Sunderland, MA: Sinauer. Idury RM and Elston RC, 1997. A faster and more general hidden Markov model algorithm for multipoint likelihood calculations. Hum Hered 47:197–202. Kong A and Cox NJ, 1997. Allele-sharing models: LOD scores and accurate linkage tests. Am J Hum Genet 61: 1179–1188. Lander ES and Green P, 1987. Construction of multilocus genetic maps in humans. Proc Natl Acad Sci USA 84:2363–2367. Nadkarni PM, 1997. Mapdiff: determining differences between two genomic maps. Comput Appl Biosci 13: 217–225. Nadkarni PM, 1998. Mapmerge: merge genomic maps. Bioinformatics 14:310–316. Ott J, 1991. Analysis of human genetic linkage. Baltimore: Johns Hopkins University Press. Ott J and Lathrop GM, 1987. Goodness of fit tests for locus order in three-point mapping. Genet Epidemiol 4: 51–57. Risch N and Giuffra L, 1992. Model misspecification and multipoint linkage analysis. Hum Hered 42:77–92. Terwilliger JD, Ding Y, and Ott J, 1992. On the relative importance of marker heterozygosity and intermarker distance in gene mapping. Genomics 13:951–956. Terwilliger JD and Ott J, 1994. Handbook of human genetic linkage. Baltimore: Johns Hopkins University Press. Werner P, Raducha MG, Prociuk U, Henthorn PS, and Patterson DF, 1997. Physical and linkage mapping of human chromosome 17 loci to dog chromosomes 9 and 5. Genomics 42:74–82. Corresponding Editor: Gregory M. Acland

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Methods of Analysis and Resources Available for Genetic Trait