Download Methods of Analysis and Resources Available for Genetic Trait

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inbreeding wikipedia , lookup

Fetal origins hypothesis wikipedia , lookup

Twin study wikipedia , lookup

Nutriepigenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Tag SNP wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Behavioural genetics wikipedia , lookup

Heritability of IQ wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genetic drift wikipedia , lookup

Genetic testing wikipedia , lookup

Genetic engineering wikipedia , lookup

Genome-wide association study wikipedia , lookup

Gene expression programming wikipedia , lookup

Medical genetics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Human genetic variation wikipedia , lookup

Designer baby wikipedia , lookup

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Genome (book) wikipedia , lookup

Public health genomics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
Methods of Analysis and Resources Available
for Genetic Trait Mapping
J. Ott
Methods of genetic linkage analysis are reviewed and put in context with other
mapping techniques. Sources of information are outlined (books, web sites, computer programs). Special consideration is given to statistical problems in canine
genetic mapping (heterozygosity, inbreeding, marker maps).
From Rockefeller University, 1230 York Avenue, New
York, NY 10021-6399. This paper was delivered at the
International Workshop on Canine Genetics at the College of Veterinary Medicine, Cornell University, Ithaca,
New York, July 12–13, 1997. Support through grant
HG00008 from NHGRI is gratefully acknowledged.
q 1999 The American Genetic Association 90:68–70
68
Genetic linkage analysis is a technique for
measuring the genetic distance between
two loci on a chromosome. Alleles at loci
in close proximity with each other will be
coinherited as a “package” from parents to
offspring. Linkage analysis is most often
used to localize a disease gene by virtue
of its linkage to a genetic marker locus on
the gene map. For Mendelian diseases
(mode of inheritance well known), the appropriate technique is the LOD score
(maximum likelihood) approach, which
may be applied with the aid of computer
programs such as LINKAGE, MENDEL,
MapMaker, etc. This typically results in a
“LOD score curve” along the chromosome
whose maximum identifies the estimated
position of the disease locus and whose
magnitude is indicative of the strength of
linkage. For so-called complex traits
(mode of inheritance unknown), it should
first be established that the trait runs in
families. Then a search for disease genes
may be carried out by a variety of techniques. The LOD score method is often applied under the assumption of a single disease residing at a given genomic position.
Another approach is the affected sib-pair
method, which does not require any assumptions on mode of inheritance and
only looks for deviations in the inheritance of marker alleles from parents to affected offspring. Variations of this identity
by descent ( IBD) allele sharing technique
are, for example, the affected pedigree
member (APM) method. Many nonparametric approaches estimate IBD allele
sharing by maximum likelihood and then
present results in terms of LOD scores.
This does not mean, however, that these
methods are parametric in the sense that
disease models are specified and recombination fractions are estimated.
Computer Programs for Linkage
Analysis
Among the likelihood methods for the joint
analysis of multiple loci, two basic algorithms exist (Idury and Elston 1997). In the
Elston–Stewart (ES) algorithm (Elston and
Stewart 1971), computational effort increases linearly with the number of individuals in
a pedigree, but exponentially with the number of marker loci while this is reversed for
the Lander–Green (LG) algorithm (Lander
and Green 1987). An extensive list of these
programs, including references and brief descriptions, may be found on our web site
http://linkage.rockefeller.edu/soft/list.html
(no literature references are provided in this
outline). Here some practical aspects of
computer programs are discussed.
The ES algorithm has been implemented
in the LIPED program and extensions of it
in the LINKAGE programs. A fast version
of the LINKAGE programs is FastLINK,
which is the program of choice for large
pedigrees and a small number of loops.
However, only a limited number of markers (up to five or six) may be analyzed simultaneously. An extremely efficient program is VITESSE, but it is currently available only for nonlooped pedigrees. The
MENDEL program is similar in its capabilities to LINKAGE/FastLINK but its method
of handling loops is more sophisticated. It
is also more flexible in parameter selection. On the other hand, MENDEL requires
much more computing resources than
FastLINK.
The LG algorithm forms the basis of programs such as MapMaker, CRI-MAP, and
GENEHUNTER. The latter represents the
current state of the art, particularly the
version modified by Kong and Cox (1997).
The GENEHUNTER program can analyze
all markers on a chromosome jointly. However, family size is limited to approximately 12 nonfounder individuals. MapMaker is
specifically designed for marker mapping
(see below). CRI-MAP is similar to MapMaker but has been modified to allow for
limited disease gene mapping. A useful option in CRI-MAP shows all crossovers occurring on chromosomes (so does the
GENEHUNTER program).
While the ES algorithm is good for large
pedigrees with a small number of genetic
loci, and the LG algorithm can handle
large numbers of loci with only small families, there is no method available for large
families and numbers of loci. For those
cases, and when large numbers of loops
are present (a typical case in animal pedigrees), computer simulation methods
have been developed that furnish approximate rather than exact results. One of the
most flexible and versatile programs in
this category is LOKI, which is part of the
PANGAEA package.
For heritable traits not following a Mendelian mode of inheritance, many researchers apply nonparametric analysis
methods. Several computer programs for
such analyses are available, for example,
SIBPAL of the S.A.G.E. package, and ANALYZE. The former implements a specific
procedure ( Haseman–Elston) for affected
sib-pair analysis, but is restricted to working with one marker locus at a time. ANALYZE is a suite of programs developed by
Dr. Joseph Terwilliger and allows for affected sibling and disequilibrium analyses,
where some of the programs can work in
a limited multimarker fashion. Also,
GENEHUNTER can carry out nonparametric analysis. As it works with all markers
on a chromosome, it is the program of
choice for many researchers.
Resources
Many of the programs developed at academic institutions are freely available. Exceptions are, for example, the SAGE programs.
Among the textbooks of interest for genetic linkage analysis, in addition to my
own (Ott 1991), Hartl (1988) gives a nice
introduction to population genetics. A
handbook with practical examples for
LINKAGE and other programs is that by
Terwilliger and Ott (1994). Courses in linkage analysis are regularly conducted at
Rockefeller University and the University
of Zurich (Switzerland) on a basic and advanced level, one each in Zurich and New
York, once a year. These courses are listed
on our web site.
A wealth of information is available from
many web sites. For example, the NCBI (National Center for Biotechnology Information,
http://www.ncbi.nlm.nih.gov/) provides information and links to many databases relevant for genomics. Also, the MEDLINE list
of publication references is available
through NCBI (http://www4.ncbi.nlm.nih.
gov/PubMed/).
Marker Ordering
To localize disease genes by linkage
analysis, a good genetic marker map is
essential. Canine marker maps are only
now beginning to be constructed (Werner et al. 1997). Several dog reference
families exist on which researchers are
working to create maps; information on
canine maps is available on the internet
(http://mendel.berkeley.edu/dog.html; http://
tiberius.fhcrc.org/home/dog.html; http://
ubeclu.unibe.ch/itz/dogmap.html). A potential problem for genetic mapping is the relatively high inbreeding in dogs compared
with that in humans (inbreeding in humans is essentially negligible). As outlined
by Dr. Acland in his meeting presentation,
the inbreeding coefficient in highly inbred
dogs is F 5 0.5, F 5 0.2 in a typical purebred dog, and F 5 0.1 in crossbred dogs.
An important property of a genetic
marker is its heterozygosity, which indicates the proportion of individuals in a
population who are heterozygous for that
marker. Under Hardy–Weinberg equilibrium, as is well known, heterozygosity can
be calculated from allele frequencies, pi, at
the ith marker allele as H0 5 1 2 Spi2. With
inbreeding, expected heterozygosity is reduced to H 5 (1 2 F )H0 5 (1 2 F )(1 2
Spi2) ( Hartl 1988). For example, a marker
with heterozygosity H0 5 70% in humans
will have expected heterozygosities of H 5
35% in highly inbred dogs, H 5 56% in
purebred dogs, and H 5 63% in crossbred
dogs.
Clearly a dense map is better than a
sparse map. However, it takes more observations to order closely spaced markers
than loosely spaced markers. An approximate approach to the question of marker
ordering may be found in my book (Ott
1991, p.137). Assume a map of markers
with spacing x, where x is the relative interval length, that is, x is approximately
equal to 1 divided by the number of markers on a chromosome. To establish local
order among three markers, we must see
at least one crossover in each of the two
adjacent intervals. If we assume that in
each meiosis only one crossover occurs,
randomly placed on the map, then the
number of meioses required for establishing local order with power 90% is given by
n 5 log(1 2 Ï0.9)/log(1 2 P ). For example, with a chromosome length of 100 cM,
ordering markers with a spacing of 20 cM
(P 5 0.2) requires n 5 13.3 meioses, but
with a spacing of 10 cM (P 5 0.1), n 5 28.2
meioses or about double the number for
a 20 cM map are required. While these figures should not be taken as accurate indications of the numbers of meioses required, they do provide relative numbers
of meioses needed for ordering maps with
different spacing. More detailed analyses
of such questions ( Bishop 1985; Ott and
Lathrop 1987) show that markers are easiest to order when they are approximately
20 cM apart; denser as well as sparser
maps are more difficult to order.
To build a map for all marker loci on a
chromosome, the ideal situation would be
to consider all possible orders and, for
each order, estimate marker distances and
associated maximum likelihood. As the total number of orders prohibits such an approach, various ordering algorithms have
been incorporated in computer programs.
For example, MapMaker’s ordering algorithm is different from the one used by Généthon. Typically one starts with a pair of
markers and successively adds more
markers, reestimating all distances at each
step.
Much of the success of the human gene
mapping project is based on the fact that
a specific set of families (the CEPH reference families) is used for mapping and
that DNA from these families is made
available worldwide. The structure of
these families is simple: One pair of parents, up to four grandparents, and a rather
large number of children. This family type
allows for streamlined ways of computing
pedigree likelihoods that are much faster
than the ones employed for general pedigrees. For example, such algorithms have
been implemented in MapMaker and LINKAGE (this version of LINKAGE can work
with a large number of markers jointly).
In addition to genetic ordering methods,
physical ordering techniques are available. A common method is radiation hybrid mapping. Also, excellent cytogenetic
methods (e.g., FISH) are now available for
marker ordering. The MULTIMAP program
is kind of an expert system that creates a
map of markers either by genetic mapping
or radiation hybrid mapping. It is based
Ott • Analysis Methods 69
on the CRI-MAP program as its compute
engine.
Disease Gene Mapping
To localize a disease gene on a genetic
map, it is important that markers be relatively dense. However, high marker polymorphism may substitute for low marker
density and vice versa. Analytical investigations have been carried out to determine the number of meioses required to
map a trait locus to the midpoint of a map
of eight equally distant markers with equal
heterozygosities ( Terwilliger et al. 1992).
A phase-known fully informative situation
was assumed. For intermarker distances
of x cM and given heterozygosities H, the
expected number of meioses M for a maximum LOD score of 3 was calculated. Results may be represented approximately
by the relationship M 5 10 1 x(1 2 H).
Clearly it is easier to map a new locus into
an interval of length 20 cM than one of
length 10 cM. If M20 denotes the number of
meioses required for interval lengths of 20
cM and M10 refers to interval lengths of 10
cM, then the ratio M20/M10 is approximately given by (3–2H)/(2 2 H). For a marker
heterozygosity of H 5 0.3, this ratio is
1.41, that is, it takes 40% more data to localize a new locus into a 10 cM interval
than into a 20 cM interval. With marker
heterozygosities of H 5 0.7, it only takes
23% more data, that is, higher marker heterozygosity increases the ability to localize a gene onto a dense map. These calculations show, approximately, the increase in observations required for decreasing marker heterozygosity and map
density.
In animals, mapping a disease gene may
be facilitated by an appropriate choice of
an efficient mating ( breeding) design. For
example, consider a fully penetrant recessive disease and two possible mating
types (family structures). Assume two alleles at the disease locus (d 5 disease allele, 1 5 normal allele) and alleles numbered 1, 2, etc., at a marker locus. Mating
type 1 is the common form, d1/12 3 d3/
14, that is, both parents are phase-known
doubly heterozygous (phase must be established through grandparents). Offspring are affected (d/d, frequency 25%) or
unaffected (d/1 or 1/1, frequency 75%).
70 The Journal of Heredity 1999:90(1)
Mating type 2 is d1/12 3 d3/d3, that is,
one parent is phase-known doubly heterozygous, while the other is doubly homozygous (affected). Offspring are affected
(d/d, frequency 50%) or unaffected (d/1,
frequency 50%). Which mating type is
more efficient for linkage? ( I am grateful
to Dr. Frode Lingaas for pointing out these
mating types to me.) This question is not
easy to answer. In mating type 1, both parents are informative for linkage but a large
portion of offspring have ambiguous disease genotypes; in mating type 2, on the
other hand, only one parent is informative
yet all offspring have known disease genotypes. Through standard analytical
methods (Ott 1991), expected LOD scores
may be computed which furnish the following results: For equal offspring numbers, mating type 2 provides more linkage
information than mating type 1, although
the difference is not dramatic. For example, if disease and marker loci are 1 cM
apart, it takes 25% more offspring for mating type 1 to be as informative as mating
type 2.
One of the motivations for doing genetic
mapping in animals assumes that it will be
possible to detect synteny between portions of genomes in different organisms.
Several authors have contributed methodology to this area of research ( Hannenhalli et al. 1995; Nadkarni 1997, 1998), but
a discussion of these methods is beyond
the scope of this article.
Complex traits are diseases without a
known Mendelian mode of inheritance;
they are generally taken to be under the
control of multiple underlying (and perhaps interacting) genes. In addition, phenotype definition may be unclear in the
sense that the genetically relevant diagnosis is unknown. Gene mapping for most
of these traits is carried out under the assumption of a single gene located at the
genomic point being analyzed. Both the
LINKAGE and MENDEL programs come in
versions that allow for the presence of two
disease loci, but they are rarely used in
practice as they make heavy demands on
computing resources. In humans, only a
few nonparametric two-locus analyses
have been carried out, for example, for different diabetes loci (Cordell et al. 1995).
With parametric linkage methods, analysis of complex traits almost surely is car-
ried out under “wrong” assumptions (e.g.,
single gene, while in reality there must be
multiple genes). As a result, the recombination fraction tends to be overestimated.
While this is not a problem in two-point
analysis, in multipoint analysis it tends to
estimate disease loci outside the marker
map even when disease genes reside inside the map (e.g., Risch and Giuffra 1992).
Thus, two-point analysis is the method of
choice until clear evidence for linkage has
been obtained.
References
Bishop DT, 1985. The information content of phaseknown matings for ordering genetic loci. Genet Epidemiol 2:349–361.
Cordell HJ, Todd JA, Bennett ST, Kawaguchi Y, and Farrall M, 1995. Two-locus maximum LOD score analysis
of a multifactorial trait: joint consideration of IDDM2
and IDDM4 with IDDM1 in type I diabetes. Am J Hum
Genet 57:920–934.
Elston RC and Stewart J, 1971. A general model for the
genetic analysis of pedigree data. Hum Hered 21:523–
542.
Hannenhalli S, Chappey C, Koonin EV, and Pevzner PA,
1995. Genome sequence comparison and scenarios for
gene rearrangements: a test case. Genomics 30:299–
311.
Hartl DL, 1988. A primer of population genetics, 2nd
edition. Sunderland, MA: Sinauer.
Idury RM and Elston RC, 1997. A faster and more general hidden Markov model algorithm for multipoint
likelihood calculations. Hum Hered 47:197–202.
Kong A and Cox NJ, 1997. Allele-sharing models: LOD
scores and accurate linkage tests. Am J Hum Genet 61:
1179–1188.
Lander ES and Green P, 1987. Construction of multilocus genetic maps in humans. Proc Natl Acad Sci USA
84:2363–2367.
Nadkarni PM, 1997. Mapdiff: determining differences
between two genomic maps. Comput Appl Biosci 13:
217–225.
Nadkarni PM, 1998. Mapmerge: merge genomic maps.
Bioinformatics 14:310–316.
Ott J, 1991. Analysis of human genetic linkage. Baltimore: Johns Hopkins University Press.
Ott J and Lathrop GM, 1987. Goodness of fit tests for
locus order in three-point mapping. Genet Epidemiol 4:
51–57.
Risch N and Giuffra L, 1992. Model misspecification and
multipoint linkage analysis. Hum Hered 42:77–92.
Terwilliger JD, Ding Y, and Ott J, 1992. On the relative
importance of marker heterozygosity and intermarker
distance in gene mapping. Genomics 13:951–956.
Terwilliger JD and Ott J, 1994. Handbook of human genetic linkage. Baltimore: Johns Hopkins University
Press.
Werner P, Raducha MG, Prociuk U, Henthorn PS, and
Patterson DF, 1997. Physical and linkage mapping of
human chromosome 17 loci to dog chromosomes 9 and
5. Genomics 42:74–82.
Corresponding Editor: Gregory M. Acland