Download Mendelian Genetics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Koinophilia wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Gene therapy wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Behavioural genetics wikipedia , lookup

Oncogenomics wikipedia , lookup

X-inactivation wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Medical genetics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene expression profiling wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome evolution wikipedia , lookup

History of genetic engineering wikipedia , lookup

Frameshift mutation wikipedia , lookup

Gene expression programming wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Human genetic variation wikipedia , lookup

Genome-wide association study wikipedia , lookup

Mutation wikipedia , lookup

Genetic drift wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Gene wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Tag SNP wikipedia , lookup

Epistasis wikipedia , lookup

Point mutation wikipedia , lookup

Public health genomics wikipedia , lookup

Population genetics wikipedia , lookup

Designer baby wikipedia , lookup

Genome (book) wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
MEDG520
Block 4
Mendelian Genetics
Concepts
 Linkage Analysis
 LOD scores
o A simple example of how to calculate a LOD score
o LOD Score Pitfalls
 CentiMorgan (cM)
 Genetic Association
 Linkage disequilibrium
 Compare and contrast linkage and association
 Transmission Disequilibrium Test (TDT)
 Dominant and recessive inheritance
o Autosomal Dominant Inheritance
o Autosomal Recessive Inheritance
o X-linked Inheritance
o Mitochondrial Inheritance
 Penetrance and expressivity
 Genetic Heterogeneity
 Haplotype
o Haplotype block
o Ancestral haplotype blocks
o Phase
o Informative/Uninformative
o Haplotype Mapping
 SNPs, synonymous vs non-synonymous
 Advantages / disadvantages of SNPs vs. other types of variants
 RFLP
 Microsatellites
 VNTR
 Founder effect, founder population
 Genetic Drift
 Genotype/Phenotype
 Pharmacogenetics and pharmacogenomics
 Simple vs complex traits
 Simple genetic disorders/disease
 Complex genetic disorders/disease
 Polygenic
 Complementation
o Intergenic vs intragenic complementation
o Complementation group
o Complications in Complementation analysis
 Homozygosity mapping
 Hardy-Weinberg equilibrium
 Evolutionary conservation
 Useful References
Linkage Analysis
 The only method that allows genetic mapping of genes (including disease genes) that are
detectable only as phenotypic traits. (No biochemical or molecular basis known for the gene of
interest).
 Methodology: Studying the segregation of the disease in large families with polymorphic markers
from each chromosome. One or more markers will eventually be identified which co segregates
with the disease more often than would be expected by chance (the two loci are linked).
 Genetic Markers: Characteristic loci located at the same place on a pair of homologous
chromosomes that allows us to distinguish one homologue from the other. They are usually DNA
sequence polymorphisms that can be detected by PCR.
 Synteny: Genes that reside on the same chromosome are said to be syntenic (whether linked or
unlinked)
 Recombination fraction (θ): A measure of the distance separating two loci or an indication of the
likelihood that a crossover will occur between them.
 If two loci are located very close to each other on a chromosome, with no recombination ever
occurring, and are transmitted together all the time, they are in complete linkage, i.e. θ ~ 0
 When equal numbers of recombinant and nonrecombinant genotypes are seen, the two loci are
said to be unlinked and θ = 0.5
 Linkage analysis is mostly used for mapping genes for classical mendelian traits.
LOD scores
 A method of measuring linkage
 To determine if two loci are linked (We are asking whether the recombination fraction (θ) between
them differs significantly from 0.5 that is expected for unlinked loci?).
 To do that, we use a statistical tool called the likelihood odds ratio: We would examine a set of
actual family data and count the number of children who show or do not show recombination
between the loci and calculate the likelihood of observing the data at different values of θ (0-0.5).
 The lod score is the log to the base of 10 of this ratio.
 A lod score of +3 or greater is confirmation of linkage. A lod score of –2 excludes linkage.
 The use of logarithms allows us to add results from different families together to get a stronger lod
score.
 A simple example of how to calculate a LOD score is shown in Figure 1.
Figure 1: A simple example of how to calculate a LOD score to measure linkage.
Lod Score Pitfalls
Lod score testing really only works well for traits with Mendelian inheritance. More complex
models are necessary for more complex traits as in cases of:
1) Genetic heterogeneity: the phenotype is affected by many loci, or different loci in different
families. Polygenic models can be tried to see if they give a better fit.
2) Incomplete penetrance: individuals carrying a gene may not show a phenotype. The analysis
can still proceed under the assumption that affected individuals carry the gene, but that unaffected
individuals are less informative.
3) Strategies for enrichment depend on the genetic causes, and different causes may suggest
opposite strategies.
For example:
a) In cases of low penetrance then families with high incidence give clearest results.
b) If mutations have to be present at all of several loci to produce the trait, then clearest
results will come from families with low incidence (because high incidence implies more
than one copy of the alleles is in the genealogy).
CentiMorgan (cM)
 Unit used to measure genetic distance between two loci
 Definition: Genetic length over which one observes recombination 1 % of the time
 Example: A recombination fraction (θ) of 20% in between two loci translates into an estimated
genetic distance of 20 cM between the two loci. This estimate is however only valid if the
number of offspring is sufficient to give us a confident and significant ratio of 80:20
(nonrecombinants to recombinants) compared to the 50:50 expected ratio for loci that assort
randomly.
Genetic Association
 Unlike linkage analysis, association is a method used mostly for complex traits to identify loci of
moderate or low effect.
 In this method, we choose a specific known locus and test which alleles are statistically associated
with the disease phenotype.
 It requires a much smaller number of samples than linkage studies to detect a contributing effect of
a gene.
 Association studies are a form of case control studies, in which the frequency of a particular allele
at a locus is compared among affected and unaffected individuals in the population.
 The strength of an associations study is measured by odds ratio.
 Example:
With allele
Without allele
a
b
Patients
c
d
Controls

The Odds ratio is calculated from the frequency of a specific allele in patients and controls
OR = a/c : b/d = ad/bc
Its main disadvantage is population stratification. False positive results may also arise by chance
when screening a wide number of candidate loci: In other words, an increased odds ratio for an
allele in a specific locus may not prove that this allele is associated with pathogenesis of the
disease.
Linkage disequilibrium
 The association of two linked alleles more frequently than would be expected by chance.
 The demonstration of Linkage disequilibrium in a particular disease suggests that the mutation
which has caused the disease occurred relatively recently and that the marker locus being studied
is very closely linked to the disease locus.
 It is counter-intuitive, but linkage does not require linkage disequilibrium only association
does. LD refers to specific alleles (which is what association is interested in). Linkage only
requires that a particular locus be linked to the disease. It could be defined by different marker
alleles, linked to the disease in different families.
Is Linkage disequilibrium the only possible explanation for an association?
 Linkage disequilibrium is not the only possible reason for an association between a disease D and
allele A. Possible causes include the following:
o Direct causation - having allele A makes you susceptible to disease D. Possession of A is
neither necessary nor sufficient for somebody to develop D, but it increases the likelihood.
In this case one would expect to see the same allele A associated with the disease in any
population studied (unless the causes of the disease vary from one population to another).
o Natural selection - people who have disease D might be more likely to survive and have
children if they also have allele A.
o Population stratification - the population contains several genetically distinct subsets. Both
the disease and allele A happen to be particularly frequent in one subset. Lander and
Schork (1994) give the example of the association in the San Francisco Bay area between
HLA-A1 and ability to eat with chopsticks. HLA-A1 is more frequent among Chinese than
among Caucasians.
o Statistical artefact - association studies often test a range of loci, each with several alleles,
for association with a disease. The raw p values need correcting for the number of
questions asked (Section 12.5.1). In the past, researchers often applied inadequate
corrections, and associations were reported that could not be replicated in subsequent
studies.
o Linkage disequilibrium - close linkage can produce allelic association at the population
level, provided that most disease-bearing chromosomes in the population are descended
from one or a few ancestral chromosomes. If linkage disequilibrium is the cause of the
association, there should be a gene near to the A locus that has mutations in people with
disease D. The particular allele at the A locus that is associated with disease D may be
different in different populations
Compare and contrast linkage and association
 In principle, linkage and association are totally different phenomena. Association is simply a
statistical statement about the co-occurrence of alleles or phenotypes. Allele A is associated with
disease D if people who have D also have A more (or maybe less) often than would be predicted
from the individual frequencies of D and A in the population. For example, HLA-DR4 is found in
36% of the general UK population but 78% of people with rheumatoid arthritis. An association
can have many possible causes, not all genetic (see below). Linkage, on the other hand, is a
specific genetic relationship between loci (not alleles or phenotypes). Linkage does not of itself
produce any association in the general population. The STR45 locus is linked to the dystrophin
locus. Within a family where a dystrophin mutation is segregating, we would expect affected
people to have the same allele of STR45, but over the whole population the distribution of STR45
alleles is just the same in people with and without muscular dystrophy. Thus linkage creates
associations within families, but not among unrelated people. However, if two supposedly
unrelated people with disease D have actually inherited it from a distant common ancestor, they
may well also tend to share particular ancestral alleles at loci closely linked to D. Where the
family and the population merge, linkage and association merge
Linkage
 Deals with specific loci.
 Makes use of LOD score.
 A measure of how alleles segregate within a families.
 Linkage requires knowledge of phase. Need to be able to distinguish recombinants from nonrecombinants in order to calculate a LOD score.
 Linkage makes use of well characterized pedigrees to identify haplotypes that are inherited intact
over several generations.
 Linkage analysis, combined with positional cloning, is a very powerful method for the detection of
loci responsible for simple Mendelian phenotypes. Historically it has had a very low falsepositive rate when a stringent LOD-score of 3.0 is used (Risch, 2000). However, linkage tends to
identify very large regions encompassing hundreds or even thousands of genes. It is less effective
for the detection of genes with more subtle effects such as those responsible for most complex
common diseases. So far, all genes identified by linkage and positional cloning, even those for
complex diseases, display mendelian or near-mendelian inheritance (reviewed in Risch, 2000). In
the case of complex diseases, linkage analysis tends to identify rare mutations or polymorphisms
unique to only a small subset of the diseased population (ie. Rare simple causes for otherwise
complex diseases).
Association
 Deals with specific alleles.
 A measure of how alleles and disease travel together.
 Depends on linkage disequilibrium.
 Association relies on retention of adjacent DNA variants over many generations (in historic
ancestries) and does not require specific knowledge of pedigrees.
 Association studies are generally much more powerful than linkage analyses when it comes to
predicting genetic components of complex diseases. Disease associated regions will be much
smaller than in linkage analysis, often encompassing only one gene or gene fragment. However,
there have been difficulties with reproducing these association studies. This could be the result of
poor study designs, incorrect assumptions about the underlying genetics of the population, or
overinterpretation of the data. To be effective, association studies often require very large sample
sizes, of as much as several thousand patient samples (Cardon and Bell, 2001). Many mutations
or polymorphisms in a gene can create the same phenotype. Each will have its own ancestral
haplotypes and thus, reduce the power to detect associations between the phenotype and a specific
allele.
Transmission Disequilibrium Test (TDT)
 TDT starts with couples who have one or more affected offspring. It is irrelevant whether either
parent is affected or not. To test whether marker allele M1 is associated with the disease, we select
those parents who are heterozygous for M1. The test simply compares the number of such parents
who transmit M1 to their affected offspring with the number who transmit their other allele.
 The result is unaffected by population stratification. The TDT can be used when only one parent is
available, but this may bias the result (Schaid, 1998).

There has been some argument about whether the TDT is a test of linkage or association. Since it
asks questions about alleles and not loci, it is fundamentally a test of association. The associated
allele may itself be a susceptibility factor, or it may be in linkage disequilibrium with a
susceptibility allele at a nearby locus. The TDT cannot detect linkage if there is no disequilibrium
- a point to remember when considering schemes to use the TDT for whole-genome scans.
How is a TDT performed?
 Affected probands are ascertained.
 The probands and their parents are typed for the marker.
 Those parents who are heterozygous for marker allele M1 are selected. They may or may not
themselves be affected.
 Let a be the number of times a heterozygous parent transmits M1 to the affected offspring, and b
be the number of times the other allele is transmitted. The TDT test statistic is (a-b)2/(a+b). This
has a Χ2 distribution with 1 degree of freedom, provided the numbers are reasonably large.
 Other alleles at the M locus can be tested using the same set of families. If n marker alleles are
tested, each individual p value must be corrected by multiplying by (n-1).
Explain the symbols used for pedigrees
Figure 3.1. Main symbols used in pedigrees. Generations are usually labeled in Roman numerals,
and individuals within each generation in Arabic numerals; III-7 or III7 is the seventh person from
the left (unless explicitly numbered otherwise) in generation III. An arrow can be used to indicate
the proband or propositus (female: proposita) through whom the family was ascertained.
Dominant and recessive inheritance
Autosomal Dominant Inheritance
•
•
•
All affected individuals should have an affected parent
Both sexes should be equally affected
Roughly 50% of the offspring of an affected individual should also be affected
Autosomal Recessive Inheritance
•
•
•
Usually there is no previous family history
The most likely place to find a second affected child is a sibling of the first
Inbreeding increases the chance of observing an autosomal recessive condition
X-linked Inheritance
•
•
•
Usually only males affected
No cases of male to male transmission
All the affected males can be linked through unaffected carrier females
Mitochondrial Inheritance
More examples of inheritance patterns
Figure 3.2. Basic mendelian pedigree patterns. (A) Autosomal dominant; (B) autosomal recessive;
(C) X-linked recessive; (D) X-linked dominant; (E) Y-linked. The risk for the individuals marked
with a query are (A) 1 in 2, (B) 1 in 4, (C) 1 in 2 males or 1 in 4 of all offspring, (D) negligibly low
for males, 100% for females. See Section 3.2 and Figure 3.5 for complications to these basic
patterns.
Penetrance and Expressivity
Terms that define aspects of the variation in the expression of genes.
Penetrance
 The probability of gene having any phenotypic expression at all.
 The likelihood, or probability, that a condition or disease phenotype will, in fact, appear when a
given genotype known to produce the phenotype is present. In other words, a genotype may give
rise to a particular phenotype only in a proportion of individuals: penetrance.
 If the frequency of expression of a phenotype is less than 100%, then some of those who have the
gene may completely fail to express it. The gene is said to show reduced penetrance.
 80% penetrance: 80% of heterozygotes express the condition.
 It can be described in a in a statistical manner. The probability that a particular phenotype P will
be observed in a individual of genotype G, Pr(P|G), is the penetrance.
 If every person carrying a gene for a dominantly inherited disorder has the mutant phenotype, then
the gene is said to have 100% penetrance. Similarly, if only 30% of those carrying the mutant
exhibit the mutant phenotype, the penetrance is 30%.
Expressivity
 The range of variation that is seen in a phenotype; it refers to the degree of expression of a given
trait or combination of traits that is associated with a gene. Conditions may have severe or mild
symptoms; they may have symptoms that show up in one organ or combination of organs in one
person but not in the same locations in other persons.
Estimation of Penetrance:
 Use family data such as Genetic relationships of all individuals within each family.
 Phenotypic data such as disease status, age at first diagnosis or age last known to be unaffected.
 Mutation status of all tested individuals.
 The penetrance is then estimated using a logistic regression model, usually designed to allow the
construction of correlations within families and to include both genetic and non-genetic factors.
Genetic Heterogeneity
A single disorder, trait, or pattern of traits caused by genetic factors in some cases and non-genetic factors
in others. For example, in many complex diseases, the effect of genetic risk factors can depend on the
presence of specific environmental factors. In clinical settings genetic heterogeneity refers to the presence
of a variety of genetic defects which cause the same disease, often due to mutations at different loci on the
same gene, a finding common to many human diseases including Alzheimer's disease, cystic fibrosis,
lipoprotein lipase and polycystic kidney disease.
Haplotype
 A haplotype is the particular combination of alleles (usually identified by SNPs) on one
chromosome or a part of a chromosome.
 Haplotypes can be exploited for the fine mapping of disease genes. The principle of haplotype
mapping is shown in the figure. A new mutation responsible for a genetic disease always enters
the population within an existing haplotype, which is termed the ancestral haplotype. Over several
generations, recombination events may occur within the haplotype but the disease allele and the
closest SNPs still tend to be inherited as a group. If this haplotype can be identified in a group of
patients with the disease, typing the alleles within the haplotype allows a conserved region to be
identified, which pinpoints the mutation responsible for the disease. Due to the abundance of
SNPs, this technique has the potential to map genes very accurately. There is therefore much
interest in developing a haplotype map of the entire human genome.
Haplotype block
 Some SNPs may be in linkage disequilibrium and are inherited in blocks. A haplotype block is
thus a discrete chromosome region of high linkage disequilibrium and low haplotype diversity. It
is expected that all pairs of polymorphisms within a block will be in strong linkage disequilibrium,
whereas other pairs will show much weaker association.
 Blocks are hypothesized to be regions of low recombination flanked by recombination hotspots.
 Blocks may contain a large number of SNPs, but a few SNPs are enough to uniquely identify the
haplotypes in a block. The HapMap is a map of these haplotype blocks and the specific SNPs that
identify the haplotypes are called tag SNPs.
Ancestral haplotype blocks
 An ancestral haplotype block is passed from generation to generation just like familial haplotype
blocks but is even found at higher than expected frequencies in the population at large between
people not closely related i.e. all arising from some distant ancestor.
 A haplotype block is a discrete chromosome region of high linkage disequilibrium and low
haplotype diversity. It is expected that all pairs of polymorphisms within a block will be in strong
linkage disequilibrium, whereas other pairs will show much weaker association. Blocks are
hypothesized to be regions of low recombination flanked by recombination hotspots.
Phase
 Linked alleles on the same chromosome are said to be in coupling
 Alleles on different homologues are said to be in repulsion
 The alleles in coupling at a set of closely linked markers constitute what is known as the haplotype
for those loci
Informative/Uninformative
Informative markers:
 highly polymorphic (> 75% heterogeneity)
Uninformative markers:
 low heterogeneity so that most people have the same genotype
Informative pedigrees:
 allow phase to be determined
 heterozygous parents with different genotypes (ie father – 1,2; mother 3,4) or heterozygous for
same alleles but offspring is homozygous (ie father – 1,2; mother 1,2; offspring 1,1)
Uninformative pedigrees:
 phase can not be determined
 parents homozygous (ie father – 1,1; mother - 1,1 or father – 1,1; mother – 2,2)
 parents heterozygous and offspring heterozygous for same alleles (ie father – 1,2; mother – 1,2;
offspring – 1,2)
A meioses is informative if we can identify if the gamete is recombinant
Assume father has dominant condition, inherited along with A1
A)
B)
C)
D)
Uninformative: Homozygous marker alleles in father undistinguishable
Uninformative: child could have inherited A1 or A2 from either parent
Informative: child inherited A1 from the father
Informative: child inherited A1 from the father
Haplotype Mapping
Fig. Principle of haplotype mapping. A new mutation (X) arises in the proximity of six single nucleotide
polymorphisms, with the ancestral haplotype signature TATCAT. Over several generations, the haplotype
signature may be eroded by recombination. For example, contemporary haplotype 1 was produced by
recombination between the first and second SNPs; the new alleles are shown in pink. However, the
smallest conserved haplotype signature in all patients carrying the disease allele places the disease
between SNPs 3 and 4. This technique provides a candidate region of about 10,000 bp, which is smaller
than most human genes.
Useful review:
Cardon, LR and Abecasis GR. 2003. Using haplotype blocks to map human complex trait loci.
19(3):135-140
SNP, synonymous, non-synonymous
A SNP or single nucleotide polymorphism is a change in which a single base in the DNA differs from the
usual base at that position.
 The SNP consortium (http://snp.cshl.org/) has discovered and characterized nearly 1.8 million SNPs
to date.
 Some SNPs such as that which causes sickle cell are responsible for disease. Other SNPs are normal
variations in the genome.
 SNPs account for 90% of all DNA polymorphisms.
 SNPs can result from either the transition or transversion of nucleotide bases
o Transitions are base changes to the same type of base – that is, a change between A and G
(purines) or between C and T (pyrimidines)
o Transversions are base changes to a different type of base – that is, a change from a purine
to a pyrimidine
 SNPs are classified according to their effect on the resulting protein:
 Nucleotide substitutions occurring in protein-coding regions are classified as synonymous and nonsynonymous according to their effect on the resulting protein
 A substitution is synonymous if it causes no amino acid change
o This can occur because the genetic code is degenerate, so more than one triplet sequence
can code for the same amino acid
 A non-synonymous substitution results in alteration in the encoded amino acid
 The nonsynonymous mutations can be further classified into missense and nonsense mutations
o A missense mutation results in amino acid changes due to the change of codon
o A nonsense mutation results in a termination codon.
 In general, the higher the frequency of a SNP allele, the older the mutation that produced it, so highfrequency SNPs largely predate human population diversification
Synonymous Substitution = A mutation that replaces one codon with another without changing the
amino acid that is specified. This can occur because the genetic code is degenerate, so more than one
codon can code for the same amino acid.
Conservative Substitution = A mutation that causes one codon to be replaced by another that specifies a
different amino acid, but one that has similar chemical properties to the original amino acid.
Non-conservative Substitution = Replacement of one codon by another, which specifies an amino acid
with different chemical properties.
Synonymous
vs.
Non-Synonymous
-high frequency of occurrence
-no selection pressure
-mostly due to 3rd base wobble
Non-Sense Mutations
Missense Mutations
-results in a stop codon
-AA different AA
-leads to decreased gene function
-increased selection pressure
-decreased frequency of occurrence
Conservative
Non-Conservative
-AA AA that is
-AA AA (dissimilar
chemically similar
side chain)
-does not usually
-different charge
alter gene function
-different polar side chains
Web source:
http://www-hto.usc.edu/~cbmp/2001/SNP/index.html
Advantages / disadvantages of SNPs vs. other types of variants
Microsatellites
vs.
SNPs
Highly polymorphic
not polymorphic – only 4 possible outcomes
Less common
more common
Map disease, haplotype map
haplotype map
Restriction Fragment Length Polymorphism (RFLP)
Polymorphism refers to the DNA sequence variation between individuals of a species. If the sequence
variation occurs at the restriction sites, it could result in RFLP. The most well known example is the
RFLP due to  globin gene mutation.
Restriction Fragment Length Polymorphism (RFLP) resulting from b-globin gene mutation. In the
normal cell, the sequence corresponding to 5th to 7th amino acids of the b-globin peptide is
CCTGAGGAG, which can be recognized by the restriction enzyme MstII. In the sickle cell, one base is
mutated from A to T, making the site unrecognizable by MstII. Thus, MstII will generate 0.2 kb and 1.2
kb fragments in the normal cell, but generate 1.4 kb fragment in the sickle cell. These different fragments
can be detected by southern blotting.
Microsatellites
 A microsatellite is a short sequence of DNA, usually 1 to 4 basepairs that is repeated in a row along
the DNA molecule. Many repeats tend to be concatenated at the same locus.
 There are hundreds of places in human DNA that contain microsatellites. The number of repeats at a
particular locus is highly polymorphic between individuals of the same species. The hypervariability
arises because the repeated simple sequences cause a high frequency of loss or insertion of
additional repeats by confusing the DNA replication machinery.
 This hypervariability allows microsatellite sequences to be used for genetic fingerprinting and
paternity testing. Most loci of the genome, even non-coding parts, would be too similar to allow
individuals to be reliably distinguished.
 Microsatellites may also be used as genetic markers to track inheritance in families and investigate
genetic associations with disease.
VNTR
VNTR is an acronym for "variable number of tandem repeats". These are short identical segments of
DNA aligned head to tail in a repeating fashion and are highly variable between individuals.
In any particular chromosome the repeat number may vary from one to thirty repeats. Since these repeat
regions are usually bounded by specific restriction enzyme sites, it is possible to cut out the segment of
the chromosome containing this variable number of tandem repeats or VNTR's, run the total DNA on
a gel, and identify the VNTR's by hybridization with a probe specific for the DNA sequence of the repeat.
Shown to the right at the top are the chromosomes of the two parental individuals of the pedigree
below. The first individual has one chromosome with 4 repeated sequences and one chromosome with 6
repeated sequences. The other individual has one chromosome with 3 repeated sequences and one
chromosome with 5 repeated sequences.
At the bottom of the figure is a pedigree of the mating between these two individuals and their four
children. The DNA of each of the individuals has been analyzed for the VNTR repeat number and the gels
are show below each individual along with the genotype for each individual. Notice that each of the six
people is distinguishable from each other by the VNTR's at this one genetic locus. If several VNTR loci
were used, the uniqueness of each individual would become even more distinct.
Founder effect, founder population
 Founder Effect: is a form of genetic drift. One of the original founders of a new group just
happens to carry a relatively rare allele. This allele will have a much higher frequency than it had
in the larger group from which the new group was derived.
 Outcome of a founder effect: Each population may be characterized by its own particular
molecular mutations as well as by an increase or decrease in specific diseases.
Genetic Drift
 The fluctuation in allele frequency due to chance operating on the small gene pool contained
within a small population. It causes high frequencies for deletrious disease alleles in a population
 fluctuation in allele frequency due to chance affecting the gene pool
 A type of genetic drift is the founder effect. This occurs when a small group breaks off from a
larger population to found a new colony. This creates a skewed gene pool due to the lack of
random mating and due to the small population size. Alleles that might have been rare become
more frequent. For example, the population of Newfoundland has a much higher incidence of
certain genetic disorders – the population was isolated and was a smaller representative of the
larger at one time.
Genotype/Phenotype
Genotype is defined as the combination of alleles or genes the affect a particular trait. Phenotype: is
defined as the physical and physiological traits of an individual resulting from genotype and environment.
The phenotype is the distinctive expression of a genotype in a given environment.
An organism's genotype is the largest influencing factor in the development of its phenotype, but it is not
the only one. Even two organisms with identical genotypes normally differ in their phenotypes. This is
clearly illustrated by phenotypic discordance in monozygous twins. Monozygous twins share the same
genotype, since their genomes are identical; but they never have the same phenotype, although their
phenotypes may be very similar.
The concept of phenotypic plasticity describes the degree to which an organism's phenotype is
determined by its genotype. A high level of plasticity means that environmental factors have a strong
influence on the particular phenotype that develops. Consequently, a gene that predisposes a person to a
certain form of cancer might not cause the disease in an individual depending on their lifestyle or other
environmental factors. If there is little plasticity, the phenotype of an organism can be reliably predicted
from knowledge of the genotype, regardless of environmental peculiarities during development.
In contrast to phenotypic plasticity, the concept of genetic canalization addresses the extent to which an
organism's phenotype allows conclusions about its genotype. A phenotype is said to be canalized if
mutations do not noticeably affect the physical properties of the organism. This means that a canalized
phenotype may form from a large variety of different genotypes, in which case it is not possible to exactly
predict the genotype from knowledge of the phenotype. If canalization is not present, small changes in the
genome have an immediate effect on the phenotype that develops.
Pharmacogenetics and pharmacogenomics
These are both terms used to describe the study of how variation in human genes leads to variation in
response to drugs. The difference between the two is one of scale:
Pharmacogenetics is used in reference to one or a few genes.
Pharmacogenomics refers to large scale genomic approaches and is a genome wide phenomenon or a
substantial number of genes are involved.
So essentially, both terms describe genetic determinants of drug disposition and response and are often
used interchangeably.
There are lots of examples that show that the way different people respond to drugs can be attributed, at
least in part, to differences in genes encoding
 Drug metabolizing enzymes: ~20-30 enzymes can interact with nearly every chemical to which
the body is exposed
 Drug transporters
 Drug targets such as receptors and enzymes.
These are, however, in addition to non-genetic factors such as age, race, sex, renal and liver function, drug
interaction, the severity of the disease and other lifestyle variables like smoking and alcohol consumption.
There are also applications in monitoring drug delivery because the concentration of circulating free drug
is dependent on absorption, distribution, metabolism and elimination – these are pharmacokinetic
processes and vary between individuals.
Example:
Mercaptopurine causes severe myelosuppression in some patients (~1 in 300).
This is because some people have very low levels of thiopurine methyltransferase, the enzyme that
metabolizes mercaptopurine.
Since this correlation was discovered, there is now a clinical test that measures an individual’s level of
enzyme before the drug is administered.
Simple vs complex traits
Simple genetic disorders are so called because a single gene underlies them. Thus, a mutation in this
gene is sufficient to cause manifestation of the disease. These disorders generally follow a Mendelian
pattern of inheritance.
Examples include Huntington’s, cystic fibrosis and sickle cell anaemia.
Complex genetic disorders
Some disorders are determined by changes in more than one gene. These disorders, known as
complex disorders, do not follow the same predicted pattern of inheritance seen in autosomal or X-linked
dominant and recessive disorders. Sometimes changes in these genes must be in combination with certain
environmental factors, such as exposure to certain chemicals or medications or maybe even diet. This type
of inheritance is often referred to as multifactorial because many different factors, genetic and/or
environmental, are involved. A person will have a complex disorder if he or she has the right combination
of changed genes and environmental exposures. Sometimes these disorders are caused by changes in one
or more genes that make a person susceptible to developing the disorder after exposure to specific
environmental factors. The close relatives of someone with a complex disorder have a higher chance of
later developing the disorder than the close relatives of someone who does not have the disorder.
Diabetes, heart disease, neural tube defects, autism, Alzheimer disease, and many cancer syndromes are
examples of disorders that can be caused by multifactorial, or complex, inheritance.
Polygenic
A term used to describe diseases that are caused by changes in more than one gene. Most common
human diseases are considered polygenic (e.g., asthma, diabetes, obesity, osteoporosis, etc.).
The next scientific frontier, will be those polygenic disorders involving a combination of gene
polymorphisms, each of which contributes in some small way to pathology. Examples of such conditions
include a variety of mental and behavioral problems (alcoholism, schizophrenia, depression), as well as
physiological disorders that involve complex interactions between genetics and environment
(atherosclerosis, hypertension). These problems will require the ability to look at patterns of gene
expression at multiple stages of disease/ disorder progression and under a variety of physiological and
environmental conditions. Array technologies may provide just such capabilities
Complementation
Figure 3.3. Complementation: parents with autosomal recessive profound hearing loss often have
children with normal hearing. II6 and II7 are offspring of unaffected but consangineous parents,
and each has affected sibs, making it likely that each has autosomal recessive hearing loss. All their
children are unaffected, showing that II6 and II7 have nonallelic mutations.
A complementation analysis asks if two putative alleles1, when in the same cell2 and acting
independently3, can supply all functions necessary4 for a wild-type phenotype5. Complementation is
therefore a test of function. The superscripts in this definition are explained below:
1. The "two putative alleles" refers to two versions of the same region of the chromosome, each of
which separately confer a mutant phenotype. They are termed "putative" alleles since it is their
very "allelism" which will be determined in this test (they are allelic if they are in the same
complementation group). Each allele ought to be present in a single copy number in the cell and it
is crucial that the entire relevant region be present in diploid in the cell. Such a partially diploid
cell is termed merodiploid. An inverse genetics application is "cloning by complementation", but
this will have many of the same concerns as standard complementation with the added concern of
copy effects if multi-copy plasmids are used.
2. The two alleles can either be present on the chromosome or on extra-chromosomal elements. If
either version is in more than one copy, there can be both regulatory complications (e.g. titration
of a regulatory factor) and difficulties in interpretation (e.g. you do not know if a positive result is
due to inappropriate quantities of the product encoded by the multi-copy gene).
3. Care must be taken that the mutations cannot recombine to form a wild-type genotype so that
typically Rec- strains are used.
4. Only functions absolutely necessary for the desired phenotype, under the conditions used, are
"demanded" by a complementation test. Mutations affecting genes whose products are not
essential for the desired phenotype will not be tested for in complementation analysis.
5. The ""wild-type" phenotype" demanded by this analysis should be more rigorously called an
"apparently wild-type phenotype under the conditions used". The phenotype is typically scored as
"growth" or "no growth", but biochemical assays of the encoded gene product can be performed
for more precise quantitation.
Figure 29 gives an idea of the results one could expect from straightforward complementation tests. In
these examples when the two mutations in the separate mutant alleles affect the same gene, then neither is
capable of generating a wild-type product of that gene and the resultant merodiploid strain is mutant in
phenotype. On the other hand, if the two mutations affect different genes, so that each copy of the region
is able to generate some of the gene products required (and between them all necessary gene products are
synthesized) then the resulting strain is phenotypically wild-type. One problem with this set of examples
is that no one(in doing bacterial genetics) routinely puts the two alleles in the "cis"-configuration as a
control for complementation (you do build such strains for other purposes, however). It is too hard (for
reasons we will cover when we get to "mapping") and it provides very little information, since the
presence of the wild-type allele on the other copy will nearly always be dominant. It is, however, often
appropriate to consider effects of a mutation on genes in cis, but this is not the same as generating "double
mutants" affected in the same small region.
There are three sorts of controls useful in analyzing the results of complementation experiments (see Fig.
30): (a) If either copy of the merodiploid contains a wild-type region, the phenotype of the resulting strain
should be a wild-type phenotype, and the wild type is said to be dominant to the mutant. If it is not, the
mutant allele is said to be trans-dominant to the wild-type (see section VII B). In either case the
merodiploid has the phenotype of whichever allele is dominant. (b) A merodiploid strain constructed with
the same mutant allele in each copy should display the mutant phenotype. If it does not, it suggests that
mere diploidy for the region of interest can confer a wild-type phenotype. One way of this occurring
would be if the mutation conferred a leaky phenotype so that a double dose might yield a pseudo wildtype response. (c) The result should not depend on the location of the alleles; i.e. the same result should
obtain no matter which allele is on the chromosome. If this is not true, it indicates that the two locations
are not equivalent and therefore the test has marginal validity. This is a variation on the concerns noted
for multi-copy plasmids above.
Chromosomal Allele
w
1- 2- 3- 4t
1 - - - - Plasmid 2- - + + + +
Allele 3 - + - + +
4- - + - - +
wt - + + + +
Intergenic vs intragenic complementation
Complementation test
 A mating test to determine whether two different recessive mutations (a1;a2) on opposite
chromosomes (trans, a1+/+a2) of a diploid cell will not complement (ie have a mutant phenotype)
each other; but the same two recessive mutations on the same chromosome (cis, a1a2/++) in a
diploid or partial diploid show a wild-type phenotype; a test for allelism. A test to determine
whether two mutant sites are in the same functional unit or gene.



complementation requires no knowledge of affected genes or proteins, just the ability to examine
cells for the correction of mutant phenotype
represents mixing of gene products not changes in genotypes of individual chromosomes
complementation can be:
o Intergenic: mutual correction of phenotype, affected genes are different
o Intragenic: when complementation tests in heterokaryons yield positive results when
mutations in mutant cells known to affect same gene
 demonstrates that patients have different but allelic mutations
 occurs when affected proteins are homomultimers, indicating that mutant subunit
from one allele interacts with mutant subunit from other allele to improve function
of protein.
Complementation group
 Complementation group – when the groups complement, they are genetically different – either
differences in the same gene (intragenic) or different genes (intergenic)
 A group of individuals that do not complement each other.
 also know as cistron (determined by the cis-trans complementation test), this term is hardly used
anymore
 Example, the Faenconi anaemia studies have identified 8 different complementation groups. If,
when you fuse cells from two mutants together, they complement, put them into separate
complementation groups.
 Example: Xeroderma pigmentosum (XP)
o autosomal recessive disease associated with increased frequency of sunlight induced skin
cancer
o caused by mutation in any one of 8 genes, 7 of which encode for DNA repair mechanisms


o diagnosis is made based on assignment to a complementation group according to the
fusioning of xeroderma pigmentosum fibroblasts
Intergenic is more common – ie cobalamin complementation groups – have all mapped to different
genes
Intragenic probably affects different protein domains – less common
Complementation Studies
 tests whether mutant sites are responsible for the same phenotype
 cells from two different people with (a) the same inherited disorder or (b) different inherited
diseases are fused
 cell fusion is induced with polyethylene glycol
 this creates 1 cell with 1 nucleus containing 2 sets of chromosomes
 therefore, get 3 cell combinations:
o cells from individual 1 fuse
o cells from individuals 2 fuse
o 2 cell types fuse together
 with two cell types fusing together, if this rescues the phenotype, the cells complement and have
mutations at different sites
 even though cells may complement, could still affect the same step but from a different angle
 can also have a cell fused with a clone – if complements, corrects the phenotype and therefore
contains the defective gene(s)
Complications in Complementation Analysis
The above examples would seem to suggest that if two mutations complement each other, then they must
affect different genes and gene products. This would suggest that the results of complementation analysis
would be to define the number of genes in the region. In fact, what complementation analysis does is to
define the number of cistrons or complementation groups. More often than not, the number of cistrons
will be coincident with the number of genes, but there are a number of special cases where this correlation
will not hold. The complications that give rise to these special cases are discussed below and they fall into
two general classes: when the non-complementing mutations actually do map to separate
complementation groups (paragraphs 1 and 2 below), and when complementing mutations actually map to
the same complementation group (paragraph 3 below). Examples of the first class will be detected when
the appropriate controls are done, as described above. The second class will be seen as an anomaly in the
actual complementation results.
1. Cis-dominant mutations are a reasonably common type of complication in complementation
analyses. Cis-dominant mutations are those that affect the expression of genes encoded on the
same piece of DNA (as the mutation itself), typically transcriptionally downstream, regardless of
the nature of the trans copy. Such mutations exert their effect, not because of altered products they
encode, but because of a physical blockage or inhibition of RNA transcription. There are two
dissimilar examples of these sorts of mutations: (a) If a mutation in a transcriptionally upstream
gene exhibits strong polarity onto downstream genes, then that mutation has the property of
eliminating more than one gene product function. (b) Similarly, a mutation in the promoter or in
other regulatory regions outside the translated area, may well eliminate transcription of the entire
operon and thus be negative in complementation for all gene functions encoded by that operon. In
each of these cases, the mutation is eliminating the function of genes that are themselves
genotypically wild type. The mutations are said to be cis-dominant because the expression of the
genes downstream on the same piece of DNA will be turned off regardless of the genotype present
in the trans copy.
2. Negative complementation. Another complication involves the very rare phenomenon known as
negative complementation or trans-dominant mutations with mutant phenotypes. Mutations of this
type cause the resultant merodiploid strain to have a mutant phenotype even when the other copy
of the region is genotypically wild type. The phenotype of the mutant allele is thus trans-dominant
to the wild type (obviously the reason that wild type is dominant to most mutants is because it
supplies the function that they have lost by mutation). There are three general schemes that can be
envisaged for mutations causing this sort of phenotype. In each of them, it is necessary to propose
that the mutant allele generates a product that, while not wild type, nevertheless possesses some
activity that leads to the mutant phenotype. Possibilities include (a) multimeric enzymes where the
merodiploid strain would generate multimers whose subunits come from both the mutant and
wild-type genes in a random assortment. As shown in figure 31, if the protein was a tetramer, and
if any multimer containing one or more mutant subunits was completely inactive, then the
presence of the mutant chain would decrease the amount of functional wild-type gene product by
approximately 8-fold (this number ignores regulation and assumes a two-fold dosage of the
product due to a two-fold dosage of the gene). (b) The mutant gene might cause the generation of
an altered protein that interfered in some reaction with the cell and thus caused a deleterious
phenotype. In this case, the presence of a wild-type allele would restore the function missing in the
mutant but would not eliminate the deleterious phenotype caused by the mutant protein. Thus, the
mutant phenotype would be dominant to wild type. (c) It is also conceivable that the mutant copy
generates an altered protein that, while it could not carry out the wild-type function, might be
competitive with the wild-type gene product. In each case, an altered product is responsible for the
trans dominance. Remember, these are rare, special cases: in general, the wild-type allele is
dominant to the mutant since the latter typically involves loss of function which is "replaced" by
the product of the wild-type gene. Such trans-dominant mutants are very appropriate for further
biochemical analysis because the protein product has alteredfunction, rather than merely a lack of
function.
3. Intragenic complementation is yet another possible complication in complementation analysis.
This term refers to cases where two mutations that do affect the same gene, and therefore the same
gene product, are able nonetheless to give a wild-type phenotype in a complementation analysis.
There are two general cases of such a phenomenon: (a) If the product of the gene in question is a
bi-functional protein, especially when those functions are independent of one another, then the
gene itself will often show intragenic complementation. Such an example is easiest to understand
if the product is pictured as "two beads on a string". If each "bead" had an independent enzymatic
function, one could imagine that a mutation affecting either (but not both) of the two functions
might well leave the other function intact. If two such mutations were put in a merodiploid
situation, each would be able to produce one of the two required enzymatic functions, giving rise
to a wild-type phenotype. In the case of such a gene, intragenic complementation would be fairly
common such that many mutations would affect only one of the two functional regions. This
model also predicts that mutations affecting each of the two functions would cluster at either end
of the gene creating two clear complementation groups. (b) It is also possible, though less likely,
for pairs of complementing mutants to occur in cases where the gene product is a multimeric
protein. In such cases, a particular mutant allele might give rise to a protein product that can only
function when allowed to aggregate with another particular mutant allele. Such an example is
sketched below. In this case, unlike the case of bifunctional protein above, instances of intragenic
complementation will be rare, limited to specific pairs of mutants. Further, there is no a priori
reason to predict any clustering of complementing or noncomplementing mutations. Would such a
case, where two mutations out of 100 in a given gene are capable of complementation, be
sufficient to say the gene had two complementation groups? This question is largely a semantic
one, but in general, unless intragenic complementation is fairly common, the few exceptional
complementing pairs would not be said to define separate complementation groups.
4. "Unimportant" genes. Since complementation analysis treats only those functions necessary to
generate the required phenotype, it does not allow the detection of complementation groups unless
their products are required for the phenotype in question. If, for example, a region encoding such
an "unimportant" product (at least under the conditions of the selection) is transcriptionally polar
onto an "important" function, that pair of genes has the complementation properties of a single
complementation group. This reflects the fact that the only mutations detected in the
transcriptionally upstream gene would be ones polar onto the functionally important gene
downstream.
Homozygosity mapping
 use inbred affected individuals to map rare recessive diseases
 use affected inbred individuals to define the region of the genome where they are all homozygous
 inbred affected individuals are more likely to receive 2 identical alleles from related parents;
therefore unrelated affected individuals should have a region identical by descent (IBD) near the
disease locus
 with inbred families, the areas of IBD are larger but still unlikely to share the same IBD region
 from a first cousin consanguineous marriage the region of homozygosity is expected to be 28 cM
 for a second cousin consanguineous marriage, the regions is expected to be 22 cM
 3 affected offspring of a first cousin marriage can achieve a lod score of +3.0
 underestimates of allele frequencies can lead to over estimations of lod scores
 therefore, allele frequencies should be estimated in the study population
 advantageous when multiple affected sibs aren’t available***







powerful strategy for mapping rare recessive traits in children of (usually) consanguineous
matings since rare recessive traits are more prevalent in these families than in the general
population
based on principle that fraction of genome of offspring of consanguineous matings would be
identical; on average 1/16 of genome expected to be shared with first cousin matings
appearance of these alleles known as homozygosity by descent (HBD) or identical by descent
(IBD)
involves locating gene causing rare recessive trait by using multi-point linkage analysis to find
regions of IBD shared among affected individuals
uses DNA of affected children and RFLPs
useful to use related affected individuals but can also be done with unrelated affected individuals
(depending on degree of genetic heterogeneity)
regions of homozygosity expected to be random between different individuals of these matings,
except at common disease locus shared by affected offspring
Problems with Homozygosity Mapping
1) unexpected genetic heterogeneity: region containing disease locus was missed as a result of
pooling
o can be overcome with use of larger numbers of consanguineous families and statistical
methods to detect heterogeneity
2) identification of homozygous IBD region unrelated to disease locus
3) potential for inflation of LOD scores as a result of underestimating extent of inbreeding
Hardy-Weinberg equilibrium
 start with as a null hypothesis
 used in 1907 to determine why dominant traits don’t over take recessive mutations
 means that sexual reproduction does not cause a constant reduction in genetic variation in each
generation
 amount of variation remains constant generation after generation in absence of disturbing factors
 direct consequence of segregation of alleles at meioses in heterozygotes
 The simple relationship between gene frequencies and genotype frequencies that is found in a
population under certain conditions
 If we pick a person at random from the population, this is equivalent to picking two genes at
random from the gene pool. The chance the person is A1A1 is p2, the chance they are A1A2 is
2pq, and the chance they are A2A2 is q2. This simple relationship between gene frequencies and
genotype frequencies holds whenever a person's two genes are drawn independently and at
random from the gene pool. A1 and A2 may be the only alleles at the locus (in which case p + q =
1) or there may be other alleles and other genotypes (p + q < 1). For X-linked loci males, being
hemizygous (only one allele) are A1 or A2 with frequencies p and q respectively, while females
can be A1A1, A1A2 or A2A2.
Hardy-Weinberg equilibrium equation
P2 + 2pq + q2 = 1
Limitations of the Hardy-Weinberg distribution
 These simple calculations break down if the underlying assumption, that a person's two genes are
picked independently from the gene pool, is violated. In particular, there is a problem if there has
not been random mating. Assortative mating can take several forms, but the most generally
important is inbreeding. If you marry a relative you are marrying somebody whose genes resemble
your own. This increases the likelihood of your children being homozygous and decreases the
likelihood that they will be heterozygous. Rare recessive conditions are strongly associated with
parental consanguinity, and Hardy- Weinberg calculations that ignore this will overestimate the
carrier frequency in the population at large
Evolutionary conservation
 The presence of similar genes, portions of genes, or chromosome segments in different species,
reflecting both the common origin of species and an important functional property of the
conserved element.
Useful References:
Nussbaum RL et al. Thompson & Thompson’s Genetics in Medicine
Mueller RF and Young ID. Emery’s Elements of Medical Genetics
Strachan T and Read AP. Human Molecular Genetics