* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Mendelian Genetics
Koinophilia wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene therapy wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Behavioural genetics wikipedia , lookup
Oncogenomics wikipedia , lookup
X-inactivation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Medical genetics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene expression profiling wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genetic engineering wikipedia , lookup
Genome evolution wikipedia , lookup
History of genetic engineering wikipedia , lookup
Frameshift mutation wikipedia , lookup
Gene expression programming wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Human genetic variation wikipedia , lookup
Genome-wide association study wikipedia , lookup
Genetic drift wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Point mutation wikipedia , lookup
Public health genomics wikipedia , lookup
Population genetics wikipedia , lookup
Designer baby wikipedia , lookup
Genome (book) wikipedia , lookup
Quantitative trait locus wikipedia , lookup
MEDG520 Block 4 Mendelian Genetics Concepts Linkage Analysis LOD scores o A simple example of how to calculate a LOD score o LOD Score Pitfalls CentiMorgan (cM) Genetic Association Linkage disequilibrium Compare and contrast linkage and association Transmission Disequilibrium Test (TDT) Dominant and recessive inheritance o Autosomal Dominant Inheritance o Autosomal Recessive Inheritance o X-linked Inheritance o Mitochondrial Inheritance Penetrance and expressivity Genetic Heterogeneity Haplotype o Haplotype block o Ancestral haplotype blocks o Phase o Informative/Uninformative o Haplotype Mapping SNPs, synonymous vs non-synonymous Advantages / disadvantages of SNPs vs. other types of variants RFLP Microsatellites VNTR Founder effect, founder population Genetic Drift Genotype/Phenotype Pharmacogenetics and pharmacogenomics Simple vs complex traits Simple genetic disorders/disease Complex genetic disorders/disease Polygenic Complementation o Intergenic vs intragenic complementation o Complementation group o Complications in Complementation analysis Homozygosity mapping Hardy-Weinberg equilibrium Evolutionary conservation Useful References Linkage Analysis The only method that allows genetic mapping of genes (including disease genes) that are detectable only as phenotypic traits. (No biochemical or molecular basis known for the gene of interest). Methodology: Studying the segregation of the disease in large families with polymorphic markers from each chromosome. One or more markers will eventually be identified which co segregates with the disease more often than would be expected by chance (the two loci are linked). Genetic Markers: Characteristic loci located at the same place on a pair of homologous chromosomes that allows us to distinguish one homologue from the other. They are usually DNA sequence polymorphisms that can be detected by PCR. Synteny: Genes that reside on the same chromosome are said to be syntenic (whether linked or unlinked) Recombination fraction (θ): A measure of the distance separating two loci or an indication of the likelihood that a crossover will occur between them. If two loci are located very close to each other on a chromosome, with no recombination ever occurring, and are transmitted together all the time, they are in complete linkage, i.e. θ ~ 0 When equal numbers of recombinant and nonrecombinant genotypes are seen, the two loci are said to be unlinked and θ = 0.5 Linkage analysis is mostly used for mapping genes for classical mendelian traits. LOD scores A method of measuring linkage To determine if two loci are linked (We are asking whether the recombination fraction (θ) between them differs significantly from 0.5 that is expected for unlinked loci?). To do that, we use a statistical tool called the likelihood odds ratio: We would examine a set of actual family data and count the number of children who show or do not show recombination between the loci and calculate the likelihood of observing the data at different values of θ (0-0.5). The lod score is the log to the base of 10 of this ratio. A lod score of +3 or greater is confirmation of linkage. A lod score of –2 excludes linkage. The use of logarithms allows us to add results from different families together to get a stronger lod score. A simple example of how to calculate a LOD score is shown in Figure 1. Figure 1: A simple example of how to calculate a LOD score to measure linkage. Lod Score Pitfalls Lod score testing really only works well for traits with Mendelian inheritance. More complex models are necessary for more complex traits as in cases of: 1) Genetic heterogeneity: the phenotype is affected by many loci, or different loci in different families. Polygenic models can be tried to see if they give a better fit. 2) Incomplete penetrance: individuals carrying a gene may not show a phenotype. The analysis can still proceed under the assumption that affected individuals carry the gene, but that unaffected individuals are less informative. 3) Strategies for enrichment depend on the genetic causes, and different causes may suggest opposite strategies. For example: a) In cases of low penetrance then families with high incidence give clearest results. b) If mutations have to be present at all of several loci to produce the trait, then clearest results will come from families with low incidence (because high incidence implies more than one copy of the alleles is in the genealogy). CentiMorgan (cM) Unit used to measure genetic distance between two loci Definition: Genetic length over which one observes recombination 1 % of the time Example: A recombination fraction (θ) of 20% in between two loci translates into an estimated genetic distance of 20 cM between the two loci. This estimate is however only valid if the number of offspring is sufficient to give us a confident and significant ratio of 80:20 (nonrecombinants to recombinants) compared to the 50:50 expected ratio for loci that assort randomly. Genetic Association Unlike linkage analysis, association is a method used mostly for complex traits to identify loci of moderate or low effect. In this method, we choose a specific known locus and test which alleles are statistically associated with the disease phenotype. It requires a much smaller number of samples than linkage studies to detect a contributing effect of a gene. Association studies are a form of case control studies, in which the frequency of a particular allele at a locus is compared among affected and unaffected individuals in the population. The strength of an associations study is measured by odds ratio. Example: With allele Without allele a b Patients c d Controls The Odds ratio is calculated from the frequency of a specific allele in patients and controls OR = a/c : b/d = ad/bc Its main disadvantage is population stratification. False positive results may also arise by chance when screening a wide number of candidate loci: In other words, an increased odds ratio for an allele in a specific locus may not prove that this allele is associated with pathogenesis of the disease. Linkage disequilibrium The association of two linked alleles more frequently than would be expected by chance. The demonstration of Linkage disequilibrium in a particular disease suggests that the mutation which has caused the disease occurred relatively recently and that the marker locus being studied is very closely linked to the disease locus. It is counter-intuitive, but linkage does not require linkage disequilibrium only association does. LD refers to specific alleles (which is what association is interested in). Linkage only requires that a particular locus be linked to the disease. It could be defined by different marker alleles, linked to the disease in different families. Is Linkage disequilibrium the only possible explanation for an association? Linkage disequilibrium is not the only possible reason for an association between a disease D and allele A. Possible causes include the following: o Direct causation - having allele A makes you susceptible to disease D. Possession of A is neither necessary nor sufficient for somebody to develop D, but it increases the likelihood. In this case one would expect to see the same allele A associated with the disease in any population studied (unless the causes of the disease vary from one population to another). o Natural selection - people who have disease D might be more likely to survive and have children if they also have allele A. o Population stratification - the population contains several genetically distinct subsets. Both the disease and allele A happen to be particularly frequent in one subset. Lander and Schork (1994) give the example of the association in the San Francisco Bay area between HLA-A1 and ability to eat with chopsticks. HLA-A1 is more frequent among Chinese than among Caucasians. o Statistical artefact - association studies often test a range of loci, each with several alleles, for association with a disease. The raw p values need correcting for the number of questions asked (Section 12.5.1). In the past, researchers often applied inadequate corrections, and associations were reported that could not be replicated in subsequent studies. o Linkage disequilibrium - close linkage can produce allelic association at the population level, provided that most disease-bearing chromosomes in the population are descended from one or a few ancestral chromosomes. If linkage disequilibrium is the cause of the association, there should be a gene near to the A locus that has mutations in people with disease D. The particular allele at the A locus that is associated with disease D may be different in different populations Compare and contrast linkage and association In principle, linkage and association are totally different phenomena. Association is simply a statistical statement about the co-occurrence of alleles or phenotypes. Allele A is associated with disease D if people who have D also have A more (or maybe less) often than would be predicted from the individual frequencies of D and A in the population. For example, HLA-DR4 is found in 36% of the general UK population but 78% of people with rheumatoid arthritis. An association can have many possible causes, not all genetic (see below). Linkage, on the other hand, is a specific genetic relationship between loci (not alleles or phenotypes). Linkage does not of itself produce any association in the general population. The STR45 locus is linked to the dystrophin locus. Within a family where a dystrophin mutation is segregating, we would expect affected people to have the same allele of STR45, but over the whole population the distribution of STR45 alleles is just the same in people with and without muscular dystrophy. Thus linkage creates associations within families, but not among unrelated people. However, if two supposedly unrelated people with disease D have actually inherited it from a distant common ancestor, they may well also tend to share particular ancestral alleles at loci closely linked to D. Where the family and the population merge, linkage and association merge Linkage Deals with specific loci. Makes use of LOD score. A measure of how alleles segregate within a families. Linkage requires knowledge of phase. Need to be able to distinguish recombinants from nonrecombinants in order to calculate a LOD score. Linkage makes use of well characterized pedigrees to identify haplotypes that are inherited intact over several generations. Linkage analysis, combined with positional cloning, is a very powerful method for the detection of loci responsible for simple Mendelian phenotypes. Historically it has had a very low falsepositive rate when a stringent LOD-score of 3.0 is used (Risch, 2000). However, linkage tends to identify very large regions encompassing hundreds or even thousands of genes. It is less effective for the detection of genes with more subtle effects such as those responsible for most complex common diseases. So far, all genes identified by linkage and positional cloning, even those for complex diseases, display mendelian or near-mendelian inheritance (reviewed in Risch, 2000). In the case of complex diseases, linkage analysis tends to identify rare mutations or polymorphisms unique to only a small subset of the diseased population (ie. Rare simple causes for otherwise complex diseases). Association Deals with specific alleles. A measure of how alleles and disease travel together. Depends on linkage disequilibrium. Association relies on retention of adjacent DNA variants over many generations (in historic ancestries) and does not require specific knowledge of pedigrees. Association studies are generally much more powerful than linkage analyses when it comes to predicting genetic components of complex diseases. Disease associated regions will be much smaller than in linkage analysis, often encompassing only one gene or gene fragment. However, there have been difficulties with reproducing these association studies. This could be the result of poor study designs, incorrect assumptions about the underlying genetics of the population, or overinterpretation of the data. To be effective, association studies often require very large sample sizes, of as much as several thousand patient samples (Cardon and Bell, 2001). Many mutations or polymorphisms in a gene can create the same phenotype. Each will have its own ancestral haplotypes and thus, reduce the power to detect associations between the phenotype and a specific allele. Transmission Disequilibrium Test (TDT) TDT starts with couples who have one or more affected offspring. It is irrelevant whether either parent is affected or not. To test whether marker allele M1 is associated with the disease, we select those parents who are heterozygous for M1. The test simply compares the number of such parents who transmit M1 to their affected offspring with the number who transmit their other allele. The result is unaffected by population stratification. The TDT can be used when only one parent is available, but this may bias the result (Schaid, 1998). There has been some argument about whether the TDT is a test of linkage or association. Since it asks questions about alleles and not loci, it is fundamentally a test of association. The associated allele may itself be a susceptibility factor, or it may be in linkage disequilibrium with a susceptibility allele at a nearby locus. The TDT cannot detect linkage if there is no disequilibrium - a point to remember when considering schemes to use the TDT for whole-genome scans. How is a TDT performed? Affected probands are ascertained. The probands and their parents are typed for the marker. Those parents who are heterozygous for marker allele M1 are selected. They may or may not themselves be affected. Let a be the number of times a heterozygous parent transmits M1 to the affected offspring, and b be the number of times the other allele is transmitted. The TDT test statistic is (a-b)2/(a+b). This has a Χ2 distribution with 1 degree of freedom, provided the numbers are reasonably large. Other alleles at the M locus can be tested using the same set of families. If n marker alleles are tested, each individual p value must be corrected by multiplying by (n-1). Explain the symbols used for pedigrees Figure 3.1. Main symbols used in pedigrees. Generations are usually labeled in Roman numerals, and individuals within each generation in Arabic numerals; III-7 or III7 is the seventh person from the left (unless explicitly numbered otherwise) in generation III. An arrow can be used to indicate the proband or propositus (female: proposita) through whom the family was ascertained. Dominant and recessive inheritance Autosomal Dominant Inheritance • • • All affected individuals should have an affected parent Both sexes should be equally affected Roughly 50% of the offspring of an affected individual should also be affected Autosomal Recessive Inheritance • • • Usually there is no previous family history The most likely place to find a second affected child is a sibling of the first Inbreeding increases the chance of observing an autosomal recessive condition X-linked Inheritance • • • Usually only males affected No cases of male to male transmission All the affected males can be linked through unaffected carrier females Mitochondrial Inheritance More examples of inheritance patterns Figure 3.2. Basic mendelian pedigree patterns. (A) Autosomal dominant; (B) autosomal recessive; (C) X-linked recessive; (D) X-linked dominant; (E) Y-linked. The risk for the individuals marked with a query are (A) 1 in 2, (B) 1 in 4, (C) 1 in 2 males or 1 in 4 of all offspring, (D) negligibly low for males, 100% for females. See Section 3.2 and Figure 3.5 for complications to these basic patterns. Penetrance and Expressivity Terms that define aspects of the variation in the expression of genes. Penetrance The probability of gene having any phenotypic expression at all. The likelihood, or probability, that a condition or disease phenotype will, in fact, appear when a given genotype known to produce the phenotype is present. In other words, a genotype may give rise to a particular phenotype only in a proportion of individuals: penetrance. If the frequency of expression of a phenotype is less than 100%, then some of those who have the gene may completely fail to express it. The gene is said to show reduced penetrance. 80% penetrance: 80% of heterozygotes express the condition. It can be described in a in a statistical manner. The probability that a particular phenotype P will be observed in a individual of genotype G, Pr(P|G), is the penetrance. If every person carrying a gene for a dominantly inherited disorder has the mutant phenotype, then the gene is said to have 100% penetrance. Similarly, if only 30% of those carrying the mutant exhibit the mutant phenotype, the penetrance is 30%. Expressivity The range of variation that is seen in a phenotype; it refers to the degree of expression of a given trait or combination of traits that is associated with a gene. Conditions may have severe or mild symptoms; they may have symptoms that show up in one organ or combination of organs in one person but not in the same locations in other persons. Estimation of Penetrance: Use family data such as Genetic relationships of all individuals within each family. Phenotypic data such as disease status, age at first diagnosis or age last known to be unaffected. Mutation status of all tested individuals. The penetrance is then estimated using a logistic regression model, usually designed to allow the construction of correlations within families and to include both genetic and non-genetic factors. Genetic Heterogeneity A single disorder, trait, or pattern of traits caused by genetic factors in some cases and non-genetic factors in others. For example, in many complex diseases, the effect of genetic risk factors can depend on the presence of specific environmental factors. In clinical settings genetic heterogeneity refers to the presence of a variety of genetic defects which cause the same disease, often due to mutations at different loci on the same gene, a finding common to many human diseases including Alzheimer's disease, cystic fibrosis, lipoprotein lipase and polycystic kidney disease. Haplotype A haplotype is the particular combination of alleles (usually identified by SNPs) on one chromosome or a part of a chromosome. Haplotypes can be exploited for the fine mapping of disease genes. The principle of haplotype mapping is shown in the figure. A new mutation responsible for a genetic disease always enters the population within an existing haplotype, which is termed the ancestral haplotype. Over several generations, recombination events may occur within the haplotype but the disease allele and the closest SNPs still tend to be inherited as a group. If this haplotype can be identified in a group of patients with the disease, typing the alleles within the haplotype allows a conserved region to be identified, which pinpoints the mutation responsible for the disease. Due to the abundance of SNPs, this technique has the potential to map genes very accurately. There is therefore much interest in developing a haplotype map of the entire human genome. Haplotype block Some SNPs may be in linkage disequilibrium and are inherited in blocks. A haplotype block is thus a discrete chromosome region of high linkage disequilibrium and low haplotype diversity. It is expected that all pairs of polymorphisms within a block will be in strong linkage disequilibrium, whereas other pairs will show much weaker association. Blocks are hypothesized to be regions of low recombination flanked by recombination hotspots. Blocks may contain a large number of SNPs, but a few SNPs are enough to uniquely identify the haplotypes in a block. The HapMap is a map of these haplotype blocks and the specific SNPs that identify the haplotypes are called tag SNPs. Ancestral haplotype blocks An ancestral haplotype block is passed from generation to generation just like familial haplotype blocks but is even found at higher than expected frequencies in the population at large between people not closely related i.e. all arising from some distant ancestor. A haplotype block is a discrete chromosome region of high linkage disequilibrium and low haplotype diversity. It is expected that all pairs of polymorphisms within a block will be in strong linkage disequilibrium, whereas other pairs will show much weaker association. Blocks are hypothesized to be regions of low recombination flanked by recombination hotspots. Phase Linked alleles on the same chromosome are said to be in coupling Alleles on different homologues are said to be in repulsion The alleles in coupling at a set of closely linked markers constitute what is known as the haplotype for those loci Informative/Uninformative Informative markers: highly polymorphic (> 75% heterogeneity) Uninformative markers: low heterogeneity so that most people have the same genotype Informative pedigrees: allow phase to be determined heterozygous parents with different genotypes (ie father – 1,2; mother 3,4) or heterozygous for same alleles but offspring is homozygous (ie father – 1,2; mother 1,2; offspring 1,1) Uninformative pedigrees: phase can not be determined parents homozygous (ie father – 1,1; mother - 1,1 or father – 1,1; mother – 2,2) parents heterozygous and offspring heterozygous for same alleles (ie father – 1,2; mother – 1,2; offspring – 1,2) A meioses is informative if we can identify if the gamete is recombinant Assume father has dominant condition, inherited along with A1 A) B) C) D) Uninformative: Homozygous marker alleles in father undistinguishable Uninformative: child could have inherited A1 or A2 from either parent Informative: child inherited A1 from the father Informative: child inherited A1 from the father Haplotype Mapping Fig. Principle of haplotype mapping. A new mutation (X) arises in the proximity of six single nucleotide polymorphisms, with the ancestral haplotype signature TATCAT. Over several generations, the haplotype signature may be eroded by recombination. For example, contemporary haplotype 1 was produced by recombination between the first and second SNPs; the new alleles are shown in pink. However, the smallest conserved haplotype signature in all patients carrying the disease allele places the disease between SNPs 3 and 4. This technique provides a candidate region of about 10,000 bp, which is smaller than most human genes. Useful review: Cardon, LR and Abecasis GR. 2003. Using haplotype blocks to map human complex trait loci. 19(3):135-140 SNP, synonymous, non-synonymous A SNP or single nucleotide polymorphism is a change in which a single base in the DNA differs from the usual base at that position. The SNP consortium (http://snp.cshl.org/) has discovered and characterized nearly 1.8 million SNPs to date. Some SNPs such as that which causes sickle cell are responsible for disease. Other SNPs are normal variations in the genome. SNPs account for 90% of all DNA polymorphisms. SNPs can result from either the transition or transversion of nucleotide bases o Transitions are base changes to the same type of base – that is, a change between A and G (purines) or between C and T (pyrimidines) o Transversions are base changes to a different type of base – that is, a change from a purine to a pyrimidine SNPs are classified according to their effect on the resulting protein: Nucleotide substitutions occurring in protein-coding regions are classified as synonymous and nonsynonymous according to their effect on the resulting protein A substitution is synonymous if it causes no amino acid change o This can occur because the genetic code is degenerate, so more than one triplet sequence can code for the same amino acid A non-synonymous substitution results in alteration in the encoded amino acid The nonsynonymous mutations can be further classified into missense and nonsense mutations o A missense mutation results in amino acid changes due to the change of codon o A nonsense mutation results in a termination codon. In general, the higher the frequency of a SNP allele, the older the mutation that produced it, so highfrequency SNPs largely predate human population diversification Synonymous Substitution = A mutation that replaces one codon with another without changing the amino acid that is specified. This can occur because the genetic code is degenerate, so more than one codon can code for the same amino acid. Conservative Substitution = A mutation that causes one codon to be replaced by another that specifies a different amino acid, but one that has similar chemical properties to the original amino acid. Non-conservative Substitution = Replacement of one codon by another, which specifies an amino acid with different chemical properties. Synonymous vs. Non-Synonymous -high frequency of occurrence -no selection pressure -mostly due to 3rd base wobble Non-Sense Mutations Missense Mutations -results in a stop codon -AA different AA -leads to decreased gene function -increased selection pressure -decreased frequency of occurrence Conservative Non-Conservative -AA AA that is -AA AA (dissimilar chemically similar side chain) -does not usually -different charge alter gene function -different polar side chains Web source: http://www-hto.usc.edu/~cbmp/2001/SNP/index.html Advantages / disadvantages of SNPs vs. other types of variants Microsatellites vs. SNPs Highly polymorphic not polymorphic – only 4 possible outcomes Less common more common Map disease, haplotype map haplotype map Restriction Fragment Length Polymorphism (RFLP) Polymorphism refers to the DNA sequence variation between individuals of a species. If the sequence variation occurs at the restriction sites, it could result in RFLP. The most well known example is the RFLP due to globin gene mutation. Restriction Fragment Length Polymorphism (RFLP) resulting from b-globin gene mutation. In the normal cell, the sequence corresponding to 5th to 7th amino acids of the b-globin peptide is CCTGAGGAG, which can be recognized by the restriction enzyme MstII. In the sickle cell, one base is mutated from A to T, making the site unrecognizable by MstII. Thus, MstII will generate 0.2 kb and 1.2 kb fragments in the normal cell, but generate 1.4 kb fragment in the sickle cell. These different fragments can be detected by southern blotting. Microsatellites A microsatellite is a short sequence of DNA, usually 1 to 4 basepairs that is repeated in a row along the DNA molecule. Many repeats tend to be concatenated at the same locus. There are hundreds of places in human DNA that contain microsatellites. The number of repeats at a particular locus is highly polymorphic between individuals of the same species. The hypervariability arises because the repeated simple sequences cause a high frequency of loss or insertion of additional repeats by confusing the DNA replication machinery. This hypervariability allows microsatellite sequences to be used for genetic fingerprinting and paternity testing. Most loci of the genome, even non-coding parts, would be too similar to allow individuals to be reliably distinguished. Microsatellites may also be used as genetic markers to track inheritance in families and investigate genetic associations with disease. VNTR VNTR is an acronym for "variable number of tandem repeats". These are short identical segments of DNA aligned head to tail in a repeating fashion and are highly variable between individuals. In any particular chromosome the repeat number may vary from one to thirty repeats. Since these repeat regions are usually bounded by specific restriction enzyme sites, it is possible to cut out the segment of the chromosome containing this variable number of tandem repeats or VNTR's, run the total DNA on a gel, and identify the VNTR's by hybridization with a probe specific for the DNA sequence of the repeat. Shown to the right at the top are the chromosomes of the two parental individuals of the pedigree below. The first individual has one chromosome with 4 repeated sequences and one chromosome with 6 repeated sequences. The other individual has one chromosome with 3 repeated sequences and one chromosome with 5 repeated sequences. At the bottom of the figure is a pedigree of the mating between these two individuals and their four children. The DNA of each of the individuals has been analyzed for the VNTR repeat number and the gels are show below each individual along with the genotype for each individual. Notice that each of the six people is distinguishable from each other by the VNTR's at this one genetic locus. If several VNTR loci were used, the uniqueness of each individual would become even more distinct. Founder effect, founder population Founder Effect: is a form of genetic drift. One of the original founders of a new group just happens to carry a relatively rare allele. This allele will have a much higher frequency than it had in the larger group from which the new group was derived. Outcome of a founder effect: Each population may be characterized by its own particular molecular mutations as well as by an increase or decrease in specific diseases. Genetic Drift The fluctuation in allele frequency due to chance operating on the small gene pool contained within a small population. It causes high frequencies for deletrious disease alleles in a population fluctuation in allele frequency due to chance affecting the gene pool A type of genetic drift is the founder effect. This occurs when a small group breaks off from a larger population to found a new colony. This creates a skewed gene pool due to the lack of random mating and due to the small population size. Alleles that might have been rare become more frequent. For example, the population of Newfoundland has a much higher incidence of certain genetic disorders – the population was isolated and was a smaller representative of the larger at one time. Genotype/Phenotype Genotype is defined as the combination of alleles or genes the affect a particular trait. Phenotype: is defined as the physical and physiological traits of an individual resulting from genotype and environment. The phenotype is the distinctive expression of a genotype in a given environment. An organism's genotype is the largest influencing factor in the development of its phenotype, but it is not the only one. Even two organisms with identical genotypes normally differ in their phenotypes. This is clearly illustrated by phenotypic discordance in monozygous twins. Monozygous twins share the same genotype, since their genomes are identical; but they never have the same phenotype, although their phenotypes may be very similar. The concept of phenotypic plasticity describes the degree to which an organism's phenotype is determined by its genotype. A high level of plasticity means that environmental factors have a strong influence on the particular phenotype that develops. Consequently, a gene that predisposes a person to a certain form of cancer might not cause the disease in an individual depending on their lifestyle or other environmental factors. If there is little plasticity, the phenotype of an organism can be reliably predicted from knowledge of the genotype, regardless of environmental peculiarities during development. In contrast to phenotypic plasticity, the concept of genetic canalization addresses the extent to which an organism's phenotype allows conclusions about its genotype. A phenotype is said to be canalized if mutations do not noticeably affect the physical properties of the organism. This means that a canalized phenotype may form from a large variety of different genotypes, in which case it is not possible to exactly predict the genotype from knowledge of the phenotype. If canalization is not present, small changes in the genome have an immediate effect on the phenotype that develops. Pharmacogenetics and pharmacogenomics These are both terms used to describe the study of how variation in human genes leads to variation in response to drugs. The difference between the two is one of scale: Pharmacogenetics is used in reference to one or a few genes. Pharmacogenomics refers to large scale genomic approaches and is a genome wide phenomenon or a substantial number of genes are involved. So essentially, both terms describe genetic determinants of drug disposition and response and are often used interchangeably. There are lots of examples that show that the way different people respond to drugs can be attributed, at least in part, to differences in genes encoding Drug metabolizing enzymes: ~20-30 enzymes can interact with nearly every chemical to which the body is exposed Drug transporters Drug targets such as receptors and enzymes. These are, however, in addition to non-genetic factors such as age, race, sex, renal and liver function, drug interaction, the severity of the disease and other lifestyle variables like smoking and alcohol consumption. There are also applications in monitoring drug delivery because the concentration of circulating free drug is dependent on absorption, distribution, metabolism and elimination – these are pharmacokinetic processes and vary between individuals. Example: Mercaptopurine causes severe myelosuppression in some patients (~1 in 300). This is because some people have very low levels of thiopurine methyltransferase, the enzyme that metabolizes mercaptopurine. Since this correlation was discovered, there is now a clinical test that measures an individual’s level of enzyme before the drug is administered. Simple vs complex traits Simple genetic disorders are so called because a single gene underlies them. Thus, a mutation in this gene is sufficient to cause manifestation of the disease. These disorders generally follow a Mendelian pattern of inheritance. Examples include Huntington’s, cystic fibrosis and sickle cell anaemia. Complex genetic disorders Some disorders are determined by changes in more than one gene. These disorders, known as complex disorders, do not follow the same predicted pattern of inheritance seen in autosomal or X-linked dominant and recessive disorders. Sometimes changes in these genes must be in combination with certain environmental factors, such as exposure to certain chemicals or medications or maybe even diet. This type of inheritance is often referred to as multifactorial because many different factors, genetic and/or environmental, are involved. A person will have a complex disorder if he or she has the right combination of changed genes and environmental exposures. Sometimes these disorders are caused by changes in one or more genes that make a person susceptible to developing the disorder after exposure to specific environmental factors. The close relatives of someone with a complex disorder have a higher chance of later developing the disorder than the close relatives of someone who does not have the disorder. Diabetes, heart disease, neural tube defects, autism, Alzheimer disease, and many cancer syndromes are examples of disorders that can be caused by multifactorial, or complex, inheritance. Polygenic A term used to describe diseases that are caused by changes in more than one gene. Most common human diseases are considered polygenic (e.g., asthma, diabetes, obesity, osteoporosis, etc.). The next scientific frontier, will be those polygenic disorders involving a combination of gene polymorphisms, each of which contributes in some small way to pathology. Examples of such conditions include a variety of mental and behavioral problems (alcoholism, schizophrenia, depression), as well as physiological disorders that involve complex interactions between genetics and environment (atherosclerosis, hypertension). These problems will require the ability to look at patterns of gene expression at multiple stages of disease/ disorder progression and under a variety of physiological and environmental conditions. Array technologies may provide just such capabilities Complementation Figure 3.3. Complementation: parents with autosomal recessive profound hearing loss often have children with normal hearing. II6 and II7 are offspring of unaffected but consangineous parents, and each has affected sibs, making it likely that each has autosomal recessive hearing loss. All their children are unaffected, showing that II6 and II7 have nonallelic mutations. A complementation analysis asks if two putative alleles1, when in the same cell2 and acting independently3, can supply all functions necessary4 for a wild-type phenotype5. Complementation is therefore a test of function. The superscripts in this definition are explained below: 1. The "two putative alleles" refers to two versions of the same region of the chromosome, each of which separately confer a mutant phenotype. They are termed "putative" alleles since it is their very "allelism" which will be determined in this test (they are allelic if they are in the same complementation group). Each allele ought to be present in a single copy number in the cell and it is crucial that the entire relevant region be present in diploid in the cell. Such a partially diploid cell is termed merodiploid. An inverse genetics application is "cloning by complementation", but this will have many of the same concerns as standard complementation with the added concern of copy effects if multi-copy plasmids are used. 2. The two alleles can either be present on the chromosome or on extra-chromosomal elements. If either version is in more than one copy, there can be both regulatory complications (e.g. titration of a regulatory factor) and difficulties in interpretation (e.g. you do not know if a positive result is due to inappropriate quantities of the product encoded by the multi-copy gene). 3. Care must be taken that the mutations cannot recombine to form a wild-type genotype so that typically Rec- strains are used. 4. Only functions absolutely necessary for the desired phenotype, under the conditions used, are "demanded" by a complementation test. Mutations affecting genes whose products are not essential for the desired phenotype will not be tested for in complementation analysis. 5. The ""wild-type" phenotype" demanded by this analysis should be more rigorously called an "apparently wild-type phenotype under the conditions used". The phenotype is typically scored as "growth" or "no growth", but biochemical assays of the encoded gene product can be performed for more precise quantitation. Figure 29 gives an idea of the results one could expect from straightforward complementation tests. In these examples when the two mutations in the separate mutant alleles affect the same gene, then neither is capable of generating a wild-type product of that gene and the resultant merodiploid strain is mutant in phenotype. On the other hand, if the two mutations affect different genes, so that each copy of the region is able to generate some of the gene products required (and between them all necessary gene products are synthesized) then the resulting strain is phenotypically wild-type. One problem with this set of examples is that no one(in doing bacterial genetics) routinely puts the two alleles in the "cis"-configuration as a control for complementation (you do build such strains for other purposes, however). It is too hard (for reasons we will cover when we get to "mapping") and it provides very little information, since the presence of the wild-type allele on the other copy will nearly always be dominant. It is, however, often appropriate to consider effects of a mutation on genes in cis, but this is not the same as generating "double mutants" affected in the same small region. There are three sorts of controls useful in analyzing the results of complementation experiments (see Fig. 30): (a) If either copy of the merodiploid contains a wild-type region, the phenotype of the resulting strain should be a wild-type phenotype, and the wild type is said to be dominant to the mutant. If it is not, the mutant allele is said to be trans-dominant to the wild-type (see section VII B). In either case the merodiploid has the phenotype of whichever allele is dominant. (b) A merodiploid strain constructed with the same mutant allele in each copy should display the mutant phenotype. If it does not, it suggests that mere diploidy for the region of interest can confer a wild-type phenotype. One way of this occurring would be if the mutation conferred a leaky phenotype so that a double dose might yield a pseudo wildtype response. (c) The result should not depend on the location of the alleles; i.e. the same result should obtain no matter which allele is on the chromosome. If this is not true, it indicates that the two locations are not equivalent and therefore the test has marginal validity. This is a variation on the concerns noted for multi-copy plasmids above. Chromosomal Allele w 1- 2- 3- 4t 1 - - - - Plasmid 2- - + + + + Allele 3 - + - + + 4- - + - - + wt - + + + + Intergenic vs intragenic complementation Complementation test A mating test to determine whether two different recessive mutations (a1;a2) on opposite chromosomes (trans, a1+/+a2) of a diploid cell will not complement (ie have a mutant phenotype) each other; but the same two recessive mutations on the same chromosome (cis, a1a2/++) in a diploid or partial diploid show a wild-type phenotype; a test for allelism. A test to determine whether two mutant sites are in the same functional unit or gene. complementation requires no knowledge of affected genes or proteins, just the ability to examine cells for the correction of mutant phenotype represents mixing of gene products not changes in genotypes of individual chromosomes complementation can be: o Intergenic: mutual correction of phenotype, affected genes are different o Intragenic: when complementation tests in heterokaryons yield positive results when mutations in mutant cells known to affect same gene demonstrates that patients have different but allelic mutations occurs when affected proteins are homomultimers, indicating that mutant subunit from one allele interacts with mutant subunit from other allele to improve function of protein. Complementation group Complementation group – when the groups complement, they are genetically different – either differences in the same gene (intragenic) or different genes (intergenic) A group of individuals that do not complement each other. also know as cistron (determined by the cis-trans complementation test), this term is hardly used anymore Example, the Faenconi anaemia studies have identified 8 different complementation groups. If, when you fuse cells from two mutants together, they complement, put them into separate complementation groups. Example: Xeroderma pigmentosum (XP) o autosomal recessive disease associated with increased frequency of sunlight induced skin cancer o caused by mutation in any one of 8 genes, 7 of which encode for DNA repair mechanisms o diagnosis is made based on assignment to a complementation group according to the fusioning of xeroderma pigmentosum fibroblasts Intergenic is more common – ie cobalamin complementation groups – have all mapped to different genes Intragenic probably affects different protein domains – less common Complementation Studies tests whether mutant sites are responsible for the same phenotype cells from two different people with (a) the same inherited disorder or (b) different inherited diseases are fused cell fusion is induced with polyethylene glycol this creates 1 cell with 1 nucleus containing 2 sets of chromosomes therefore, get 3 cell combinations: o cells from individual 1 fuse o cells from individuals 2 fuse o 2 cell types fuse together with two cell types fusing together, if this rescues the phenotype, the cells complement and have mutations at different sites even though cells may complement, could still affect the same step but from a different angle can also have a cell fused with a clone – if complements, corrects the phenotype and therefore contains the defective gene(s) Complications in Complementation Analysis The above examples would seem to suggest that if two mutations complement each other, then they must affect different genes and gene products. This would suggest that the results of complementation analysis would be to define the number of genes in the region. In fact, what complementation analysis does is to define the number of cistrons or complementation groups. More often than not, the number of cistrons will be coincident with the number of genes, but there are a number of special cases where this correlation will not hold. The complications that give rise to these special cases are discussed below and they fall into two general classes: when the non-complementing mutations actually do map to separate complementation groups (paragraphs 1 and 2 below), and when complementing mutations actually map to the same complementation group (paragraph 3 below). Examples of the first class will be detected when the appropriate controls are done, as described above. The second class will be seen as an anomaly in the actual complementation results. 1. Cis-dominant mutations are a reasonably common type of complication in complementation analyses. Cis-dominant mutations are those that affect the expression of genes encoded on the same piece of DNA (as the mutation itself), typically transcriptionally downstream, regardless of the nature of the trans copy. Such mutations exert their effect, not because of altered products they encode, but because of a physical blockage or inhibition of RNA transcription. There are two dissimilar examples of these sorts of mutations: (a) If a mutation in a transcriptionally upstream gene exhibits strong polarity onto downstream genes, then that mutation has the property of eliminating more than one gene product function. (b) Similarly, a mutation in the promoter or in other regulatory regions outside the translated area, may well eliminate transcription of the entire operon and thus be negative in complementation for all gene functions encoded by that operon. In each of these cases, the mutation is eliminating the function of genes that are themselves genotypically wild type. The mutations are said to be cis-dominant because the expression of the genes downstream on the same piece of DNA will be turned off regardless of the genotype present in the trans copy. 2. Negative complementation. Another complication involves the very rare phenomenon known as negative complementation or trans-dominant mutations with mutant phenotypes. Mutations of this type cause the resultant merodiploid strain to have a mutant phenotype even when the other copy of the region is genotypically wild type. The phenotype of the mutant allele is thus trans-dominant to the wild type (obviously the reason that wild type is dominant to most mutants is because it supplies the function that they have lost by mutation). There are three general schemes that can be envisaged for mutations causing this sort of phenotype. In each of them, it is necessary to propose that the mutant allele generates a product that, while not wild type, nevertheless possesses some activity that leads to the mutant phenotype. Possibilities include (a) multimeric enzymes where the merodiploid strain would generate multimers whose subunits come from both the mutant and wild-type genes in a random assortment. As shown in figure 31, if the protein was a tetramer, and if any multimer containing one or more mutant subunits was completely inactive, then the presence of the mutant chain would decrease the amount of functional wild-type gene product by approximately 8-fold (this number ignores regulation and assumes a two-fold dosage of the product due to a two-fold dosage of the gene). (b) The mutant gene might cause the generation of an altered protein that interfered in some reaction with the cell and thus caused a deleterious phenotype. In this case, the presence of a wild-type allele would restore the function missing in the mutant but would not eliminate the deleterious phenotype caused by the mutant protein. Thus, the mutant phenotype would be dominant to wild type. (c) It is also conceivable that the mutant copy generates an altered protein that, while it could not carry out the wild-type function, might be competitive with the wild-type gene product. In each case, an altered product is responsible for the trans dominance. Remember, these are rare, special cases: in general, the wild-type allele is dominant to the mutant since the latter typically involves loss of function which is "replaced" by the product of the wild-type gene. Such trans-dominant mutants are very appropriate for further biochemical analysis because the protein product has alteredfunction, rather than merely a lack of function. 3. Intragenic complementation is yet another possible complication in complementation analysis. This term refers to cases where two mutations that do affect the same gene, and therefore the same gene product, are able nonetheless to give a wild-type phenotype in a complementation analysis. There are two general cases of such a phenomenon: (a) If the product of the gene in question is a bi-functional protein, especially when those functions are independent of one another, then the gene itself will often show intragenic complementation. Such an example is easiest to understand if the product is pictured as "two beads on a string". If each "bead" had an independent enzymatic function, one could imagine that a mutation affecting either (but not both) of the two functions might well leave the other function intact. If two such mutations were put in a merodiploid situation, each would be able to produce one of the two required enzymatic functions, giving rise to a wild-type phenotype. In the case of such a gene, intragenic complementation would be fairly common such that many mutations would affect only one of the two functional regions. This model also predicts that mutations affecting each of the two functions would cluster at either end of the gene creating two clear complementation groups. (b) It is also possible, though less likely, for pairs of complementing mutants to occur in cases where the gene product is a multimeric protein. In such cases, a particular mutant allele might give rise to a protein product that can only function when allowed to aggregate with another particular mutant allele. Such an example is sketched below. In this case, unlike the case of bifunctional protein above, instances of intragenic complementation will be rare, limited to specific pairs of mutants. Further, there is no a priori reason to predict any clustering of complementing or noncomplementing mutations. Would such a case, where two mutations out of 100 in a given gene are capable of complementation, be sufficient to say the gene had two complementation groups? This question is largely a semantic one, but in general, unless intragenic complementation is fairly common, the few exceptional complementing pairs would not be said to define separate complementation groups. 4. "Unimportant" genes. Since complementation analysis treats only those functions necessary to generate the required phenotype, it does not allow the detection of complementation groups unless their products are required for the phenotype in question. If, for example, a region encoding such an "unimportant" product (at least under the conditions of the selection) is transcriptionally polar onto an "important" function, that pair of genes has the complementation properties of a single complementation group. This reflects the fact that the only mutations detected in the transcriptionally upstream gene would be ones polar onto the functionally important gene downstream. Homozygosity mapping use inbred affected individuals to map rare recessive diseases use affected inbred individuals to define the region of the genome where they are all homozygous inbred affected individuals are more likely to receive 2 identical alleles from related parents; therefore unrelated affected individuals should have a region identical by descent (IBD) near the disease locus with inbred families, the areas of IBD are larger but still unlikely to share the same IBD region from a first cousin consanguineous marriage the region of homozygosity is expected to be 28 cM for a second cousin consanguineous marriage, the regions is expected to be 22 cM 3 affected offspring of a first cousin marriage can achieve a lod score of +3.0 underestimates of allele frequencies can lead to over estimations of lod scores therefore, allele frequencies should be estimated in the study population advantageous when multiple affected sibs aren’t available*** powerful strategy for mapping rare recessive traits in children of (usually) consanguineous matings since rare recessive traits are more prevalent in these families than in the general population based on principle that fraction of genome of offspring of consanguineous matings would be identical; on average 1/16 of genome expected to be shared with first cousin matings appearance of these alleles known as homozygosity by descent (HBD) or identical by descent (IBD) involves locating gene causing rare recessive trait by using multi-point linkage analysis to find regions of IBD shared among affected individuals uses DNA of affected children and RFLPs useful to use related affected individuals but can also be done with unrelated affected individuals (depending on degree of genetic heterogeneity) regions of homozygosity expected to be random between different individuals of these matings, except at common disease locus shared by affected offspring Problems with Homozygosity Mapping 1) unexpected genetic heterogeneity: region containing disease locus was missed as a result of pooling o can be overcome with use of larger numbers of consanguineous families and statistical methods to detect heterogeneity 2) identification of homozygous IBD region unrelated to disease locus 3) potential for inflation of LOD scores as a result of underestimating extent of inbreeding Hardy-Weinberg equilibrium start with as a null hypothesis used in 1907 to determine why dominant traits don’t over take recessive mutations means that sexual reproduction does not cause a constant reduction in genetic variation in each generation amount of variation remains constant generation after generation in absence of disturbing factors direct consequence of segregation of alleles at meioses in heterozygotes The simple relationship between gene frequencies and genotype frequencies that is found in a population under certain conditions If we pick a person at random from the population, this is equivalent to picking two genes at random from the gene pool. The chance the person is A1A1 is p2, the chance they are A1A2 is 2pq, and the chance they are A2A2 is q2. This simple relationship between gene frequencies and genotype frequencies holds whenever a person's two genes are drawn independently and at random from the gene pool. A1 and A2 may be the only alleles at the locus (in which case p + q = 1) or there may be other alleles and other genotypes (p + q < 1). For X-linked loci males, being hemizygous (only one allele) are A1 or A2 with frequencies p and q respectively, while females can be A1A1, A1A2 or A2A2. Hardy-Weinberg equilibrium equation P2 + 2pq + q2 = 1 Limitations of the Hardy-Weinberg distribution These simple calculations break down if the underlying assumption, that a person's two genes are picked independently from the gene pool, is violated. In particular, there is a problem if there has not been random mating. Assortative mating can take several forms, but the most generally important is inbreeding. If you marry a relative you are marrying somebody whose genes resemble your own. This increases the likelihood of your children being homozygous and decreases the likelihood that they will be heterozygous. Rare recessive conditions are strongly associated with parental consanguinity, and Hardy- Weinberg calculations that ignore this will overestimate the carrier frequency in the population at large Evolutionary conservation The presence of similar genes, portions of genes, or chromosome segments in different species, reflecting both the common origin of species and an important functional property of the conserved element. Useful References: Nussbaum RL et al. Thompson & Thompson’s Genetics in Medicine Mueller RF and Young ID. Emery’s Elements of Medical Genetics Strachan T and Read AP. Human Molecular Genetics