Download Linkage Disequilibrium essay

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neuronal ceroid lipofuscinosis wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

History of genetic engineering wikipedia , lookup

SNP genotyping wikipedia , lookup

Genetics and archaeogenetics of South Asia wikipedia , lookup

Fetal origins hypothesis wikipedia , lookup

Genetic engineering wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Genomic library wikipedia , lookup

Genetic studies on Bulgarians wikipedia , lookup

Genomics wikipedia , lookup

Heritability of IQ wikipedia , lookup

Pathogenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

RNA-Seq wikipedia , lookup

Genetic drift wikipedia , lookup

Medical genetics wikipedia , lookup

Designer baby wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Human genome wikipedia , lookup

Behavioural genetics wikipedia , lookup

Human Genome Project wikipedia , lookup

Genome evolution wikipedia , lookup

Microevolution wikipedia , lookup

Genome (book) wikipedia , lookup

Population genetics wikipedia , lookup

A30-Cw5-B18-DR3-DQ2 (HLA Haplotype) wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Public health genomics wikipedia , lookup

Human genetic variation wikipedia , lookup

Tag SNP wikipedia , lookup

Transcript
1. Explain the different ways in which linkage disequilibrium is exploited to identify the
parts of the genome that affect a quantitative trait.
[[[LD is relevant to the use of hapmap data and any other genome chip as the idea is to survey SNPS plus the loci in LD. It is
key to admixture analysis, linkage analysis, and GWAS. It generates the patterns seen in selective sweep analysis and in Fst
outlier analysis and hybrid zone analysis. Almost everything ]]]
 Introduction
Linkage disequilibrium, the non-random association of alleles at different loci, is a sensitive
indicator of the population genetic forces that structure a genome. Because of the explosive
growth of methods for assessing genetic variation at a fine scale, evolutionary biologists and
human geneticists are increasingly exploiting linkage disequilibrium in order to understand
past evolutionary and demographic events, to map genes that are associated with
quantitative characters and inherited diseases, and to understand the joint evolution of
linked sets of genes. Association mapping also known as linkage disequilibrium mapping is a
method of mapping quantitative trait loci (QTLs) that take advantage of historic linkage
disequilibrium to link phenotypes to genotypes. Linkage disequilibrium is relevant to the use
of hapmap data, SNP chips including GWAS. It is also key to admixture and linkage analysis.
Linkage disequilibrium can generate patterns seen in several types of analyses including
selective sweep analysis.
 Association mapping
Association mapping is based on the idea that traits that have entered a population only
recently will still be linked to the surrounding genetic sequence of the original ancestor, or
in other words, will more often be found within a given haplotype than outside of it. It is
most often performed by scanning the entire genome for significant associations between a
panel of SNPs and a particular phenotype. These associations must then be independently
verified in order to show that they either contribute to the trait of interest directly, or are in
linkage disequilibrium with a quantitative trait locus that contributes to the trait of interest.
Association studies have identified many rare, single-gene disorders after localization to a
candidate region. Numerous diseases have been identified by association studies some of
which include Alzheimer disease, inflammatory bowel disease, diabetes and schizophrenia.
 Tag SNPs
Genome scanning with SNPs has been successful with myocardial infarction, and gene
identification is competing with studies of candidate regions. The regions of the genome in
high LD that harbor a specific set of SNPs that are inherited together are also known as
haplotypes. Tag SNPs are representative of all SNPs within a haplotype. When haplotypes
are used for genome wide association studies, it is important to note the population being
studied. Often different populations will have different patterns of LD.
One example of differentiating patterns is African-descended populations vs. European and
Asian-descended populations. Since humans originated in Africa and spread into Europe and
then the Asian and American continents, the African populations are the most genetically
diverse and have smaller regions of LD while European and Asian-descended populations
have larger regions of LD due to founder effect. When LD patterns differ in populations,
SNPs can become disassociated with each other due to the changes in haplotype blocks.
This means that tag SNPs, as representatives of the haplotype blocks, are unique in
populations and population differences should be taken into effect when performing
association studies.
 HAPMAP
Because whole genome sequencing of individuals is still cost prohibitive, the international
HapMap Project was constructed with a goal to map the human genome to haplotype
groupings (haplotype blocks) that can describe common patterns of human genetic variation.
By mapping the entire genome to haplotypes, tag SNPs can be identified to represent the
haplotype blocks examined by genetic studies. An important factor to consider when
planning a genetic study is the frequency and risk incurred by specific alleles. These factors
can vary in different populations so the HapMap project used a variety of sequencing
techniques to discover and catalog SNPs from different sets of populations.
According to Musunuru et al. (2008) the utility of HapMap data in genetic mapping for has
proven useful in identifying novel contributors to cardiovascular traits some of which
include diabetes, lipid levels and myocardial infarction. A comprehensive test of common
SNPs would theoretically involve the genotyping of all 11 million common SNPs in patients
with disease and individuals free of disease. However, the correlation structure among SNPs
provided by HapMap allows investigators to genotype far fewer SNPs while still retaining
statistical power to find regions of the genome associated with disease.
 CHIPS/ GWAS
Genome-wide association studies (GWAS) use SNPs to identify genetic associations with
clinical conditions and phenotypic traits. They are hypothesis free and use a whole-genome
approach to investigate traits by comparing large group of individuals that express a
phenotype with a large group of people that don't. The ultimate goal of GWAS is to
determine genetic risk factors that can be used to make predictions about who is at risk for
a disease, what are the biological underpinnings of disease susceptibility and creating new
prevention and treatment strategies. Due to the large number of possible SNP variants
(almost 13 million as of June 2008 [10]) it is still very expensive to sequence all SNPs. That is
why GWAS use customizable arrays (SNP chips) to genotype only a subset of the variants
identified as tag SNPs. Greater than 60% coverage (Barrett 2006) of the genome is provided
by commercially available SNP chips.
 ADMIXTURE
Admixture arises if the population being sampled is the result of two populations that have
recently begun to intermarry and have different allele frequencies at both the trait and
marker loci. The differences in the allele frequencies cause a difference between the
observed haplotypic frequencies and the expected ones. Disequilibrium due to admixture
can even be detected if the marker and trait loci are unlinked, and it will be replicated in
repeated sampling from the same population.
Mapping by admixture linkage disequilibrium
(MALD), also known as admixture mapping, is a
genetic association strategy that makes use of
one of the consequences of ADMIXTURE. The
gene flow that takes place during admixture
results in the temporary generation of long
haplotype blocks, which include polymorphic
variants — inducing a phenomenon known as
ADMIXTURE LINKAGE DISEQUILIBRIUM (ALD).
MALD takes advantage of long-range
haplotypes that are generated by gene flow
among recently admixed ethnic groups, such as
African-Americans and Latinos. Under ideal
circumstances, MALD will have more power to
detect some genetic variants than other types
of genome-wide association study that are
carried
out
among
more
ethnically
homogeneous populations. It also requires
200–500 times fewer markers, providing a
significant economic advantage. The idea of the
MALD approach is to screen across the genome in a population of individuals of mixed
ancestry, specifically in individuals who are affected by the disease of interest. The strategy
looks for regions with an unusually high representation of a particular chromosomal
segment from the parental population with the higher risk for the disease (Fig. 1). The
reason for the high frequency of a particular chromosomal segment is due to the location of
the disease-gene allele within it.
[[FIGURE a | The strategy that is used to assess the ancestral origin of chromosomal segments in mapping by admixture linkage disequilibrium (MALD).
Genotyping MALD markers is used to assess parental ancestry across a single chromosome in multiple cases (individuals with the disease of interest) versus
matched healthy controls. The region indicated by the star is derived more often from one of the parental populations only in the disease cases, indicating
that this region contains a disease-susceptibility locus. In the controls, the same region has an equal probability of originating from either parental population.
b | A theoretical example of how an admixture signal can be detected using the MALD method for a disease with a higher incidence in one parental
population (population A). The proportion of ancestry from population A in multiple individuals (both with the disease (cases) and without the disease
(controls)) is shown schematically for different positions on a single chromosome. An elevated ancestry proportion from population A in cases is evident at the
peak (marked by an arrow), which indicates the involvement of the corresponding genomic region in the disease. The peak can be identified by the higher (or
lower; not shown) level of ancestry that is seen in cases relative to the same region in controls, and/or relative to the remainder of the genome in cases (only
the neighbouring chromosomal region is shown here).]]
 Selective sweeps
An excess of linkage disequilibrium is known to be a signature of a selective sweep. While LD
between neutral loci is eliminated by a selective sweep at an intervening site, if the selected
site does not separate the neutral loci LD can be increased or decreased depending on their
proximity to the selected site. A further complication is that the age of the neutral
mutations relative to the selected one has critical consequences for the magnitude of LD. If
both neutral loci are closely linked to the selected site, mutations older than the selected
one will typically show strong LD and younger mutations will typically have little or no LD.
When both features are combined the result is a nonmonotonic relationship between the
proximity of a pair of neutral loci to a selected one and the strength of LD. Whether using
patterns of Linkage disequilibrium directly to make inferences about selective sweeps are
effective remains debatable due to the complications outlined above.
 Conclusion/ Future for LD studies
In human population genetics, the future of linkage disequilibrium is now. Very large-scale
GWA studies are underway. With technologies allowing the efficient genotyping of 500,000
or more SNPs and genotyping costs declining, re-sequencing studies including 1000
Genomkes project will be allowed to take place. With increased resolution of Linkage
disequilibrium patterns, the study of human history will focus from understanding the
average history of populations to understanding the history of different genomic regions.