Download SBGP_Lectures_Price

Document related concepts

Biology and consumer behaviour wikipedia , lookup

Heritability of IQ wikipedia , lookup

Behavioural genetics wikipedia , lookup

Neurogenomics wikipedia , lookup

Twin study wikipedia , lookup

Gene expression programming wikipedia , lookup

Transcript
Linkage analysis and eQTL studies
Tom Price
MRC SGDP Centre, Institute of Psychiatry
Systems Biomedicine Graduate Programme 2008/9
Genetic Linkage Studies
• Use the inheritance of markers within families to identify
chromosomal regions where disease genes may lie
Genetic
markers
M1
M2
M3
M4
M5
M6
M7
Disease
susceptibility gene
Linkage Pedigree
2 2
1 1
Disease cases
Genotype
2 1
1 3
1 3
1 3
3 3
2 3
2 3
2 3
Random chance?
Or linkage between marker and disease locus?
The Possibilities
SIMPLE
Multiple alleles of a single gene
Different alleles different effects
Trinucleotide repeat diseases
CONTINUOUS
DISCRETE
COMPLEX
Quantitative traits
Multiple genes and environment
Height
Mendelian
One gene = one trait
Cystic fibrosis
Non-Mendelian
Multiple genes and environment
Epilepsy, liability to stroke
One Gene, One Trait?
•
Laws of heredity discovered by
Mendel 1865
– Three laws of heredity
Mendel’s Laws
1. Dominance
•
When two contrasting characters are crossed only one
appears in the next generation
2. Segregation
•
For each trait, a gamete carries only one of the two
parental alleles
3. Independent assortment
•
Alleles for different traits are inherited independently
of each other
Dominance for Hair Colour
Mendel’s Laws
1. Dominance
•
When two contrasting characters are crossed only one
appears in the next generation
2. Segregation
•
For each trait, a gamete carries only one of the two
parental alleles
3. Independent assortment
•
Alleles for different traits are inherited independently
of each other
Segregation
AB
CD
Parental
Genotypes
D
C
C
A
C
D
D
C
Mendel’s Laws
1. Dominance
•
When two contrasting characters are crossed only one
appears in the next generation
2. Segregation
•
For each trait, a gamete carries only one of the two
parental alleles
3. Independent assortment
•
Alleles for different traits are inherited independently
of each other
Independent Assortment
• Eye colour IS NOT predictable from hair colour
– Blonde hair and brown or blue eyes
– Brown hair and blue or brown eyes
Mendel’s Laws
1. Dominance
•
When two contrasting characters are crossed only one
appears in the next generation
2. Segregation
•
For each trait, a gamete carries only one of the two
parental alleles
3. Independent assortment
•
Alleles for different traits are inherited independently
of each other
Independent Assortment
• Eye colour IS often predictable from hair colour
– Blonde hair and blue eyes
– Brown hair and dark eyes
What is Linkage?
• A method to map the relative positions of
two or more loci using genetic markers
– Occurs because loci do not obey Mendel’s third
law
Breaking the Third Law
A, B, O = blood group genes
affected,
unaffected
Adapted from Phillip McLean http://www.ndsu.nodak.edu/instruct/mcclean/plsc431/linkage/
Breaking the Third Law
A, B, O = blood group alleles
affected,
unaffected
Adapted from Phillip McLean http://www.ndsu.nodak.edu/instruct/mcclean/plsc431/linkage/
Breaking the Third Law
ABO locus predicts
D locus
A, B, O = blood group alleles
affected,
unaffected
Adapted from Phillip McLean http://www.ndsu.nodak.edu/instruct/mcclean/plsc431/linkage/
Genetics for Card Players
♠ We can think of genetic information as a
deck of cards.
♥ The closer 2 cards are, the less likely it is
that they will separate during shuffling.
♣ If not much shuffling has occurred, more
distant cards can act as markers.
Linkage Groups
• If inheritance of two loci is independent
– They are unlinked
• If inheritance of two loci is dependent
– They are in the same linkage group
– Linkage groups correspond to the physical
structures called chromosomes
Chromosomes
• Chromosomes are
NOT inherited as a
single block
• Recombination occurs
at meiosis
– Affects co-inheritance
of alleles
Recombination and Meiosis
• Nearby loci A and B
are likely to cosegregate during
meiosis.
• Distant loci B and C
are less likely to cosegregate during
meiosis.
Recombination
• For any pair of markers
– Parental pattern = NR
– Mixed pattern = R
Aa
Bb
cc
dd
A
a
B
b
Ac gametes
Ac
Ac
Non-recombinant
Bd
Bd
bd
NR
NR
R
ac
Bd
R
Recombination
• For any pair of markers
– Parental pattern = NR
– Mixed pattern = R
Aa
Bb
cc
dd
A
a
b
B
Ac gametes
Ac
Ac
Recombinant
Bd
Bd
bd
NR
NR
R
ac
Bd
R
Recombination
• For any pair of markers
– Parental pattern = NR
– Mixed pattern = R
Aa
Bb
cc
dd
Ac
Bd
Ac
Bd
Ac
bd
ac
Bd
NR
NR
R
R
Recombination Fraction
= The proportion of offspring that are
recombinant between two loci
• RF = 0.5 between unlinked loci (e.g.
different chromosomes)
Parametric Linkage Analysis
• Uses pedigree information to estimate
recombination fraction between markers and
disease
• Assumes a particular model of inheritance
(additive, dominant, recessive)
• Useful for Mendelian disorders (single gene)
Allele Sharing
• People with rare diseases are more highly
related to each other near the disease-causing
gene than you would typically expect.
• This is because nearby markers tend to be
inherited together with the disease locus.
→We can look for excess allele sharing as a
signal that a disease locus is nearby.
Identity By State
• When two individuals
possess the same alleles
at a locus, they are said
to be identical by state
(IBS).
• For example, these
affected sibs share one
allele IBS, the allele a.
ac
ad
Identity By State
• But if the parental genotypes
are unknown, we do not
know whether the offspring
have inherited the a allele
from the same parent or from
different parents.
• We can’t established shared
inheritance, so IBS allele
sharing is useless for linkage
analysis.
??
??
ac
ad
Identity By Descent
• Individuals who share
copies of a common
ancestral allele are said to
be identical by descent
(IBD).
• For example, these affected
sibs share one allele IBD.
The paternal allele a has
been transmitted to both
offspring.
ab
ac
cd
ad
Allele Sharing in Affected Sib Pair
ab
ac
cd
??
Sibling
genotypes
ac ac
ac ad
ac bc
ac bd
Alleles
shared IBD
2
1
1
0
Expected
Probability
¼
½
½
¼
Allele Sharing in Affected Sib Pair
ab
ac
cd
??
Sibling
genotypes
ac ac
ac ad
ac bc
ac bd
Alleles
shared IBD
2
1
1
0
Expected
Probability
¼
½
½
¼
Probability under random transmission of marker alleles.
Allele Sharing in Affected Sib Pair
ab
ac
cd
??
Sibling
genotypes
ac ac
ac ad
ac bc
ac bd
Alleles
shared IBD
2
1
1
0
Expected
Probability
¼
½
½
¼
Probability under random transmission of marker alleles.
But what if the marker lies near a disease gene?
Affected siblings are more likely to share marker alleles IBD.
Non-parametric Linkage Analysis
• Uses information on IBD allele sharing
– Usually between affected sibs
• Do not need to specify the model of
inheritance at any locus
• Useful for complex traits (multiple genes,
different modes of inheritance)
Linkage Statistic for Affected Sib Pairs
Alleles IBD
Expect
Observed
Under linkage
0
0.25
Z0

1
0.50
Z1

2
0.25
Z2

Linkage Statistic for Affected Sib Pairs
Alleles IBD
Expect
Observed
Under linkage
0
0.25
Z0

1
0.50
Z1

2
0.25
Z2

Suppose x families share 0 alleles IBD,
y families share 1 allele IBD,
z families share 2 alleles IBD.
Under a multinomial model, the expected probability of the marker data
Z0, Z1, Z2 assuming no linkage is
P( Z0, Z1, Z2 ) = x! y! z! 0.25 x 0.5 y 0.25 z
(x+y+z)!
Linkage Statistic for Affected Sib Pairs
Alleles IBD
Expect
Observed
Under linkage
0
0.25
Z0

1
0.50
Z1

2
0.25
Z2

Suppose x families share 0 alleles IBD,
y families share 1 allele IBD,
z families share 2 alleles IBD.
LOD = log10 P(marker data given estimated sharing Z0, Z1, Z2 )
P(marker data given sharing 0.25, 0.5, 0.25)
= log10
Z0x Z1y Z2z
0.25 x 0.5 y 0.25 z
Example: 200 ASPs
Sharing among 200 affected sibling pairs
0
1
Observed sharing
36
90
Expected sharing
50
100
• Z0 = 36/200 = 0.18
• Z1 = 90/200 = 0.45
• Z2 = 74/200 = 0.37
2
74
50
Recall: baseline values
0.25
0.5
0.25
Example: 200 ASPs
Sharing among 200 affected sibling pairs
0
1
Observed sharing
36
90
Expected sharing
50
100
2
74
50
Recall: baseline values
0.25
0.5
0.25
• Z0 = 36/200 = 0.18
• Z1 = 90/200 = 0.45
• Z2 = 74/200 = 0.37
• LOD = log10 0.1836 0.4590 0.3774
0.2536 0.590 0.2574
= 3.35
STRONG EVIDENCE FOR LINKAGE
Complications of Linkage Analysis
ab
cd
??
??
• With unknown parental genotypes, allele sharing must be estimated using
population allele frequencies
• Families with less than four alleles may give unclear sharing
• Multipoint linkage analysis, using information from adjacent markers, will
increase power to detect genes
• Computationally intensive: use computer programs to calculate LOD scores
• Other problems due to non-paternity, genotyping errors, sample mix-ups, poor
phenotype definition
Software
• Several programs are available, including:
– Parametric:
LINKAGE
MLINK
– Non-parametric:
MERLIN
GENEHUNTER
Linkage Study Design
Candidate gene search: dense marker genotyping within a
region of positional or functional interest
Genome search:
- Aim to identify several susceptibility genes
• Families are genotyped on polymorphic markers across all
chromosomes
• 300-400 microsatellite markers across genome, separated
by 10cM
(or, more recently, 10,000 SNP markers)
– tighter marker spacing gives more information
– few markers makes it difficult to reconstruct haplotypes,
particularly without parental genotypes
Significance Level
• Lander and Kruglyak (1994) suggested criteria for affected
sibling pair studies in complex diseases
LOD score > 2.2 suggestive linkage
LOD score > 3.6 significant linkage
• These LOD scores are expected to occur by chance in 1 and
1/20 times in a genome search, respectively
• Many studies of complex disease do not reach these cut-offs
• Another approach is to report highest LOD scores even if
they are below these thresholds and look for replication
across studies
Does It Work?
• Very powerful for mapping single gene
disorders, e.g. early-onset Alzheimer’s
Disease, many forms of mental retardation…
Does It Work?
• Very powerful for mapping single gene
disorders, e.g. early-onset Alzheimer’s
Disease, many forms of mental retardation…
• …but many non-replications for complex
traits
Linkage v Association
Linkage
Association
Usual
sample
Families
Unrelated individuals
(e.g. case control)
Good for
finding
Rare variants with
large effects
Common variants with
small effects
Identifies
Broad chromosomal
region
Narrow region usually
within a single gene
Break
• Next up: application of linkage analysis to
gene expression phenotypes.
Central Dogma
DNA → mRNA → protein
Finding Disease Pathways
1. Conduct linkage/association study to find candidate
2. Determine candidate gene function experimentally
Problems:
• Markers only give regional information, the identity
of the causal variations remains obscure
• Many GWAS hits are nowhere near any genes
• Reliance on animal and in vitro models to probe
function
Genetics of Gene Expression
• Linkage study or GWAS using mRNA abundance as
the phenotype
Motivation:
• mRNA abundance as ‘endophenotype’
– Lies on causal path between genetic variation and disease
• Hits (‘eQTLs’) may have less complex inheritance
– Larger effect sizes, fewer causal variants?
• We may already know which transcripts are
dysregulated in diseased tissues
– eQTLs can provide a link to finding susceptibility genes
The First eQTL Study
Cis Regulation
• Genetic variation near the gene locus
that influence its expression
• What we think of as “functional”
polymorphisms fall into this category
Trans Regulation
• Genetic variation away from the gene
locus that influence its expression
• e.g. polymorphisms in “hub” genes
that act as master regulators
B
A
C
E
D
Microarray Experiment
Human eQTL Studies
1st Author
Year
Journal
Population
Sample
Tissue
Measure
Genotyping
Morley
2004
Nature
CEPH
14 pedigrees
LCL
8K
Affy
Linkage scan
Monks
2004
AJHG
CEPH
15 pedigrees
LCL
25K
oligo
Linkage scan
Dixon
2007
Nat Gen
MRC-A
206 families
LCL
54K
Affy
Linkage scan
Goring
2007
Nat Gen
SAFHS
1240 individuals
Lymphocytes
47K
Illumina
Illumina 100K
Stranger
2007
Science
HapMap
270 individuals
LCL
47K
Illumina
2M SNPs + 7K
CNVs
Emilsson
2008
Nature
IFB/IFA
1002/673
individuals
Blood/
Adipose
25K
oligo
Illumina 370K
+ Linkage scan
Selected list
• Largest human eQTL study to date
• 1,240 subjects from extended pedigrees
• Blood lymphocytes, not lymphocyte cell
lines
• 47K Illumina WG-6 Series I microarray
• Expression adjusted for age, sex
Heritability
85% of 19,648 transcripts detected
were heritable (FDR 5%)
Cis Regulation
• Single LOD score calculated at gene locus
to identify cis-regulated transcripts
• 1,345 (6.8%) cis-regulated transcripts
detected (FDR 5%)
• eQTL effect size overall:
median 1.8%, mean 5.0%
• eQTL effect size in significant loci:
median 24.6%, mean 29.1%
Trans Regulation
• Much lower power
• No evidence of master regulators:
only 58 transcripts had 2+ peaks with LOD > 3
Gene Discovery Using eQTLs
Promoter variants in VNN1 are associated with
transcript abundance and HDL-C concentration
Consistency of Results
• Morley cis eQTLs confirmed by Göring,
but not trans eQTLs.
• This is consistent with tissue specificity of
trans regulation, but also with lower power
to detect trans effects.
•
•
•
•
Linkage & association study
Icelandic subjects
Blood and adipose tissue samples
Expression adjusted for age, sex, BMI
Tissue Specificity
Blood
Adipose
Both
Cis
2,529
1,489
Trans
52
25
762
?
• Linkage eQTLs (FDR 5%) for 20,877 expression
traits, 10,364 of them heritable (FDR 5%)
Proximity of Cis Acting Variants
• Association eSNPs were within 100kb of
the probe for 96% of expression traits with
strong cis-acting effects
Potential Problem
• Microarray probes overlapping SNPs can
give rise to spurious cis eQTLs
• Older studies did not have so much
resequencing data available to identify
probes containing SNPs
Further Directions
• Animal models (e.g. mouse F2 crosses)
• Other tissues (e.g. mouse brain)
• Evoked phenotypes
– genetics of expression response to e.g.
ionizing radiation, drug/hormone treatment
• Causal modelling
– Genotype data can establish whether expression
changes cause disease or are a consequence of it
Schadt et al. (2005) Nature Genetics 37: 710-717.
Website
http://tomprice.net/
Go to “Presentations”
Download slides