Download 696 KB - Imaging Genetics Conference

Document related concepts

Twin study wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Heritability of IQ wikipedia , lookup

Transcript
SOLAR 101:
Intro to Theory & Intro to Software
Laura Almasy
Outline on web site
Part 1: 2:00- 3:00pm (John Blangero and Laura Almasy)
I. Introduction to SOLAR: Download, Installation, Registration
II. Creating a SOLAR project: family vs. epidemiological sample,
pedigree file
III. Loading traits and pedigree files
IV. Manipulation with traits: statistics, normalization, regression
Break (10 minutes)
Part 2: 3:10- 4:10pm (John Blangero)
I. Multivariate linear model: Assumptions, departures, and basic
construction
II. Polygenic model, and heritability
III. Statistical Inference
IV. Basic example: transformation, heritability estimation, hypothesis
testing
V. Fundamentals of GWAS and linkage
Outline - revised
Part 1: THEORY
I. Basic concepts of variance component models for genetics
II. Heritability
III. Covariates
IV. Association
V. A little bit about linkage
Break
Part 2: SOFTWARE
I. SOLAR: download, registration, documentation, user support
II. Pedigree file – what if you don’t have families?
III. Phenotype file – polygenic, heritability, covariates, normalization
IV. SNP and map files – linkage disequillibrium, association analysis
V. Just a little bit about linkage
σ2
μ
Variance Decomposition
σp
2
σp =
2
= σg + σe
2
2
Total phenotypic variance
σg
2
= σa + σd
2
2
σa =
Additive genetic variance
σd =
Dominance variance
2
2
AA
AB
BB
-a
d
+a
AA
AB
BB
-a
d
+a
If the heterozygote is half way between the two
homozygotes, there’s a “dose-response” effect, d
is zero, and there is no dominance.
2
σ
a=
2pq[a +
2
σ
d
=
2
d(q-p)]
2
(2pqd)
σg
2
σi =
2
= σa + σd +σ
2
2
2
i
Interaction variance (epistasis)
AA
AB
-a
d
BB
+a
Interaction variance exists when a (or d) is a
function of the genotypes at another locus.
σe
2
σc =
2
σue=
2
= σc + σue
2
2
Common or shared environment
Unique environment
Heritability (h2): the proportion of
the phenotypic variance in a trait
that is attributable to the additive
effects of genes.
Broad sense heritability
σ
g
2
h = 2
σp
2
Additive genetic
(narrow sense) heritability
σ
a
2
h = 2
σp
2
Modeling the Phenotype
p= μ +Σβi xi + a + e
μ Baseline mean
β Regression coefficients
x Scaled covariates
a Additive genetic effects
e Random environmental effects
Modeling phenotypic
covariance
Ω = 2Φσ a + I σ e
2
2
2
σ a = additive genetic variance
2
- environmental influences
σ e = variance due to unique
Relationship
Self
MZ twin pair
Parent-offspring
Siblings
Grandparent-grandchild
Avuncular
Half-siblings
1st cousins
2nd cousins
2φ
1
1
1/2
1/2
1/4
1/4
1/4
1/8
1/32
Hypothesis Testing:
Null hypothesis h2 = 0
Parameters Estimated
Model
σ2a
σ2e
Sporadic
0
+
Additive
+
+
Twice the difference in ln likelihoods between the two
models is distributed as a mixture of chi-square
distributions.
Alcoholism
NIDDM
Body mass index
HDL cholesterol
Thrombosis
Height
h2
0.39
0.49
0.51
0.52
0.61
0.81
Limitation
This model assumes that the only source of
correlation among family members is genetic.
Solution
Add more components to the model.
Shared environment
Modeling the Phenotype:
p= μ +Σβi xi + a +c+e
μ Baseline mean
β Regression coefficients
x Scaled covariates
a Additive genetic effects
c Shared environmental effects
e Random environmental effects
Variance Decomposition
σ p = σa + σc + σe
2
2
2
2
Shared environmental
(household) effects
σ
c
2
c = 2
σp
2
Modeling phenotypic
covariance
Ω = 2Φσ a + Hσ c + I σ e
2
2
2
2
σ a = additive genetic variance
2
σ c = variance due to shared environmental influences
2
- environmental influences
σ e = variance due to unique
Assumption of VC analysis:
Trait is normally distributed.
What happens if this assumption is violated?
4th Central Moment:
Kurtosis
Data Transformation: Waist
Circumference, Serum Leptin
What about those
covariates?
Modeling the Phenotype:
p= μ +Σβi xi + a +c+e
μ
Baseline mean
β
Regression coefficients
x
Scaled covariates
g
Additive genetic effects
c
Shared environmental effects
e
Random environmental effects
What good are covariates?
G5
G2
G1
G3
G4
h2 = G1 + G2 + G3 + G4 + G5
Total
Covariates absorb variance
G5
Age
G2
G1
G3
G4
Sex
h2 = G1 + G2 + G3 + G4 + G5
Total - Age - Sex
Hypothesis Testing:
Null hypothesis: regression coefficient
for covariate = 0
Parameters Estimated
Model
β
Null
0
Alternate
+
Twice the difference in ln likelihoods between the two
models is distributed as a chi-square with 1 df.
Caution! In general, you only
want to use covariates that are
demographic or environmental.
h2 =
G5
G3 + G4
Total - Age - Sex - BMI
BMI
Age
G2
G4
G1
G3
Sex
If we include BMI as a covariate,
we reduce our power to detect G1,
G2, or G5, genes that influence
both BMI and the trait of interest.
Standard association analysis uses
genotype as a covariate
G5
Age
G2
G1
G3
A39T
Sex
Measured genotype
association tests whether the
trait mean differs by genotype,
taking into account the nonindependence between family
members.
QTN-Specific Heritability
σ
qtn
2
hqtn= 2
σp
2
Additive genetic variance due to a QTN
assuming additivity:
AA
-a
AB
BB
0=d
+a
2
σ QTN
= Ha 2 , where
a = 1 / 2 of phenotypic difference between homozygotes
H = 2 p (1− p), i.e. heterozygosity
Genotypes as covariates
If effect of QTL is
modeled as additive:
Genotype
AA
Aa
aa
Cov
0
1
2
Power for Association Studies
Power for association studies is a function of the sample
size, the family configuration, the QTN variance, and the
LD between a QTN and a genotyped marker.
effective h2snp = actual h2qtn * r2
Example: FXII levels by F12
46C/T genotype
CC
CT
TT
FXII levels
128.88
92.23
55.58
p < 1×10 -7
Prothrombin activity levels (%)
Prothrombin levels by G20210A genotype
190
170
150
130
110
90
p < 1´10-7
70
50
G/G
G/A
A/A
Linkage analysis
2
2
2
ˆ
Ω = Πσ qtl + 2Φσ a + I σ e
Identity By Descent
Want to know the proportion of alleles
shared by a relative pair that derive from
a common ancestral source (IBD). This
is in contrast to alleles shared identical
by state (IBS) in which the ancestral
source of the alleles is not considered.
IBS = association
IBD = linkage
πij = IBD probability for individuals i and j
= ½ Pr (1 allele shared IBD)
+ Pr (2 alleles shared IBD)
Variance component-based
linkage analysis
In a region containing a QTL influencing
the trait, relatives who are phenotypically
similar will share more alleles IBD than
relatives who are phenotypically dissimilar
Linkage and association analyses
linkage
analysis
Linkage and association analyses
time
linkage
analysis
Break Time!
Next up: How to actually run these
models/analyses in SOLAR
Downloading SOLAR
http://solar.txbiomedgenetics.org/
SOLAR is available for the following systems:
Linux: Intel or AMD cpu
Solaris 10+: Sun SPARC workstation
Mac OS X 10.4+: Intel cpu
Mac OS X 10.4+: G3-G5 PPC
Solaris x86 10+ OS: Intel or AMD cpu
Windows XP or later (using VMWare Player)
SOLAR Registration
To maxmize any likelihood models, you
will need a SOLAR key, obtained by
emailing [email protected].
SOLAR is funded by an NIH grant and
registration allows us to provide numbers
of users and justify our funding. We do
not give out the SOLAR mailing list and
will send you email maybe once a year or
so letting you know of updates or bugs.
SOLAR is helpful
Documentation is online, but also
available interactively from within SOLAR.
COMMANDS:
solar> help
solar> help XXX
SOLAR data files
Two general file formats:
1) Comma delimited text file
2) PEDSYS files
Five types of files:
1) Pedigree
2) Phenotype
3) Marker
4) Map
5) Freq
Pedigree file
ID, FA, MO, SEX, [FAMID],[MZTWIN],[HHID]
ID,MO,FA,SEX
1,0,0,2
2,0,0,1
3,1,2,1
4,1,2,1
5,0,0,2
6,0,0,1
7,5,6,2
8,5,6,2
1
2
5
3
4
7
6
8
What if you don’t have families?
ID,FA,MO,SEX
sam,,,M
bob,,,M
joshua,,,M
aliesha,,,F
sophie,,,F
ralph,,,M
gladys,,,F
Loading the pedigree file
COMMANDS:
solar> pedigree load
solar> pedigree show
Phenotype file
Must contain ID [and FAMID if needed].
May have anything else you want.
CRUCIAL: MISSING DATA = BLANK,
YES/NO or AFF/UNAFF CODED 1/0 (or 1/2)
ID,LDL,AGE,SMOKING
sam,157,42.5,1
gladys,200,65.34,0
sophie,127,22.1,0
joshua,146,38.5,0
aliesha,,46.2,1
Loading the phenotype file
COMMANDS:
solar> pheno load phenos
solar> stats –all
solar> pheno
SOLAR output files
Housekeeping files: pedindex.out
pedindex.cde pedigree.info
phenotypes.info phi2.gz
Copies of the results shown on the
screen: stats.out
The polygenic model
COMMANDS:
solar> trait LDL
solar> covariate age sex
solar> polygenic [-s]
How polygenic -s deals with
covariates
Parameters Estimated
Model
β1
β2
β3
Full model
+
+
+
Covar 1
0
+
+
Covar 2
+
0
+
Each covariate is dropped out one by one and the
likelihood is compared with that of the full model.
Looking under the hood
COMMAND:
solar> model
Normalization
VC models assume a normally distributed
trait and can be sensitive to kurtosis in
the trait distribution.
COMMANDS:
solar> define LDLnorm = inorm_LDL
Define can be used for general
manipulation of phenotypes:
solar> define newthing = (LDL + AGE)^2
What does inorm really do?
The trait values are sorted, and for any value
V found at position I in the sorted list, a
quantile is computed for it by the formula
I/(N+1). The inverse normal cumulative
density function is computed for each
quantile and stored. When the value V
occurs multiple times, the inverse normal is
computed for each applicable quantile,
averaged, then the average is what is stored.
Marker file
ID [and FAMID if needed]
and genotypes only
ID,rs14756,rs93456,rs34526
bob,AA,GC,AT
gladys,TT,CC,AT
joshua,TA,CG,TT
ralph,AT,CC,TT
sophie,AA,CC,TA
sam,AA,CG,TT
aliesha,TA,CG,AT
Loading the marker file
COMMANDS:
solar> snp load mysnps
solar> snp show
Map file
Header line specifying type of map (cM for
linkage or basepair position) then each
line has marker name and location,
separated by spaces.
SNP basepair
rs14756 45667823
rs34526 40693821
rs93456 45692598
Preparing SNPs for analysis
COMMANDS:
solar> snp covar [-nohaplos]
solar> snp ld [plot]
solar> snp effnum
Simplest way to run association
COMMAND:
solar> mgassoc –files snp.genocov
Measured genotype model
Asks ‘does the trait mean differ by
genotype’?
Likelihood ratio test comparing the
likelihood of a model where a regression
parameter for genotype is estimated to a
model where it is fixed to zero. 1 df chi-sq
ASSUMPTION: additive model of gene
action where heterozygotes are midway
between the two homozygotes
If you have families
COMMANDS:
solar> snp qtldcov
solar> pheno load phenos snp.qtldcov
solar> qtld
Runs measured genotype but also
provides quantitative trait TDT and a test
for stratification.
Making life easier
You can use most Unix commands from
inside SOLAR. Wildcards (*) are an
exception.
You can use TCL scripts to automate any
series of commands and build your own
custom analyses. (See also the ‘toscript’
command.)
Power users – you can directly specify the
mean and variance equations.
Linkage analysis
To quote Monty Python: “Not dead yet!”
Requires families
Obtain MLEs for allele frequencies
Calculate and store estimates of identity by
descent (IBD) allele sharing
Run twopoint or multipoint
See SOLAR tutorial for detailed walk-through.
Email [email protected] for help with
MIBD estimation.
Recap
Download, documentation, etc:
http://solar.txbiomedgenetics.org/
User support:
[email protected]
Title
Stuff