Download genes - Vietsciences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oncogenomics wikipedia , lookup

Human genome wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Gene desert wikipedia , lookup

Genetic testing wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Medical genetics wikipedia , lookup

Genetic drift wikipedia , lookup

Essential gene wikipedia , lookup

Pathogenomics wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Human genetic variation wikipedia , lookup

Epistasis wikipedia , lookup

Genetic engineering wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Gene expression programming wikipedia , lookup

RNA-Seq wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome evolution wikipedia , lookup

Genomic imprinting wikipedia , lookup

Population genetics wikipedia , lookup

Minimal genome wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene wikipedia , lookup

Public health genomics wikipedia , lookup

Behavioural genetics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene expression profiling wikipedia , lookup

Designer baby wikipedia , lookup

Heritability of IQ wikipedia , lookup

Twin study wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genome (book) wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Genetic Epidemiological Strategies
in the Search for Genes
Tuan V. Nguyen
University of New South Wales
Faculty of Medicine
Genes and Diseases
• Many diseases have their roots in gene and
environment.
• Currently, >4000 diseases, including sickle cell
anemia and cystic fibrosis, are known to be
genetic and are passed on in families.
Genes and Medical Sciences
The central question for the medical sciences is
the extent to which it will be possible to relate
events at the molecular level with the clinical
findings or phenotypes of patients with
particular diseases.
Contents
• Genes and DNA
• Detection of genetic effects
• Search for specific genes
Chromosomes
Each human cell contains 23 pairs of chromosomes
(distinguished by size and banding pattern). This is for males.
Females have two XX chromosomes
DNA and Genes
• DNA carries the instructions
that allow cells to make
proteins.
• DNA is made up of 4
chemical bases (A, T, G, C).
• The bases make “words”:
AGT CTC GAA TAA
• Words make “sentence” =
genes:
< AGT CTC GAA TAA>
Genes, Alleles, and Genotypes
• Location of a gene is called locus.
• Alleles are alternate forms of a gene. Example: A,
a
• Genotype: the maternal and paternal alleles of an
individual at a locus defines the genotype of the
individual at that locus. Example: AA, Aa, aa.
How Do Genes Work?
• Genes tell cell how to make
molecules, called proteins.
• Protein allows cells to perform
specific functions.
• If the instructions are fine, things
will be normal. If the instructions
are changed (mutated),
abnormality will be resulted.
Inheritance
• The passing of genes from parents to child is the
basis of inheritance.
• We are not identical to our parents: half of our
genes are from our mothers and half from our
fathers.
• Each brother and sister inherits different
combination of chromosomes. N = 2^23 =
8,388,608 combinations.
• Identical twins receive exactly the same
combination of genes from their parents.
Genetic effects
• Three types of gene action: additive, dominant,and
epistasis.
• Additive effect.
– AA: 9, Aa = 7, aa = 5.
• Dominant effect.
– AA: 9, Aa = 9, aa = 5.
• Epistasis: interaction of alleles ar 2 loci
– For locus 1: AA: 9, Aa = 7, aa = 5.
– For locus 2: AA: 5, Aa = 5, aa = 9.
How to detect genetic effects?
Clues to Genetics and Environment
Epidemiol characteristics
Geographic variation
Ethnic variation
Temporal variation
Epidemics
Social class variation
Gender variation
Age
Family variables
History of disease
Birth order
Birth interval
Co-habitation
Genetics
+
+
+/+
+/+
+/-
Environment
+
+
+
+
+
+
+
+
+
+
+
Methods of Investigation of Genetic Traits
• Family studies.
Examine phenotypes (diseases) in the
relatives of affected subjects (probands).
• Twin studies.
Examine the intraclass correlation between
MZ (who share 100% genotypes) and DZ twins (who share
50% genotypes).
• Adoption studies.
Seek to distinguish genetic from
environmental effects by comparing phenotypes in children
more closely resemble their biological than adoptive parents.
• Offspring of discordant MZ twins. Control for
environmental effect; test for large genetic contribution to
etiology.
Basic Genetic-Environmental Model
Phenotype (P) = Genetics + Environment
Genetics = Additive (A) + Dominant (D)
Environment = Common (C) + Specific (E)
=> P = A + D + C + E
Statistical Genetic Model
Cov(Yi,Yj) = 2Fijs2(a) + Dijs2(d) + gijs2(c) + dijs2(e)
Fij : kinship coefficient
Dij : Jacquard’s coefficient of identical-by-descent
gij : Probability of sharing environmental factors
dij : Residual coefficient
VP = VA + VD + VC + VE
V = variance; P = Phenotype; A, D, C, E = as defined
Kinship coefficients
Expected coefficient for
Relative
Spouse-spouse
Parent-offspring
Full sibs
Half-sibs
Aunt-niece
First cousins
Dizygotic twins
Monozygotic twins
s2(a)
0
1/2
1/2
1/4
1/4
1/8
1/2
1
s2(d)
0
0
1/4
0
0
0
1/4
1
s2(c)
1
1
1
1
1
0
1
1
Heritability (H2)
Cov(Yi,Yj) = 2Fijs2(a) + Dijs2(d) + gijs2(c) + dijs2(e)
VP = VA + VD + VC + VE
Broad-sense heriatbility: H2 = (VA+ VD) / VP
Narrow-sense heriatbility: H2 = VA / VP
Statistical Methods for Estimating Heritability
• Simple linear regression
Yoffp = b(Yp ) + e
H2 = 2b
• Twin concordance
Intraclass correlation: rMZ and rDZ
H2 = 2(rMZ - rDZ)
• Path analysis and variance component model
Path Model for Twin Data
r=1
r = .5 / .25
r = 1 / .5
E1
C1
a
c
D1
d
Twin 1
A1
e
A2
D2
a
C2
d
c
E2
e
Twin 2
A=additive; D=dominant; C=common environment; E=specific environment
Intraclass Correlation:
Femoral neck bone mass
MZ
1.4
1.4
rMZ = 0.73
rMZ = 0.47
1.3
1.2
1.2
1.1
1.1
1
Twin 2
Twin 2
1.3
DZ
0.9
1
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Twin 1
Twin 1
Genetic Determination of Lean, Fat and Bone Mass
rMZ
rDZ
H2 (%)
Lumar spine BMD
0.74 (0.06)
0.48 (0.10)
77.8
Femoral neck BMD
0.73 (0.06)
0.47 (0.11)
76.4
Total body BMD
0.80 (0.05)
0.48 (0.10)
78.6
Lean mass
0.72 (0.06)
0.32 (0.12)
83.5
Fat mass
0.62 (0.08)
0.30 (0.12)
64.8
rMZ, rDZ : Intraclass correlation for MZ and DZ twins
Multivariate Analysis:
The Cholesky Decomposition Model
G1
G2
G3
G4
G5
Lean
mass
Fat
mass
LS
BMD
FN
BMD
TB
BMD
E1
E2
E3
E4
E5
LS=lumbar spine, FN=femoral neck, TB=total body, BMD = bone mineral density
Genetic and Environmental Correlation between
Lean, Fat and Bone Mass
LM
Lean mass (LM)
FM
LS
0.52
0.39
0.23
0.51
0.41
0.36
0.70
0.57
0.70
Ft mass (FM)
0.16
Lumbar spine BMD (LS)
0.08
0.02
Femoral neck BMD (FN)
0.16
0.05
0.64
Total body BMD (TB)
0.09
0.31
0.75
FN
TB
0.61
0.58
Strategies for finding genes
How many genes?
• Initial estimate: 120,000.
• DNA sequence: 60,000 - 70,000.
• HGP: 32,000 - 39,000 (including nonfunctional genes = inactive genes).
Distribution of the number of genes
Polygenes
Number of genes
Oligogenes
Major genes
Effect size
Finding genes: a challenge
One of the most difficult challenges ahead is to
find genes involved in diseases that have a
complex pattern of inheritance, such as those
that contribute to osteoporosis, diabetes,
asthma, cancer and mental illness.
Why Search for Genes?
• Scientific value
• Study genes’ actions at the molecular level
• Therapeutic value
• Gene product and development of new drugs;
• Gene therapy
• Public health
• Identification of “high-risk” individuals
• Interaction between genes and environment
Genomewise screening vs
Candidate aene approach
• Genomewise screening
• No physiological assumption
• Systematic screening for chromosomal regions of
interest in the entire genome
• Candidate gene
• Proven or hypothetical physiological mechanism
• Direct test for individual genes
Linkage vs Association
• Linkage
• Transmission of genes within pedigrees
• Association
• Difference in allele frequencies between cases and
unrelated controls
Statistical models
• Linkage analysis traces cosegregation and
recombination phenomena between observed markers and
unobserved putative trait. Significance is shown by a LOD
(log-odds) score.
• Association analysis compares the frequencies of
alleles between unrelated cases (diseased) and controls.
• Transmission disequilibrium test (TDT)
examines the transmission of alleles from heterozygous
parents to those children exhibiting the phenotype of
interest.
Two-point linkage analysis: an example
D
142
D
d
134 142
138 /142
??
134 /142
142 /146 142 /154
Non
Rec
134 / 146
Non
142 / 154
Non
146 / 154
134 / 146
Non
134 / 154 134 / 146 134 / 154
Non
Non = non-recombination; Rec = recombination
Rec
Non
No linkage
D
Complete linkage
d
D
d
134
1/4
1/4
134
0
1/2
142
1/4
1/4
142
1/2
0
Incomplete linkage
134
D
d
q/2
(1-q)/2
6
LOD  log 10
142
(1-q)/2
q/2
1 θ   θ 

  
 2  2
8
1
 
4
2
Estimation of q
Max LOD score
+6
+4
LOD
score
+2
0
-2
-4
-6
0
0.1
0.2
0.3
0.4
Estimated value of q
0.5
Basic linkage model
LR: likelihood ratio
LR(q) = L(data | q) / L(data | q =
0.5)
LOD = Log10 max [LR(q)]
Haseman-Elston model
(allele sharing method)
Xi1 = value of sib 1; Xi2 = value of sib 2
Di = abs(Xi1 - Xi2)2
pi = probability of genes shared identical-by-descent
E(Di | pi) = a + b
pi
If b = 0
If b < 0
=>
=>
s2(g) = 0; q = 0.5, i.e. No linkage
s2(g) > 0; q ne 0.5, i.e. Linkage
Behav Genet 1972; 2:3-19
Identical-by-descent (IBD)
126 / 130
126 / 134
A
126 / 138
B
134 / 138
130 / 134
C
130 / 138
D
126 / 138
E
Alleles ibd if they are identical and descended from the same ancestral allele
• A and D share no alleles
• A, B and E share 1 allele (126) ibd; C vs D; A vs C; B, D and E
• B and E share 2 (126 and 138) alleles ibd
Identical-by-state (IBS)
126 / 126
126 / 126
A
126 / 138
B
126 / 138
126 / 138
C
126 / 126
D
Alleles ibs if they are identical, but their ancestral derivation is unclear
• A and D share 1 allele (126) ibs
• B and C share 126 ibs, 138 ibd
Sibpair linkage analysis:
allele-sharing method
Squared
difference
in BMD
among
siblings
o
oo
oo
oo
o
o
o
oo
oo
oo
o
o
o
oo
oo
oo
o
o
0
1
2
Number of alleles shared IBD
Intrapair difference (%)
25
20
15
10
5
0
0
1
2
Alleles shared IBD
Linkage between VDR gene and lumbar spine bone mineral density
in a sample of 78 DZ twin pairs.
Nature 1994; 367:284-287
Association analysis
• Presence/absence of an allele in a phenotype.
Genotype
Fx
No Fx
BB
Bb
bb
Total
50
30
20
100
10
30
60
100
Frequency of allele B among fx: (50x2 + 30) / (100x2) = 0.65
Freq. of allele B among no fx: (10x2 + 30) / (100x2) = 0.25
Association analysis: an example
1.1
g/cm2
1
0.9
0.8
BB
Bb
bb
VDR genotype
Association between vitamin D receptor gene and bone mineral density
Association analysis
• Three conditions of association
• The genetic marker is the putative gene
• The marker is in linkage disequilibrium (association)
with the putative gene or with a nearby locus
• Random artefact, population admixture
Linkage and association
• Linkage without association
• Many trait-causing loci
• Association between a marker and a loci can be weak or
absent
• Association without linkage
• A minor effect of the genetic marker
• Poor discriminant power for phenotype within a pedigree
Statistical issues
Diagnostic reasoning
Test
Disease is really
Present
Absent
Statistical reasoning
Stat test
Null hypothesis (Ho) is
Not true
True
______________________________________________
______________________________________________
+ve
-ve
Reject Ho
Accept Ho
True +ve False +ve
False -ve True -ve
______________________________________________
No error
Type I (a)
Type II (b) No error
______________________________________________
Study design: minimize type I and type II errors
No. of sibpairs required to establish linkage
for a single gene and recombination = 0
l
LOD = 3
LOD = 4
1.1
1.2
1.3
1.5
2.0
1.5
3.0
7460
2048
1033
489
199
191
88
8931
2566
1299
615
242
154
115
l = familial relative risk
Strategies for improvement of power
• Population and sampling
• Phenotypes
• Statistical analysis
Population and sampling
• Population
• Homogenous populations
• Sampling units
• Related members
• Large, multigenerational families (rather than
sibpairs)
• Phenotypes
• Low-level, intermediate
• Well-defined and highly reproducible
Statistical analyses
• Multivariate analysis vs. univariate analysis
• Variance component model
• Power
• Locus-specific power: probability of detecting an
individual locus associated with the trait, e.g. 1-bi
• Genomewide power: probability of detecting any of the k
loci, e.g. 1-b1 x b2 x b3 x … x bk
• Studywise power: probability of detecting all k loci, e.g.
(1-b1) x (1-b2) x (1-b3) x ... x (1-bk)
Summary
• Most diseases are regulated by genes and
environment.
• Genetic dissection of multifactorial diseases
is a challenge.
• Gene-hunting is a major endeavour in
epidemiological research.
• Substantial progress in statistical models.
Perspective
•
•
•
•
Can genes be found?
The Human Genome Project
Influences of biotechnology
Should “epidemiology” become “genetic
epidemiology”?
Further readings
• BMJ 2001; 322: 28 April. Special issue on genetics.
• Nguyen TV, Eisman JA. Genetics of fracture:
challenges and opportunities. J Bone Miner Res
2000; 15:1253-1256.
• Nguyen TV, Blangero J, Eisman JA. Genetic
epidemiological approaches to the search for
osteoporosis genes. J Bone Miner Res 2000;
15:392-401.
• Nguyen TV, et al. Bone mass, lean mass and fat
mass: same genes or same environment. Amer J
Epidemiol 1998; 147:3-16.