* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download genes - Vietsciences
Oncogenomics wikipedia , lookup
Human genome wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Gene desert wikipedia , lookup
Genetic testing wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Medical genetics wikipedia , lookup
Genetic drift wikipedia , lookup
Essential gene wikipedia , lookup
Pathogenomics wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Human genetic variation wikipedia , lookup
Genetic engineering wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Gene expression programming wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Ridge (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome evolution wikipedia , lookup
Genomic imprinting wikipedia , lookup
Population genetics wikipedia , lookup
Minimal genome wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Public health genomics wikipedia , lookup
Behavioural genetics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene expression profiling wikipedia , lookup
Designer baby wikipedia , lookup
Heritability of IQ wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genetic Epidemiological Strategies in the Search for Genes Tuan V. Nguyen University of New South Wales Faculty of Medicine Genes and Diseases • Many diseases have their roots in gene and environment. • Currently, >4000 diseases, including sickle cell anemia and cystic fibrosis, are known to be genetic and are passed on in families. Genes and Medical Sciences The central question for the medical sciences is the extent to which it will be possible to relate events at the molecular level with the clinical findings or phenotypes of patients with particular diseases. Contents • Genes and DNA • Detection of genetic effects • Search for specific genes Chromosomes Each human cell contains 23 pairs of chromosomes (distinguished by size and banding pattern). This is for males. Females have two XX chromosomes DNA and Genes • DNA carries the instructions that allow cells to make proteins. • DNA is made up of 4 chemical bases (A, T, G, C). • The bases make “words”: AGT CTC GAA TAA • Words make “sentence” = genes: < AGT CTC GAA TAA> Genes, Alleles, and Genotypes • Location of a gene is called locus. • Alleles are alternate forms of a gene. Example: A, a • Genotype: the maternal and paternal alleles of an individual at a locus defines the genotype of the individual at that locus. Example: AA, Aa, aa. How Do Genes Work? • Genes tell cell how to make molecules, called proteins. • Protein allows cells to perform specific functions. • If the instructions are fine, things will be normal. If the instructions are changed (mutated), abnormality will be resulted. Inheritance • The passing of genes from parents to child is the basis of inheritance. • We are not identical to our parents: half of our genes are from our mothers and half from our fathers. • Each brother and sister inherits different combination of chromosomes. N = 2^23 = 8,388,608 combinations. • Identical twins receive exactly the same combination of genes from their parents. Genetic effects • Three types of gene action: additive, dominant,and epistasis. • Additive effect. – AA: 9, Aa = 7, aa = 5. • Dominant effect. – AA: 9, Aa = 9, aa = 5. • Epistasis: interaction of alleles ar 2 loci – For locus 1: AA: 9, Aa = 7, aa = 5. – For locus 2: AA: 5, Aa = 5, aa = 9. How to detect genetic effects? Clues to Genetics and Environment Epidemiol characteristics Geographic variation Ethnic variation Temporal variation Epidemics Social class variation Gender variation Age Family variables History of disease Birth order Birth interval Co-habitation Genetics + + +/+ +/+ +/- Environment + + + + + + + + + + + Methods of Investigation of Genetic Traits • Family studies. Examine phenotypes (diseases) in the relatives of affected subjects (probands). • Twin studies. Examine the intraclass correlation between MZ (who share 100% genotypes) and DZ twins (who share 50% genotypes). • Adoption studies. Seek to distinguish genetic from environmental effects by comparing phenotypes in children more closely resemble their biological than adoptive parents. • Offspring of discordant MZ twins. Control for environmental effect; test for large genetic contribution to etiology. Basic Genetic-Environmental Model Phenotype (P) = Genetics + Environment Genetics = Additive (A) + Dominant (D) Environment = Common (C) + Specific (E) => P = A + D + C + E Statistical Genetic Model Cov(Yi,Yj) = 2Fijs2(a) + Dijs2(d) + gijs2(c) + dijs2(e) Fij : kinship coefficient Dij : Jacquard’s coefficient of identical-by-descent gij : Probability of sharing environmental factors dij : Residual coefficient VP = VA + VD + VC + VE V = variance; P = Phenotype; A, D, C, E = as defined Kinship coefficients Expected coefficient for Relative Spouse-spouse Parent-offspring Full sibs Half-sibs Aunt-niece First cousins Dizygotic twins Monozygotic twins s2(a) 0 1/2 1/2 1/4 1/4 1/8 1/2 1 s2(d) 0 0 1/4 0 0 0 1/4 1 s2(c) 1 1 1 1 1 0 1 1 Heritability (H2) Cov(Yi,Yj) = 2Fijs2(a) + Dijs2(d) + gijs2(c) + dijs2(e) VP = VA + VD + VC + VE Broad-sense heriatbility: H2 = (VA+ VD) / VP Narrow-sense heriatbility: H2 = VA / VP Statistical Methods for Estimating Heritability • Simple linear regression Yoffp = b(Yp ) + e H2 = 2b • Twin concordance Intraclass correlation: rMZ and rDZ H2 = 2(rMZ - rDZ) • Path analysis and variance component model Path Model for Twin Data r=1 r = .5 / .25 r = 1 / .5 E1 C1 a c D1 d Twin 1 A1 e A2 D2 a C2 d c E2 e Twin 2 A=additive; D=dominant; C=common environment; E=specific environment Intraclass Correlation: Femoral neck bone mass MZ 1.4 1.4 rMZ = 0.73 rMZ = 0.47 1.3 1.2 1.2 1.1 1.1 1 Twin 2 Twin 2 1.3 DZ 0.9 1 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 Twin 1 Twin 1 Genetic Determination of Lean, Fat and Bone Mass rMZ rDZ H2 (%) Lumar spine BMD 0.74 (0.06) 0.48 (0.10) 77.8 Femoral neck BMD 0.73 (0.06) 0.47 (0.11) 76.4 Total body BMD 0.80 (0.05) 0.48 (0.10) 78.6 Lean mass 0.72 (0.06) 0.32 (0.12) 83.5 Fat mass 0.62 (0.08) 0.30 (0.12) 64.8 rMZ, rDZ : Intraclass correlation for MZ and DZ twins Multivariate Analysis: The Cholesky Decomposition Model G1 G2 G3 G4 G5 Lean mass Fat mass LS BMD FN BMD TB BMD E1 E2 E3 E4 E5 LS=lumbar spine, FN=femoral neck, TB=total body, BMD = bone mineral density Genetic and Environmental Correlation between Lean, Fat and Bone Mass LM Lean mass (LM) FM LS 0.52 0.39 0.23 0.51 0.41 0.36 0.70 0.57 0.70 Ft mass (FM) 0.16 Lumbar spine BMD (LS) 0.08 0.02 Femoral neck BMD (FN) 0.16 0.05 0.64 Total body BMD (TB) 0.09 0.31 0.75 FN TB 0.61 0.58 Strategies for finding genes How many genes? • Initial estimate: 120,000. • DNA sequence: 60,000 - 70,000. • HGP: 32,000 - 39,000 (including nonfunctional genes = inactive genes). Distribution of the number of genes Polygenes Number of genes Oligogenes Major genes Effect size Finding genes: a challenge One of the most difficult challenges ahead is to find genes involved in diseases that have a complex pattern of inheritance, such as those that contribute to osteoporosis, diabetes, asthma, cancer and mental illness. Why Search for Genes? • Scientific value • Study genes’ actions at the molecular level • Therapeutic value • Gene product and development of new drugs; • Gene therapy • Public health • Identification of “high-risk” individuals • Interaction between genes and environment Genomewise screening vs Candidate aene approach • Genomewise screening • No physiological assumption • Systematic screening for chromosomal regions of interest in the entire genome • Candidate gene • Proven or hypothetical physiological mechanism • Direct test for individual genes Linkage vs Association • Linkage • Transmission of genes within pedigrees • Association • Difference in allele frequencies between cases and unrelated controls Statistical models • Linkage analysis traces cosegregation and recombination phenomena between observed markers and unobserved putative trait. Significance is shown by a LOD (log-odds) score. • Association analysis compares the frequencies of alleles between unrelated cases (diseased) and controls. • Transmission disequilibrium test (TDT) examines the transmission of alleles from heterozygous parents to those children exhibiting the phenotype of interest. Two-point linkage analysis: an example D 142 D d 134 142 138 /142 ?? 134 /142 142 /146 142 /154 Non Rec 134 / 146 Non 142 / 154 Non 146 / 154 134 / 146 Non 134 / 154 134 / 146 134 / 154 Non Non = non-recombination; Rec = recombination Rec Non No linkage D Complete linkage d D d 134 1/4 1/4 134 0 1/2 142 1/4 1/4 142 1/2 0 Incomplete linkage 134 D d q/2 (1-q)/2 6 LOD log 10 142 (1-q)/2 q/2 1 θ θ 2 2 8 1 4 2 Estimation of q Max LOD score +6 +4 LOD score +2 0 -2 -4 -6 0 0.1 0.2 0.3 0.4 Estimated value of q 0.5 Basic linkage model LR: likelihood ratio LR(q) = L(data | q) / L(data | q = 0.5) LOD = Log10 max [LR(q)] Haseman-Elston model (allele sharing method) Xi1 = value of sib 1; Xi2 = value of sib 2 Di = abs(Xi1 - Xi2)2 pi = probability of genes shared identical-by-descent E(Di | pi) = a + b pi If b = 0 If b < 0 => => s2(g) = 0; q = 0.5, i.e. No linkage s2(g) > 0; q ne 0.5, i.e. Linkage Behav Genet 1972; 2:3-19 Identical-by-descent (IBD) 126 / 130 126 / 134 A 126 / 138 B 134 / 138 130 / 134 C 130 / 138 D 126 / 138 E Alleles ibd if they are identical and descended from the same ancestral allele • A and D share no alleles • A, B and E share 1 allele (126) ibd; C vs D; A vs C; B, D and E • B and E share 2 (126 and 138) alleles ibd Identical-by-state (IBS) 126 / 126 126 / 126 A 126 / 138 B 126 / 138 126 / 138 C 126 / 126 D Alleles ibs if they are identical, but their ancestral derivation is unclear • A and D share 1 allele (126) ibs • B and C share 126 ibs, 138 ibd Sibpair linkage analysis: allele-sharing method Squared difference in BMD among siblings o oo oo oo o o o oo oo oo o o o oo oo oo o o 0 1 2 Number of alleles shared IBD Intrapair difference (%) 25 20 15 10 5 0 0 1 2 Alleles shared IBD Linkage between VDR gene and lumbar spine bone mineral density in a sample of 78 DZ twin pairs. Nature 1994; 367:284-287 Association analysis • Presence/absence of an allele in a phenotype. Genotype Fx No Fx BB Bb bb Total 50 30 20 100 10 30 60 100 Frequency of allele B among fx: (50x2 + 30) / (100x2) = 0.65 Freq. of allele B among no fx: (10x2 + 30) / (100x2) = 0.25 Association analysis: an example 1.1 g/cm2 1 0.9 0.8 BB Bb bb VDR genotype Association between vitamin D receptor gene and bone mineral density Association analysis • Three conditions of association • The genetic marker is the putative gene • The marker is in linkage disequilibrium (association) with the putative gene or with a nearby locus • Random artefact, population admixture Linkage and association • Linkage without association • Many trait-causing loci • Association between a marker and a loci can be weak or absent • Association without linkage • A minor effect of the genetic marker • Poor discriminant power for phenotype within a pedigree Statistical issues Diagnostic reasoning Test Disease is really Present Absent Statistical reasoning Stat test Null hypothesis (Ho) is Not true True ______________________________________________ ______________________________________________ +ve -ve Reject Ho Accept Ho True +ve False +ve False -ve True -ve ______________________________________________ No error Type I (a) Type II (b) No error ______________________________________________ Study design: minimize type I and type II errors No. of sibpairs required to establish linkage for a single gene and recombination = 0 l LOD = 3 LOD = 4 1.1 1.2 1.3 1.5 2.0 1.5 3.0 7460 2048 1033 489 199 191 88 8931 2566 1299 615 242 154 115 l = familial relative risk Strategies for improvement of power • Population and sampling • Phenotypes • Statistical analysis Population and sampling • Population • Homogenous populations • Sampling units • Related members • Large, multigenerational families (rather than sibpairs) • Phenotypes • Low-level, intermediate • Well-defined and highly reproducible Statistical analyses • Multivariate analysis vs. univariate analysis • Variance component model • Power • Locus-specific power: probability of detecting an individual locus associated with the trait, e.g. 1-bi • Genomewide power: probability of detecting any of the k loci, e.g. 1-b1 x b2 x b3 x … x bk • Studywise power: probability of detecting all k loci, e.g. (1-b1) x (1-b2) x (1-b3) x ... x (1-bk) Summary • Most diseases are regulated by genes and environment. • Genetic dissection of multifactorial diseases is a challenge. • Gene-hunting is a major endeavour in epidemiological research. • Substantial progress in statistical models. Perspective • • • • Can genes be found? The Human Genome Project Influences of biotechnology Should “epidemiology” become “genetic epidemiology”? Further readings • BMJ 2001; 322: 28 April. Special issue on genetics. • Nguyen TV, Eisman JA. Genetics of fracture: challenges and opportunities. J Bone Miner Res 2000; 15:1253-1256. • Nguyen TV, Blangero J, Eisman JA. Genetic epidemiological approaches to the search for osteoporosis genes. J Bone Miner Res 2000; 15:392-401. • Nguyen TV, et al. Bone mass, lean mass and fat mass: same genes or same environment. Amer J Epidemiol 1998; 147:3-16.