* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Intro
Minimal genome wikipedia , lookup
Medical genetics wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Genetic drift wikipedia , lookup
Neocentromere wikipedia , lookup
Genomic imprinting wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Frameshift mutation wikipedia , lookup
X-inactivation wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Genomic library wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Behavioural genetics wikipedia , lookup
Point mutation wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Gene expression programming wikipedia , lookup
Genetic testing wikipedia , lookup
Genetic code wikipedia , lookup
Human genome wikipedia , lookup
Non-coding DNA wikipedia , lookup
Population genetics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Heritability of IQ wikipedia , lookup
Genetic engineering wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome editing wikipedia , lookup
Designer baby wikipedia , lookup
Genome evolution wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Human genetic variation wikipedia , lookup
Public health genomics wikipedia , lookup
Microsatellite wikipedia , lookup
History of genetic engineering wikipedia , lookup
Introduction to Linkage Analysis March 2002 3 Stages of Genetic Mapping Are there genes influencing this trait? Where are those genes? Epidemiological studies Linkage analysis What are those genes? Association analysis Where are those genes? Outline How is genetic information organized? Examples of genetic variation Chromosomes Sequence Changes that have observable effects Genetic markers Linkage analysis Strategy for surveying variation in families Genetic Information Human Genome 22 autosomes X and Y Sequence of 3 x 109 base-pairs ~17-20 bp can identify unique sequence in the genome Variation Most sequence is conserved across individuals 1 in 103 base-pairs differs between chromosomes DNA Polymer of 4 bases Purines Pyrimidines (A) – Adenine (G) – Guanine (C) – Cytosine (T) – Thymine Double Helix Complementary Strands Hydrogen Bonds Some Types of DNA Sequence Genes ~30,000 in humans Exons, translated into protein Introns, transcribed into RNA, but not protein Promoters Enhancers Repeat DNA Pseudogenes Genetic Code DNA RNA Protein DNA: 4 bases (A,T,C,G) RNA: 4 bases (A,U,C,G) Proteins: 20 amino-acids Universal Genetic Code Translation between DNA/RNA and protein Three bases code for one amino-acid Genetic Code Example of CFTR Variants Position 482 1609 1654 2566 3659 Mutation G->C C->T Deletion of 3 nucleotides AT insertion C deletion Effect Arg-117 -> His-177 Gln-493 -> STOP Deletion of Phe-508 Frameshift Frameshift Phenotype vs. Genotype Genotype Phenotype Underlying genetic constitution Observed manifestation of a genotype Different changes within CFTR all lead to cystic fibrosis phenotype Common types of DNA variants Tandem repeats Microsatellites Single nucleotide polymorphisms Insertions Deletions Repeat Length Polymorphisms Variable Number Tandem Repeats VNTRs Typical repeat units of 10 – 100s bp E.g.: ~110 bp repeat in IL1RN gene Microsatellites Simple repeat sequences Most popular are 2, 3 or 4 bp E.g.: ACACACAC … D naming scheme (e.g., D2S160) Microsatellites Most popular markers for linkage analysis Large number of alleles (10 is common) Can distinguish and track individual chromosomes in families Relatively abundant ~15,000 mapped loci SNPs Single Nucleotide Polymorphisms Change one nucleotide Insert Delete Replace it with a different nucleotide Many have no phenotypic effect Some can disrupt or affect gene function A little more on SNPs Most SNPs have only two alleles Easy to automate their scoring Becoming extremely popular Typing Methods Sequencing Restriction Site Hybridization Classifying Genotypes Each individual carries two alleles Homozygotes If there are n alternative alleles … … there will be n (n + 1) / 2 possible genotypes 3 possible genotypes for SNPs, typically more for microsatellites and VNTRs The two alleles are the same Heterozygotes The two alleles are different Genes in an individual Sexual reproduction One copy inherited from father One copy inherited from mother Each individual has 2 copies of each chromosome 2 copies of each gene These copies may be similar or different Meiosis Leads to formation of haploid gametes from diploid cells Assortment of genetic loci Recombination or crossover What happens in meiosis… Recombination Non-Recombinant Gametes Recombinant Gametes / / / / 1- Recombination Actual No. of recombinants between two locations An average of one per Morgan Observed Usually, only odd / even number of crossovers between two locations can be established Recombination and Map Distance Observed Recombination 1.00 0.80 0.60 0.40 0.20 0.00 0.00 0.20 0.40 0.60 Distance 0.80 1.00 Intuition for Linkage Analysis Millions of variations that could be responsible for disease Impractical to investigate individually Within families, they organized into limited number of haplotypes Sample modest number of markers to determine whether each stretch of chromosome is shared Tracing Chromosomes Tracing Chromosomes 1 2 1 3 1 4 3 4 2 3 1 3 5 6 3 5 1 5 IBD At each location, try to establish whether siblings (or twins) share 0, 1 or 2 chromosomes Inference may be probabilistic Example of Scoring IBD Parental genotypes are available A/C A/C Siblings are IBD = 2 Share maternal and paternal chromosomes A/A A/A Example of Scoring IBD II Parental genotypes unavailable IBD between siblings may be 0, 1 or 2 Likelihood of each outcome depends on frequency of allele A A/A A/A Example of IBD scoring III Looking at multiple consecutive markers helps infer IBD Especially without parental genotypes IBD = 2 may be quite likely A/A C/G A/T G/G A/A C/G A/T G/G Notation - IBD sharing (0, ½ and 1) Z0 - probability = 0 Z1 - probability = ½ Z2 - probability = 1 ˆ Z2 12 Z1, estimated IBD sharing Typical IBD information Pair Chr. Pos (cM) z0 z1 z2 pi-hat 5378-5479 5378-5479 5378-5479 5378-5479 5378-5479 3 3 3 3 3 10 20 30 40 50 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.50 1.00 0.98 0.99 0.99 0.50 0.00 0.01 0.995 0.995 0.750 0.500 0.500 Model = 0.0, 0.5, 1.0 0.5 [DZ], 1.0 [MZ] 1.0 Q A C Twin 1 E E C A Twin 2 Q No Linkage Linkage Hypothesis Test evidence for linked genetic effect Fit two models Full model (Q,A,C,E) Restricted model (A,C,E) Maximum likelihood test Compare likelihoods using ² Analysis Estimate For example, using Genehunter or Merlin Test hypothesis at each location along chromosome Summarize results in linkage curve Chi-squared is 50:50 mixture of 1 df and point mass zero Lod scores Often, report results as lod scores LOD log 10 4.6 L(Q, A, C , E ) L( A, C , E ) 2 Genome is large, many locations tested Threshold for significance is usually LOD > ~3 Sample Linkage Curve LOD