* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Cancer epigenetics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
DNA polymerase wikipedia , lookup
Population genetics wikipedia , lookup
Genomic library wikipedia , lookup
Point mutation wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
DNA barcoding wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
DNA vaccination wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
DNA profiling wikipedia , lookup
Epigenomics wikipedia , lookup
Molecular cloning wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
SNP genotyping wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Helitron (biology) wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Non-coding DNA wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
DNA supercoil wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Metagenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Deoxyribozyme wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Improving accuracy of DNA-based methods for population estimation: Incorporating genotype uncertainty into Mark-Recapture models Janine Wright, Richard Barker, Matthew Schofield Department of Mathematics and Statistics, University of Otago, NZ Andrea Byrom Landcare Research, Lincoln, NZ Dianne Gleeson Ecological Genetics Laboratory Landcare Research, Auckland, NZ Why use DNA to monitor wildlife? • DNA can be collected from non-invasive samples in the field (e.g. hair, faeces) • Relatively inexpensive methods for developing molecular markers • Enables ID of individuals in a wildlife population • Feasible to generate count data for cryptic, low-density, or hard-to-trap species • Free from bias Avoiding bias by using DNA • Variation in trappability of individual animals • Can create bias in estimates of trapcatch post-control • Aims: – Develop an unbiased method of measuring absolute numbers of animals present – Identify potential biases in trap-catch estimates – To develop a set of reliable microsatellite markers for pests in New Zealand – start with stoats and possums and extend to other species (e.g. cats, rats, pigs, goats) Molecular Markers – What is a microsatellite? • A repeated sequence of 2-5 nucleotides e.g. ACACACACACACACAC = AC8 • Usable repeat lengths are 8-40 copies • Occur in many locations in genome, usually in non-coding regions • Mutation prone (slippage replication) (High mutation rate – 10-2 to 10-5) • Thus any given population may contain variants of differing sizes • Size variants = ‘alleles’ • Typical vertebrate populations have 5-15 alleles at each locus (locus = position in genome) • Each individual possesses two alleles at each locus (maternally and paternally inherited) • Can see if an individual is homozygous or heterozygous at each locus homozygous = both alleles identical heterozygous = different alleles Gives a ‘genotype’ (tag) for each individual Laboratory Procedure (Part 1) • Extract DNA from possum faecal pellet or ear tissue • Run DNA sample through Real Time PCR machine with known DNA standards to quantify (twice) • Calculate amount (>200pg) to add to PCR reaction Laboratory Procedure (Part 2) • Use PCR to amplify microsatellite products at 7 loci (repeated twice) • Run on agarose gel to confirm success of amplification and to determine amount required for sequencing • Run on sequencer • Analyse using GeneMapper software and by eye How many loci to study? • Between 4 and 12 depending on species being studied (differing amounts of variability) • More loci = less chance of genotypes matching by chance => but more chance of error Challenges for DNA methods Four types of error can occur: 1. Laboratory/recording error – thought to be negligible 2. Sample contamination – also unlikely 3. ‘Shadow effect’ – not enough loci/alleles used results in several individuals sharing the same genetic tag 4. ‘Allelic drop-out’ Challenges for DNA methods Four types of error can occur: 1. Laboratory/recording error – thought to be negligible 2. Sample contamination – also unlikely 3. ‘Shadow effect’ – not enough loci/alleles used results in several individuals sharing the same genetic tag 4. ‘Allelic drop-out’ Allelic drop-out What is it? • Failure of DNA amplification at one or more loci • More likely with a lower concentration of DNA • Only one allele detected in a heterozygous individual (observed # of homozygous individuals is higher) Allelic drop-out Why is it a problem? Leads to overestimation of population size (may be >5-fold): (Creel et al., 2003) 1. Incorrect genotypes lead to encounters of ‘new’ individuals 2. False decrease in probability of recapture (recaptured but thought to be new individual) Overcoming allelic drop-out • Quantitative PCR approach designed to measure amount of amplifiable nuclear DNA present in faecal samples of possums • Possum-specific piece of DNA used as target sequence with specific TaqMan assay primers and probe • Duplicate standards of known DNA amounts included in each set of samples to produce a standard curve 200 pg How does this relate to mark-recapture? • Rejection rate of samples may be high when screened through quantitative PCR (samples rejected if <200pg of DNA) - 53% for possum faecal samples • These low concentration samples still contain ‘usable’ information in the form of partial genotypes despite <99% accuracy of tag ID (genotype) • Error rate (allelic drop-out) can be linked to DNA quantity & quality • How to build this into MR models? Data – Start with Data! Sample # Tv54 Tv16 Tv58 Tv27 Tv12 Tv53 Tv19 1 PP1 113 113 136 138 137 137 174 180 234 234 244 272 274 274 2 PP2 113 113 128 136 137 137 174 192 232 232 244 266 261 275 3 PP3 99 99 136 138 145 149 174 190 232 232 266 266 261 263 4 PP4 99 99 136 138 145 149 174 190 232 232 266 266 261 263 92 PP105 99 125 126 138 159 161 174 198 232 234 260 266 263 273 113 PP135 99 125 126 138 159 161 174 198 232 234 260 266 263 273 205 PP439 113 113 132 136 143 153 192 196 232 232 240 268 255 261 206 PP440 113 113 132 136 143 153 192 196 232 232 240 268 255 261 What We Would Like to Do g11 O g S1 X obs g17 (G obs , X obs ) g S 7 1 1 0 0 0 0 0 1 0 1 0 0 0 1 0 Example – 5 samples containing 3 individuals Gobs contains the genotypes of The sampled animals Completing the Data • What about the animals never sampled? X obs X 0 • Xobs is (u. S) where 0 is sample history for the N–u. never sampled. • Completing the data introduces N as an unknown ( parameter) Likelihood L( N , ; X obs , z ) [ X obs | N , , z ] N S! ij ( , z ) u. c1 ! cu. ! j i 1 • If we assume ij i, j N [X obs N S! 1 | N , , z] u . c1 ! cu. ! j i N X ij X ij Missing and Corrupted Data • With no genotyping error: (G , X ) Sampling O (G , X obs obs ) – Only one way to map Gobs and Xobs to G and X – G irrelevant for estimation (ancillary for N and ) Missing and Corrupted Data • With allelic dropout (corruption) (G , X ) Sampling Corruption O (G , X obs obs ) – Now many ways to map Gobs and Xobs to G and X – Each has different set of capture ({ci} and u.) statistics – With dropout observed u. usually too high; this leads to overestimation of N Knowns and Unknowns • Known (observed): O, z • Unknown: G, X, N, and • Bayesian inference: find [Unknowns | Knowns] [G, X , N , , | O, z ] [O, X , G | N , , , z ][ N , , ] Knowns and Unknowns • Known (observed): O, z • Unknown: G, X, N, and • Bayesian inference: find [Unknowns | Knowns] [G, X , N , , | O, z ] [O, X , G | N , , , z ][ N , , ] Complete data likelihood Knowns and Unknowns • Known (observed): O, z • Unknown: G, X, N, and • Bayesian inference: find [Unknowns | Knowns] [G, X , N , , | O, z ] [O, X , G | N , , , z ][ N , , ] [O | X , G, ][G | N , ][ X | N , , z ][ N ][ ][ ] Independent priors Knowns and Unknowns • Known (observed): O, z • Unknown: G, X, N, and • Bayesian inference: find [Unknowns | Knowns] [G, X , N , , | O, z ] [O, X , G | N , , , z ][ N , , ] [O | X , G, ][G | N , ][ X | N , , z ][ N ][ ][ ] Field sampling Knowns and Unknowns • Known (observed): O, z • Unknown: G, X, N, and • Bayesian inference: find [Unknowns | Knowns] [G, X , N , , | O, z ] [O, X , G | N , , , z ][ N , , ] [O | X , G, ][G | N , ][ X | N , , z ][ N ][ ][ ] Allocation of genotypes Knowns and Unknowns • Known (observed): O, z • Unknown: G, X, N, and • Bayesian inference: find [Unknowns | Knowns] [G, X , N , , | O, z ] [O, X , G | N , , , z ][ N , , ] [O | X , G, ][G | N , ][ X | N , , z ][ N ][ ][ ] Data corruption Posterior Sampling • Use McMC to draw a sample from the joint posterior – Calculate importance ratios using probability model • E.g. updating genotypes – propose G* using current G and a proposal distribution J(G* | G) – Terms that don’t involve G cancel [O | X , G* , ][G* | N , ] J (G | G* ) ir [O | X , G, ][G | N , ] J (G* | G) One Small Problem • not identifiable from O and z. – includes dropout rate and genotype frequencies – Either need some different data or some strong assumptions One Small Problem • Andrea also has data from ear samples – So much DNA that Pr(dropout) = 0(.000000001) – If the ear sample is a random sample from the same population, provides information on genotype frequencies and dropout rate – Jointly model ear and pellet data all unknowns identifiable – Can also test for H-W equilibrium To Do • Come up with more realistic models for [ X | N , , z ] – Good progress on this • Combine models for ear data and pellet data • Incorporate information on amount of DNA – Covariate for dropout rate • Consider other applications – Being able to impute G has implications for genetic models and for open population MR models Acknowledgements Funding: • Animal Health Board • Foundation for Research, Science & Technology Landcare Research staff: • Robyn Howitt • Denise Jones • Dave Morgan • Graham Nugent • Nick Poutu • Casey Sole • Caroline Thomson Acknowledgements Funding: • Animal Health Board • Foundation for Research, Science & Technology Landcare Research staff: • Robyn Howitt • Denise Jones • Dave Morgan • Graham Nugent • Nick Poutu • Casey Sole • Caroline Thomson