* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DNA BARCODING
Species distribution wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
History of genetic engineering wikipedia , lookup
Microsatellite wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genome editing wikipedia , lookup
Designer baby wikipedia , lookup
Non-coding DNA wikipedia , lookup
Point mutation wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Koinophilia wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Metagenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Helitron (biology) wikipedia , lookup
DNA BARCODING IMBB 2016 BecA-‐ILRI Hub, Nairobi May 9 – 20, 2016 Joyce Nzioki DNA Barcoding: a new diagnosEc tool for rapid species recogniEon, idenEficaEon and discovery DNA barcoding: towards an inventory of life A DNA barcode is a short gene sequence taken from a standardized porFon of the genome, used to idenFfy species What is DNA Barcoding? What is DNA Barcoding • A• way samples A wof ay identifying of idenFfying to species a ashort samples based based onon short standardised gene-region standardized gene-‐region • Keywords: • Keywords – Identify – IdenFfy – Samples – Samples – Species – Species – Gene – Gene – Short – Short – Standardised – Standardized An Internal ID System for All Animals DNA Barcoding with Cytochrome Oxidase subunit 1 (CO1) The ideal gene to study • Present in all species The CO1 Gene ü All eukaryotes contain mitochondria; CO1 encodes a mitochondrial protein needed for cells to make ATP. • Variable, but not too variable • Standardized among scienFsts around the world ü CO1 is almost idenFcal within a species but varies between different species. ü Agreement among scienFsts that the CO1 gene is used for animal barcoding. Non – CO1 regions for other taxa • Land plants: o Chloroplast matK and rbcL o 70-‐75% resolving ability, higher in angiosperms o Non-‐coding plasFd and nuclear regions being explored • Fungi o Nuclear ITS region including coding and non-‐coding regions o 72% effecFve at species level; supplementary regions used • Bacteria: o 16S Ribosomal gene Why we need barcoding v IdenFfying specimens – recogniFon of named (described) species v Discovering new species – aid in speedy discovering to the remaining biodiversity as tradiFonal taxonomy (morphology) is too slow. Image credit: Barcoding institute of ontario Image credit: Barcoding institute of ontario TheVision Vision The Credit: iBOL PotenFal applicaFons a) Controlling agricultural pests – by idenFfying them at any life stage easing control before crop damage. b) IdenEfying Disease vectors – allow idenFficaFon of disease causing vectors in animals and humans. c) Sustaining natural resources -‐ by monitoring illegal trade of products made of natural resources like hard wood. d) ProtecEng endangered species – Primate populaFon is reduced in Africa by 90% due to bush meat hunFng. e) Monitoring water quality – By studying organisms in lakes, rivers and streams their health can be measured. f) RouEne authenEcaEon od Natural Health Products. g) Biosecurity h) IdenFfy plant leaves even when flowers and fruits are not available. PotenFal applicaFons a) Controlling agricultural pests – by idenFfying them at any life stage easing control before crop damage. b) IdenEfying Disease vectors – allow idenFficaFon of disease causing vectors in animals and humans. c) Sustaining natural resources -‐ by monitoring illegal trade of products made of natural resources like hard wood. d) ProtecEng endangered species – Primate populaFon is reduced in Africa by 90% due to bush meat hunFng. The Barcode of Life Project (BOLD) NavigaFng the system Databases Public data portal: a database of all public sequences on BOLD. Barcode Index Numbers (BINs) Database: BINS are an interim taxonomic system for animals. Primer databases – database of barcode primers & primer staFsFcs PublicaEon database: community maintained database of barcode papers. Taxonomy A publicly available resource which displays images, distribuFon maps and other details for each taxon on BOLD IdenEficaEon The animal, plant and fungal idenFficaFon engines are based on CO1, matK/ rbcl and ITS genes respecFvely Workbench The workbench provides access to manage and contribute to DNA barcode projects as well as the BOLD data analysis tool Resources Technical documentaFon, user support and addiFonal resources are available at this link. Searching Public Data Searching Public Data • Users can enter a combinaFon of search terms to advance their search e.g “Lepidoptera Canada” will return all of the Lepidoptera records collected in Canada. • QuotaFon marks must be used for exact match retrieval of mulF word terms e.g. “United States” Aves will return results of US birds • A minus (-‐) operator will omit certain results from the search e.g. “Biodiversity Ins<tute of Ontario” Sesiidae –Manitoba will deliver results for the Sesiidae stored in the Biodiversity InsFtute of Ontario, but not collected in Manitoba BOLD IdenFficaFon Engine • The BOLD ID engine accepts sequences from the 5’ region of the CO1 gene and returns species level idenFficaFon (when possible). • BOLD uses the BLAST algorithm to idenFfy single base indels before aligning the protein translaFon through profile to a Hidden Markov Model of the CO1 protein. • In the Bold Engine ITS is the default idenFficaFon for fungal barcodes and rbcl/matK for plant barcodes DescripFon of the 6 types of IdenFficaFon Databases on BOLD. Database Name DescripEon Database Size All Barcode Records Every CO1 sequence on bold > 500bp 4,407,257 sequences Species Barcode records Every CO1 sequence > 500bp with species level idenFficaFon 2,573,278 sequences Public Barcode records Every public CO1 sequence > 500bp 980,022 sequences Full length Barcode Records Every CO1 sequence on BOLD > 640bp 1,633,770 sequences Fungal Records Every ITS sequence on BOLD > 100bp >15,000 sequences Plant Records Every rbcl and matK sequence on BOLD >95,000 & >70,000 > 500bp sequences respecFvely Taxonomy browser Primer database BARCODE Data Standards • A set of required elements for a reserved word (‘Barcode’) in GenBank • Ensure data longevity by archiving in GenBank • Enable comparisons among records from approved BARCODE gene regions. • Ensure minimum quality of sequences • Enable georeferencing • Provide traceability to voucher specimen • Ensure access to raw sequencer data • Pave the way for regulatory and forensic use. BARCODE Data Standards • Include at least 500 conFguous unambiguous base-‐pairs from bi-‐direcFonal sequencing within the approved barcode region. • Include no more than 1% ambiguous sites for the enFre submioed sequence. • Include the name of the gene region used. • Be associated with the trace file submioed to the NCBI Trace Archive of the Ensemble Trace Server • The “La<tude and longitude, Name of the iden<fier, Name of the collector and date of collec<on” Are also recommended but not required How DNA Barcodes should not be used “It is expected that DNA Barcodes will contribute to the discovery and formal recogniFon of new species. However, DNA barcodes should not be used as the sole criterion for descripFon of new species, which instead require analysis of diverse data, including morphology, ecology and behavior, as well as geneFcs.” The End Some slides were adopted from Mark Wamalwa and David E. Schindel