Download Creation and Maintenance of GeneKeyDB

Creation and Maintenance of GeneKeyDB Research being conducted by Kevin Kastner Under the direction of Dr. Erich Baker The Problem  There exists thousands of biomedical data sources.  In 2006, there were ~557 relevant public resources in molecular biology.  This is growing rapidly.  203 sources in 1999  226 sources in 2000  277 sources in 2001. The Problem  Traditional database approaches are too structured.  Scientific objects change identification over time.  Gene names change over time.  The Human Genome Nomenclature Database (HUGO) contains 13,594 active symbols, 9635 literature aliases, and 2739 withdrawn symbols.  SIR2L1 (w/drawn) is a synonym for SIRT1 and sir2like 1. Scientific Object Identities Hugo Name GDB GenAtlas OMIM GeneCards LocusLink TP53 1 33 52 22 13 P53 1(same) 17 188 69 63 SIRT1 1 0 5 1 2 SIR2L1 0 0 1 1(same) 1(same) The Solution  GeneKeyDB  A gene-centered relational database developed to enhance data mining in biological data sets.  GeneKeyDB relies primarily on existing database identifiers derived from community databases (NCBI, GO, Ensembl, et al.) as well as the known relationships among those identifiers.  Version 1 is already out!  http://www.biomedcentral.com/1471-2105/6/72 Weaknesses of Version 1  Can no longer be updated  Complex queries must be made to the database in order to obtain desired information Complex Queries SELECT ll_xp_cdd.cdd_name, ll_np_cdd.cdd_name, organism FROM ll_xp_cdd, ll_np_cdd, ll_locus WHERE ll_xp_cdd.cdd_score = ll_np_cdd.cdd_score AND ll_id IN (SELECT ll_id FROM ll_refseq_xm WHERE ll_refseq_xm_id IN (SELECT ll_refseq_xm_id FROM ll_xp_cdd, ll_np_cdd WHERE ll_xp_cdd.cdd_score = ll_np_cdd.cdd_score)) AND ll_id IN (SELECT ll_id FROM ll_refseq_nm WHERE ll_refseq_nm_id IN (SELECT ll_refseq_nm_id FROM ll_xp_cdd, ll_np_cdd WHERE ll_xp_cdd.cdd_score = ll_np_cdd.cdd_score)); Current Research  Creation of APIs to validate data in the database and to enable querying to become much easier for the user.  One-step updating of the database and the information it contains. API Alternative // fxn(search_params, desired_info), returns ll_id curated.cdd(score[ ],null) curated_score[ ]  score[ ] locus_id1[ ]  gaa.cdd((name[ ],score[ ]), score[ ]) gaa_name[ ]  name[ ] gaa_score[ ]  score[ ] locus_id2[ ]  curated.cdd(name[ ],score[ ]) curated_name[ ]  name[ ] locus_id[ ]  intersect(locus_id1[ ],locus_id2[ ]) locus(organism[ ], locus_id[ ]) print(gaa_name[ ], curated_name[ ], organism[ ]) External Implementations  Some databases have APIs as well.  Ensembl  APIs are done in Perl.  APIs for GeneKeyDB will be done in Java.  More structured language.  Easier to read. The Future of GeneKeyDB  GeneKeyDB will join even more external and widely used databases together.  Code for updating GeneKeyDB will tie into database information that will change in expected ways.  Lowers the required number of code rewrites.  GeneKeyDB will be dynamically updated. The Future of GeneKeyDB  APIs made that will be written in Perl.  Perl is used often, almost exclusively, by biologists.  Can have Perl APIs tie into Java APIs, rather than creating all new ones. Comments? Questions?  http://genereg.ornl.gov/gkdb/

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Creation and Maintenance of GeneKeyDB