Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Implementing Reference Linking in PROLA Mark Doyle Manager, Product Development The American Physical Society http://prola.aps.org/ September 26, 2002 CrossRef - Boston, MA The American Physical Society 40,000+ members Founded in 1898 Mission: “diffusion and advancement of knowledge of physics” Publisher of Physical Review journals and Reviews of Modern Physics 14,500 articles per year (100,000 pages per year) September 26, 2002 CrossRef - Boston, MA What is PROLA? Physical Review Online Archive Covers all APS journals from 1893-present, but only 1893-1998 available Separate subscription from current content journals 1 year “migrated” each year APS corpus is 330,000 articles September 26, 2002 CrossRef - Boston, MA The Basic Problem References in an article’s bibliography needs to linked to the full text article Citation metadata given: author, journal, volume, page (or other enumeration) Identify metadata, query linking partners, store results, create links for end users Keep links up to date, keep system robust and fast, keep costs low September 26, 2002 CrossRef - Boston, MA Three General Approaches Static - query for links at time of publication, create a static HTML file with the appropriate links, serve that. Dynamic - Store linking information in live database which is queried at the time the user requests the web page Semi-dynamic - Pre-query links, update them periodically, generate HTML with links dynamically September 26, 2002 CrossRef - Boston, MA Semi-Dynamic Approach Lower investment in database technology Lower costs to mirror Fast for the user High availability Scales well with usage September 26, 2002 CrossRef - Boston, MA APS Process Overview Full Text SGML/ XML Apache Bibliogr. XML mod_perl Filesystem XREF CAS AIP September 26, 2002 Linking Database (Oracle) XML Link Metadata ADS CrossRef - Boston, MA HTML End User XML File <references> …. <citation cid="C3"><ref><article><refauth>J. J. Boland</refauth>, <journal>Phys. Rev. Lett.</journal> <volume>67</volume>, <pages>1539</pages> (<date>1991</date>);</article></ref> <ref abbrev="prevau"><article><refauth>J. J. Boland</refauth> , <journal>J. Vac. Sci. Technol. A</journal> <volume>10</volume>, <pages>2458</pages> (<date>1992</date>).</article></ref></citation> ….. September 26, 2002 CrossRef - Boston, MA Process Overview Full Text SGML/ XML Apache Bibliogr. XML mod_perl Filesystem XREF CAS AIP September 26, 2002 Linking Database (Oracle) XML Link Metadata ADS CrossRef - Boston, MA HTML End User Parse XML Bibliographic Record Parse XML tagged references Article’s DOI suffix becomes the primary key Journal, volume, page information becomes a reference ID (J. Vac. Sci. Technol. A 10, 2458 gets mapped to JVacSciTechnolA.10.2458) Table for DOI, reference id, citation number, reference number Second table with article metadata for querying process. September 26, 2002 CrossRef - Boston, MA Database Schema ARTICLES (Phys. Rev. DOI, citation number, reference number, reference id) ARTICLE_DATA (ref_id, first author, journal, volume, issue, enumeration, year) ARTICLE_LINKS (ref_id, link type, link data) QUERY_DATES (ref_id, link type, query date). September 26, 2002 CrossRef - Boston, MA Query CrossRef and others Nightly query of CrossRef for new references that don’t have DOI Track batches in a Scheduler application Table tracks link source (XREF, ADS, CAS, SPIN, INSPEC), linking data (DOI for XREF) for each reference ID. Query dates table to track when we last queried something that didn’t match Periodically rerun queries which haven’t matched September 26, 2002 CrossRef - Boston, MA Links in the Database SQL> select link_type,link_data from article_links where ref_id='JVacSciTechnolA.10.2458'; LINK_TYPE --------XREF INSPEC SPIN ADS CAS September 26, 2002 LINK_DATA -----------------------------10.1116/1.577984 JVTAD600001000000400245800000B JVTAD6000010000004002458000001 1992JVST...10.2458B 1:CAS:528:DyaK38XltlygtLg%3D CrossRef - Boston, MA Statistics 330,000 articles (1893-present) 6.4 million (journal) references 3 million Phys. Rev. references 1.4 million unique non-APS references 210,000 CrossRef links (1.8 million links total) Folding in the APS references which are also in CrossRef, about 30% of our references are in CrossRef September 26, 2002 CrossRef - Boston, MA Process Overview Full Text SGML/ XML Apache Bibliogr. XML mod_perl Filesystem XREF CAS AIP September 26, 2002 Linking Database (Oracle) XML Link Metadata ADS CrossRef - Boston, MA HTML End User XML Linking File <?xml version="1.0"?> <apslinks> <citlink cid="1" rid="1"> <link ref_id="PhysRevLett.62.567” type="APS">PhysRevLett.62.567</link></citlink> … <citlink cid="3" rid="2"> <link ref_id="JVacSciTechnolA.10.2458" type="XREF">10.1116/1.577984</link> <link ref_id="JVacSciTechnolA.10.2458" type="INSPEC">JVTAD600001000000400245800000B</link> ….</apslinks> September 26, 2002 CrossRef - Boston, MA Process Overview Full Text SGML/ XML Apache Bibliogr. XML mod_perl Filesystem XREF CAS AIP September 26, 2002 Linking Database (Oracle) XML Link Metadata ADS CrossRef - Boston, MA HTML End User Rendered Links September 26, 2002 CrossRef - Boston, MA Conclusions Simple and pragmatic solutions work Marked up content makes it all fit together (obviates the need for extensive labor) Modest resources are needed to implement and maintain the system Scheme is easily expanded to include other linking targets September 26, 2002 CrossRef - Boston, MA Contact information http://prola.aps.org/ [email protected] September 26, 2002 CrossRef - Boston, MA