Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Web-Enabled Research Commons: Applications, Goals, and Trends Thinh Nguyen October 2009 Use Case #1 NeuroCommons Project: Science Commons project using Semantic Web to link massive amounts of data 27,266 papers 128,437 papers 41,985 papers 4,563 papers 10,365 papers PDSPki Reactome Gene Ontology BAMS NeuronDB Entrez Gene Antibodies Allen Brain Atlas Literature BrainPharm SWAN Homologene PubChem AlzGene Mammalian Phenotype MESH credit: W3C HCLS PDSPki NeuronDB Reactome Gene Ontology BAMS Antibodies Entrez Gene Allen Brain Atlas MESH Literature Mammalian Phenotype SWAN AlzGene BrainPharm Homologene PubChem making computers understand linkages (the WWW) links to Web page Web page directed, contextual links is located in receptor Cell membrane ht “URI” (unique names for things on the web) http://ontology.foo.org/is_located_in is located in receptor Cell membrane http://ontology.foo.org/receptor http://ontology.foo.org/compartment ht has neuron Cell membrane is located in receptor Cell membrane is located in channel Cell membrane using the web to integrate data and databases “compartment” “container” “doohickey” Cell membrane http://ontology.foo.org/compartment better answers through better formats: prefix go: <http://purl.org/obo/owl/GO#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix owl: <http://www.w3.org/2002/07/owl#> prefix mesh: <http://purl.org/commons/record/mesh/> prefix sc: <http://purl.org/science/owl/sciencecommons/> prefix ro: <http://www.obofoundry.org/ro/ro.owl#> select ?genename ?processname where { graph <http://purl.org/commons/hcls/pubmesh> { ?paper ?p mesh:D017966 . ?article sc:identified_by_pmid ?paper. ?gene sc:describes_gene_or_gene_product_mentioned_by ?article. } graph <http://purl.org/commons/hcls/goa> { ?protein rdfs:subClassOf ?res. ?res owl:onProperty ro:has_function. ?res owl:someValuesFrom ?res2. ?res2 owl:onProperty ro:realized_as. ?res2 owl:someValuesFrom ?process. graph <http://purl.org/commons/hcls/20070416/classrelations> {{?process <http://purl.org/obo/owl/obo#part_of> go:GO_0007166} union {?process rdfs:subClassOf go:GO_0007166 }} ?protein rdfs:subClassOf ?parent. ?parent owl:equivalentClass ?res3. ?res3 owl:hasValue ?gene. } graph <http://purl.org/commons/hcls/gene> { ?gene rdfs:label ?genename } graph <http://purl.org/commons/hcls/20070416> { ?process rdfs:label ?processname} } Mesh: Pyramidal Neurons Pubmed: Journal Articles Entrez Gene: Genes GO: Signal Transduction •reformat what we already have •reformat into a commons, not a closed system •get the materials into the emerging research web What data sharing protocol (legal and policy) best enables use of Web technology? “Licensing” Archetypes • Public Domain: No restrictions on use or distribution, no contracts, copyright waived. • Community Licenses: standard “open access” licenses, a range of rights, some rights reserved, available to all • Private Licenses: custom agreements, varies by institution, privately negotiated, may be offered only to some Goals • Interoperable: data from many sources can be combined without restriction • Reusable: data can be repurposed into new and interesting contexts • Administrative Burden: low transaction costs and administrative costs over time • Legal Certainty: users can rely on legal usability of the data • Community Norms: consistent with community expectations and usages Interoperability • Public Domain **** – Can be combined with other data sources with ease • Community Licenses *** / ** – Depends on type of license: share-alike or copyleft are unsuitable, but attribution-only licenses are less problematic • Private Licenses * / ** – Depends on restrictions, but not scalable; permutations too large Reusable • Public Domain **** – No restrictions on subsequent use • Community Licenses *** – Depends on license, but some licenses such as NC / ND can be restrictive • Private Licenses ** – Depends on license, but typically restrictive Administrative Burden • Public Domain **** – No paperwork or legal review needed • Community License *** – Little paperwork, but some legal review needed (attribution stacking issues) • Private Licenses * – Large amounts of paperwork, frequent legal review needed Legal Certainty • Public Domain **** / *** – Clear rights; generally irrevocable; (copyright should be addressed) • Community Licenses *** – Generally credible, good track record with open access and open source licenses • Private Licenses ** – Must be considered individually; few private licenses tested by time Community Norms • Public Domain *** – Traditional method for scientific data sharing (citation) • Community Licenses *** – Relatively new, but familiar to computer scientists and open source community (attribution) • Private Licenses ** – tendency to emphasize private / individual interests rather than community norms Overall Grade • Public Domain *** – Easiest and least restrictive form of sharing • Community Licenses ** – Can be used to implement community expectations, but can be burdensome / restrictive • Private Licenses * – High transaction costs, burdensome, unpredictable Convergence CC0 • Released by Creative Commons in 2009 • Result of a 3-year policy exploration process • Not a license but a waiver of copyright Why is it needed • “Borderline” copyright • European sui generis database rights • Varying legal standards for copyright protection in different countries CC0 • [deed] CC0 • • • • • Waiver of copyright Waiver of sui generis database rights Waiver of “neighboring rights” Does not affect trademarks or patents Only affects rights of person making assertion Use Case #2 • Coordination and Sustainability of International Mouse Informatics Resources (CASIMIR) (EU Project) • Commentary in Letter to Nature (Sept 2009) recommends PD and use of CC0 for sharing mouse genomic data • Recommendations endorsed by scientists, NIH representatives, Jackson Labs, and editors of top scientific journals Use Case #3 • Personal Genome Project personalized medicine project from George Church lab • Adopted CC0 to release sequence and medical data collected from volunteers Summary • Solving some bioinformatics problems require ability to integrate massive quantities of data from diverse sources • Public Domain sharing best fits this need • CC0 waiver can be used to enrich public domain and provide clarity Thank You • Thinh Nguyen ([email protected]) • On the Web: http://www.sciencecommons.org