The Future of Scientific Knowledge Discovery in Open Networked Environments: Legal Considerations Michael Madison Professor of Law Faculty Director, Innovation Practice Institute University of Pittsburgh [email protected] Board on Research Data and Information in collaboration with Computer Science and Telecommunications Board National Academy of Sciences Washington, DC March 10, 2011 Assume that access to the database/dataset is permitted in the first instance Dataset # 2, labeled “Public domain/CC0” Dataset # 1, with no label What legal challenges does the researcher in the middle face? Dataset # 3, labeled “All rights reserved” Dataset # 4, labeled “Data Mining Not Allowed” Legal issues 1. Status of the data, coding, formats, and datasets: copyright law basics 2. Making sense of contracts and licenses 3. Managing the results of data collection Status of the data, coding, formats, and datasets: copyright law default rules 1. 2. 3. 4. 5. 6. A specific datum (observation, research result, nucleotide sequence, and so on) is a noncopyrightable fact and is in the public domain. Datasets and other collections of facts are covered automatically by copyright as “compilations” if their “selection, coordination, or arrangement” demonstrates “minimal” (human) creativity. Coding, formats, interpretations are likely covered by copyright. Compilations arranged for ease of use, to comply with standard taxonomies or disciplinary standards, or in other obvious, routine, or mechanical ways cannot be protected by copyright. Compilation copyrights are thin, meaning only verbatim copying is prohibited. In Europe, EU database rights may apply. Fair use or other limitations protecting scientific research may apply. Making sense of contracts and licenses 1. 2. 3. 4. 5. 6. In the absence of a contract, license, or notice, default copyright rules apply. Data and datasets may be offered to the public domain. Or, the compiler may dedicate the material to the public domain (e.g., CC0 licenses). A CC0 license irrevocably commits copyrighted content to the PD. Noncopyrighted material may be labeled PD for clarity. Licenses specifying scope of authorized access, use, datamining, recombination may be bundled with contracts, or may be unilateral. No assent may be required. Notice of the terms may be limited. Terms may be human readable but not machine readable. Or the reverse. License terms may be custom designed by the dataset provider /host institution, leading to overlapping/inconsistent legal obligations. License scope may enable some onward collaboration, sharing, or redistribution of data -- but not all. Enforceability of private unilateral notices / licenses is unclear. Government-mandated data sharing /licensing provides consistency, enforceability. Managing the results of data collection Forward-looking issues related to governing a data commons: 1. Data/dataset integrity. 2. Translation and interoperability of data from different sources, in different formats: designing and enforcing standards; managing and maintaining data consistency. (Ensuring PD status of data may be inadequate to deal with this challenge.) 3. Who has access to the new collection of data, and for what purposes? 4. What are participants’ duties and rights regarding standardization and data consistency, and re-sharing, re-combining, re-using data? 5. How is compliance monitored and enforced? 6. When do those duties and rights pass to downstream parties who did not obtain the data in the first place? 7. Compare the costs and benefits of government-sponsored enforcement with those of private enforcement via licenses and contracts, and with those of informal/community enforcement. Summary 1. 2. Difficulty of achieving Clarity Simplicity Flexibility Even with sponsor support and the best of intentions. 3. The legal system is designed to promote securing things, not sharing knowledge. 4. We’re unlikely to find one-size-fits-all legal solutions.