Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EMBL-EBI Pragmatics Goal of pragmatics group EMBL-EBI • Expose existing practice of databases and discuss it in the context of database methods. EMBL-EBI Four domain overviews • Engineering – the maintenance of list of components with different versions used in electronic engineering products • Astronomy – data from sky surveys at different wavelengths and catalogues of objects • Environmental simulation – data generated by simulations of the marine environment in a river mouth • Bioinformatics – shared data collections of biomolecular information. EMBL-EBI Engineering Astronomy Simulation Bioinforma tics Key driver for provenance Ensuring correctness Evoking scientific trust Ensuring comparabilty of data, liability issues Evoking trust, data quality, attribution Key issues Recording “used in” information to allow updates Data transformatio n from raw data Tweaking of details of simulation creates incompatibilitie s Assertions based on whole database comparison s EMBL-EBI Motivations for capturing provenance information • • • • • • • Quality/Trust Attribution Priority Interpretation Interoperability Roll-back Emphasis differed from discipline to discipline. EMBL-EBI Cost/Benefit/Risk (1) • Cheap, reproducible data carry little risk, can be redetermined, less pressure to record provenance • High-aggregation provenance (e.g., information derived from comparison with large, changing databases) expensive, suppliers give up on provenance. • Expensive data (e.g., sky-surveys, macromolecular structures) create pressureis cheap by comparison with recollecting the data. EMBL-EBI Cost/Benefit/Risk (2) • Unique event data (e.g., cosmic events) cannot be verified after the fact, provenance important • Life-critical data – (e.g., drug ingredient pedigree, aircraft component versions) high risk we are prepared to tolerate cost in capturing the information. Reducing the cost of provenance information EMBL-EBI • Only create provenance information information when it’s essential or easy? Obvious EMBL-EBI • Archive rather than serve provenance • Automate provenance collection EMBL-EBI …and the non-obvious • Can the task of the provenance of a complex and ever-changing database be turned into a database engineering problem rather than a domain problem? • Could the very technology used to maintain the database also maintain the provenance information? The domain specialists look to the computer scientists for help here.