Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Web of Data emerging industries Michalis Vafopoulos 04/04/2013 Contents ①The Web of documents vs. Web of data – Some technology – Some economics – ..and action ② PSGR project ③and more… 2 The Web of Documents • Simple, big and unstructured • Organized in Silos But humans: • are interested in Things, no documents & these Things might be in docs or elsewhere • Limited capacity to extract meaning... 3 The Web of Data • Analogy: a global file system ----> global database • Designed for: human consumption ->machines first, humans later • • • • Primary objects: documents --> things (or descriptions of things) Links between: documents --> things Degree of structure in objects: fairly low ---> high Semantics of content and links: implicit --> explicit (Tom Heath) 4 The Web of Data: why? encourages reuse reduces redundancy maximizes its (real and potential) interconnectedness enables network effects to add value to data 5 The Web of Data: how? – current state on the Web • Relational Databases • APIs • XML • CSV • XLS Computers can’t consume data because: • Different formats & models • Not inter-connected 6 The Web of Data: how? – we need to create a standard way of publishing Data on the Web (like HTML for docs) This is the Resource Description Framework (RDF) (a simple example here from Juan F. Sequeda), more next semester!) 7 Resource Description Framework (RDF) • A data model – A way to model data – Inspired form Relational databases and Logic • RDF is a triple data model • Labeled Graph (semantic networks) • Subject, Predicate, Object <Isidoro> <was born in> <Chios> <Chios> <is part of> <Greece> Example: Document on the Web Databases back up documents THINGS have PROPERTIES: A Book as a Title, an author, … Isbn Title 978-0-59615381-6 … Author PublisherID ReleasedData Programming Toby Segaran the Semantic Web 1 July 2009 … … … … This is a THING: A book title “Programming the Semantic Web” by Toby Segaran, … PublisherID PublisherNa me 1 O’Reilly Media … … Data representation in RDF Isbn Title Author PublisherID ReleasedData 978-059615381 -6 Programming the Semantic Web Toby Segaran 1 July 2009 PublisherI D PublisherName 1 O’Reilly Media Programming the Semantic Web title book author Toby Segaran isbn 978-0-596-15381-6 publisher Publishe r name O’Reilly Everything on the web is identified by a URI! link the data to other data Programming the Semantic Web title http://… /isbn978 author Toby Segaran isbn 978-0-596-15381-6 publisher http://… /publishe r1 name O’Reilly consider the data from Revyu.com http://… /review1 hasReview http://… /isbn978 description reviewer Awesome Book http://… /reviewer name Juan Sequeda start to link data http://… /review1 hasReview http://… /isbn978 description sameAs hasReviewer Awesome Book http://… /reviewer Programming the Semantic Web title name http://… /isbn978 author Toby Segaran isbn 978-0-596-15381-6 Juan Sequeda publisher http://…/p ublisher1 name O’Reilly Juan Sequeda publishes data too http://juans equeda.com /id livesIn name http://dbpedia.org/Aus tin Juan Sequeda Let’s link more data http://… /review1 hasReview http://… /isbn978 description hasReviewer Awesome Book http://… /reviewer sameAs http://juans equeda.com /id name Juan Sequeda livesIn name http://dbpedia.org/Aus tin Juan Sequeda Linked data = internet + http + RDF http://…/ review1 hasReview http://…/i sbn978 description hasReviewer Programming the Semantic Web title sameAs Awesome Book http://…/ reviewer http://…/i sbn978 name author Toby Segaran isbn 978-0-596-15381-6 sameAs http://juanse queda.com/id Juan Sequeda livesIn name publisher http://…/p ublisher1 name http://dbpedia.org/Austin Juan Sequeda O’Reilly Linked data = internet + http + RDF Linked Data Principles 1. Use URIs as names for things 2. Use URIs so that people can look up (dereference) those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs so that they can discover more things. Web as a database Linked Data makes the web exploitable as ONE GIANT HUGE GLOBAL DATABASE! Is there any query language like sql? SPARQL… May 2007 What is a Linked Data application/service? Software system that makes use of data on the Web from multiple datasets and that benefits from links between the datasets Characteristics of Linked Data Applications • Consume data that is published on the web following the Linked Data principles: an application should be able to request, retrieve and process the accessed data • Discover further information by following the links between different data sources: the fourth principle enables this. • Combine the consumed linked data with data from sources (not necessarily Linked Data) • Expose the combined data back to the web following the Linked Data principles • Offer value to end-users the 5 stars of open linked data ★make your stuff available on the Web (whatever format) ★★make it available as structured data (e.g. excel instead of image scan of a table) ★★★non-proprietary format (e.g. csv instead of excel) ★★★★use URLs to identify things, so that people can point at your stuff ★★★★★link your data to other people’s data to provide context http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/ Two magics of Web Science: the case of Linked Data The (practical) question contextualized & hands-on experience in Semantic Web & Business 3.0 on a unique, fast evolving and semantified dataset 29 PSGR project: the answer The first attempt to generate, curate, interlink and distribute daily updated public spending data in LOD formats that can be useful to both expert (i.e. scientists and professionals) and naïve users. 30 The context first… 31 Economy after the Web New form of property • Public, Private, Peer (e.g. Wikipedia) The right to: • Use-modify-benefit-transfer resources • Energetic & connected consumption • Pro-sumption 32 Research question Web economy: from potential to actual Enable new virtuous cycles in the economy through Linked Open Data 33 Outline ① EU Unification: institutions-technology ② Why Linked Open Data? ③ Economic LOD o the story so far o how to start o use cases o engineering ④Government Budget ⑤Tenders ⑥Spending ⑦Business Information ⑧Next steps 34 EU Unification: the institutions Best in theory – poor in practice a (complicated) market example • monetary policy, currency, eurozone • European Single Market • fiscal policy FORTHCOMING 35 EU Unification: the technology Linked Data or Web of data • “publish once, use many times”. • different consumers extract different slices of the data for different purposes • publish in context: value & “meaning” 36 EU Unification: the technology • Linked Data (LD) + Open Data =LOD • Economic LOD as “data currency” 37 Why LOD? • Transparency & innovation Network effects: enabling users to • bidirectional & massively processable interconnections among data • re-using the existing infrastructure in the government and business spheres 38 Economic LOD: the story so far • Isolated/fragmented behind technological & institutional barriers • General statistics: Eurostat etc. • LOD2 case • LOTTED (Linked Open Tenders Electronic Daily) 39 Economic LOD: how to start A general model 40 Economic LOD: use cases • Business applications on top • Users: citizens, gov., EU, business • track the life-cycle of every financial flow: evaluate budget allocation, tenders, spending and their efficiency • pre-allocate resources on provisional public works • receive & submit information in real-time 41 Economic LOD: engineering 42 Government Budget • heterogeneous repositories & methods (mainly PDF) 43 Tenders • Closed data in HTML • Public Contracts Ontology (PCO), e.g. – pco:Contract and pco:AwardCriterion • Common Procurement Vocubulary • now working on linking our ontology to: – Payments Ontology – GoodRelations – FOAF 44 Spending • • • • • • most dynamic & open part increasing number of countries/cities raw & structured data leader: the Greek Clarity project spending decisions ex-ante to execution Actually every decision 45 www.publicspending.gr (*****) • based on Greek Clarity & Tax information • semantify, interconnect, clean, visualize, SPARQL endpoint, daily update • PSGR ontology Links to – WESO products classif. – UK Payments Ontology – DBpedia and Geonames – …more to come 46 Business Information • Registries: mainly closed • Key standards – Classification of Products by Activity (CPA) – eXtensible Business Reporting Language (XBRL) 47 Business Information 48 Next steps • • • • Working on our basic ontology Real-life examples & apps Bad news: A long way to go Good news: we have started 49 PSGR ① why Linked Open Data (LOD) ② LOD in Greece ③ issues ④ WHERE MY MONEY GOES App ⑤ local spending in EU demo ⑥ to the future 50 Why public spending LOD o more & better information o objective and processable information for economic/political “dialogue” • to promote competition • to decrease cost • to judge the efficiency of policy mixtures • to enable participation 51 LOD in Greece: current status • • • • in its infancy – NO Apps yet 2-3 stars Open not Linked very limited public awareness 52 LOD in Greece: why it is important • quality of information during economic crisis • transparency & efficiency in funding development 53 Issues o how can we initiate the virtuous cycle of creation? demonstrate LOD’s added value o how to get the most out of data? local & global interconnections 54 In few words, Apps, Apps, Apps….. 55 WHERE MY MONEY GOES in Greece publicspending.gr • the first LOD App in Greece • daily updates • open spending linked data, endpoint & visualizations 56 WHERE MY MONEY GOES in Greece publicspending.gr • Input 1.“Diavgeia” (all public spending decisions online daily) API, average data quality, rich information • Payer, payee (amount, VAT number, name) • CPA 2008: Classification of products by Activity • CPV 2008: Common Procurement Vocabulary • Original decision text in pdf 2. TAXIS (official Tax Information System) VAT number validation and profile request 57 Checklist ①Ontology – enriching with core vocub. ②Basic visualizations ③SPARQL endpoint - thedatahub ④Interconnections – – – – Product classifications Open Corporates Greek LOD (e-proc, geodata, dbpedia) EU and US (CPV -> NAICS) ⑤Demos & services ⑥Public awareness - working with the media , hackathons, courses, theses 58 59 60 Architecture 61 publicspending.gr ontology 62 63 Network analysis Betweenness Centrality: how often a node appears on shortest paths between nodes in the network 64 Size: Betweness Cent. Color: HUB (HITS) 65 Node size:Weighted- In Degree Cent., Node color: PageRank 66 Competition in telecoms 67 Comments, ideas and more 68 Additional material 69 History of LD • • • • • • • • • • • • • • Linked Data Design Issues by TimBL July 2006 Linked Open Data Project WWW2007 First LOD Cloud May 2007 1st Linked Data on the Web Workshop WWW2008 1st Triplification Challenge 2008 How to Publish Linked Data Tutorial ISWC2008 BBC publishes Linked Data 2008 2nd Linked Data on the Web Workshop WWW2009 NY Times announcement SemTech2009 - ISWC09 1st Linked Data-a-thon ISWC2009 1st How to Consume Linked Data Tutorial ISWC2009 Data.gov.uk publishes Linked Data 2010 2st How to Consume Linked Data Tutorial WWW2010 1st International Workshop on Consuming Linked Data COLD2010 More Examples http://data-gov.tw.rpi.edu/wiki http://dbrec.net/ http://fanhu.bz/ http://data.nytimes.com/schools/schools .html • http://sig.ma • http://visinav.deri.org/semtech2010/ • • • • References • Weaving the Economic Linked Open Data • The Web Economy: Goods, Users, Models, and Policies • Public Spending: Interconnecting and Visualizing Greek Public Expenditure Following Linked Open Data Directives • A Framework for Linked Data Business Models 73