Download The Web of Data emerging industries

Document related concepts

IEEE 1355 wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Semantic Web wikipedia , lookup

Transcript
The Web of Data emerging
industries
Michalis Vafopoulos
04/04/2013
Contents
①The Web of documents vs. Web of data
– Some technology
– Some economics
– ..and action
② PSGR project
③and more…
2
The Web of Documents
• Simple, big and unstructured
• Organized in Silos
But humans:
• are interested in Things,
no documents
& these Things might be in docs or
elsewhere
• Limited capacity to extract
meaning...
3
The Web of Data
• Analogy:
a global file system ---->
global database
• Designed for: human consumption ->machines first, humans later
•
•
•
•
Primary objects: documents --> things (or descriptions of things)
Links between: documents --> things
Degree of structure in objects: fairly low ---> high
Semantics of content and links: implicit --> explicit
(Tom Heath)
4
The Web of Data: why?
 encourages reuse
 reduces redundancy
 maximizes its (real and potential) interconnectedness
 enables network effects to add value to
data
5
The Web of Data: how?
– current state on the Web
• Relational Databases
• APIs
• XML
• CSV
• XLS
Computers can’t consume data because:
• Different formats & models
• Not inter-connected
6
The Web of Data: how?
– we need to create a standard way of
publishing Data on the Web (like HTML for
docs)
This is the Resource Description
Framework (RDF)
(a simple example here from Juan F. Sequeda), more next
semester!)
7
Resource Description Framework (RDF)
• A data model
– A way to model data
– Inspired form Relational databases and Logic
• RDF is a triple data model
• Labeled Graph (semantic networks)
• Subject, Predicate, Object
<Isidoro> <was born in> <Chios>
<Chios> <is part of> <Greece>
Example: Document on the Web
Databases back up documents
THINGS have PROPERTIES:
A Book as a Title, an author, …
Isbn
Title
978-0-59615381-6
…
Author
PublisherID
ReleasedData
Programming Toby Segaran
the Semantic
Web
1
July 2009
…
…
…
…
This is a THING:
A book title “Programming the
Semantic Web” by Toby Segaran,
…
PublisherID
PublisherNa
me
1
O’Reilly
Media
…
…
Data representation in RDF
Isbn
Title
Author
PublisherID
ReleasedData
978-059615381
-6
Programming
the Semantic
Web
Toby
Segaran
1
July 2009
PublisherI
D
PublisherName
1
O’Reilly Media
Programming the
Semantic Web
title
book
author
Toby
Segaran
isbn
978-0-596-15381-6
publisher
Publishe
r
name
O’Reilly
Everything on the web is identified
by a URI!
link the data to other data
Programming the
Semantic Web
title
http://…
/isbn978
author
Toby
Segaran
isbn
978-0-596-15381-6
publisher
http://…
/publishe
r1
name
O’Reilly
consider the data from Revyu.com
http://…
/review1
hasReview
http://…
/isbn978
description
reviewer
Awesome
Book
http://…
/reviewer
name
Juan
Sequeda
start to link data
http://…
/review1
hasReview
http://…
/isbn978
description
sameAs
hasReviewer
Awesome
Book
http://…
/reviewer
Programming the
Semantic Web
title
name
http://…
/isbn978
author
Toby Segaran
isbn
978-0-596-15381-6
Juan
Sequeda
publisher
http://…/p
ublisher1
name
O’Reilly
Juan Sequeda publishes data too
http://juans
equeda.com
/id
livesIn
name
http://dbpedia.org/Aus
tin
Juan Sequeda
Let’s link more data
http://…
/review1
hasReview
http://…
/isbn978
description
hasReviewer
Awesome
Book
http://…
/reviewer
sameAs
http://juans
equeda.com
/id
name
Juan
Sequeda
livesIn
name
http://dbpedia.org/Aus
tin
Juan Sequeda
Linked data = internet + http + RDF
http://…/
review1
hasReview
http://…/i
sbn978
description
hasReviewer
Programming the
Semantic Web
title
sameAs
Awesome
Book
http://…/
reviewer
http://…/i
sbn978
name
author
Toby Segaran
isbn
978-0-596-15381-6
sameAs
http://juanse
queda.com/id
Juan
Sequeda
livesIn
name
publisher
http://…/p
ublisher1
name
http://dbpedia.org/Austin
Juan Sequeda
O’Reilly
Linked data = internet
+ http + RDF
Linked Data Principles
1. Use URIs as names for things
2. Use URIs so that people can look up
(dereference) those names.
3. When someone looks up a URI,
provide useful information.
4. Include links to other URIs so that
they can discover more things.
Web as a database
Linked Data makes the web
exploitable as ONE GIANT HUGE
GLOBAL DATABASE!
Is there any query language like sql?
SPARQL…
May 2007
What is a Linked Data application/service?
Software system that makes use of
data on the Web from multiple
datasets and that benefits from links
between the datasets
Characteristics of Linked Data Applications
• Consume data that is published on the web following the
Linked Data principles: an application should be able to
request, retrieve and process the accessed data
• Discover further information by following the links
between different data sources: the fourth principle
enables this.
• Combine the consumed linked data with data from
sources (not necessarily Linked Data)
• Expose the combined data back to the web following the
Linked Data principles
• Offer value to end-users
the 5 stars of open linked data
★make your stuff available on the Web (whatever
format)
★★make it available as structured data
(e.g. excel
instead of image scan of a table)
★★★non-proprietary format (e.g. csv instead of
excel)
★★★★use URLs to identify things, so that
people can point at your stuff
★★★★★link your data to other people’s data to
provide context
http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
Two magics of Web Science:
the case of Linked Data
The (practical) question
contextualized & hands-on experience in
Semantic Web & Business 3.0 on
a unique, fast evolving and semantified
dataset
29
PSGR project: the answer
The first attempt to generate, curate,
interlink and distribute daily updated
public spending data in LOD formats that
can be useful to both expert (i.e.
scientists and professionals) and naïve
users.
30
The context first…
31
Economy after the Web
New form of property
• Public, Private, Peer (e.g. Wikipedia)
The right to:
• Use-modify-benefit-transfer resources
• Energetic & connected consumption
• Pro-sumption
32
Research question
Web economy: from potential to
actual
Enable new virtuous cycles
in the economy
through Linked Open Data
33
Outline
① EU Unification: institutions-technology
② Why Linked Open Data?
③ Economic LOD
o the story so far
o how to start
o use cases
o engineering
④Government Budget
⑤Tenders
⑥Spending
⑦Business Information
⑧Next steps
34
EU Unification: the institutions
Best in theory – poor in practice
a (complicated) market example
• monetary policy, currency, eurozone
• European Single Market
• fiscal policy FORTHCOMING
35
EU Unification: the technology
Linked Data or Web of data
• “publish once, use many times”.
• different consumers extract different
slices of the data for different purposes
• publish in context:
value & “meaning”
36
EU Unification: the technology
• Linked Data (LD) + Open Data =LOD
• Economic LOD as “data currency”
37
Why LOD?
• Transparency & innovation
Network effects: enabling users to
• bidirectional & massively processable
interconnections among data
• re-using the existing infrastructure in the
government and business spheres
38
Economic LOD: the story so far
• Isolated/fragmented behind
technological & institutional barriers
• General statistics: Eurostat etc.
• LOD2 case
• LOTTED
(Linked Open Tenders Electronic Daily)
39
Economic LOD: how to start
A general model
40
Economic LOD: use cases
• Business applications on top
• Users: citizens, gov., EU, business
• track the life-cycle of every financial flow:
evaluate budget allocation, tenders,
spending and their efficiency
• pre-allocate resources on provisional
public works
• receive & submit information in real-time
41
Economic LOD: engineering
42
Government Budget
• heterogeneous repositories & methods (mainly PDF)
43
Tenders
• Closed data in HTML
• Public Contracts Ontology (PCO), e.g.
– pco:Contract and pco:AwardCriterion
• Common Procurement Vocubulary
• now working on linking our ontology to:
– Payments Ontology
– GoodRelations
– FOAF
44
Spending
•
•
•
•
•
•
most dynamic & open part
increasing number of countries/cities
raw & structured data
leader: the Greek Clarity project
spending decisions ex-ante to execution
Actually every decision
45
www.publicspending.gr (*****)
• based on Greek Clarity & Tax information
• semantify, interconnect, clean, visualize,
SPARQL endpoint, daily update
• PSGR ontology Links to
– WESO products classif.
– UK Payments Ontology
– DBpedia and Geonames
– …more to come
46
Business Information
• Registries: mainly closed
• Key standards
– Classification of Products by Activity (CPA)
– eXtensible Business Reporting Language (XBRL)
47
Business Information
48
Next steps
•
•
•
•
Working on our basic ontology
Real-life examples & apps
Bad news: A long way to go
Good news: we have started
49
PSGR
① why Linked Open Data (LOD)
② LOD in Greece
③ issues
④ WHERE MY MONEY GOES App
⑤ local spending in EU demo
⑥ to the future
50
Why public spending LOD
o more & better information
o objective and processable information
for economic/political “dialogue”
• to promote competition
• to decrease cost
• to judge the efficiency of policy mixtures
• to enable participation
51
LOD in Greece: current status
•
•
•
•
in its infancy – NO Apps yet
2-3 stars
Open not Linked
very limited public awareness
52
LOD in Greece: why it is important
• quality of information during economic
crisis
• transparency & efficiency in funding
development
53
Issues
o how can we initiate the virtuous cycle of
creation?
demonstrate LOD’s added value
o how to get the most out of data?
local & global interconnections
54
In few words,
Apps, Apps, Apps…..
55
WHERE MY MONEY GOES in Greece
publicspending.gr
• the first LOD App in Greece
• daily updates
• open spending linked data, endpoint &
visualizations
56
WHERE MY MONEY GOES in Greece
publicspending.gr
• Input
1.“Diavgeia” (all public spending decisions online daily)
API, average data quality, rich information
• Payer, payee (amount, VAT number, name)
• CPA 2008: Classification of products by Activity
• CPV 2008: Common Procurement Vocabulary
• Original decision text in pdf
2. TAXIS
(official Tax Information System)
VAT number validation and profile request
57
Checklist
①Ontology – enriching with core vocub.
②Basic visualizations
③SPARQL endpoint - thedatahub
④Interconnections
–
–
–
–
Product classifications
Open Corporates
Greek LOD (e-proc, geodata, dbpedia)
EU and US (CPV -> NAICS)
⑤Demos & services
⑥Public awareness - working with the media ,
hackathons, courses, theses
58
59
60
Architecture
61
publicspending.gr ontology
62
63
Network analysis
Betweenness Centrality: how often a node appears on shortest
paths between nodes in the network
64
Size: Betweness Cent.
Color: HUB (HITS)
65
Node size:Weighted- In Degree Cent.,
Node color: PageRank
66
Competition in telecoms
67
Comments, ideas and more
68
Additional material
69
History of LD
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Linked Data Design Issues by TimBL July 2006
Linked Open Data Project WWW2007
First LOD Cloud May 2007
1st Linked Data on the Web Workshop WWW2008
1st Triplification Challenge 2008
How to Publish Linked Data Tutorial ISWC2008
BBC publishes Linked Data 2008
2nd Linked Data on the Web Workshop WWW2009
NY Times announcement SemTech2009 - ISWC09
1st Linked Data-a-thon ISWC2009
1st How to Consume Linked Data Tutorial ISWC2009
Data.gov.uk publishes Linked Data 2010
2st How to Consume Linked Data Tutorial WWW2010
1st International Workshop on Consuming Linked Data COLD2010
More Examples
http://data-gov.tw.rpi.edu/wiki
http://dbrec.net/
http://fanhu.bz/
http://data.nytimes.com/schools/schools
.html
• http://sig.ma
• http://visinav.deri.org/semtech2010/
•
•
•
•
References
• Weaving the Economic Linked Open Data
• The Web Economy: Goods, Users, Models, and Policies
• Public Spending: Interconnecting and Visualizing
Greek Public Expenditure Following Linked Open
Data Directives
• A Framework for Linked Data Business Models
73