Download Data Cloud - Yury Lifshits

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
Data Cloud
Yury Lifshits
Yahoo! Research
http://yury.name
My Beliefs
The key challenge in web search is structured search
Part 1: What is structured search?
The key challenge in structured search is collecting data
Part 2: Data distribution & idea of Data Cloud
Part 3: Demo: numeric data distribution
The key challenge in collecting data is incentive design
Part 4: Economics of data distribution
Structured
Search
Data
Data = data of entities + data of content
Structured data
Semi-structured data
Entity unit:
Content unit:
•
Identifier
• Body: text, video, audio, or image
•
Metadata:
• Metadata:
–
Explicit key-value pairs
– Explicit key-value pairs
–
Relational properties
– Relational properties
–
Evaluation
– Evaluation
Structured Search
Factoid search
“what's the value of property X of object Y“
Entity hubs
– Domain hubs
Structured object search
"all concerts this weekend in SF under 20$ sorted by popularity"
– Time focus
– Ranking focus
– Relations focus
Structured content search
"all videos with Tom Brady"
“all comments and blog posts about Bing"
Yury’s Wishlist
Business-generated data
• Products, services, news, wishlists, contact data
Reality stream, sensors
• Where what have happened
Expert knowledge
• Glossary, issues, typical solutions, object databases, related
objects graph
Events
• Sport, concerts, education, corporate, community, private
Market graph & signals
• Like, interested, use, following, want to buy; votes and ratings
Search as a Platform
Query analysis
Classic search
Web index
Post analysis
App 1
App 2
App 3
Structured Data
App 4
Data Cloud
How to collect all structured data in one place?
Data Producers
• People: forums, wiki, mail groups,
blogs, social networks
• Enterprizes: product profiles,
corporate news, professional
content
• Sensors: GPS modules, web
cameras, traffic sensors, RFID
• Transactional data
Data Distributors
Data distributor is any
technical solution to
accumulate, organize and
provide access to
structured and semistructured data
Data publisher: the original
distributor of some data
Data retailer: a consumerfacing distributor of some
data
Data Consumers
• Humans
– Email
– Aggregators: news, friend feeds, RSS readers
– Search
– Browsing / random walks
• Intelligence projects
– Recommendation systems
– Trend mining
Data Cloud
Data Cloud is a centralized fully-functional
data distribution service
Success metric for data cloud strategy = the total “value” of data on the cloud
To-Cloud Solutions
• Extraction
– DBpedia.org, “web tables”
• Semantic markup, data APIs
– Yahoo! SearchMonkey
• Feeds
– Yahoo! Shopping
– Disqus.com, js-kit.com, Facebook Connect
• Direct publishing
On-Cloud Solutions
• Ontology maintenance
– Freebase
• Normalization, de-duplication, antispam
• Named entity recognition,
metadata inference, ranking
• Data recycling (cross-references)
– Amazon Public Data Sets
– Viral license
• Hosted search
– Yahoo! BOSS
From-Cloud Solutions
• Search, audience
– Y! SearchMonkey, Google Base
• Data API, dump access, update
stream
• Custom notifications
– Gnip.com
• Data cloud as a primary backend
• Access control
– Ad distribution. (AT&T and Yahoo! Local deal)
Demo:
webNumbr.com
Joint work with Paul Tarjan
webNumbr.com: Import
• Crawl numbers from the web
URL + XPath + regex
• Create “numbr pages”
• Update their values every hour
• Keep the history
Anyone can create a numbr
http://webnumbr.com/create
webNumbr.com: Export
• Embed code
• Graphs
• Search & browse
• RSS
Economics of Data
Distribution
Joint work with Ravi Kumar and Andrew Tomkins
Network Effect in Two-Sided Markets
Two sided market = every product serves consumers of two types
A and B
Cross-side network effect: the more type-A users product X has,
the more attractive it is for type-B consumers and vice versa
Examples: operating systems, credit cards, e-commerce
marketplaces
Two-sided network effects: A theory of information product design
G. Parker, M.W. Van Alstyne, N. Bulkley, M. Van Alstyne
Basic model
• Distributors D1, … Dk
• Producer/consumer joins only one
distributor
• Initial shares (p1,c1) … (pk,ck)
• New consumer selects a
distributor with a probability
proportional to pi
• New producer selects a distributor
with probability proportional to ci
Basic model
a1
a1
a2
a2
a3
a3
a4
a4
Market Shares Dynamics
Theorem 1
Market shares will stabilize
Theorem 2
With super-liner preference rule
one of distributors will tip
Theorem 3
With sub-liner preference rule
market shares will flatten
External Factor
Preference rule with external factor:
ei+ci/(c1+…+ck)
Theorem 4
Market shares will stabilize on
e1 : e2 : … : ek
Coalition
Data Cloud
Coalitions
Theorem 5
If all market shares are below 1/sqrt(k)
coalition (sharing data) is profitable for
all distributors
Corollary
Coalitions are not monotone
Example: 5 : 4 : 1 : 1
Model Variations
• Same-side network effect
• Different p-to-c and c-to-p rules
• Multi-homing (overlapping audiences)
• n^2 vs. nlog n revenue models
• Mature market: newcomer rate =
departing rate
• Diverse market (many types of producers
and consumers)
• Newcoming and departing distributors
• Directed coalitions
Challenges
Marketing
• Data demand?
• Data offerings?
• Requirements for distribution
technology?
Incentive design
• Incentives for data sharing?
• Centralized or distributed?
–For profit or non-profit?
• Data licensing and ownership?
• Monetizing data cloud?
More Challenges
Prototyping:
•
•
•
•
Data marketplace: open data & data demand
Search plugins: related objects, glossaries, object timelines
Publishing tools for structured data
Data client: structured news, bookmarking, notifications
Tech design:
• Access management
• Namespace design
User interface:
• Structured search UI
• Discovery UI
Thanks!
Follow my research:
http://twitter.com/yurylifshits
http://yury.name/blog