Download notes - USERLab

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SQL wikipedia , lookup

Oracle Database wikipedia , lookup

Microsoft Access wikipedia , lookup

IMDb wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

ContactPoint wikipedia , lookup

Relational model wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Variations in
Searching for Information
CMPT 455/826 - Week 11, Day 2
1
Approximate Query Processing
• Abstract1
– This article describes query processing in the DBO database system.
– Like other database systems designed for ad hoc analytic processing, DBO is
able to compute the exact answers to queries over a large relational database in
a scalable fashion.
– Unlike any other system designed for analytic processing, DBO can constantly
maintain a guess as to the final answer to an aggregate query throughout
execution, along with statistically meaningful bounds for the guess’s accuracy.
– As DBO gathers more and more information, the guess gets more and more
accurate, until it is 100% accurate as the query is completed.
– This allows users to stop the execution as soon as they are happy with the query
accuracy, and thus encourages exploratory data analysis.
1.
Scalable Approximate Query Processing with the DBO Engine by Chris Jermaine, Subramanian Arumugan, Abhijit
Pol, and Alin Dobra
Approximate Query Processing
• Purpose:
– To get fast intermediate results on queries that could take longer
than the extra precision is worth
• Technique:
– Uses random sampling rather than sequential processing to
keep accumulating more and more exact information
• Comments:
– The paper is very technical, but the concept is what is important
to consider
Inconsistent Databases
• Abstract2
– Query answering from inconsistent databases
• amounts to finding “meaningful” answers to queries posed over database
instances
• that do not satisfy integrity constraints specified over their schema.
– A declarative approach to this problem relies on
•
•
•
•
2.
the notion of repair,
that is, a database that satisfies integrity constraints
and is obtained from the original inconsistent database
by “minimally” adding and/or deleting tuples.
Repair Localization for Query Answering from Inconsistent Databases by Thomas Eiter, Michael
Fink, Gianluigi Greco, and Domenico Lembo Sapienza
Inconsistent Databases
• Purpose:
– A database may become inconsistent in many ways
• This is particularly challenging in the context of data integration,
– where a number of data sources, heterogeneous and widely
distributed, must be presented to the user as if they were a single
(virtual) centralized database, which is often equipped with a rich set of
constraints expressing important semantic properties of the application
at hand.
– Since, in general, the integrated sources are autonomous, the data
resulting from the integration are likely to violate these constraints.
– The standard approach through data cleaning
• may be insufficient
• even if only few inconsistencies are present in the data
Inconsistent Databases
• Technique:
– The notion of a repair for an inconsistent database
• a repair is a new database which satisfies the constraints in the
schema and minimally differs from the original one.
– The suitability of a possible repair depends on
» the underlying semantics adopted for the inconsistent database,
» and on the kinds of integrity constraints allowed on the schema.
• multiple repairs might be possible
• the standard way of answering a user query is
– to compute the answers that are true in every possible repair
• Comments:
Inconsistent Databases
• Comments:
– The major problem here is having inconsistent information in a
database.
• A more important problem is the reason behind the inconsistency in
information throughout the database.
– It is difficult to decide what form information should be represented in
when combining differing database schemes.
• If this is not done carefully it is likely that the database will end up with
misleading or inconsistent data.
– The query is checked against all the possible repairs to the database.
• The answer is based on some evaluation between the repairs that are
available, but how likely is it that the query was answered in the desired
way?
– Instead of doing extra work with rewriting queries as they are asked
• why not use the information found out by these techniques to determine a
more permanent fix for the inconsistency of the data
– If a consistent answer can be determined from an inconsistent database, then it
seems likely that the information could be made consistent in the database for
future queries.
Dynamic Spatial Queries
• Abstract3
– Conventional spatial queries are usually meaningless in dynamic
environments
• since their results may be invalidated
• as soon as the query or data objects move.
– In this paper we formulate two novel query types,
• A time-parameterized query
• A continuous query
3.
Spatial Queries in Dynamic Environments by Yufei Tao and Dimitris Papadias
Dynamic Spatial Queries
• Purpose:
– As opposed to traditional, “instantaneous”, queries
• that are evaluated only once to return a single result,
– continuous queries
• may require constant evaluation and updates of the results
• as the query conditions or database contents change
Dynamic Spatial Queries
• Technique:
– A time-parameterized query returns:
• the objects that satisfy the corresponding spatial query at the time when the
query is issued
• the expiry time of the result given the current motion of the query and
database objects
• the change that causes the expiration of the result
– A continuous query retrieves
• tuples of the form <result, interval>,
• where each result is accompanied by a future interval, during which it is
valid.
• NOTE: A continuous query can be answered by repetitive execution of TP
queries until some termination clause is satisfied.
Dynamic Spatial Queries
• Comments:
– In addition to getting the correct result from the spatial queries,
should have addressed how a dynamic database could be
updated.
• E.g. Dynamic environment such as automated car park involves
both vehicles moving in and out of the parking lot and the database
being updated on the number of available lots at a given time.
– There are issues how expiry time is dealt with,
• what happens when the entity changes direction or velocity, does
the expiry time remain valid?
Querying the Semantic Web
• Abstract4
– The Resource Description Framework (RDF)
• enables the creation and exchange of metadata as any other Web data.
– There is a need for sufficiently expressive declarative query languages
• for querying Web pages that make use of RDF
– We propose RQL, a new query language
•
•
•
•
4.
adapting the functionality of semistructured or XML query languages
to the peculiarities of RDF
but also extending this functionality
in order to uniformly query both RDF descriptions and schemas.
Querying the Semantic Web with RQL by G. Karvounarakis, A. Magganaraki, S. Alexaki, V.
Christophides, D. Plexousakis, M. Scholl, and K. Tolle
Querying the Semantic Web
• Purpose:
– RQL adapts the functionality
•
•
•
•
of semistructured or XML query languages
to the peculiarities of RDF
but also extends this functionality in order
to uniformly query both RDF descriptions and schemas.
– With RQL users are able to query resources
• described according to their preferred schema,
• while discovering how the same resources
• are also described using another classification schema.
Querying the Semantic Web
• Technique:
– We introduce a formal data model and type system
• for description bases created according to the RDF Model & Syntax
and Schema specifications
– In order to support superimposed RDF descriptions,
• the main modeling challenge is
– to represent properties as self-existent individuals,
– as well as to introduce a graph instantiation mechanism permitting
multiple classification of resources.
Querying the Semantic Web
• Comments:
– The typed system used for RQL is extremely useful
• in that it is actually read from the RDF schema - the type system is
specific to the schema being used.
– However all types fit into a finite list of types,
• which contains literal types, resource types, class types, property
types and others.
– The discussion on typing as it relates to RDF
• would be useful in considering various other approaches to typing
for other means of modeling (ER or class diagrams).
– In ER modeling this could be achieved
• through choosing property names/attributes for a relationship and
including them in the diagram (and not just “is-a”).
Entity Search Engine
• Abstract5
– The Web has become a rich collection of data-rich pages,
• on the “surface Web” of static URLs
• as well as the “deep Web” of database-backed contents
– The richness of data,
• while a promising opportunity,
• has challenged us to effectively find data we need,
• from one or multiple sources.
– We are motivated by the need of
• large scale on-the-fly integration for online structured data.
5.
Entity Search Engine: Towards Agile Best Effort Information Integration over the Web by Tao
Cheng and Kevin Chen-Chuan Chang
Entity Search Engine
• Purpose:
– How do we identify and integrate the structured data
• embedded in unstructured result pages?
Entity Search Engine
• Technique:
– search engines search for pages by keywords.
– such as Google, Yahoo, or MSN,
• while being ”IR-style” with a scalable text processing framework,
• they are not data aware.
– Integration services exist online for specific domains.
– such as Expedia.com or PriceGrabber.com
• They provide “DB-style” precise querying,
• but they can hardly scale the amount of data and the number of
sources on the Web.
– We propose a solution
• where the two extremes meet,
• with a synergistic “marriage” in the middle.
Entity Search Engine
• Comments:
– There are still problems with sites that embed their data in
inaccessible formats that cannot be queried