Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Open Database Connectivity wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Concurrency control wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Inexact Querying of XML XML Data May be Irregular • Relational data is regular and organized. XML may be very different. – Data is incomplete: Missing values of attributes in elements – Data has structural variations: Relationships between elements are represented differently in different parts of the document – Data has ontology variations: Different labels are used to describe nodes of the same type • (Note: In some of the upcoming slides, we have labels on edges instead of on nodes.) The movie has a year attribute Movie Database The year of the movie is missing 1 Movie Movie 11 Actor Actor 21 Name 30 Mark Hamill 22 Name Title 12 Star 1977 Wars Harrison Ford 25 Title 27 Léon 28 Kyle Title MacLachlan Name Natalie Portman 14 Movie Name T.V. Series 26 32 Actor 13 Actor Title Year 23 24 31 Film 33 Magnolia Incomplete Data 29 Title Year 34 35 36 Twin Peaks Dune 1984 Movie Database Actor below Movie Movie Movie 11 Actor Actor 21 Name 30 Mark Hamill 22 Name Title Harrison Ford 25 Title 27 Léon 28 Kyle Title MacLachlan Name Natalie Portman 14 Movie Name T.V. Series 26 32 Actor 13 Actor Title Year Star 1977 Wars 31 Film 12 23 24 Movie below Actor 1 33 Magnolia Variations in Structure 29 Title Year 34 35 36 Twin Peaks Dune 1984 Movie Database 1 A movie label 11 Actor Actor 21 Name 30 Mark Hamill 22 Name Title 25 Movie Name T.V. Series Title 26 27 Léon Natalie Portman 28 Kyle Title MacLachlan Name 32 A film label 13 13 Actor Title Year Star 1977 Wars Harrison Ford Film 12 23 24 31 Actor Movie Movie 34 Magnolia Ontology Variations 29 Title Year 33 34 35 Twin Peaks Dune 1984 Data is contributed by many users in a variety of designs The query should deal with different structures of data The structure of the database is changed frequently Queries should be rewritten frequently The description of the schema is large (e.g., a DTD of XML) It is difficult to use the schema when formulating queries Need to allow the user to write an “approximate query” and have the query processor deal with it The Problem • In many different domains, we are given the option to query some source of information • Usually, the user only gets results if the query can be completely answered (satisfied) • In many domains, this is not appropriate, e.g., – The user is not familiar with the database – The database does not contain complete information – There is a mismatch between the ontology of the user and that of the database Example 1 ישוב :באר שבע איזור חיוג 03 : היישוב הנבחר אינו מופיע באיזור החיוג הנבחר! עלייה :חיפה – טכניון ירידה :אילת אין קו ישיר המחבר בין הנקודות הנבחרות עלייה: ירידה :אילת פרטי המקצוע :בסיסי נתונים לא נמצאו מקצועות מתאימים What Do Users Need? • Users need a way to get interesting partial answers to their queries, especially if a complete answer does not exist • These partial answers should contain maximal information • Problem: – It is easy to define when an answer satisfies a query – Hard to say when an answer that does not satisfy a query is of interest – Hard to say which incomplete answers are better than others Modeling a Database and a Query • It is useful to model both databases and queries as labeled directed graphs – Clean mathematical modeling! – Captures the essentials of XPath, XQuery University Database University Name Dept Dept Technion Name Computer Science Name Chana Israeli Name Faculty Faculty Biology Professor Teaches Lecturer Teaches Databases Name Bioinformatics Teaches Avi Levy Molecular Biology Query • Exact answers are University Dept defined by exact matchings, i.e., Faculty subgraph homorphisms • This query asks for the Name names of all faculty members (of any type) How would you write this in XPath? University Exact Answers Dept University Faculty Name Dept Dept Technion Name Computer Science Name Name Faculty Faculty Name Biology Professor Teaches Lecturer Teaches Name Teaches Chana Israeli Databases Bioinformatics Avi Levy Molecular Biology University Exact Answers Dept University Faculty Name Technion Name Computer Science Name Dept Dept Name Faculty Faculty Name Biology Professor Teaches Lecturer Teaches Name Teaches Chana Israeli Databases Bioinformatics Avi Levy Molecular Biology Slightly More Complex Query • Returns faculty University members only from the Dept Biology Department Faculty Biology Name Exact Answers Are Not Always Useful • Problems with exact answers: – labels are not always known – content may be unknown, misspelled, etc. – structure may be unknown, or may vary from one representation to another – we may actually want to perform a search, since the query is a vague hypothesis – do not allow users to get partial/vague answers where none better exist Manually Adding Inexactness • One can use language constructs in order to get more flexible queries • Example: Suppose we want to find courses, with teachers that teach them but we don’t know which hierarchy exists in the database: – for each teacher, there is a list of courses or – for each course, there is a list of teachers – or both… Query Needed: University Name Technion Name Computer Science Name Dept Teacher Course Dept Name Faculty Faculty Biology Teacher Course Teacher Course Name Course Chana Israeli Databases Bioinformatics Avi Levy Molecular Biology Query Needed: University Name Technion Name Computer Science Name Dept Faculty Course Teacher Dept Name Faculty Biology Course Teacher Course Teacher Name Bioinformatics Chana Israeli Avi Levy Molecular Biology Manually Adding Inexactness (cont.) • If we don’t know the hierarchy, we need Course Teacher Union Course Teacher Manually Adding Inexactness (cont.) • If we don’t know the hierarchy, we need: Course Teacher Union Teacher Course • If we don’t know what exactly the labels are, we might need: Teacher or Lecturer or Professor Course or Seminar or Lab Course or Seminar or Lab Union Teacher or Lecturer or Professor Help! Intuition • Users write regular queries, stating what they are looking for • The query processor uses a built-in strategy to find answers that exactly satisfy the query or inexactly satisfy the query • Burden is on the query processor, not on the user Inexact Answers • Many different definitions have been given – For each definition, query processing algorithms have been defined • Examples: – Allow some of the nodes of the query to be unmatched – Allow edges in the query to be matched to paths in the database – Allow nodes to be matched to nodes with labels that have a similar meaning • Be careful so that answers are meaningful! Allow Unmatched Nodes: Bezeq Query שמולביץ Name Phone Number City Area Code באר שבע 03 Matching Edges to Paths: Egged Query Technion-Haifa Source Destination Eilat Similar Meaning Labels Course Name בסיסי נתונים Details Other Types of Inexactness • Many other definitions have been given, e.g., – allow permutations of nodes in the query – allow child nodes to be promoted – interconnection • Summary: Inexactness basically means that we relax some of the query requirements!