Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Lowell Database Research Self Assessment 2005-09-21 淡江大學 周清江 Summary Senior database researcher Meeting Senior database researchers have gathered every few years to assess the state of database research and to recommend problems and problem areas deserve additional focus. Laguna Beach, Calif. in 1989 [1] Palo Alto, Calif. (“Lagunita”) in 1990 [2] and 1995 [3] Cambridge, Mass. in 1996 [4] Asilomar, Calif. in 1998 [5] Lowell, Mass . In 2003 2 Focus information storage, organization, management, and access it is driven by new applications, technology trends, new synergies with related fields, and innovation within the field itself 3 Sources of information and information-processing demands Internet and web Cross enterprise vs. intra-enterprise Require stronger facilities for security and information integration Science Large and complex data sets Pipeline of data products produced by data analysis Storing and querying “ordered” data Integrating with the world-wide data grid eCommerce To come: cheap micro-sensor technology that will enable most things to report their status in real time 4 Major changes in the traditional DBMS topics Technology advances require us to re-assess: Data models, access methods, query processing algorithms, concurrency control, recovery, query language, user interface Ex: Storage is improving in capacity and cost. Thus, storage management and query-processing algorithms have to be re-assessed. Cache-aware Maturation of related technologies, like data mining, web search engines, artificial intelligence (speech, natural language, reasoning with uncertainty, machine learning) Personal information manager 5 Next Generation Infrastructure 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Integration of Text, Data, Code, and Streams Information Fusion Sensor Data and Sensor Networks Multimedia Queries Reasoning about Uncertain Data Personalization Data Mining Self Adaptation Privacy Trustworthy Systems New User Interfaces One-Hundred-Year Storage Query Optimization 6 Integration of Text, Data, Code, and Streams The Web has demonstrated the importance of more sophisticated data types, like text, temporal, spatial, sound, image, or video data. Make the following “fist-class citizen” of DBMS: Uncertainty management (like information retrieval) User-defined procedure data Text, space, time image, and multimedia data Structured data Triggers (scalable) Data streams (from micro-sensor devices) and queues Scientific dataset XML Schema and XQuery are too complex to be the basis for this sort of new architecture 7 Information Fusion Typical approach to information integration is by an extract-transform-load (ETL) tool to build data warehouse and data marts for a single cooperation With the internet enabled integration integration among different enterprises, data must stay at the sources and be accessed at query time ETL tool cannot be applied to sensor dataset Web semantic heterogeneity solution still elusive At web scale, query execution must move to probabilistic evidence accumulation When integration among autonomous enterprises, each query processing must reveal only the minimal information necessary in conformance with security 8 Sensor Data and Sensor Networks Self-powered, wireless device Draws more power when communicating than when computing It is preferable to distribute query computation to the individual nodes Query execution on sensor networks requires the ability to adapt to rapidly changing configurations How to deduce high-level fact from very lowlevel signals 9 Multimedia Queries How to create easy ways to analyze, summarize, search, and view the “electronic shoebox” of a person’s multimedia information Ex: how to prepare a multimedia presentation about a child 10 Reasoning about Uncertain Data Non-business data is essentially uncertain or imprecise Scientific measurements have standard errors GPS data involves uncertainty in current position Sequence, image, and text similarity are approximate metrics The “lineage” of the data must be tracked Query processing must move to a stochastic model, where evidence accumulation is performed to obtain a better and better answer Must handle imprecise queries Must be able to characterize the accuracy offered 11 Personalization Query answers should depend on personal profiles Relevance and relevance feedback should also depend on the person and the context A framework for including and exploiting appropriate metadata for mass personalization Personalization and uncertainty leave one with a need to verify that the answer is “correct” 12 Data Mining Historically, data mining focuses on efficient ways to discover models of existing data sets Data warehouse users have only one data mining query: “something interesting” Need to develop algorithms and structures to look for “unexpected pearls”, while running in the background and consuming excess system resources Need to integrate data mining with querying, optimization, and other DB facilities such as triggers 13 Self Adaptation Modern DBMSs are too complex To simplify DB administration 1. It should be possible to perform tuning using a combination of a rule-based system and a database of knob settings and configuration data. This needs more sophisticated models of user behaviors and workloads. 2. DBMSs need to recognize internal malfunctions and malfunctions of communicating components, identify data corruption, detect application failures, and do something about them 14 Privacy Data-oriented security needs to be revitalized Need to address the concerns, policies and mechanisms to support multiple individual options and controls on information held by third parties Access decisions should be based not only on who is requesting the data but also on to what use it will be put. 15 Trustworthy Systems Safely store data, protect it from unauthorized disclosure, protect it from loss, and make it always available to authorized users. Digital rights management Ensuring the correctness of query results and data-intensive computation for embedded systems Use logical inference technology in validating correctness 16 New User Interfaces Sophisticated visualization systems Keyword-based query and browsing Use speech or natural language to query through semantic web and ontology 17 One-Hundred-Year Storage A need for indefinite electronic storage of information Requires mechanisms for migration and for emulation Requires metadata for lineage and context 18 Query Optimization Optimization of information integrators, for semi-structured query language like XQuery, for stream processors, for sensor networks, and other domains Inter-query optimization involving large numbers of queries 19 Next steps and Discussions Generate a test bed and collection of integration tasks Classroom scheduling At which level should information integration occur DB or application Will web services make any progress on addressing semantic heterogeneity? 20