Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Lowell Database Research Self Assessment 2005-09-21 淡江大學 周清江 Summary  Senior database researcher Meeting  Senior database researchers have gathered every few years to assess the state of database research and to recommend problems and problem areas deserve additional focus.      Laguna Beach, Calif. in 1989 [1] Palo Alto, Calif. (“Lagunita”) in 1990 [2] and 1995 [3] Cambridge, Mass. in 1996 [4] Asilomar, Calif. in 1998 [5] Lowell, Mass . In 2003 2 Focus  information storage, organization, management, and access  it is driven by new applications, technology trends, new synergies with related fields, and innovation within the field itself 3 Sources of information and information-processing demands  Internet and web  Cross enterprise vs. intra-enterprise  Require stronger facilities for security and information integration  Science  Large and complex data sets  Pipeline of data products produced by data analysis  Storing and querying “ordered” data  Integrating with the world-wide data grid  eCommerce  To come: cheap micro-sensor technology that will enable most things to report their status in real time 4 Major changes in the traditional DBMS topics  Technology advances require us to re-assess:  Data models, access methods, query processing algorithms, concurrency control, recovery, query language, user interface  Ex: Storage is improving in capacity and cost. Thus, storage management and query-processing algorithms have to be re-assessed.  Cache-aware  Maturation of related technologies, like data mining, web search engines, artificial intelligence (speech, natural language, reasoning with uncertainty, machine learning)  Personal information manager 5 Next Generation Infrastructure 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Integration of Text, Data, Code, and Streams Information Fusion Sensor Data and Sensor Networks Multimedia Queries Reasoning about Uncertain Data Personalization Data Mining Self Adaptation Privacy Trustworthy Systems New User Interfaces One-Hundred-Year Storage Query Optimization 6 Integration of Text, Data, Code, and Streams  The Web has demonstrated the importance of more sophisticated data types, like text, temporal, spatial, sound, image, or video data.  Make the following “fist-class citizen” of DBMS:        Uncertainty management (like information retrieval) User-defined procedure data Text, space, time image, and multimedia data Structured data Triggers (scalable) Data streams (from micro-sensor devices) and queues Scientific dataset  XML Schema and XQuery are too complex to be the basis for this sort of new architecture 7 Information Fusion  Typical approach to information integration is by an      extract-transform-load (ETL) tool to build data warehouse and data marts for a single cooperation With the internet enabled integration integration among different enterprises, data must stay at the sources and be accessed at query time ETL tool cannot be applied to sensor dataset Web semantic heterogeneity solution still elusive At web scale, query execution must move to probabilistic evidence accumulation When integration among autonomous enterprises, each query processing must reveal only the minimal information necessary in conformance with security 8 Sensor Data and Sensor Networks  Self-powered, wireless device  Draws more power when communicating than when computing  It is preferable to distribute query computation to the individual nodes  Query execution on sensor networks requires the ability to adapt to rapidly changing configurations  How to deduce high-level fact from very lowlevel signals 9 Multimedia Queries  How to create easy ways to analyze, summarize, search, and view the “electronic shoebox” of a person’s multimedia information  Ex: how to prepare a multimedia presentation about a child 10 Reasoning about Uncertain Data  Non-business data is essentially uncertain or imprecise    Scientific measurements have standard errors GPS data involves uncertainty in current position Sequence, image, and text similarity are approximate metrics  The “lineage” of the data must be tracked  Query processing must move to a stochastic model, where evidence accumulation is performed to obtain a better and better answer  Must handle imprecise queries  Must be able to characterize the accuracy offered 11 Personalization  Query answers should depend on personal profiles  Relevance and relevance feedback should also depend on the person and the context  A framework for including and exploiting appropriate metadata for mass personalization  Personalization and uncertainty leave one with a need to verify that the answer is “correct” 12 Data Mining  Historically, data mining focuses on efficient ways to discover models of existing data sets  Data warehouse users have only one data mining query: “something interesting”  Need to develop algorithms and structures to look for “unexpected pearls”, while running in the background and consuming excess system resources  Need to integrate data mining with querying, optimization, and other DB facilities such as triggers 13 Self Adaptation  Modern DBMSs are too complex  To simplify DB administration 1. It should be possible to perform tuning using a combination of a rule-based system and a database of knob settings and configuration data. This needs more sophisticated models of user behaviors and workloads. 2. DBMSs need to recognize internal malfunctions and malfunctions of communicating components, identify data corruption, detect application failures, and do something about them 14 Privacy  Data-oriented security needs to be revitalized  Need to address the concerns, policies and mechanisms to support multiple individual options and controls on information held by third parties  Access decisions should be based not only on who is requesting the data but also on to what use it will be put. 15 Trustworthy Systems  Safely store data, protect it from unauthorized disclosure, protect it from loss, and make it always available to authorized users.  Digital rights management  Ensuring the correctness of query results and data-intensive computation for embedded systems  Use logical inference technology in validating correctness 16 New User Interfaces  Sophisticated visualization systems  Keyword-based query and browsing  Use speech or natural language to query through semantic web and ontology 17 One-Hundred-Year Storage  A need for indefinite electronic storage of information   Requires mechanisms for migration and for emulation Requires metadata for lineage and context 18 Query Optimization  Optimization of information integrators, for semi-structured query language like XQuery, for stream processors, for sensor networks, and other domains  Inter-query optimization involving large numbers of queries 19 Next steps and Discussions  Generate a test bed and collection of integration tasks  Classroom scheduling  At which level should information integration occur  DB or application  Will web services make any progress on addressing semantic heterogeneity? 20