Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Engineering & Management (DEM) Proposal for ACM SAC 2007 Track Chairs William Perrizo Imad Rahal Baoying Wang IACC 258 Computer Science Department North Dakota State University Fargo, ND 58105 211 PE Science Center Computer Science Department College of Saint Benedict / Saint John’s University Collegeville, MN 56321 406 Stewart Science Hall Mathematics, Computer Science, and Physics Waynesburg College Waynesburg, PA 15370 [email protected] (701) 231-7284 [email protected] (320) 363-2837 Description The huge volumes of data being generated in numerous application areas such as bioinformatics, agriculture, medicine, business, networks, and the like have motivated a universal solicitation for techniques capable of managing, querying and analyzing such data by directly dealing with their volumes. Data engineering techniques are highly potent processes capable of organizing, managing, queries and retrieving large data sets and discovering and extracting important and useful patterns from huge volumes of raw data. Such data engineering processes form the intersection among a number of popular, but older, research areas, such as databases, information retrieval, machine learning, artificial intelligence, statistics, and the like. In short, data engineering as a research direction has been mainly motivated by the availability of huge amounts of data and a universal need for its management along with uncovering any embedded “interesting” information and knowledge that might play useful roles in decision-making processes. [email protected] (724) 852-3285 The DEM track will cover a broad range of topics including theory, methods, applications, and tools. The track particularly welcomes contributions at the junction of theory and practice disseminating basic research with immediate impact on practical applications. The core emphasis will be on high performance pragmatic and theoretical answers to the two infamous data problems of this day and age: the curse of scalability and the curse of high dimensionality. Learning patterns from large volumes of data, learning and predicting events, adapting to situations, rationalizing, providing autonomous control, and assisting in executive decision-making are some of the state-of-the potential directions in data engineering. Rationale An unfortunate situation that is facing the research community is the lack of communication among various factions of researchers working on similar problems but in different contexts such as database, data mining and information retrieval with specialized conferences like ACM SIGMOD/PODS, ACM CIKM, ACM SIGIR, ACM SIGKDD, IEEE ICDM, and IEEE ICDE to name a few. They all have an implicit objective in common which is to devise solutions to data from different aspects such as management, retrieval and analysis that would scale. We expect this track to provide a venue facilitating communication among researchers from the different aforementioned factions and make them focus on the problems rather than the contexts. One rather new data engineering direction is vertical data organization which permits a number of transactions accessing the same data units to execute concurrently. Such an organizing also makes caching work well, makes compression easy to achieve, and may greatly speedup I/O operations since only participating attributes are retrieved instead of whole records thus resulting in significantly improved data management, retrieval and analysis systems. The DEM track acknowledges vertical data organizations as highly potent solutions to a lot of scalability and dimensionality issues and, thus, particularly welcomes contributions utilizing vertical approaches for improved scalability and alleviating dimensionality problems. Topics Large-scale Database Integration and Interoperability Query Processing and Optimization for Large Databases Data Structures for Storing Large Data Sets Semi-structured and XML Databases Distributed, Parallel, Peer to Peer Large-scale Databases and Data Operations Scientific, Biological, Stream, Temporal, Multimedia, Sensor and Business Data Systems Data Grids, Data Warehousing and OLAP Database System Internals and Performance Large-scale Data Mining (cardinality and dimensionality scalability) Traditional Data Mining (e.g., Classification, Clustering Association Rule Mining, and Outlier Analysis) Data Management, Retrieval and Analysis Complexity and Efficiency (e.g., quality and interestingness metrics) Data Pre-processing (e.g., data reduction, feature selection, transformation) High Performance and Parallel/Distributed Data Mining Security and Privacy in Data Mining and Databases Intelligent Systems Information Retrieval Techniques and Systems Search Engines Distributed Information Retrieval Distribute Data Management Undertaken Activities We plan to take full charge in executing the following activities (1) disseminate the call-for-paper for the track on popular advertising media like DBWorld, (2) fully (and timely) manage the review process (including managing paper submissions, assigning review assignments to reviewers based on backgrounds and preferences, collecting reviewer feedback in a timely manner, and notifying authors of decisions) most probably by employing a conference management system, (3) collect final materials from authors of accepted paper, and (4) design the track program and supervise it on site.