Download Final_Proposal_0330

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Engineering & Management (DEM)
Proposal for ACM SAC 2007
Track Chairs
William Perrizo
Imad Rahal
Baoying Wang
IACC 258
Computer Science
Department
North Dakota State University
Fargo, ND 58105
211 PE Science Center
Computer Science
Department
College of Saint Benedict /
Saint John’s University
Collegeville, MN 56321
406 Stewart Science Hall
Mathematics, Computer
Science, and Physics
Waynesburg College
Waynesburg, PA 15370
[email protected]
(701) 231-7284
[email protected]
(320) 363-2837
Description
The huge volumes of data being
generated in numerous application areas
such
as
bioinformatics,
agriculture,
medicine, business, networks, and the like
have motivated a universal solicitation for
techniques capable of managing, querying
and analyzing such data by directly dealing
with their volumes. Data engineering
techniques are highly potent processes
capable of organizing, managing, queries
and retrieving large data sets and
discovering and extracting important and
useful patterns from huge volumes of raw
data. Such data engineering processes form
the intersection among a number of popular,
but older, research areas, such as
databases, information retrieval, machine
learning, artificial intelligence, statistics, and
the like. In short, data engineering as a
research direction has been mainly
motivated by the availability of huge
amounts of data and a universal need for its
management along with uncovering any
embedded “interesting” information and
knowledge that might play useful roles in
decision-making processes.
[email protected]
(724) 852-3285
The DEM track will cover a broad
range of topics including theory, methods,
applications,
and
tools.
The
track
particularly welcomes contributions at the
junction
of
theory
and
practice
disseminating
basic
research
with
immediate impact on practical applications.
The core emphasis will be on high
performance pragmatic and theoretical
answers to the two infamous data problems
of this day and age: the curse of scalability
and the curse of high dimensionality.
Learning patterns from large volumes of
data, learning and predicting events,
adapting
to
situations,
rationalizing,
providing autonomous control, and assisting
in executive decision-making are some of
the state-of-the potential directions in data
engineering.
Rationale
An unfortunate situation that is
facing the research community is the lack of
communication among various factions of
researchers working on similar problems but
in different contexts such as database, data
mining and information retrieval with
specialized
conferences
like
ACM
SIGMOD/PODS, ACM CIKM, ACM SIGIR,
ACM SIGKDD, IEEE ICDM, and IEEE ICDE
to name a few. They all have an implicit
objective in common which is to devise
solutions to data from different aspects such
as management, retrieval and analysis that
would scale. We expect this track to provide
a venue facilitating communication among
researchers
from
the
different
aforementioned factions and make them
focus on the problems rather than the
contexts.
One rather new data engineering
direction is vertical data organization which
permits a number of transactions accessing
the same data units to execute concurrently.
Such an organizing also makes caching
work well, makes compression easy to
achieve, and may greatly speedup I/O
operations since only participating attributes
are retrieved instead of whole records thus
resulting in significantly improved data
management,
retrieval
and
analysis
systems. The DEM track acknowledges
vertical data organizations as highly potent
solutions to a lot of scalability and
dimensionality issues and, thus, particularly
welcomes contributions utilizing vertical
approaches for improved scalability and
alleviating dimensionality problems.
Topics

Large-scale Database Integration and
Interoperability

Query Processing and Optimization for
Large Databases

Data Structures for Storing Large Data
Sets

Semi-structured and XML Databases

Distributed, Parallel, Peer to Peer
Large-scale Databases and Data
Operations

Scientific, Biological, Stream, Temporal,
Multimedia, Sensor and Business Data
Systems

Data Grids, Data Warehousing and
OLAP

Database System Internals and
Performance

Large-scale Data Mining (cardinality and
dimensionality scalability)

Traditional Data Mining (e.g.,
Classification, Clustering Association
Rule Mining, and Outlier Analysis)

Data Management, Retrieval and
Analysis Complexity and Efficiency
(e.g., quality and interestingness
metrics)

Data Pre-processing (e.g., data
reduction, feature selection,
transformation)

High Performance and
Parallel/Distributed Data Mining

Security and Privacy in Data Mining and
Databases

Intelligent Systems

Information Retrieval Techniques and
Systems

Search Engines

Distributed Information Retrieval

Distribute Data Management
Undertaken Activities
We plan to take full charge in executing
the following activities (1) disseminate
the call-for-paper for the track on
popular advertising media like DBWorld,
(2) fully (and timely) manage the review
process (including managing paper
submissions,
assigning
review
assignments to reviewers based on
backgrounds and preferences, collecting
reviewer feedback in a timely manner,
and notifying authors of decisions) most
probably by employing a conference
management system, (3) collect final
materials from authors of accepted
paper, and (4) design the track program
and supervise it on site.