Download Emerging Research Directions in DBs/ISs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mobile business intelligence wikipedia , lookup

Collaborative decision-making software wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
Emerging Research
Directions in DBs/ISs
Outline








Mobile Databases
Multimedia Databases
Geographic Information Systems
Bioinformatics
XML
Data Mining
Data Warehousing
Introduction to ASIS Lab
2
Mobile Databases
 Recent advances in portable and wireless technology led to
mobile computing, a new dimension in data communication
and processing.
 Portable computing devices coupled with wireless
communications allow clients to access data from virtually
anywhere and at any time.
 There are a number of hardware and software problems that
must be resolved before the capabilities of mobile computing
can be fully utilized.
 Some of the software problems – which may involve data
management, transaction management, and database
recovery – have their origins in distributed database
systems.
3
Mobile Databases(2)
 In mobile computing, the problems are more
difficult, mainly:
• The limited and intermittent connectivity afforded by
wireless communications.
• The limited life of the power supply(battery).
• The changing topology of the network.
• In addition, mobile computing introduces new
architectural possibilities and challenges.
4
Mobile Computing Architecture
5
Mobile Computing Architecture(2)
 It is distributed architecture where a number of
computers, generally referred to as Fixed Hosts
and Base Stations are interconnected through a
high-speed wired network.
• Fixed hosts are general purpose computers configured to
manage mobile units.
• Base stations function as gateways to the fixed network
for the Mobile Units.
6
Data Management Issues
 From a data management standpoint, mobile computing may
be considered a variation of distributed computing. Mobile
databases can be distributed under two possible scenarios:
• The entire database is distributed mainly among the wired
components, possibly with full or partial replication.
→A base station or fixed host manages its own database with a
DBMS-like functionality, with additional functionality for locating
mobile units and additional query and transaction management
features to meet the requirements of mobile environments.
• The database is distributed among wired and wireless
components.
→Data management responsibility is shared among base stations
or fixed hosts and mobile units.
7
Data Management Issues(2)
 Data management issues as it is applied to mobile
databases:
•
•
•
•
•
•
•
•
Data distribution and replication
Transactions models
Query processing
Recovery and fault tolerance
Mobile database design
Location-based service
Division of labor
Security
 M-Commerce
8
Outline








Mobile Databases
Multimedia Databases
Geographic Information Systems
Bioinformatics
XML
Data Mining
Data Warehousing
Introduction to ASIS Lab
9
Multimedia Databases
 In the years ahead multimedia information systems
are expected to dominate our daily lives.
• Our houses will be wired for bandwidth to handle
interactive multimedia applications.
• Our high-definition TV/computer workstations will have
access to a large number of databases, including digital
libraries, image and video databases that will distribute
vast amounts of multisource multimedia content.
10
Multimedia Databases (2)
 DBMSs have been constantly adding to the types of
data they support.
 Today many types of multimedia data are available
in current systems.
11
Multimedia Databases(3)
 Types of multimedia data are available in current
systems
• Text: May be formatted or unformatted. For ease of
parsing structured documents, standards like SGML and
variations such as HTML are being used.
• Graphics: Examples include drawings and illustrations
that are encoded using some descriptive standards (e.g.
CGM, PICT, postscript).
12
Multimedia Databases(4)
 Types of multimedia data are available in current
systems (contd.)
• Images: Includes drawings, photographs, and so forth,
encoded in standard formats such as bitmap, JPEG, and
MPEG. Compression is built into JPEG and MPEG.
→These images are not subdivided into components. Hence
querying them by content (e.g., find all images containing circles)
is nontrivial.
• Animations: Temporal sequences of image or graphic
data.
13
Multimedia Databases(5)
 Types of multimedia data are available in current
systems (contd.)
• Video: A set of temporally sequenced photographic data
for presentation at specified rates– for example, 30
frames per second.
• Structured audio: A sequence of audio components
comprising note, tone, duration, and so forth.
14
Multimedia Databases(6)
 Types of multimedia data are available in current
systems (contd.)
• Audio: Sample data generated from aural recordings in a
string of bits in digitized form. Analog recordings are
typically converted into digital form before storage.
15
Multimedia Databases(7)
 Types of multimedia data are available in current
systems (contd.)
• Composite or mixed multimedia data: A combination of
multimedia data types such as audio and video which
may be physically mixed to yield a new storage format or
logically mixed while retaining original types and formats.
Composite data also contains additional control
information describing how the information should be
rendered.
16
Data Management Issues
 Multimedia applications dealing with thousands of
images, documents, audio and video segments, and
free text data depend critically on
• Appropriate modeling of the structure and content of data
• Designing appropriate database schemas for storing and
retrieving multimedia information.
17
Outline









Mobile Databases
Multimedia Databases
Geographic Information Systems
Bioinformatics
XML
Data Mining
Data Warehousing
Introduction to ASIS Lab
Revision
18
Geographic Information Systems
 Geographic information systems(GIS) are used to
collect, model, and analyze information describing
physical properties of the geographical world.
19
Geographic Information Systems(2)
 The scope of GIS broadly encompasses two types of data:
• Spatial data, originating from maps, digital images,
administrative and political boundaries, roads, transportation
networks, physical data, such as rivers, soil characteristics,
climatic regions, land elevations, and
• Non-spatial data, such as socio-economic data (like census
counts), economic data, and sales or marketing information.
GIS is a rapidly developing domain that offers highly innovative
approaches to meet some challenging technical demands.
20
Geographic Information Systems(3)
21
Spatial data
22
GIS Applications
 It is possible to divide GISs into three categories:
• Cartographic applications
• Digital terrain modeling applications
• Geographic objects applications
23
GIS Applications(2)
GIS Applications
Digital Terrain Modeling
Applications
Cartographic
Geographic Objects
Applications
Irrigation
Crop yield
analysis
Land
Evaluation
Planning and
Facilities
management
Earth
science
Civil engineering and
military evaluation
Soil Surveys
Air and water
pollution studies
Landscape
studies
Flood Control
Traffic pattern
analysis
Water resource
management
Car navigation
systems
Geographic
market analysis
Utility
distribution and
consumption
Consumer product
and services –
economic analysis
24
Data Management Requirements of GIS
 The functional requirements of the GIS applications
above translate into the following database
requirements.
25
Data Management Requirements of GIS (2)
Data Modeling and Representation
 GIS data can be broadly represented in two
formats:
• Vector data represents geometric objects such as points,
lines, and polygons.
26
Data Management Requirements of GIS (3)
 Data Modeling and Representation (contd.):
• Raster data is characterized as an array of points, where
each point represents the value of an attribute for a realworld location.
→Informally, raster images are n-dimensional array where each
entry is a unit of the image and represents an attribute. Twodimensional units are called pixels, while three-dimensional units
are called voxels.
→Three-dimensional elevation data is stored in a raster-based
digital elevation model (DEM) format.
27
Data Management Requirements of GIS (4)
Data Integration
 GISs must integrate both vector and raster data from a
variety of sources.
• Sometimes edges and regions are inferred from a raster image
to form a vector model, or conversely, raster images such as
aerial photographs are used to update vector models.
• Several coordinate systems such as Universal Transverse
Mercator (UTM), latitude/longitude, and local cadastral systems
are used to identify locations.
• Data originating from different coordinate systems requires
appropriate transformations.
28
Specific GIS Data Operations
 GIS applications are conducted through the use of
special operators such as the following:
•
•
•
•
•
Interpolation
Interpretation
Proximity analysis
Raster image processing
Analysis of networks
29
Specific GIS Data Operations(2)
 The functionality of a GIS database is also subject to other
considerations:
• Extensibility
• Data quality control
• Visualization
 Such requirements clearly illustrate that standard RDBMSs
or ODBMSs do not meet the special needs of GIS.
• Therefore it is necessary to design systems that support the
vector and raster representations and the spatial functionality
as well as the required DBMS features.
30
Outline









Mobile Databases
Multimedia Databases
Geographic Information Systems
Bioinformatics
XML
Data Mining
Data Warehousing
Introduction to ASIS Lab
Revision
31
Bioinformatics
 Bioinformatics: The study of genetics can be divided into
three branches:
• Mendelian genetics is the study of the transmission of traits between
generations
• Molecular genetics is the study of the chemical structure and function
of genes at the molecular level
• Population genetics is the study of how genetic information varies
across populations of organisms
 Bioinformatics addresses information management of
genetic information with special emphasis on DNA sequence
analysis
 Interdisciplinary research field
32
Outline









Mobile Databases
Multimedia Databases
Geographic Information Systems
Bioinformatics
XML
Data Mining
Data Warehousing
Introduction to ASIS Lab
Revision
33
XML: Extensible Markup Language
 Although HTML is widely used for formatting and structuring
Web documents, it is not suitable for specifying structured
data that is extracted from databases.
 A new language—namely XML (eXtended Markup
Language) has emerged as the standard for structuring and
exchanging data over the Web.
• XML can be used to provide more information about the
structure and meaning of the data in the Web pages rather
than just specifying how the Web pages are formatted for
display on the screen.
 The formatting aspects are specified separately—for
example, by using a formatting language such as XSL
(eXtended Stylesheet Language).
34
XML (2)
 Example1:
 Example2:
35
XML (3)
 The basic object is XML is the XML document.
 There are two main structuring concepts that are
used to construct an XML document:
• Elements
• Attributes
 Attributes in XML provide additional information that
describe elements.
36
XML(4)
 As in HTML, elements are identified in a document by their
start tag and end tag.
• The tag names are enclosed between angled brackets <…>, and
end tags are further identified by a backslash </…>.
 Complex elements are constructed from other elements
hierarchically, whereas simple elements contain data
values.
 It is straightforward to see the correspondence between the
XML textual representation and the tree structure.
• In the tree representation, internal nodes represent complex
elements, whereas leaf nodes represent simple elements.
• That is why the XML model is called a tree model or a
hierarchical model.
37
Outline









Mobile Databases
Multimedia Databases
Geographic Information Systems
Bioinformatics
XML
Data Mining
Data Warehousing
Introduction to ASIS Lab
Revision
38
Definitions of Data Mining
 The discovery of new information in terms of
patterns or rules from vast amounts of data.
 The process of finding interesting structure in data.
 The process of employing one or more computer
learning techniques to automatically analyze and
extract knowledge from data.
39
Knowledge Discovery in Databases
(KDD)
 Data mining is actually one step of a larger process
known as knowledge discovery in databases
(KDD).
 The KDD process model comprises six phases
•
•
•
•
•
•
Data selection
Data cleansing
Enrichment
Data transformation or encoding
Data mining
Reporting and displaying discovered knowledge
40
Outline









Mobile Databases
Multimedia Databases
Geographic Information Systems
Bioinformatics
XML
Data Mining
Data Warehousing
Introduction to ASIS Lab
Revision
41
Data Warehousing
 The data warehouse is a historical database
designed for decision support.
 Data mining can be applied to the data in a
warehouse to help with certain types of decisions.
 Proper construction of a data warehouse is
fundamental to the successful use of data mining.
 W. H Inmon characterized a data warehouse as:
• “A subject-oriented, integrated, nonvolatile, timevariant collection of data in support of management’s
decisions.”
42
Data Warehousing (2)
 Purpose of Data Warehousing
• Traditional databases are not optimized for data access
only they have to balance the requirement of data access
with the need to ensure integrity of data.
• Most of the times the data warehouse users need only
read access but, need the access to be fast over a large
volume of data.
• Most of the data required for data warehouse analysis
comes from multiple databases and these analysis are
recurrent and predictable to be able to design specific
software to meet the requirements.
• There is a great need for tools that provide decision
makers with information to make decisions quickly and
reliably based on historical data.
• The above functionality is achieved by Data Warehousing
and Online analytical processing (OLAP)
43
Data Warehousing (3)
 Applications that data warehouse supports are:
• OLAP (Online Analytical Processing) is a term used to
describe the analysis of complex data from the data
warehouse.
• DSS (Decision Support Systems) also known as EIS
(Executive Information Systems) supports organization’s
leading decision makers for making complex and
important decisions.
• Data Mining is used for knowledge discovery, the
process of searching data for unanticipated new
knowledge.
44
Conceptual Structure of Data Warehouse
 Data Warehouse processing involves
• Cleaning and reformatting of data
• OLAP
• Data Mining
45
Comparison with Traditional Databases
 Data Warehouses are mainly optimized for appropriate data
access.
• Traditional databases are transactional and are optimized for both
access mechanisms and integrity assurance measures.
 Data warehouses emphasize more on historical data as their
main purpose is to support time-series and trend analysis.
 Compared with transactional databases, data warehouses
are nonvolatile.
 In transactional databases transaction is the mechanism
change to the database. By contrast information in data
warehouse is relatively coarse grained and refresh policy is
carefully chosen, usually incremental.
46
Outline









Mobile Databases
Multimedia Databases
Geographic Information Systems
Bioinformatics
XML
Data Mining
Data Warehousing
Introduction to ASIS Lab
Revision
47
Introduction to ASIS Lab
 Advances in Security & Information Systems Lab
(www.cse.hcmut.edu.vn/~asis )
 Research Directions (2006-2010)
• Information Systems Security:
→Database Security
→Security Issues in E-/M-Commerce
→Security and Privacy in Location-Based Applications
→Security Issues in Outsourced Databases Services
→DBs/ISs Security Visualization
→E-Learning Systems Security
→Digital Watermarking and Steganography
→Privacy and Identity Management
48
Introduction to ASIS_Lab(2)
 Research Directions (2006-2010) (cont.)
• Advanced Information Systems:
→E-/M-Commerce
→SOA-Based Modern Information Systems
→Large Database Systems
→Web Information Systems
→Modern Information Retrieval Systems
→Stream Data Management*
→Bioinformatics*
49
Outline








Mobile Databases
Multimedia Databases
Geographic Information Systems
Bioinformatics
XML
Data Mining
Data Warehousing
Introduction to ASIS Lab
50
Q&A
51