Download 4.1 Genome Data Management(3)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Molecular ecology wikipedia , lookup

Transcript
New kinds of databases

1. Distributed Databases
2. Warehouse Architecture
3. Mobile databases
4. GIS

5. Multimedia Databases

6. Genome Data Management



Distributed Databases
Distributed Database System
DBMS
DBMS
DBMS
DBMS
data
data
data
data
CS 245
Notes 12
3
Parallelism: Pipelining

Example:


CS 245
T1  SELECT *
FROM A WHERE cond
T2  JOIN T1 and B
select
join
A
B
(with index)
Notes 12
4
Parallelism: Concurrent Operations

Example: SELECT * FROM A WHERE cond
merge
data location is
important...
select
select
select
A where
A where
A where
A.x < 10
CS 245
10  A.x < 20
Notes 12
20  A.x
5
ATM

Two Phase Commit
Bank
Mainframe
ATM
CS 245
Notes 12
6
What is a Warehouse?

Collection of diverse data







subject oriented
aimed at executive, decision maker
often a copy of operational data
with value-added data (e.g., summaries, history)
integrated
time-varying
non-volatile
more
CS 245
Notes11
7
What is a Warehouse?

Collection of tools





gathering data
cleansing, integrating, ...
querying, reporting, analysis
data mining
monitoring, administering warehouse
CS 245
Notes11
8
Warehouse Architecture
Client
Client
Query & Analysis
Metadata
Warehouse
Integration
Source
CS 245
Source
Notes11
Source
9
Motivating Examples




Forecasting
Comparing performance of units
Monitoring, detecting fraud
Visualization
CS 245
Notes11
10
Mobile Databases
Recent advances in portable and wireless technology led to mobile
computing, a new dimension in data communication and
processing.
Portable computing devices coupled with wireless communications
allow clients to access data from virtually anywhere and at any
time.
2 Multimedia Databases
In the years ahead multimedia information systems are expected to
dominate our daily lives. Our houses will be wired for
bandwidth to handle interactive multimedia applications. Our
high-definition TV/computer workstations will have access to a
large number of databases, including digital libraries, image
and video databases that will distribute vast amounts of
multisource multimedia content.
2.1 Multimedia Databases
DBMSs have been constantly adding to the types of data they
support. Today the following types of multimedia data are
available in current systems.

Text: May be formatted or unformatted. For ease of parsing
structured documents, standards like SGML and variations
such as HTML are being used.

Graphics: Examples include drawings and illustrations that are
encoded using some descriptive standards (e.g. CGM, PICT,
postscript).
2.1 Multimedia Databases(2)


Images: Includes drawings, photographs, and so forth,
encoded in standard formats such as bitmap, JPEG, and
MPEG. Compression is built into JPEG and MPEG. These
images are not subdivided into components. Hence querying
them by content (e.g., find all images containing circles) is
nontrivial.
Animations: Temporal sequences of image or graphic data.
2.1 Multimedia Databases(3)



Video: A set of temporally sequenced photographic data for
presentation at specified rates– for example, 30 frames per
second.
Structured audio: A sequence of audio components comprising
note, tone, duration, and so forth.
Audio: Sample data generated from aural recordings in a string
of bits in digitized form. Analog recordings are typically
converted into digital form before storage.
2.1 Multimedia Databases(4)

Composite or mixed multimedia data: A combination of
multimedia data types such as audio and video which may be
physically mixed to yield a new storage format or logically
mixed while retaining original types and formats. Composite
data also contains additional control information describing
how the information should be rendered.
2.1 Multimedia Databases(5)
Nature of Multimedia Applications: Multimedia data may be stored,
delivered, and utilized in many different ways. Applications
may be categorized based on their data management
characteristics as follows:

Repository applications: A large amount of multimedia data as
well as metadata is stored for retrieval purposes. Examples
include repositories of satellite images, engineering drawings
and designs, space photographs, and radiology scanned
pictures.
2.1 Multimedia Databases(6)

Presentation applications: A large amount of applications
involve delivery of multimedia data subject to temporal
constraints; simple multimedia viewing of video data, for
example, requires a system to simulate VCR-like functionality.
Complex and interactive multimedia presentations involve
orchestration directions to control the retrieval order of
components in a series or in parallel. Interactive environments
must support capabilities such as real-time editing analysis or
annotating of video and audio data.
2.3 Multimedia Database Applications
Large-scale applications of multimedia databases can be expected
encompasses a large number of disciplines and enhance
existing capabilities.

Documents and records management

Knowledge dissemination

Education and training

Marketing, advertising, retailing, entertainment, and travel

Real-time control and monitoring
3 Geographic Information Systems
Geographic information systems(GIS) are used to collect, model,
and analyze information describing physical properties of the
geographical world. The scope of GIS broadly encompasses
two types of data:
1.
spatial data, originating from maps, digital images,
administrative and political boundaries, roads, transportation
networks, physical data, such as rivers, soil characteristics,
climatic regions, land elevations, and
3 Geographic Information Systems(2)
2.
nonspatial data, such as socio-economic data (like census
counts), economic data, and sales or marketing information.
GIS is a rapidly developing domain that offers highly innovative
approaches to meet some challenging technical demands.
3.1 GIS Applications
It is possible to divide GISs into three categories:

cartographic applications,

digital terrain modeling applications, and

geographic objects applications
3.1 GIS Applications(2)
GIS Applications
Geographic Objects
Digital Terrain
Applications
Modeling
Irrigation Applications
Car
Earth
navigation
science
Crop yield
systems
Geographic
analysis
Civil engineering
market analysis
and military
Land
Utility
evaluation
Evaluation
Soil Surveys
distribution
Planning and
Air and water
and
Facilities
pollution studies
Consumer product
consumption
management
and services –
Landscape
Flood Control
economic analysis
studies
Cartographic
Traffic pattern
analysis
Water resource
management
4.1 Genome Data Management
Biological Sciences and Genetics: The biological sciences
encompass an enormous variety of information. Environmental
science gives us a view of how species live and interact in a
world filled with natural phenomena. Biology and ecology study
particular species. Anatomy focuses on the overall structure of
an organism, documenting the physical aspects of individual
bodies. Traditional medicine and physiology break the
organism into systems and tissues and strive to collect
information on the workings of these systems and the
organism as a whole.
4.1 Genome Data Management(2)
Histology and cell biology delve into the tissue and cellular levels
and provide knowledge about the inner structure and function
of the cell. This wealth of information that has been generated,
classified, and stored for centuries has only recently become a
major application of database technology.
4.1 Genome Data Management(3)
Genetics has emerged as an ideal field for the application of
information technology. In a broad sense, it can be taught of as
the construction of models based on information about genes –
which can be defined as units of heredity – and population and
the seeking out of relationships in that information.
4.1 Genome Data Management(4)
The study of genetics can be divided into three branches:
1.
Mendelian genetics is the study of the transmission of traits
between generations.
2.
Molecular genetics is the study of the chemical structure and
function of genes at the molecular level.
3.
Population genetics is the study of how genetic information
varies across populations of organisms.
4.1 Genome Data Management(5)
The origins of molecular genetics can be traced to two important
discoveries:
1.
In 1869 when Friedrich Miescher discovered nuclein and its
primary component, deoxyribonucleic acid (DNA). In
subsequent research DNA and a related compound,
ribonucleic acid , were found to be composed of nucleotides (a
sugar, a phosphate, and a base which combined to form
nucleic acid) linked into long polymers via the sugar and
phosphate.
4.1 Genome Data Management(6)
2.
The second discovery was the demonstration in 1944 by
Oswald Avery that DNA was indeed the molecular substance
carrying genetic information.
4.1 Genome Data Management(7)
Genes were shown to be composed of chains of nucleic acids
arranged linearly on chromosomes and to serve three primary
functions:
1.
replicating genetic information between generations,
2.
providing blueprints for the creation of polypeptides, and
3.
accumulating changes– thereby allowing evolution to occur.
Watson and Crick found the double-helix structure of the DNA in
1953, which gave molecular biology a new direction.
Summary

From this lecture you can learn some new
kinds of databases
Any Questions?
If there are any outstanding questions you can ask me
one-to-one after the lecture OR privately in my office.
Exercises