Download PowerPoint Template - Internet Database Lab.

Document related concepts
no text concepts found
Transcript
Database Technology
Prof. Hyoung-Joo Kim
Internet Database Lab
School of Computer Sci & Eng
Seoul National University
1
Contents
•
•
•
•
A general survey of DBMS
History of DBMS
Database market share
The current
Research DBMS
in IDB Lab.trend
2
What is a Database?(1/10)
DBMS

A software system which
provides the environment
enables to store and retrieve
massive data effectively
3
What is a Database?(2/10)
A large collection of data
Data + Programs
Database
STORE
4
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
What is a Database?(3/10)
 Information about register and course of
40,000 students of the Seoul Natl’ Univ.
45 courses,
10K records per student
course
term
register
grade
prof
course
term
register
grade
prof
10K Byte * 40,000 = 400M Byte
Others:
library, health center, S-card, …
5
What is a Database?(4/10)
 Information of SAT management
profile
answer
rate
ranking
…
Profile
Answer
Rate
ranking
…
8K records per student
Year 2006: 550,000
Year 2005: 570,000
8K Byte * 550,000 = 4.4G Byte (109)
6
What is a Database?(5/10)
 Information of mobile phone
phone
number
station
time
…
phone
number
station
time
…
60KB record per one
39M * 60 Byte * 5calls/day * 365 days = 4T Byte
Korea 2006.7
7
China 370M in 2005
What is a Database?(6/10)
 Information of resident registration
10KB record per one
SSN
name
addr
domicile
…
SSN
name
addr
domicile
…
10K Byte * 470 M = 5T Byte (47millions)
8
What is a Database?(7/10)
 Google database
8billion’s Websites, 2billion’s indexing terminology management
Usenet archive = 700 Million messages * 20KB/message =
9
14 TB
What is a Database?(8/10)
 Hubble space telescope data from Mars
Data constructed by 2005 : over 12 TB
Constructing and sending 3~5GB’s data abroad daily
10
What is a Database?(9/10)
 NCBI (National Center for Biotechnology Information)
GenBank
• management of information of 165,000 species
• add 3million’s new DNA sequence monthly
11
What is a Database?(10/10)
 Genome map of Koreans
Venture “MacroGen”
SNU Medical School
Early version: 900G Byte Final product: 15T Byte
12
What do we do with Database?(1/2)
 Record search

Retrieve math grade of the student whose SSN is “840101-12121”
740,000 * 5 records = 3.7 M records
12ms to fetch a record and check content
3.7M * 12ms = 44.4Kseconds = over 12 hours
Statistical processing
for population census
Search for the correlation
between gene and disease
If we use DBMS,
it will be less than 0.1sec!
DBMS
13
Search for the
purchase pattern on
customer groups
What do we do with Database?(2/2)
 Most (all?) computing applications use some type of a database
CRM
ERP
Data Warehouse
MIS, ERP
OLTP
EDPS
Database
Database
Database
Database
14
Database Management System (DBMS) (1/3)
Warehouse
15
Database Management System (DBMS) (2/3)
Warehouse
Warehouse
keeper
16
Database Management System (DBMS) (3/3)
Database
Management of
orders on-line
profile
product
customer
user
DBMS
Management of
wages
stock
Management of
manager info.
Application
17
sale
DBMS Architecture
naive
users
application
programmers
casual
users
database
administrator
application
programs
system
calls
query
database
scheme
data manipulation
language
pre-compiler
application
programs
object
query
processor
database
manager
file
manager
Disk storage
18
data definition
language
compiler
DBMS
A Sample Relational Database
19
SQL
 SQL: widely used commercial query language


E.g. find
select
from
where
the name of the customer with customer-id 192-83-7465
customer.customer-name
customer
customer.customer-id = ‘192-83-7465’
E.g. find the balances of all accounts held by the customer with
customer-id 192-83-7465
select account.balance
from depositor, account
where depositor.customer-id = ‘192-83-7465’ and
depositor.account-number = account.account-number
20
Major Commercial DBMS in 2006(1/3)
Market Leader
Stability
Mass storage literacy
Famous CEO
21
10g
Major Commercial DBMS in 2006(2/3)
Integration with Window NT/XP
PC based (Windows NT)
Microsoft!!!
22
Major Commercial DBMS in 2006(3/3)
Stability
Mainframe
Informix purchase
IBM
23
Database Companies in the World
24
Contents
•
•
•
•
A general survey of DBMS
History of DBMS
Database market share
The current
Research DBMS
in IDB Lab.trend
25
Hierarchical, Network DBMS
The early 70’
IMS (IBM), System/2000(MRA)
DMS 1100 (Sperry), Total (Cincom)
Advantage: quick data access using link
Drawback: impossible to make out independent application
26
Network Database example
Root Record
Customer
records
Lowery
Maple
Queens
Hodges
Shiver
Amount
records
900
North
556
SideHill Brooklyn
Bronx
647
647
Query
What’s the total balance of Mr. Shiver in Bronx?
27
801
Network DB query example
sum:=0
get first customer
where customer.name=“Shiver”
and customer.city =“Bronx”;
while DB_status = 0 do
begin
sum:=sum+customer.amount;
get next customer
where
customer.name = “Shiver”
and customer.city =“Bronx”;
end
print(sum);
28
Relational DBMS
The late 70’ and early 80’






E.F.Codd, 1970 CACM paper, “The Relational
Data Model”
Relational Algebra & Calculus
The Spartan Simplicity!
SQL: Structured Query Language
System/R - 1976, first commercial RDBMS
Ingres
- 1976, first academic RDBMS
29
Relational DBMS example
name
street
city
amount
Lowerly
Maple
Queens
900
Shiver
North
Bronx
556
Shiver
North
Bronx
647
Hodges
SideHill
Brooklyn
801
Hodges
SideHill
Brooklyn
647
Select sum(amount)
from customer
where customer.name = “Shiver”
and customer.city=“Bronx”;
30
The advent of new DB application in 80’ (1/4)
CAD/CASE/CAM: massive design data
Artificial Intelligence: Expert systems
Telecommunication
Multimedia: IMAGE, TEXT, AUDIO, VIDEO, etc.
Rich data model & DBMS function
31
The advent of new DB application in 80’ (2/4)
 Massive design data in CAD/CASE/CAM
name
street
city
amount
Lowerly
Maple
Queens
900
Shiver
North
Bronx
556
Shiver
North
Bronx
647
Hodges
SideHill
Brooklyn
801
Hodges
SideHill
Brooklyn
647
Previous DATA
CAD DATA
32
The advent of new DB application in 80’(3/4)
 Artificial Intelligence: Expert systems
Vehicle disorder
Symptoms
name
street
city
amount
Lowerly
Maple
Queens
900
Shiver
North
Bronx
556
Shiver
North
Bronx
647
Hodges
SideHill
Brooklyn
801
Hodges
SideHill
Brooklyn
647
Control
Break
Drive
Handle
Gearbox
Engine
conclusion : engine ECU disorder
Previous DATA
Expertise DATA
33
The advent of new DB application in 80’(4/4)
 Multimedia: image, audio, video
name
street
city
amount
Lowerly
Maple
Queens
900
Shiver
North
Bronx
556
Shiver
North
Bronx
647
Hodges
SideHill
Brooklyn
801
Hodges
SideHill
Brooklyn
647
Previous DATA
MULTIMEDIA DATA
34
17
Advent of Object Oriented DBMS
The mid 80’ ~ mid 90’
Research prototype
ORION, POSTGRES, ENCORE/ObServer
Commercial Products:
O2, ObjectStore, Objectivity, Versant, etc.
ODMG-93 OODB standard
35
Feature of Object Oriented DBMS
Persistent programming language
Long-duration transaction
Large object
Semantic Data Model extension
Version & Composite object
Object-Oriented Paradigm support
object, object identity,
go back to traversal Network DB?
Class hierarchy, inheritance
36
Object Oriented Database example
name
street
city
amount
Lowerly
Maple
Queens
900
Shiver
North
Bronx
556
Shiver
North
Bronx
647
Hodges
SideHill
Brooklyn
801
Hodges
SideHill
Brooklyn
647
Is-part-of relationship
37
ISA relationship
OQL query of Object Oriented DBMS
select sum(customer.deposit.balance)
from Customer customer
where customer.name = “Shiver”
and customer.deposit.branch.city = “Bronx”;
38
Object Relational DBMS
1980 – 1985: ORDBMS Research Prototype
PostGres by UC Berkeley
System/R Engineering Extension
Relational DBMS with Object Oriented function
Extension within SQL & Tables!
The early 90’: OODBMS (Illustra, UniSQL, Mattise) downfall
1997, Big3 ORDBMS advent
39
Object Relational Database example
name
street
city
amount
Lowerly
Maple
Queens
900
Shiver
North
Bronx
556
Shiver
North
Bronx
647
Hodges
SideHill
Brooklyn
801
Hodges
SideHill
Brooklyn
647
40
Principal functions of Object Relational DBMS
Abstract
Data Type
support
LOB
(large object)
support
User defined
type &
Stored procedure
support
SQL procedure
extension
Application
domain specific
extension support
Type
Inheritance
support
41
Rule/trigger
System support
Product of Object Relational DBMS
ORACLE-8 Universal Server
Informix Universal Server
IBM DB2 Universal Database
Sybase Adaptive Server
Microsoft Access
42
Contents
•
•
•
•
A general survey of DBMS
History of DBMS
Database market share
The current
Research DBMS
in IDB Lab.trend
43
DBMS market share(1/2)
 Worldwide market share for biggest sellers
of corporate databases, 2005
15%
48.6%
22%
Oracle
IBM
Microsoft
Source: Gartner
Dataquest
44
DBMS market share(2/2)
 Worldwide sales for biggest sellers of
corporate databases, 2005
7
6
5
4
Oracle
IBM
Microsoft
$6.7
3
2
1
$3.0
$2.1
0
billions of dollars
Source: Gartner
Dataquest
45
Domestic DBMS market share
source : Report for database industry and perspective in Korea, 2004
46
Domestic DBMS market sales
 Domestic market share for biggest sellers of
corporate databases, 2004
60
50
40
30
₩57.2
₩25.1
20
₩45.3
10
0
billions of won
Source: Gartner Dataquest,
South Korea(2005)
47
Oracle
IBM
Microsoft
Preference in domestic market
Others 3%
source : Report for database industry and perspective in Korea, 2004
48
Contents
•
•
•
•
A general survey of DBMS
History of DBMS
Database market share
The current
Research DBMS
in IDB Lab.trend
49
XML Technology(1/2)
 The late 90’ and now
 What is XML1)?



Developed by the W3C
Semi-structured text for dissemination and publication
Self-describing
HTML
XML
<tr>
<td>
<font color=“red”>이름
</font>
</td>
<td>홍길동</td>
<person>
<name>홍길동</name>
<city>서울</city>
<age>20</age>
…
</person>
</tr>
<tr>
<td>
<b>주소</b>
</td>
Tagging for structure
and semantics
Tagging for Display
1) eXtensible Markup Language
50
XML Technology(2/2)
 Why XML

Standard data format for storing and exchange
XML
<person>
<name>홍길동</name>
<city>서울</city>
…
</person>
51
Semantic Web(1/2)
 기존의 web:

1) 환자가 검색 엔진에서 치과를 검색
2) 자신의 장소와 가까운 치과의 홈페이지를 찾음
3) 치과의 진료 스케줄을 확인하고 자신과 시간이 맞을 경우 예약

예약을 하기까지 다수의 반복 작업 필요


search engine
Patient
52
clinic’s web
pages
appointment
schedule
Semantic Web(2/2)
 Semantic web:



Semantic web으로 다음의 정보가 구축된 상태
 환자의 개인 스케줄, 각 치과의 위치, 진료 과목, 진료
1) 환자는 software agent에게 예약 요청
2) 각 병원의 홈페이지의 내용이나 구조가 다르더라도 software agent가 환자와
치과의 시멘틱웹 데이타를 분석, 환자의 시간과 위치에서 진료 가능한 치과를 예약해 줌
clinic’s web pages
(with Semantic web)
Patient
Software
Agents
53
appointment
schedule
Knowledge discovery
Database Data
Warehouse
useful,
interesting
hidden
information
Knowledge Discovery
Processing: Data mining
apply
decision
54
Data warehouse(1/2)
 Storing data of time
 Analyze the pattern in times
 Summarized data
 Observation data in various view point
 Non-volatile
Need for new data model:
Dimensional model
55
Data warehouse(2/2)
Sales Volumes
Jan
time
Product
Feb
Mar
Wong
Dewitt
Stonebreaker
Sales person
56
A
B
C
Data mining(1/2)
 넓은 의미

대상이 되는 데이터를 추출하는 단계에서부터 발견된
패턴을 정제, 해석한 후 사람이 이해할 수 있는
언어[텍스트, 그림, 그래픽]로 표현하는 단계까지를
포함
 좁은 의미

대용량 데이터에서 흥미 있고 사람이 이해할 수 있는
패턴과 규칙성을 추출하는 여러 가지 알고리즘[data
mining algorithm]또는 소프트웨어의 사용
57
Data mining(2/2)
패턴발견
빵과 과자를 사는 사람의 80%는 우유를 같이 산다
분유와 기저귀를 사는 사람의 74%는 맥주를 같이 산다
의사결정
맥주 소비는 분유와 기저귀 소비에 영향을 미침
빵과 과자 가격 인상은 우유 소비에 영향을 미침
업무적용
상품 진열대에 (빵, 과자, 우유), (분유, 기저귀, 맥주)를 같이 진열
우유 소비를 조절하기 위해 빵,과자 가격을 조정
58
The emerging challenges
Rapid spread of
Web and Internet
Millions of users
Connected on Web
Environment
Rapid development
of H/W
Disks and RAM size
Access time
Bandwidth
New areas emerging
Sensor Streams, Scientific data
Uncertain data, Information privacy
59
The Emerging Challenges
 Sophisticated Data type support
New DBMS
sound
video
Structured data
temporal
image
Unstructured data
spatial
60
The Emerging Challenges
 Sensor streams



Battery constraint,
communication cost
Rapidly changing configuration
(Sensors die or disconnect)
Complex forms of information
integration
“Locate a person from the heat,
sound and vibration sensors”
61
The Emerging Challenges
 Reasoning about uncertain data



Scientific measurement errors
Location data for moving objects
Sequence, image and text similarity
Scientific measurement
Location data
62
Sequence data
The Emerging Challenges
 Personalization


Different person,
different answer
WEB CRM example
Web Site Entry
Page Views
Event:
Select product
Insert item to Shopping Cart
Recommendation Engine
Personalized View of
Recommendation
63
The Emerging Challenges
 Privacy



How to support the protection of personal or sensitive
information
Access by user and usage
Include purpose description in query
Name | income | … We just want the statistics of the
income not the personal information !
Alice | 25K | …
John | 40K | …
64
Related documents