Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University 1 Contents • • • • A general survey of DBMS History of DBMS Database market share The current Research DBMS in IDB Lab.trend 2 What is a Database?(1/10) DBMS A software system which provides the environment enables to store and retrieve massive data effectively 3 What is a Database?(2/10) A large collection of data Data + Programs Database STORE 4 Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data What is a Database?(3/10) Information about register and course of 40,000 students of the Seoul Natl’ Univ. 45 courses, 10K records per student course term register grade prof course term register grade prof 10K Byte * 40,000 = 400M Byte Others: library, health center, S-card, … 5 What is a Database?(4/10) Information of SAT management profile answer rate ranking … Profile Answer Rate ranking … 8K records per student Year 2006: 550,000 Year 2005: 570,000 8K Byte * 550,000 = 4.4G Byte (109) 6 What is a Database?(5/10) Information of mobile phone phone number station time … phone number station time … 60KB record per one 39M * 60 Byte * 5calls/day * 365 days = 4T Byte Korea 2006.7 7 China 370M in 2005 What is a Database?(6/10) Information of resident registration 10KB record per one SSN name addr domicile … SSN name addr domicile … 10K Byte * 470 M = 5T Byte (47millions) 8 What is a Database?(7/10) Google database 8billion’s Websites, 2billion’s indexing terminology management Usenet archive = 700 Million messages * 20KB/message = 9 14 TB What is a Database?(8/10) Hubble space telescope data from Mars Data constructed by 2005 : over 12 TB Constructing and sending 3~5GB’s data abroad daily 10 What is a Database?(9/10) NCBI (National Center for Biotechnology Information) GenBank • management of information of 165,000 species • add 3million’s new DNA sequence monthly 11 What is a Database?(10/10) Genome map of Koreans Venture “MacroGen” SNU Medical School Early version: 900G Byte Final product: 15T Byte 12 What do we do with Database?(1/2) Record search Retrieve math grade of the student whose SSN is “840101-12121” 740,000 * 5 records = 3.7 M records 12ms to fetch a record and check content 3.7M * 12ms = 44.4Kseconds = over 12 hours Statistical processing for population census Search for the correlation between gene and disease If we use DBMS, it will be less than 0.1sec! DBMS 13 Search for the purchase pattern on customer groups What do we do with Database?(2/2) Most (all?) computing applications use some type of a database CRM ERP Data Warehouse MIS, ERP OLTP EDPS Database Database Database Database 14 Database Management System (DBMS) (1/3) Warehouse 15 Database Management System (DBMS) (2/3) Warehouse Warehouse keeper 16 Database Management System (DBMS) (3/3) Database Management of orders on-line profile product customer user DBMS Management of wages stock Management of manager info. Application 17 sale DBMS Architecture naive users application programmers casual users database administrator application programs system calls query database scheme data manipulation language pre-compiler application programs object query processor database manager file manager Disk storage 18 data definition language compiler DBMS A Sample Relational Database 19 SQL SQL: widely used commercial query language E.g. find select from where the name of the customer with customer-id 192-83-7465 customer.customer-name customer customer.customer-id = ‘192-83-7465’ E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465 select account.balance from depositor, account where depositor.customer-id = ‘192-83-7465’ and depositor.account-number = account.account-number 20 Major Commercial DBMS in 2006(1/3) Market Leader Stability Mass storage literacy Famous CEO 21 10g Major Commercial DBMS in 2006(2/3) Integration with Window NT/XP PC based (Windows NT) Microsoft!!! 22 Major Commercial DBMS in 2006(3/3) Stability Mainframe Informix purchase IBM 23 Database Companies in the World 24 Contents • • • • A general survey of DBMS History of DBMS Database market share The current Research DBMS in IDB Lab.trend 25 Hierarchical, Network DBMS The early 70’ IMS (IBM), System/2000(MRA) DMS 1100 (Sperry), Total (Cincom) Advantage: quick data access using link Drawback: impossible to make out independent application 26 Network Database example Root Record Customer records Lowery Maple Queens Hodges Shiver Amount records 900 North 556 SideHill Brooklyn Bronx 647 647 Query What’s the total balance of Mr. Shiver in Bronx? 27 801 Network DB query example sum:=0 get first customer where customer.name=“Shiver” and customer.city =“Bronx”; while DB_status = 0 do begin sum:=sum+customer.amount; get next customer where customer.name = “Shiver” and customer.city =“Bronx”; end print(sum); 28 Relational DBMS The late 70’ and early 80’ E.F.Codd, 1970 CACM paper, “The Relational Data Model” Relational Algebra & Calculus The Spartan Simplicity! SQL: Structured Query Language System/R - 1976, first commercial RDBMS Ingres - 1976, first academic RDBMS 29 Relational DBMS example name street city amount Lowerly Maple Queens 900 Shiver North Bronx 556 Shiver North Bronx 647 Hodges SideHill Brooklyn 801 Hodges SideHill Brooklyn 647 Select sum(amount) from customer where customer.name = “Shiver” and customer.city=“Bronx”; 30 The advent of new DB application in 80’ (1/4) CAD/CASE/CAM: massive design data Artificial Intelligence: Expert systems Telecommunication Multimedia: IMAGE, TEXT, AUDIO, VIDEO, etc. Rich data model & DBMS function 31 The advent of new DB application in 80’ (2/4) Massive design data in CAD/CASE/CAM name street city amount Lowerly Maple Queens 900 Shiver North Bronx 556 Shiver North Bronx 647 Hodges SideHill Brooklyn 801 Hodges SideHill Brooklyn 647 Previous DATA CAD DATA 32 The advent of new DB application in 80’(3/4) Artificial Intelligence: Expert systems Vehicle disorder Symptoms name street city amount Lowerly Maple Queens 900 Shiver North Bronx 556 Shiver North Bronx 647 Hodges SideHill Brooklyn 801 Hodges SideHill Brooklyn 647 Control Break Drive Handle Gearbox Engine conclusion : engine ECU disorder Previous DATA Expertise DATA 33 The advent of new DB application in 80’(4/4) Multimedia: image, audio, video name street city amount Lowerly Maple Queens 900 Shiver North Bronx 556 Shiver North Bronx 647 Hodges SideHill Brooklyn 801 Hodges SideHill Brooklyn 647 Previous DATA MULTIMEDIA DATA 34 17 Advent of Object Oriented DBMS The mid 80’ ~ mid 90’ Research prototype ORION, POSTGRES, ENCORE/ObServer Commercial Products: O2, ObjectStore, Objectivity, Versant, etc. ODMG-93 OODB standard 35 Feature of Object Oriented DBMS Persistent programming language Long-duration transaction Large object Semantic Data Model extension Version & Composite object Object-Oriented Paradigm support object, object identity, go back to traversal Network DB? Class hierarchy, inheritance 36 Object Oriented Database example name street city amount Lowerly Maple Queens 900 Shiver North Bronx 556 Shiver North Bronx 647 Hodges SideHill Brooklyn 801 Hodges SideHill Brooklyn 647 Is-part-of relationship 37 ISA relationship OQL query of Object Oriented DBMS select sum(customer.deposit.balance) from Customer customer where customer.name = “Shiver” and customer.deposit.branch.city = “Bronx”; 38 Object Relational DBMS 1980 – 1985: ORDBMS Research Prototype PostGres by UC Berkeley System/R Engineering Extension Relational DBMS with Object Oriented function Extension within SQL & Tables! The early 90’: OODBMS (Illustra, UniSQL, Mattise) downfall 1997, Big3 ORDBMS advent 39 Object Relational Database example name street city amount Lowerly Maple Queens 900 Shiver North Bronx 556 Shiver North Bronx 647 Hodges SideHill Brooklyn 801 Hodges SideHill Brooklyn 647 40 Principal functions of Object Relational DBMS Abstract Data Type support LOB (large object) support User defined type & Stored procedure support SQL procedure extension Application domain specific extension support Type Inheritance support 41 Rule/trigger System support Product of Object Relational DBMS ORACLE-8 Universal Server Informix Universal Server IBM DB2 Universal Database Sybase Adaptive Server Microsoft Access 42 Contents • • • • A general survey of DBMS History of DBMS Database market share The current Research DBMS in IDB Lab.trend 43 DBMS market share(1/2) Worldwide market share for biggest sellers of corporate databases, 2005 15% 48.6% 22% Oracle IBM Microsoft Source: Gartner Dataquest 44 DBMS market share(2/2) Worldwide sales for biggest sellers of corporate databases, 2005 7 6 5 4 Oracle IBM Microsoft $6.7 3 2 1 $3.0 $2.1 0 billions of dollars Source: Gartner Dataquest 45 Domestic DBMS market share source : Report for database industry and perspective in Korea, 2004 46 Domestic DBMS market sales Domestic market share for biggest sellers of corporate databases, 2004 60 50 40 30 ₩57.2 ₩25.1 20 ₩45.3 10 0 billions of won Source: Gartner Dataquest, South Korea(2005) 47 Oracle IBM Microsoft Preference in domestic market Others 3% source : Report for database industry and perspective in Korea, 2004 48 Contents • • • • A general survey of DBMS History of DBMS Database market share The current Research DBMS in IDB Lab.trend 49 XML Technology(1/2) The late 90’ and now What is XML1)? Developed by the W3C Semi-structured text for dissemination and publication Self-describing HTML XML <tr> <td> <font color=“red”>이름 </font> </td> <td>홍길동</td> <person> <name>홍길동</name> <city>서울</city> <age>20</age> … </person> </tr> <tr> <td> <b>주소</b> </td> Tagging for structure and semantics Tagging for Display 1) eXtensible Markup Language 50 XML Technology(2/2) Why XML Standard data format for storing and exchange XML <person> <name>홍길동</name> <city>서울</city> … </person> 51 Semantic Web(1/2) 기존의 web: 1) 환자가 검색 엔진에서 치과를 검색 2) 자신의 장소와 가까운 치과의 홈페이지를 찾음 3) 치과의 진료 스케줄을 확인하고 자신과 시간이 맞을 경우 예약 예약을 하기까지 다수의 반복 작업 필요 search engine Patient 52 clinic’s web pages appointment schedule Semantic Web(2/2) Semantic web: Semantic web으로 다음의 정보가 구축된 상태 환자의 개인 스케줄, 각 치과의 위치, 진료 과목, 진료 1) 환자는 software agent에게 예약 요청 2) 각 병원의 홈페이지의 내용이나 구조가 다르더라도 software agent가 환자와 치과의 시멘틱웹 데이타를 분석, 환자의 시간과 위치에서 진료 가능한 치과를 예약해 줌 clinic’s web pages (with Semantic web) Patient Software Agents 53 appointment schedule Knowledge discovery Database Data Warehouse useful, interesting hidden information Knowledge Discovery Processing: Data mining apply decision 54 Data warehouse(1/2) Storing data of time Analyze the pattern in times Summarized data Observation data in various view point Non-volatile Need for new data model: Dimensional model 55 Data warehouse(2/2) Sales Volumes Jan time Product Feb Mar Wong Dewitt Stonebreaker Sales person 56 A B C Data mining(1/2) 넓은 의미 대상이 되는 데이터를 추출하는 단계에서부터 발견된 패턴을 정제, 해석한 후 사람이 이해할 수 있는 언어[텍스트, 그림, 그래픽]로 표현하는 단계까지를 포함 좁은 의미 대용량 데이터에서 흥미 있고 사람이 이해할 수 있는 패턴과 규칙성을 추출하는 여러 가지 알고리즘[data mining algorithm]또는 소프트웨어의 사용 57 Data mining(2/2) 패턴발견 빵과 과자를 사는 사람의 80%는 우유를 같이 산다 분유와 기저귀를 사는 사람의 74%는 맥주를 같이 산다 의사결정 맥주 소비는 분유와 기저귀 소비에 영향을 미침 빵과 과자 가격 인상은 우유 소비에 영향을 미침 업무적용 상품 진열대에 (빵, 과자, 우유), (분유, 기저귀, 맥주)를 같이 진열 우유 소비를 조절하기 위해 빵,과자 가격을 조정 58 The emerging challenges Rapid spread of Web and Internet Millions of users Connected on Web Environment Rapid development of H/W Disks and RAM size Access time Bandwidth New areas emerging Sensor Streams, Scientific data Uncertain data, Information privacy 59 The Emerging Challenges Sophisticated Data type support New DBMS sound video Structured data temporal image Unstructured data spatial 60 The Emerging Challenges Sensor streams Battery constraint, communication cost Rapidly changing configuration (Sensors die or disconnect) Complex forms of information integration “Locate a person from the heat, sound and vibration sensors” 61 The Emerging Challenges Reasoning about uncertain data Scientific measurement errors Location data for moving objects Sequence, image and text similarity Scientific measurement Location data 62 Sequence data The Emerging Challenges Personalization Different person, different answer WEB CRM example Web Site Entry Page Views Event: Select product Insert item to Shopping Cart Recommendation Engine Personalized View of Recommendation 63 The Emerging Challenges Privacy How to support the protection of personal or sensitive information Access by user and usage Include purpose description in query Name | income | … We just want the statistics of the income not the personal information ! Alice | 25K | … John | 40K | … 64