Download Advanced Data Engineering What is Data Engineering? Databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Relational model wikipedia , lookup

Big data wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
2016/6/14
Advanced Data Engineering
Haruo Yokota
Office: West 8(E)‐705 (ex. 3505)
Email: [email protected]
What is Data Engineering?
• An academic/research area managing “Data”
– How to store, search for, retrieve and process data efficiently
• The word “Data Engineering” has been used since 1980’s
– As society’s name (Japan) / conference’s name (US)
• Tightly related to “Databases”
– The concept of Data Engineering is much broader
• Other expressions:
– Management of Data (also name of a group in society)
– Data Intensive Processing (vs. Computation Intensive)
2016/6/14
Advance Data Engineering (©H.Yokota)
2
Load Balancer
Databases are everywhere
2016/6/14
Web Server
App
Server
Web Server
App
Server
Web Server
App
Server
:
:
Web Server
App
Server
DB Server
Advance Data Engineering (©H.Yokota)
Advanced Data Engineering (©H. Yokota)
3
1
2016/6/14
Targets of Data Engineering (DE)
• One of the most essential technologies for the information era
– We need data (information)
• Information is the source of power
– Alvin Toffler, “The Third Wave”, 1980
– We need to keep the data for a long time
– We need to retrieve the data efficiently
• The amount of information increase rapidly
– So called “Information Explosion” or “Big Data”
2016/6/14
Advance Data Engineering (©H.Yokota)
4
Units
•
•
•
•
•
•
•
•
•
•
2016/6/14
101 102 103 106 109 1012
1015
1018
1021
1024
Deka (da)
Hecto(h)
Kiro (k)
Mega (M)
Giga (G)
Tera (T)
Peta (P)
Exa (E)
Zetta (Z)
Yotta (Y)
•
•
•
•
•
•
•
•
•
•
10‐1
10‐2
10‐3
10‐6
10‐9
10‐12
10‐15
10‐18
10‐21
10‐24
Deci (d)
Centi (c)
Milli (m)
Mircro (μ)
Nano (n)
Pico (p)
Femto (f)
Atto (a)
Zepto (z)
Yocto (y)
Advance Data Engineering (©H.Yokota)
5
Big Data is 'Buzzword du Jour;‘
CS Academics 'Have the Best Job'
• Communications of the ACM
– Michael Stonebraker (MIT)
• Big data can mean:
– Big volumes of data, but "small analytics”
– Big analytics on big volumes of data
– Big velocity
– Big variety
• We can do many things based on database technologies
http://cacm.acm.org/blogs/blog-cacm/155468-what-does-big-data-mean/fulltext
2016/6/14
Advance Data Engineering (©H.Yokota)
Advanced Data Engineering (©H. Yokota)
6
2
2016/6/14
Activities in Data Engineering
• This research area is very active!
• Want to know activities of some research area
– Keep your eyes on conferences or other events of related societies
• Both international and domestic
– They reflect activities of the research area
– They are also good indicators to find recent trends of research
2016/6/14
Advance Data Engineering (©H.Yokota)
7
Top 3 Major Int’l Conferences
• SIGMOD: International Conference on Management Data – Annual conference from 1975 sponsored by ACM
• VLDB: International Conference on Very Large Data Bases
– Annual conference from 1975 sponsored by VLDB Endowment
• ICDE: International Conference on Data Engineering
– Annual conference from 1984 Sponsored by IEEE Computer Society
2016/6/14
Advance Data Engineering (©H.Yokota)
8
Other Int’l Conferences
• PODS: Principles of Database Systems
• EDBT: International Conference on Extending Database Technology • DASFAA: Database Systems for Advanced Applications • DEXA: Database and Expert Systems Applications
• ER: International Conference on Conceptual Modeling
– Formerly: International Conference on the Entity Relationship Approach
• ICDT: International Conference on Database Theory • ICDM: International Conference on Data Mining • …
There are many conference related to data engineering
2016/6/14
Advance Data Engineering (©H.Yokota)
Advanced Data Engineering (©H. Yokota)
9
3
2016/6/14
Japanese Related Organizations
• Related Organizations in Japan
– Special Interest Group on Database Systems (1973‐)
• In IPSJ (The Information Processing Society of Japan)
– Technical Group on Data Engineering (1986‐)
• In IEICE (The Institute of Electronics, Information and Communication Engineers)
– ACM SIGMOD Japan Chapter (1994‐)
– DBSJ: The Database Society of Japan (2002‐)
2016/6/14
Advance Data Engineering (©H.Yokota)
10
Related Domestic Events
• DEIM Forum (formerly DEWS from 1990)
– Data Engineering and Information Management
– DEIM2015: in Koriyama, Fukushima, 3/2‐4, 2015
• Presentation: 332, Poster: 191, Attendance: 545
– DEIM2016: in Fukuoka, 2/29‐3/2, 2016
• WebDB Forum (formerly DBWeb from 2000)
– WebDB2014: at Shibaura IT. 11/19‐20, 2014
• Submission: 44, Presentation: 26, Attendance: 298
– WebDB2015: at Shibaura IT. 11/24‐25, 2015
• SoC (Social Computing Symposium)
– SoC2015: at Recruit Tech. (Yaesu) 6/20, 2015
• Presentation: 12, Attendance: 127
• Workshops
• Tutorials
2016/6/14
Advance Data Engineering (©H.Yokota)
11
Dynamism of Research Area
• Disappeared (Several Years Ago)
– Materialized Views, E‐commerce
• Current Trend / New Area
– Data on the Cloud
– Social Media and Crowd
• Still Remain
– Data Mining, XML
• Continuously appeared
– Traditional Database Technologies
• Indexing, Query Processing, Storage
2016/6/14
Advance Data Engineering (©H.Yokota)
Advanced Data Engineering (©H. Yokota)
12
4
2016/6/14
In This Course
• Both Traditional DB & New DE Topics
– Data Warehouse & OLAP (Data Mining)
– Storage & Advanced Indexing
– Cost Estimation of Data Engineering Operations
– Parallel Database Processing
– Skew Handling
– Distributed Database Operations
– Databases in the Cloud
– NoSQL & Key Value Storage
– XML Database, RDF Database
2016/6/14
Advance Data Engineering (©H.Yokota)
13
Background Requirements
• Assuming basic knowledge of
– Relational Databases • Relational Algebra, SQL, etc.
– Transaction Processing Concept • ACID Property, Logging, etc.
– Computer Architecture
• Parallel & Distributed Computer Architecture
• Prepare Short Course of Databases
2016/6/14
Advance Data Engineering (©H.Yokota)
14
Reference Books
• Since these topics are up‐to‐date, there is no text book describing them. Just for reference.
• "Transaction Processing: Concept and Techniques,"
– Jim Gray and Andreas Reuter,
– Morgan Kaufmann Publishers, 1993
– ISBN: 1‐55860‐190‐2
• “An Introduction to Database Systems”, – C. J. Date, – Addison‐Wesley, 7th edition, 2000
– ISBN:0‐201‐38590‐2
• “A First Course In Database Systems”, – J. D. Ullman, J. Widom, – Prentice Hall, 1997, – ISBN: 0‐13‐861337‐0
2016/6/14
Advance Data Engineering (©H.Yokota)
Advanced Data Engineering (©H. Yokota)
15
5
2016/6/14
Additional Information
• Scheduled Class Cancelation
– The classes on Jun. 26 and July 12 are canceled
• Compensate Classes
– We will have classes on Aug. 5 and Aug. 9
• Evaluation
– Assignments (given in each class) and Final Report
• OCW
– http://www.ocw.titech.ac.jp/
2016/6/14
Advance Data Engineering (©H.Yokota)
Advanced Data Engineering (©H. Yokota)
16
6