Download Database overview

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CS 292 Special topics on
Big Data
Yuan Xue
([email protected])
Part I Relational Database
Yuan Xue
([email protected])
Discussion

Did you ever encounter a data management problem?
 Experimental data from a homework?
 Personal data?
 Other data?

How did you manage your data?
Database

Database: An integrated collection of related data



Usually stored on secondary storage (as files)
Also in-memory database
Examples of databases




Vanderbilt student database, course registration and grading database (backend of
YES);
Amazon’s products and customer database; Ebay’s products and transaction
database;
Facebook’s user and message database;
Data
And more…
Database
Data
Database Management System (DBMS)

DBMS: A collection of software/programs



Designed to assist in creating, and managing database
Support defining, constructing, manipulating, sharing databases
Examples of DBMSs

Relational DBMSs: Commercial: Oracle, IBM (DB2, Informix), Microsoft (SQL
Server, Access); Open source: MySQL, PostgreSQL


NoSQL and newSQL: BigTable/Hbase, Cassandra, Redis, Riak, MongoDB,
Dynamo, DynamoDB, Spanner
Other: object-oriented database, etc
Database System Environment
With DBMS
Without DBMS
Users
Application
Users
Application
Application
Application
DBMS
Data
Data
Database
Benefit of DBMS







Development convenience
 Reduce application development time
Data independence:
 Application programs not dependent on data representation and storage details
Data integrity and consistency:
 Enforce consistency constraints on data
Data sharing and Concurrency control
 Data is better utilized (discovered and reused), redundancy of data is minimized
 Avoid undesirable race conditions that arise with simultaneous access/updates to
data
Centralized control
 DBA tunes the database to balance user's needs
Security
 Prevent unauthorized access.
Crash recovery
 Ensure the integrity of data in the presence of failures
Example Application – MiniTwitter
 What
data do we need?
 What
capabilities on the data do we need?
Example Application – MiniTwitter

What data do we need?



Information required to record System State
User profile info: ID, password, email, display name, picture, people I follow,
people who follow me.
Tweets: author, time, content (topic), replies (author, time, content),
favorite (author, time),
What capabilities on the data do we need?
Operations that update and retrieve System State






Register a new user
Follow/unfollow a user (approve following request)
post/delete a tweet
Read/update in real-time all the tweets from the people I follow
Show the number of tweets I posted, #people following me, #people I
follow
Trend information
Three-Level Architecture

Key question: how to describe data?
Conceptual Data Model
Logic Data Model
Physical Data Model
Entities, attributes, relationships
(entity-relationship model)
Coming next
Storage, data structure
Database Model
Logic Data Model: logical structure of data organization
 Types of data model





Relational model:
 table
Semistructured data model (XML/JSON)
 tree
Various data models in NoSQL systems
 key-value pair
 column-family
 graph
Object-oriented model
 object, class, inheritance
 a layer over relational model
Relational Data Model
Schema
Schema
– structural
= structural
description
description
of relations
of relations
in database
in database
Instance
Instance
– data
= actual
in thecontents
database at
at given
a given
point
point
in in
time
time
ID
Name
Email
Password
Alice00
Alice
[email protected]
Aadf1234
Bob2013
Bob
[email protected]
qwer6789
Relational Data Model
Database
of named
relationsof(or
tables)in database
Schema==set
structural
description
relations
Each
relation
has
a set
of named
attributes
(or
columns)
Instance
=
actual
contents
at
given
point
in
time
Each tuple (or row) has a value for each attribute
Each attribute has a type (or domain)
ID
Name
Email
Password
Alice00
Alice
[email protected]
Aadf1234
Bob2013
Bob
[email protected]
qwer6789
Discussion
How to design relations (tables) for MiniTwitter
 What are the aspects we need to consider?

Design – Version 0.1
Pretending to be md5 hashcode ;)
User
Follow
ID
Name
Email
Password
Followee
Follower
Timestamp
Alice00
Alice
alice00@gmail
.com
Aadf1234
Alice00
Bob2013
2011.1.1.3.6.6
Bob2013
Bob
bob13@gmail.
com
qwer6789
Bob2013
Cathy123
2012.10.2.6.7.7
Cathy123
Cathy
cath@vandy
Tyuoa~!@
Alice00
Cathy123
2012.11.1.2.3.3
Cathy123
Alice00
2012.11.1.2.6.6
Bob2013
Alice00
2012.11.1.2.6.7
Tweet
ID
Timestamp
Author
Content
0001
2013.12.20.11
.20.2
Alice00
Hello
0002
2013.12.20.11
.23.6
Bob2013
Nice weather
0003
2014.1.6.1.25.
2
Alice00
@Bob
Not sure..
Relational Data Model
Key – attribute whose value is unique in each tuple
Or set of attributes whose combined values are unique
User
Follow
ID
Name
Email
Password
Alice00
Alice
alice00@gm
ail.com
Aadf1234
Bob2013
Bob
bob13@gmai
l.com
Cathy123
Cathy
cath@vandy
ID
Follower
timestamp
Alice00
Bob2013
2011.1.1.3.6.6
qwer6789
Bob2013
Cathy123
2012.10.2.6.7.7
Tyuoa~!@
Alice00
Cathy123
2012.11.1.2.3.3
Cathy123
Alice00
2012.11.1.2.6.6
Bob2013
Alice00
2012.11.1.2.6.7
Tweet
ID
timestamp
Author
Content
0001
2013.12.20.1
1.20.2
Alice00
Hello
0002
2013.12.20.1
1.23.6
Bob2013
Nice
weather
0003
2014.1.6.1.2
5.2
Alice00
@Bob
Not sure..
Relational Data Model
Key – attribute whose value is unique in each tuple
Or set of attributes whose combined values are unique
User
Follow
ID
Name
Email
Password
Alice00
Alice
alice00@gm
ail.com
Aadf1234
Bob2013
Bob
bob13@gmai
l.com
Cathy123
Cathy
cath@vandy
ID
Follower
timestamp
Alice00
Bob2013
2011.1.1.3.6.6
qwer6789
Bob2013
Cathy123
2012.10.2.6.7.7
Tyuoa~!@
Alice00
Cathy123
2012.11.1.2.3.3
Cathy123
Alice00
2012.11.1.2.6.6
Bob2013
Alice00
2012.11.1.2.6.7
Tweet
ID
timestamp
Author
Content
0001
2013.12.20.1
1.20.2
Alice00
Hello
0002
2013.12.20.1
1.23.6
Bob2013
Nice
weather
0003
2014.1.6.1.2
5.2
Alice00
@Bob
Not sure..
Relational Data Model
Foreign Key – attribute or set of attributes in one table that point to
the primary key of another
User
Follow
ID
Name
Email
Password
Alice00
Alice
alice00@gm
ail.com
Aadf1234
Bob2013
Bob
bob13@gmai
l.com
Cathy123
Cathy
cath@vandy
ID
Follower
timestamp
Alice00
Bob2013
2011.1.1.3.6.6
qwer6789
Bob2013
Cathy123
2012.10.2.6.7.7
Tyuoa~!@
Alice00
Cathy123
2012.11.1.2.3.3
Cathy123
Alice00
2012.11.1.2.6.6
Bob2013
Alice00
2012.11.1.2.6.7
Tweet
ID
timestamp
Author
Content
0001
2013.12.20.1
1.20.2
Alice00
Hello
0002
2013.12.20.1
1.23.6
Bob2013
Nice
weather
0003
2014.1.6.1.2
5.2
Alice00
@Bob
Not sure..
Relational Data Model
Foreign Key – attribute or set of attributes in one table that point to
the primary key of another
User
Follow
ID
Name
Email
Password
Alice00
Alice
alice00@gm
ail.com
Aadf1234
Bob2013
Bob
bob13@gmai
l.com
Cathy123
Cathy
cath@vandy
ID
Follower
timestamp
Alice00
Bob2013
2011.1.1.3.6.6
qwer6789
Bob2013
Cathy123
2012.10.2.6.7.7
Tyuoa~!@
Alice00
Cathy123
2012.11.1.2.3.3
Cathy123
Alice00
2012.11.1.2.6.6
Bob2013
Alice00
2012.11.1.2.6.7
Tweet
ID
timestamp
Author
Content
0001
2013.12.20.1
1.20.2
Alice00
Hello
0002
2013.12.20.1
1.23.6
Bob2013
Nice
weather
0003
2014.1.6.1.2
5.2
Alice00
@Bob
Not sure..
More on Relational Data Model

NULL – special value for “unknown” or “undefined”

Relational Model Constraint Summary
 Domain constraints
 Key constraints
 Integrity contraints
Relational Data Model and Database

Relation Model
 Simple representation
 Efficient implementation
 Driven by relational algebra and relational calculus
 Up-front definition of schemas and types that the data will thereafter
adhere to
 High-level simple yet expressive query language

Relational databases
 Proven success for both open source and proprietary systems
 Provide full ACID guarantees.
 SQL as widely used and standard way of database interaction
Creating and Using a Relational Database

Steps in creating and using a (relational) database
1. Design schema (using DDL – data definition language)
2. Initialization: “Bulk load” initial data
3. Operation: execute queries and modifications
Meta-data: database definition
Data
Data
Related documents