Download Data, Dataset and Database

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft SQL Server wikipedia , lookup

IMDb wikipedia , lookup

SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Data, Dataset and Database
Dr. Saed Sayad
2010
[email protected]
http://chem-eng.utoronto.ca/~datamining/
1
Data, Dataset and Database
• Data is information typically the results of
measurement (numerical) or counting (categorical).
• Dataset is a collection of data, usually presented in
tabular form. Each column represents a particular
variable and each row corresponds to a given
member of the data.
• Database collects, stores and manages information
so users can retrieve, add, update or remove such
information.
http://chem-eng.utoronto.ca/~datamining/
2
Data Types
Ratio
Measurement
Numerical
Interval
Data
Ordinal
Categorical
Counting
Nominal
http://chem-eng.utoronto.ca/~datamining/
3
Data Sources
Text Files
Relational
Database
Table
Multi-dimensional
Database
Entities
File
Attributes
Record, Field,
Row and Col
Index
Dimension, Level,
Measurement
Methods
Select, Insert,
Read, Write Update,
Delete
Drill down, Drill
up, Drill through
Language
-
MDX
SQL
http://chem-eng.utoronto.ca/~datamining/
Cube
4
Dataset
Columns/Fields
Rows/Records
Unique Key
ID
Outlook
Temp
Humidity
Windy
Play Golf
1
Rainy
85
92
False
No
2
Rainy
80
88
True
No
3
Overcast
83
86
False
Yes
4
Sunny
70
80
False
Yes
5
Sunny
68
?
False
Yes
6
Sunny
65
58
True
No
7
Overcast
64
62
True
Yes
8
Rainy
72
95
?
No
9
Rainy
?
70
False
Yes
10
Sunny
75
72
False
Yes
11
Rainy
75
74
True
Yes
12
?
72
78
True
Yes
13
Overcast
81
66
False
Yes
14
Sunny
71
79
True
No
http://chem-eng.utoronto.ca/~datamining/
5
Dataset – Text (Flat) File
http://chem-eng.utoronto.ca/~datamining/
6
Dataset – Table (Database)
http://chem-eng.utoronto.ca/~datamining/
7
SQL Data Definition Language (DDL)
The Data Definition Language (DDL) permits database tables to
be created, altered or deleted. We can also define indexes
(keys), specify links between tables, and impose constraints
between database tables.
The most important DDL statements are:
o CREATE TABLE - creates a new table
o ALTER TABLE – alters a table
o DROP TABLE - deletes a table
o CREATE INDEX - creates an index
o DROP INDEX - deletes an index
Data Manipulation Language (DML)
• DML is a language which enables users to access and
manipulate data.
• DML main functions:
o SELECT : retrieval of data from the database.
o INSERT INTO: insertion of new data into the database.
o UPDATE: modification of data in the database.
o DELETE: deletion of data in the database.
• Structural Query Language (SQL) is a computer language
designed for manipulating and managing data.
http://chem-eng.utoronto.ca/~datamining/
9
Tables Relationship
One to One and One to Many
1 to N
Customers
Transactions
1 to 1
Customers
Loyalty Score
http://chem-eng.utoronto.ca/~datamining/
10
Tables Relationship
One to One and One to Many
Customers
Transactions
Customer ID
Age
Married
1
25
N
2
38
Y
3
46
Y
1
1
1
Customers Loyalty Score
Transaction Customer
ID
ID
Customer ID
Score
Club
1
653
Silver
2
890
Gold
3
230
Bronze
N
Purchased
Amount
1
1
250
2
1
125
3
2
100
4
2
85
5
2
24
6
3
400
http://chem-eng.utoronto.ca/~datamining/
11
Copy and Aggregate
Customers
Copy
Aggregate
Transactions
http://chem-eng.utoronto.ca/~datamining/
12
Data Preparation - Copy
1
1
Purchased
Amount
250
2
1
125
25
N
3
2
100
38
Y
4
2
85
38
Y
5
2
24
38
Y
6
3
400
46
Y
Transaction ID Customer ID
Age
Married
25
N
http://chem-eng.utoronto.ca/~datamining/
13
Data Preparation - Aggregate
Customer ID
Age
Married
1
2
3
25
38
46
N
Y
Y
Purchased
Count
2
3
1
http://chem-eng.utoronto.ca/~datamining/
Purchased
Total
375
209
400
14
Aggregate Functions
Count
Categorical
Count%
Aggregation
Count, Sum
Numeric
Mean, Std
Min, Max
http://chem-eng.utoronto.ca/~datamining/
15
Data Preparation - Summary
One Row per Subject
http://chem-eng.utoronto.ca/~datamining/
16
Questions?
http://chem-eng.utoronto.ca/~datamining/
17