Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Functional Database Model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Relational model wikipedia , lookup
Data, Dataset and Database Dr. Saed Sayad 2010 [email protected] http://chem-eng.utoronto.ca/~datamining/ 1 Data, Dataset and Database • Data is information typically the results of measurement (numerical) or counting (categorical). • Dataset is a collection of data, usually presented in tabular form. Each column represents a particular variable and each row corresponds to a given member of the data. • Database collects, stores and manages information so users can retrieve, add, update or remove such information. http://chem-eng.utoronto.ca/~datamining/ 2 Data Types Ratio Measurement Numerical Interval Data Ordinal Categorical Counting Nominal http://chem-eng.utoronto.ca/~datamining/ 3 Data Sources Text Files Relational Database Table Multi-dimensional Database Entities File Attributes Record, Field, Row and Col Index Dimension, Level, Measurement Methods Select, Insert, Read, Write Update, Delete Drill down, Drill up, Drill through Language - MDX SQL http://chem-eng.utoronto.ca/~datamining/ Cube 4 Dataset Columns/Fields Rows/Records Unique Key ID Outlook Temp Humidity Windy Play Golf 1 Rainy 85 92 False No 2 Rainy 80 88 True No 3 Overcast 83 86 False Yes 4 Sunny 70 80 False Yes 5 Sunny 68 ? False Yes 6 Sunny 65 58 True No 7 Overcast 64 62 True Yes 8 Rainy 72 95 ? No 9 Rainy ? 70 False Yes 10 Sunny 75 72 False Yes 11 Rainy 75 74 True Yes 12 ? 72 78 True Yes 13 Overcast 81 66 False Yes 14 Sunny 71 79 True No http://chem-eng.utoronto.ca/~datamining/ 5 Dataset – Text (Flat) File http://chem-eng.utoronto.ca/~datamining/ 6 Dataset – Table (Database) http://chem-eng.utoronto.ca/~datamining/ 7 SQL Data Definition Language (DDL) The Data Definition Language (DDL) permits database tables to be created, altered or deleted. We can also define indexes (keys), specify links between tables, and impose constraints between database tables. The most important DDL statements are: o CREATE TABLE - creates a new table o ALTER TABLE – alters a table o DROP TABLE - deletes a table o CREATE INDEX - creates an index o DROP INDEX - deletes an index Data Manipulation Language (DML) • DML is a language which enables users to access and manipulate data. • DML main functions: o SELECT : retrieval of data from the database. o INSERT INTO: insertion of new data into the database. o UPDATE: modification of data in the database. o DELETE: deletion of data in the database. • Structural Query Language (SQL) is a computer language designed for manipulating and managing data. http://chem-eng.utoronto.ca/~datamining/ 9 Tables Relationship One to One and One to Many 1 to N Customers Transactions 1 to 1 Customers Loyalty Score http://chem-eng.utoronto.ca/~datamining/ 10 Tables Relationship One to One and One to Many Customers Transactions Customer ID Age Married 1 25 N 2 38 Y 3 46 Y 1 1 1 Customers Loyalty Score Transaction Customer ID ID Customer ID Score Club 1 653 Silver 2 890 Gold 3 230 Bronze N Purchased Amount 1 1 250 2 1 125 3 2 100 4 2 85 5 2 24 6 3 400 http://chem-eng.utoronto.ca/~datamining/ 11 Copy and Aggregate Customers Copy Aggregate Transactions http://chem-eng.utoronto.ca/~datamining/ 12 Data Preparation - Copy 1 1 Purchased Amount 250 2 1 125 25 N 3 2 100 38 Y 4 2 85 38 Y 5 2 24 38 Y 6 3 400 46 Y Transaction ID Customer ID Age Married 25 N http://chem-eng.utoronto.ca/~datamining/ 13 Data Preparation - Aggregate Customer ID Age Married 1 2 3 25 38 46 N Y Y Purchased Count 2 3 1 http://chem-eng.utoronto.ca/~datamining/ Purchased Total 375 209 400 14 Aggregate Functions Count Categorical Count% Aggregation Count, Sum Numeric Mean, Std Min, Max http://chem-eng.utoronto.ca/~datamining/ 15 Data Preparation - Summary One Row per Subject http://chem-eng.utoronto.ca/~datamining/ 16 Questions? http://chem-eng.utoronto.ca/~datamining/ 17