* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Data and Knowledge Management
Survey
Document related concepts
Transcript
Chapter 10
Data and Knowledge Management
Agenda
•
•
•
•
•
•
Information processing
Database
Data Administrator
The DBMS
Distributing data
Data warehousing and data mining
Data
• Set of discrete, objective facts about
events
• Business - structured records of
transactions
• Little relevance or purpose
Information
• Message with sender and receiver
• Meant to change way receiver perceives
something
• Have an impact on his judgment /
behavior
Data Processing
• Contextualize - why was data gathered?
• Categorize - what are its key
components?
• Calculate - analyze mathematically
• Condense - summarize in more concise
form
Information Processing
• Compare - in kind and in time
• Consequences - how used in decisions /
actions
• Connections - relation to other
information
• Conversation - what other people think
about this information
Agenda
•
•
•
•
•
•
Information processing
Database
Data Administrator
The DBMS
Distributing data
Data warehousing and data mining
Database
•
•
•
•
•
•
Element
Types
Structure
Models
Creation
Topology
Element
•
•
•
•
Bit, byte, field, record, file, database
Entity, attribute, key field
Relation
Class, object
Database Types
•
•
•
•
•
•
•
•
Business database
Geographical information database
Knowledge database / deductive database
Multimedia database
Data warehouse
Data marts
Multimedia and hypermedia database
Object-oriented database
Database Structure
• Data definition language
– Schema & subschema
• Data Manipulation language
– Structured Query Language (SQL)
– Query By Example (QBE)
• Data dictionary
Database Models
• Hierarchical
– One to many
– TPS or routine MIS
• Network
– Many to many
– TPS or routine MIS
• Relational
– Normalization
– Ad hoc reports or DSS
• Object-oriented
– E-commerce
Database Creation
• Conceptual design
– Logical view
– Entity-relationship (ER) diagram
– Normalization
Entity Relationship Diagram
• Entity: object or concept
• Relationship: meaning association between
objects
• Attribute: property of an object
– Simple & Composite
– Single-valued & multi-valued
– Derived
• Key
– Primary key
– Foreign key
Normalization
• A technique for identifying a true primary
key for a relation
• Types
– First normal form: not repeating group
– Second normal form: every non-primary-key
attribute is fully functionally dependent on the
entire primary key
– Third normal form: no transit dependency
Structured Query Language
• Select
• Join
SQL DML - SELECT
•
SELECT [DISTINCT|ALL] {* | [colexpr [AS newname]][,...]
FROM table-name [alias] [,...]
[WHERE condition]
[GROUP by colm [, colm]
[HAVING condition]]
ORDER BY colm [, colm]
SQL DML - SELECT
•
SELECT attributes (or calculations: +,
-, /, *)
FROM relation
•
SELECT DISTINCT attributes
FROM relation
Examples
• SELECT stuname
FROM student;
• SELECT stuid, stuname, credit
FROM student;
• SELECT stuid, stuname, credit+10
FROM student;
• SELECT DISTINCT major
FROM student;
SQL DML - SELECT
•
SELECT attributes (or * wild card)
FROM relation
WHERE condition
Examples
• SELECT *
FROM student;
• SELECT stuname, major, credit
FROM student
WHERE stuid = ‘S114’;
• SELECT *
FROM faculty
WHERE dept = ‘MIS’;
SELECT - WHERE condition
•
•
•
•
•
•
AND
OR
NOT
IN
NOT IN
BETWEEN
IS NULL
IS NOT NULL
LIKE '%' multiple characters
LIKE ‘_’ single characters
Examples
• SELECT *
FROM faculty
WHERE dept = ‘MIS’ AND rank =
‘full professor’;
• SELECT *
FROM faculty
WHERE dept = ‘MIS’ OR rank =
‘full professor’;
• SELECT *
FROM faculty
WHERE dept = ‘MIS’ NOT rank =
‘full professor’;
• SELECT *
FROM class
WHERE room LIKE ‘B_S%’;
• SELECT *
FROM class
WHERE room NOT LIKE ‘BUS%’;
• SELECT productid, productname
FROM inventory
WHERE onhand BETWEEN 50 and
100;
• SELECT companyid, companyname
FROM company
WHERE companyname BETWEEN
‘G’ AND ‘K’;
• SELECT productid, productname
FROM inventory
WHERE onhand NOT BETWEEN
50 and 100;
• SELECT companyid, companyname
FROM company
WHERE companyname NOT
BETWEEN ‘G’ AND ‘K’;
• SELECT facname
FROM faculty
WHERE dept IN (‘MIS’, ‘ACT’);
• SELECT facname
FROM faculty
WHERE rank NOT IN (‘assistant’,
‘lecture’);
• SELECT customername
FROM customer
WHERE emailadd IS NOT NULL;
• SELECT customername
FROM customer
WHERE creditlimit IS NULL;
SELECT - aggregate functions
•
•
•
•
•
•
COUNT (*)
COUNT
SUM
AVG
MIN
MAX
Examples
• SELECT COUNT(*)
FROM student;
• SELECT COUNT(major)
FROM student;
• SELECT COUNT(DISTINCT major)
FROM student;
• SELECT COUNT(stuid), SUM(credit),
AVG(credit), MAX(credit),
MIN(credit)
FROM student;
SELECT - GROUP
•
•
GROUP BY
HAVING
Examples
•
•
SELECT major, AVG(credit)
FROM student
GROUP BY major
HAVING COUNT(*) > 2;
SELECT course#, COUNT(stuid)
FROM enrollment
GROUP BY course#
HAVING COUNT(*) > 2;
•
SELECT major, AVG(credit)
FROM student
WHERE major IN (‘MIS’, ‘ACT’)
GROUP BY major
HAVING COUNT(*) > 2;
SELECT - ORDER BY
•
•
ORDER BY
ORDER BY ... DESC
Examples
•
•
SELECT facname, rank
FROM faculty
ORDER BY facname;
SELECT facname, rank
FROM faculty
ORDER BY rank DESC,
facname;
SELECT - JOIN Tables
•
•
Multiple tables in FROM clause
MUST have join conditions!!!
Examples
•
SELECT stuname, grade
FROM student, enrollment
WHERE student.stuid =
enrollment.stuid;
•
SELECT enrollment.course#,
stuname, major
FROM class, enrollment, student
WHERE class.course# =
enrollment.course#
AND enrollment.stuid =
student.stuid
AND facid = ‘F114’
ORDER BY enrollment.course#;
SUBQUERY, EXIST, NOT EXIST
• SELECT s.stuname, major
FROM student s
WHERE EXIST
(SELECT *
FROM enrollment e
WHERE
s.stuid = e.stuid);
• SELECT s.stuname, major
FROM student s
WHERE NOT EXIST
(SELECT *
FROM enrollment e
WHERE
s.stuid = e.stuid);
Database Creation
• Physical design
– Physical view
– Data topology (organization)
• Centralized
• Distributed database
– Replicated database
– Partitioned
• Organization & access method
– Sequential file
– Indexed sequential file
– Direct or random file
• Security
– Logical, physical, and transmitting
Selection Criteria
•
•
•
•
•
•
•
•
•
User’ needs (type of application)
Compatibility
Portability
Reliability
Cost
Feature
Performance
Vendor’s support
Others?
Agenda
•
•
•
•
•
•
Information processing
Database
Data Administrator
The DBMS
Distributing data
Data warehousing and data mining
Data Administrator
•
•
•
•
Clean up data definitions
Control shared data
Manage distributed data
Maintain data quality
Clean Up Definitions
• Synonyms / aliases
• Standard data definitions
– Names and formats
• Data Dictionary
– Active
– Integrated
Control Shared Data
• Local - used by one unit
• Shared - used by two or more activities
• Impact of proposed program changes on
shared data
• Program-to-data element matrix
Manage Distributed Data
• Geographically dispersed
– Whether shared data or not
• Different levels of detail
– Different management levels
Maintain Data Quality
• Put owners in charge of data
– Verify data accuracy and quality
• Purge old data
Agenda
•
•
•
•
•
•
Information processing
Database
Data Administrator
The DBMS
Distributing data
Data warehousing and data mining
The DBMS
Data Base Management System: software
that permits a firm to:
– Centralize data
– Manage them efficiently
– Provide access to applications
• Such as payroll, inventory
DBMS Components
•
•
•
•
Data Definition Language (DDL)
Data Manipulation Language (DML)
Inquiry Language (IQL)
Teleprocessing Interface (TP)
Definitions
• Views:
– Physical - how stored
– Logical - how viewed and used by users
• Schema - Overall logical layout of records
and fields in a database
• Subschema: Individual user’s logical
portion of database (view)
Agenda
•
•
•
•
•
•
Information processing
Database
Data Administrator
The DBMS
Distributing data
Data warehousing and data mining
Distributing Data
• Centralized files
• Fragemented files
– Distribute data without duplication
– Users unaware of where data located
Distributing Data
• Replicated files
– Data duplicated
– One site has master file
– Problem with data synchronization
• Decentralized files
– Local data autonomy
Distributing Data
• Distributed files
–
–
–
–
–
Client / server systems
Stored centrally
Portion downloaded to workstation
Workstation can change data
Changes uploaded to central computer
Agenda
•
•
•
•
•
•
Information processing
Database
Data Administrator
The DBMS
Distributing data
Data warehousing and data mining
Data Warehousing
• Collect large amounts of data from
multiple sources over several years
• Classify each record into multiple
categories
– Age
– Location
– Gender
Data Warehousing
• Rapidly select and retrieve by multiple
dimensions
– All females in Chicago under 25 years of
age
• Provide tailored, on-demand reports
• Data mart: a replicated subset of the data
warehouse
– A functional or regional area
Data Mining
• Fitting models to, or determining
patterns from, warehoused data
• Purposes:
– Analyze large amount of data
– Find critical points of knowledge
– Perform automatic analyses
Data Mining Terms
• Data Visualization
• Drill-down Analysis
– Hierarchical structure
– Leads to increasing level of detail
• Expert System (ES) methodology
– e.g., neural networks
Applications
•
•
•
•
•
Finance - fraud detection
Stock Market - forecasting
Real estate - property evaluation
Airlines - customer retention
Retail - customer targeting
Data Mining Example
• What type customers are buying specific
products?
• When are the times customers will most
likely shop?
• What types of products can be sold
together?
Points to Remember
•
•
•
•
•
•
Information processing
Database
Data Administrator
The DBMS
Distributing data
Data warehousing and data mining
Discussion Questions
• How can a database help an organization?
• Why normalization is very important for
building a database?
• Do you see any problem of the database in
your organization?
Discussion Questions
• What kind of database model is most suitable for
– School?
– Department store?
– Police?
• Some organizations are hesitant to distribute data.
These organizations feel that they may lose control.
– Do they lose control? Why?
– Could you suggest a “good” tactic?
• Could Data Mining pose a threat to individual
privacy?
– Why or why not?
– If so, how can we mitigate that threat?
– Do the advantages outweigh the disadvantages?
Assignment
•
•
•
•
Review chapters 10
Read chapter 8, 9, and 11
Group assignment
Research paper