Download Chapter 8: Data and Knowledge Management

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data center wikipedia , lookup

Data model wikipedia , lookup

Database wikipedia , lookup

Forecasting wikipedia , lookup

Data analysis wikipedia , lookup

3D optical data storage wikipedia , lookup

Clusterpoint wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Database model wikipedia , lookup

Transcript
Chapter 7: Databases and Data
Warehouses
Oz (5th edition)
Ideas From the First Part of Chapter 7
• Problems with the traditional file approach (pre
database)
– Data redundancy
– Data integrity
– Data security
– Program data dependence
– Programmers are required to access data
• Advantages of the database approach
– Reductions in data redundancy
– Application-data independence
– Better control; better security
– Flexibility
More Ideas
• Object-oriented database model
– Successor to the relational model
– Integration of data and programs
– Handles wider variety of field types
• Entity-relationship (ER) diagrams
– Graphical method of displaying relationships
between tables
– An ER diagram is an example of a schema
(conceptual model of the database)
– Tool for IS professionals
CREATING A DATABASE
ENVIRONMENT
An Entity-Relationship Diagram
Physical versus Logical Views
• In managing information, physical deals with the
structure of information as it resides on various
storage media.
• Logical deals with how knowledge workers view
their information needs, and includes such terms
as:
– CHARACTER - our smallest unit of
information.
– FIELD - group of related characters.
– RECORD - group of related fields.
– FILE - group of related records.
– DATABASE - group of logically associated
files.
– DATA WAREHOUSE - information from many
databases.
Other Logical Structures in a Database
• DATA DICTIONARY - contains the logical structure
of information in a database.
– Definitions of all fields, records, and tables
– Relationships between tables
– Who is responsible for maintaining data in the
database
– Descriptions of who is authorized to access
different parts of the database
• Data dictionary contains meta data (data about the
data)
Sample Data Dictionary Report
Components of a DBMS
• Data definition subsystem (language; DDL)
– Defines the structure of the database tables
(design view in Access)
– Creates and maintains the data dictionary
– Defines the relationships between tables
– Add, delete, or modify field properties
More Components of a DBMS
• Data manipulation subsystem (language; DML)
– Add, delete, and modify data in the database
– Contains the query languages (QBE or SQL)
for the database. SQL is both a DML and DDL
– Contains report generation capability
• Data administration subsystem
– Manage the overall database environment by
providing facilities for:
• Backup and recovery
• Security management
Data Warehouse
– Definition- a database with tools that stores
current and historical data that is designed to
support business analysis activities and decisionmaking tasks of managers; typically a relational
database model is used
– Benefits
• improved access
• improved information
• isolation from operational systems
• tools permit advanced data analysis
– Users
– Data marts
Building a Data Warehouse (ETL)
• Extraction phase – create files on the computer that
will store the data warehouse and move transaction
data to this machine; data may come from many
sources or parts of the organization
• Transformation phase – cleanse and standardize the
data. Why is this necessary?
• Load phase – transfer the data from the
transformation phase into the data warehouse
• The ETL process becomes automated to make
regular transfers of transaction data into the data
warehouse
Comparison of Data in a Data Warehouse
and Operational Data
• Operational Data
• Data is on many
systems
• Current operational
data
• Inconsistent data
definitions
• Functionally organized
data
• Data are constantly
changing
• Support OLTP
• Warehouse Data
• Integrated in one
enterprise-wide system
• Recent and historical
data
• Consistent data
definitions
• Data are organized
around business
entities
• Data are stabilized
• Support OLAP
Data-Mining and Data-Mining Tools
• Data-mining is the process of selecting, exploring, and
modeling large amounts of data to discover previously
unknown relationships that support decision making.
• Traditional data mining tools answer questions about
variables that we think are related
– Query languages (QBE or SQL)
– Report generators
– Multidimensional analysis tools (OLAP or pivot
tables)
– Standard statistical procedures (regression,
ANOVA)
• Knowledge discovery tools are data-mining tools for
finding relationships that are not discernable to the
human eye (see next slide);
Data-Mining as Knowledge Discovery:
Selected Examples
Figure 8.22 Potential applications of data-mining
Multidimensionality
• Multidimensional data analysis (or OLAP) enables
users to view data using various dimensions,
measures and time frames (i. e., OLAP)
– dimensions: products, business units,
country, industry (e.g., categories)
– measures: money, unit sales, head count,
variances
– time: daily, weekly, monthly, quarterly, yearly)
• This type of analysis also provides the ability to
view data in different ways (tables, charts, 3-D,
geographically)
• OLAP tools provide for this
• Pivot tables in Excel or Access
Examples of OLAP Tools
• Go to www.fedscope.opm.gov
– Under data cubes on entry page click on
employment
– Demonstrate drill down and adding charts
– Data for this example comes from the Central
Personnel Data File (CPDF) of the federal
government
– The OLAP tool used to build this site is from a
company named Cognos (PowerPlay)
• OLAP tools based on Excel
– http://wLCubed.com
– http://www.cubularity.com
Multidimensionality
Database Architecture: The Physical and
Logical Layout of the Hardware, Data, and
Applications
• Centralized databases with remote access
• Distributed Databases
– With replication a full copy of the entire database
is stored at all sites
– With fragmentation the database is partitioned
• Parts of database are stored where they are
most often accessed
Web Databases
• The ease of use of Web browsers enables firms to
link their databases to the Web
• Ease of use enables users to
– Access and retrieve information from a database
– Enter information into the database
• The user requires no special training in a DBMS to
perform the above activities; prior to the browser
and the Web this would not have been feasible
• What does this mean?
Federal Trade Commission’s Fair
Information Practice Principles* (1973)
• Notice/awareness – disclosure of practices before
collecting data
• Choice/consent – opt in/opt out for consumers
• Access/participation – consumers can review and
contest data for accuracy and completeness
• Security – data collectors must take steps to secure
data for accuracy and unauthorized use
• Enforcement – there must be a mechanism in place
to enforce FIP principles
• *Laws enforce these principles for data collected by
federal agencies; not so in the private sector
Spreadsheets Versus DBMS
• Linkage between elements
– spreadsheet - between cells in same table
– DBMS - between elements in different tables
• Orientation
– spreadsheet is toward calculations
– DBMS is tilted toward organization and linkage
of data elements in different tables
• Capabilities
– DBMS has extensive querying and reporting
power
– spreadsheet is limited
• Memory requirements
– entire spreadsheet table must be in memory
– not true for the database table