Download View

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Clusterpoint wikipedia , lookup

Data model wikipedia , lookup

Data analysis wikipedia , lookup

Operational transformation wikipedia , lookup

Data center wikipedia , lookup

Information privacy law wikipedia , lookup

3D optical data storage wikipedia , lookup

Data vault modeling wikipedia , lookup

Database model wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
Turban, Aronson, and Liang
Decision Support Systems and Intelligent Systems,
Seventh Edition
Chapter 5
Business Intelligence: Data
Warehousing, Data Acquisition, Data
Mining, Business Analytics, and
Visualization
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-1
Learning Objectives
•
•
•
•
•
Describe the issues in management of data.
Understand the concepts and use of DBMS.
Learn about data warehousing and data marts.
Explain business intelligence/business analytics.
Examine how decision making can be improved
through data manipulation and analytics.
• Understand the interaction betwixt the Web and
database technologies.
• Explain how database technologies are used in
business analytics.
• Understand the impact of the Web on business
intelligence and analytics.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-2
Data, Information, Knowledge
• Data
– Items that are the most elementary descriptions
of things, events, activities, and transactions
– May be internal or external
• Information
– Organized data that has meaning and value
• Knowledge
– Processed data or information that conveys
understanding or learning applicable to a
problem or activity
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-3
Data
• Raw data collected manually or by
instruments
• Quality is critical
– Quality determines usefulness
•
•
•
•
Contextual data quality
Intrinsic data quality
Accessibility data quality
Representation data quality
– Often neglected or casually handled
– Problems exposed when data is summarized
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-4
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-5
Data
• Cleanse data
–
–
–
–
When populating warehouse
Data quality action plan
Best practices for data quality
Measure results
• Data integrity issues
–
–
–
–
Uniformity
Version
Completeness check
Conformity check
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-6
Describe the role of the Internet in MSS data
management and business intelligence.
• The role of the Internet in MSS data
management and business intelligence is
increasing. Currently database vendors are
providing Web hooks that allow their
databases to provide data directly in
HTML or XML format, and Web browsers
are used to access databases. Most
business intelligence tools permit access
to data warehouses via the Internet or
company intranet.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-7
•
List the major categories of data
sources for an MSS/BI.
Internal sources; usually the reporting
systems of the functional areas.
External sources (commercial
databases, government and industry
reports, etc.) and personal data.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-8
• Describe the benefits of commercial
databases.
Provide external data at a timely
manner and at a reasonable cost.
Because of economies of scale, such
services are comprehensive and
inexpensive.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-9
Database Management Systems
•
•
•
•
•
Supplements operating system
Manages data
Queries data and generates reports
Data security
Combines with modeling language for
construction of DSS
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-10
Database Models
• Hierarchical
– Top down, like inverted tree
– Fields have only one “parent”, each “parent” can have multiple
“children”
– Fast
• Network
– Relationships created through linked lists, using pointers
– “Children” can have multiple “parents”
– Greater flexibility, substantial overhead
• Relational
– Flat, two-dimensional tables with multiple access queries
– Examines relations between multiple tables
– Flexible, quick, and extendable with data independence
• Object oriented
– Data analyzed at conceptual level
– Inheritance, abstraction, encapsulation
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-11
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-12
Database Models, continued
• Multimedia Based
– Multiple data formats
• JPEG, GIF, bitmap, PNG, sound, video, virtual reality
– Requires specific hardware for full feature
availability
• Document Based
– Document storage and management
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-13
• Define document management.
Document management involves managing
what was once paper documents in a firm. It is a
generally computerized system that provides
access to the most recent versions of important
documents (policies, methods, etc.), restricts
access to appropriate employees, allows updates
by key people, and performs archiving
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-14
• Define object-oriented database
management.
Based
on
object-oriented
programming: using symbols and
icons it can handle very complex data
structures, show hierarchies, and
complex relationships.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-15
• What is SQL? Why is it important?
A SQL (Structured Query Language) is a
nonprocedural language for data
manipulation in a relational DBMS. It
can be used to query a database, to
exercise DBMS operations, and to
perform database administration
functions. It is a standard used by
database vendors to permit access to
relational databases.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-16
What is the difference between a database and a data
warehouse?
Technically a data warehouse is a database, however, a
data warehouse is an integrated, time-variant, nonvolatile,
subject-oriented repository of detail and summary data used for
decision support and business analytics within an organization.
Databases are typically the term used to describe operational
data stores and are transactional in their structure. As a result
databases are usually highly normalized, whereas data
warehouses are highly denormalized.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-17
Data warehouse
•
A data warehouse is a physically separate
database
from
a
company’s
operational
environments. Its purpose is to provide decision
support from its data repository that makes
operational data accessible in a form that is readily
acceptable for decision support and other user’s
applications. Data warehousing is the process of
taking internal data, cleansing it, and storing it in a
data warehouse where it can be accessed by
various decision makers in the decision-making
process. External information is also brought into
the data warehouse.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-18
Data Warehouse
• Subject oriented
• Scrubbed so that data from heterogeneous sources are
standardized
• Nonvolatile
– Read only
• Summarized
• Not normalized; may be redundant
• Data from both internal and external sources is present
• Metadata included
– Data about data
• Business metadata
• Semantic metadata
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-19
Architecture
• May have one or more tiers
– Determined by warehouse, data
acquisition (back end), and client (front
end)
• One tier, where all run on same platform, is
rare
• Two tier usually combines DSS engine
(client) with warehouse
– More economical
• Three tier separates these functional parts
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-20
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-21
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-22
Data Warehouse Design
• Dimensional modeling
– Retrieval based
– Implemented by star schema
• Central fact table
• Dimension tables
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-23
star schema
•
A Star Schema is a technique used to define the
structure of a data warehouse. It consists of two
components, dimension tables (which define the
criteria by which data will be retrieved ;e.g.,
location, product, time and fact tables (the data
that is of interest to the organization). Facts can
be highly summarized or detail data
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-24
• Describe the role that a data warehouse can play
in MSS. List its benefits.
•
The data contained in a data warehouse has
been cleansed and thus has little redundancy and
a higher level of integrity. This gives a higher level
of confidence in the decisions made based on the
data contained in the warehouse. Benefits include
a common storage format, quick access to data for
strategic use, and accurate data.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-25
Data Marts
A data mart is a small data warehouse
designed for the strategic business unit
(SBU) or a department. Data marts can
either be dependent or independent. They
are important because they can be a cost
effective way to determine the benefits of a
data warehouse to an organization.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-26
Data Marts
• Dependent
– Created from warehouse
– Replicated
• Functional subset of warehouse
• Independent
– Scaled down, less expensive version of data
warehouse
– Designed for a department or SBU
– Organization may have multiple data marts
• Difficult to integrate
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-27
Business Intelligence and Analytics
• Business intelligence
– Acquisition of data and information for
use in decision-making activities
• Business analytics
– Models and solution methods
• Data mining
– Applying models and methods to data to
identify patterns and trends
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-28
OLAP
•
OLAP is the “online analytical
processing” of data. It allows a user to
tap into raw data and perform detailed
and complex analysis directly on the
client machine, without resorting to
back-end processing
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-29
OLAP
• Activities performed by end users in online
systems
– Specific, open-ended query generation
• SQL
– Statistical analysis
– Building DSS applications
• Special class of tools
–
–
–
–
DSS/BI front ends
Data access front ends
Database front ends
Visual information access systems
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-30
Data Mining
• Organizes and employs information and
knowledge from databases
• Statistical, mathematical, artificial
intelligence, and machine-learning
techniques
• Automatic and fast
• Tools look for patterns
– Simple models
– Intermediate models
– Complex Models
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-31
• Differentiate data mining, text mining, and Web
mining.
Text mining involves analyzing vast amounts of
textual data to determine patterns or correlations
within the text. Data mining is a broader subject
encompassing all types of information contained
within an organization. Web mining extends data
mining to include Web resources in the
determination of correlations or patterns with
organizational data.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-32
Data Visualization
• Technologies supporting visualization
and interpretation
– Digital imaging, GIS, GUI, tables,
multidimensions, graphs, VR, 3D,
animation
– Identify relationships and trends
• Data manipulation allows real time
look at performance data
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-33