Download chapter 4

Document related concepts

Big data wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
CHAPTER 4
Data, Information, and Knowledge Management
1
Opening Case
COFCO China Foods Limited Adopts a New
Database Management System
COFCO Group is one of China’s largest enterprises, with over
100,000 employees, and it specializes in importing and exporting food
products. One of its subsidiaries, China Foods Limited, is the market
leader in the food industry, selling wines, oils, confectionery, and other
food products.
The Business Problem
China Foods Limited grew rapidly through a number of mergers and
acquisitions. This created a number of problems, especially with
regards to integrating the information systems of the newly acquired
subsidiaries. Data errors, duplication, and data inconsistencies across
subsidiaries became everyday problems.
2
Opening Case
Discussion

Why has data management become so important for today’s
organizations?

What are the potential benefits of investing in database management
technology?

What are the technological and managerial challenges of managing a
large database management system such as the one implemented at
COFCO?
3
Opening Case
What we learned from this case?

The case of COFCO’s China Foods Ltd. represents the very real
problems that almost every business faces as it grows, regarding
one of its most valuable resources: data.

Data problems become even more pronounced when we consider
the incredibly rapid increase in the amount of data that
organizations capture and store. The opportunity for errors in the
data is increasing exponentially as businesses expand.

Increasingly, more and more businesses, like COFCO, are investing
in database technology to manage their valuable data so that they
can better support their business processes and ultimately improve
profitability.
4
Agenda
4.1 Managing Data
4.1.1 Difficulties in managing data
4.1.2 Data life cycle
4.2 The Database Approach
4.2.1 Database Management Systems (DBMS)
4.2.2 The data hierarchy
4.2.3 Designing the database
4.3 Relational Database Management Systems
4.3.1 The Relational Database Model
4.3.2 Query Languages
4.3.3 Data Dictionary
4.3.4 Normalization
5
Agenda
4.4 Data Warehousing
4.4.1 Definition and characteristics
4.4.2 Benefits
4.4.3 Data Marts
4.5 Data Governance
4.6 Knowledge Management
4.6.1 Concepts and definitions
4.6.2 Knowledge Management Systems (KMSs)
4.6.3 Knowledge management system cycle
6
LEARNING OBJECTIVES
1.
2.
3.
4.
5.
6.
Recognize the importance of data, the issues involved in managing
them and understand the data life cycle. (4.1)
Explain the advantages of the database approach. (4.2)
Describe the main characteristics of the relational database
model. (4.3)
Explain how a data warehouse operates and how it supports
decision making.(4.4)
Define data governance and explain how it helps produce highquality data. (4.5)
Define knowledge, and differentiate between explicit and tacit
knowledge.(4.6)
7
CHAPTER OVERVIEW
8
4.1 Managing Data
4.1.1 Difficulties in managing data
4.1.2 Data life cycle
9
4.1.1 Difficulties in managing data
•
•
•
•
Amount of data increases exponentially
Data are scattered and collected by many individuals
using various methods and devices
Data come from many sources
Data security, quality, and integrity are critical
10
Data source examples
Credit card
swipes
E-mails
RFID tags
Digital video
surveillance
Radiology scans
Blogs
11
4.1.2 Data life cycle

Businesses run on data that have been processed into information
and knowledge. Managers then apply this knowledge to business
problems and opportunities. Businesses transform data into
knowledge and solutions in several ways. The general process is
illustrated in Figure 4.1 and is referred to as the data life cycle.
12
Figure 4.1 illustrates the processing of data into information and then knowledge.
13
4.2 The Database Approach
4.2.1 Database Management Systems (DBMS)
4.2.2 The data hierarchy
4.2.3 Designing the database
14
4.2.1 Database management systems
Definition
 Benefits
 Issues

15
Definition

A database management system (DBMS) is a set of programs
that provide users with tools to add, delete, access, and
analyze data stored in one location.

Database Management Systems Interfaces with the database,
and provides all users with integrated access to the data.

There are a number of different database architectures, but
we focus on the relational database model because it is
popular and easy to use. Other database models are also
available such as the hierarchical, network, and objectoriented models.
16
17
Benefits
In general, database management systems contribute to
minimize the following problems:
 Data redundancy
 Data isolation
 Data inconsistency
18

Data redundancy: The same data are stored in many
places.

Data isolation: Applications cannot access data
associated with other applications.

Data inconsistency: Various copies of the data do not
agree.
19
Issues
In addition, database systems maximize the following
issues:



Data security
Data integrity
Data independence
20

Data security: Because data are essential to organizations,
databases have extremely high security measures in place to
deter mistakes and attacks (recall our discussion in Chapter
3).

Data integrity: Data meet certain constraints, such as no
alphabetic characters in a social insurance number field.

Data independence: Applications and data are independent
of one another (that is, applications and data are not linked
to each other, meaning that applications can be designed to
access the same data).
21
4.2.2 The data hierarchy

Data is organized in a hierarchy that begins with bits
and proceeds all the way to database
Figure 4.3 Hierarchy of data for a computer-based file
22






A bit (binary digit) represents the smallest unit of data a computer
can process. The term “binary” means that a bit can consist only of
a 0 or a 1.
A group of eight bits, called a byte, represents a single character. A
byte can be a letter, a number, or a symbol.
A logical grouping of characters into a word, a small group of
words, or an identification number is called a field.
A logical grouping of related fields, such as the student’s name, the
courses taken, the date, and the grade, compose a record.
A logical grouping of related records is called a file or table.
A logical grouping of related tables would constitute a database.
23
4.2.3 Designing the database
Data must be organized so that users can retrieve, analyze and
understand it. A key to effectively designing a database is the data
model.
Data model: a diagram that represents the entities in the database
and their relationships.
◦
◦
◦
◦
Entity
Attribute
Primary key
Secondary keys
24

An entity is a person, place, thing, or event about which
information is maintained. A record generally describes an entity.

An attribute is a particular characteristic or quality of a particular
entity.

The primary key is a field that uniquely identifies a record.

Secondary keys are other field that have some identifying
information but typically do not identify the file with complete
accuracy.
25
Entity-Relationship Modeling


Entity classes are groups of entities of a certain type.
Database designers plan the database design in a process called
entity-relationship (ER) modeling.
26

ER diagrams consists of entities, attributes and relationships.
◦ Entity classes
◦ Instance
◦ Identifiers

An instance of an entity class is the representation of a particular
entity.

Entity instances have identifiers, which are attributes that are
unique to that entity instance.
27
Entity-Relationship Diagram Model
28
4.3 Relational Database Management
Systems
4.3.1 The Relational Database Model
4.3.2 Query Languages
4.3.3 Data Dictionary
4.3.4 Normalization
29
4.3.1 The Relational Database Model

The relational database model is based on the concept of
two-dimensional tables.

A relational database generally is not one big table—usually
called a flat file—that contains all of the records and
attributes. Such a design would entail far too much data
redundancy. Instead, a relational database is usually designed
with a number of related tables. Each of these tables contains
entities (as records listed in rows) and attributes (as fields
listed in columns).
30
Relational Databases
31
4.3.2 Query Languages

Requesting information from a database is the most
commonly performed operation.

Structured query language (SQL) is the most popular query
language used to request information. SQL allows people to
perform complicated searches by using relatively simple
statements or key words.

Another way to find information in a database is to use
query by example (QBE). In QBE, the user fills out a grid or
template (also known as a form) to construct a sample or
description of the data he or she wants.
32
4.3.3 Data Dictionary

When a relational model is created, the data dictionary
defines the format necessary to enter the data into the
database. The data dictionary provides information on
each attribute, such as its name, whether it is a key or
part of a key, the type of data expected (alphanumeric,
numeric, dates, and so on), and valid values.
33
4.3.4 Normalization

Normalization is a method for analyzing and reducing a
relational database to its most streamlined form for:
◦ Minimum redundancy
◦ Maximum data integrity
◦ Best processing performance

Normalized data is when attributes in the table depend
only on the primary key.
34
The Normalization Produces Order
35
4.4 Data Warehousing
4.4.1 Definition and characteristics
4.4.2 Benefits and issues
4.4.3 Data Marts
36
4.4.1 Definition and characteristics
Definition
◦ A data warehouse is a repository of historical data organized by
subject to support decision makers in the organization. Data
warehouses facilitate business intelligence activities, such as data
mining, decision support, and query applications.
Characteristics
◦
◦
◦
◦
◦
◦
Organized by business dimension or subject.
Consistent
Historical
Nonvolatile
Multidimensional
Relationship with relational databases
A Data Cube
37

The data cube has three dimensions: customer, product, and time.

Historical data in data warehouses can be used for identifying
trends, forecasting, and making comparisons over time.

Online analytical processing (OLAP) involves the analysis of
accumulated data by end users (usually in a data warehouse).

In contrast to OLAP, online transaction processing (OLTP)
typically involves a database, where data from business transactions
are processed online as soon as they occur.
38
Figure 4.8 Data Warehouse Framework & Views
39
4.4.2 Benefits and issues
Benefits
 End users can access data quickly and easily via Web
browsers because they are located in one place
 End users can conduct extensive analysis with data in
ways that may not have been possible before
 End users have a consolidated view of organizational
data
40
Issues
 Can be very expensive to build and to maintain.
 Incorporating data from obsolete mainframe systems
can be difficult and expensive.
 People in one department might be reluctant to share
data with other departments.
 Transferring of Data from other systems, can go through
a cleansing process that changes the information,
meaning that the data is no longer a historical record
and do not fully represent the actual accounting
systems.
41
4.4.3 Data Marts
A data mart is a small data warehouse designed for
the end-user needs in a strategic business unit (SBU) or
a department.
42

Data marts are far less costly than data warehouses. A typical Data
Mart costs less than $100,000, compared with $1 million or more
for a Data Warehouse.

Data Marts can be implemented more quickly, often in less than 90
days.

Data Marts contain less information than a Data Warehouses,
resulting in more rapid response, are easier to learn and navigation
is easier for the end user.

Data Marts support users of the data locally rather then centrally
controlled by giving power or control to the users and user groups
of the information being used.
43
4.5 Data Governance
Data Governance is an approach to managing
information across an entire organization. It involves a
formal set of business process and policies that are
designed to ensure that data is handled in a certain, well
defined fashion.
44

One strategy for implementing data governance is
master data management.

Master data management is a process that spans all
organizational business processes and applications. It
provides companies with the ability to store, maintain,
exchange, and synchronize a consistent, accurate, and
timely “single version of the truth” for the company’s
core master data.
45

Master data are a set of core data, such as customer,
product, employee, vendor, geographic location, and so
on, that span the enterprise information systems.

Transaction data, which are generated and captured by
operational systems, that describes the activities, or
transactions, of the business.

In contrast, master data are applied to multiple
transactions and used to categorize, aggregate, and
evaluate the transaction data.
46
4.6 Knowledge Management
4.6.1 Concepts and definitions
4.6.2 Knowledge management systems (KMSs)
4.6.3 Knowledge management system cycle
47
4.6.1 Concepts and definitions

Knowledge management is a process that helps organizations
manipulate important knowledge that is part of the organization’s
memory, usually in an unstructured format.

Knowledge is information that is contextual, relevant, and
actionable.

Intellectual capital is another term often used for knowledge.
Explicit Knowledge
(above the waterline)
Tacit Knowledge
(below the waterline)
48

Explicit knowledge: objective, rational, technical knowledge that
has been documented.
Examples: policies, procedural guides, reports, products, strategies,
goals, core competencies

Tacit knowledge: cumulative store of subjective or experiential
learning.
Examples: experiences, insights, expertise, know-how, trade secrets,
understanding, skill sets, and learning
49
4.6.2 Knowledge Management Systems

Knowledge Management systems refer to the use of
information technologies to systematize, enhance, and
expedite intra-firm and inter-firm knowledge
management.

Best practices is a method, technique, activity, or
process that is the most effective and efficient way of
doing tasks in a business.
50
4.6.3 Knowledge Management System Cycle
51
Closing Case
Document Management at P&G
The consumer goods giant Procter & Gamble (P&G) is a huge firm with
reported sales of US$79 billion in 2009. Its portfolio includes Crest, Tide,
Gillette, Pampers, and Charmin.
The Business Problem
Even though it used advanced IT and business processes, P&G faced
problems managing the vast amounts of paper required for a company
that develops drugs and over-the-counter medications. Regulatory issues,
research and development (R&D), and potential litigation generate even
more paper documents and files. As a result, P&G wanted to gain control
of its company documents, reduce administrative oversight of its paper
documents, reduce costs, accelerate R&D initiatives, and improve tracking
and signature compliance.
52
Closing Case
Discussion
1.
Company documents are one type of data that companies must
manage. Compare the benefits of P&G’s document management
system with the benefit of database technology. Do you notice
any differences? Support your answer.
2.
This case has described numerous advantages of P&G’s move to
electronic documents. Describe the disadvantages of electronic
documents.
53
Closing Case
The Results
Today, once a digital signature is added to a file, an auditor can
immediately view the document and all activity related to the
document. The auditor right-clicks on the signature and views the
entire audit trail. The signature can also be appended as a last page
of the file so that it can be shared externally when necessary, such
as in a court of law.

The electronic document management system is estimated to save
each employee an average of 30 minutes of signing and archiving
time per week. That doesn’t seem like much, but in a huge company
like P&G, it is expected to add up to a savings of tens of millions of
dollars in productivity gains. The system also saves time, as P&G can
access vast amounts of data at its fingertips that may be asked for
by government regulators or business partners.
54
Copyright
Copyright © 2011 John Wiley & Sons Canada, Ltd. All rights reserved.
Reproduction or translation of this work beyond that permitted by
Access Copyright (the Canadian copyright licensing agency) is unlawful.
Requests for further information should be addressed to the
Permissions Department, John Wiley & Sons Canada, Ltd. The purchaser
may make back-up copies for his or her own use only and not for
distribution or resale. The author and the publisher assume no
responsibility for errors, omissions, or damages caused by the use of
these files or programs or from the use of the information contained
herein.