Download 8. managing data resources - College of Business Administration

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Managing Data
Resources
Th
9
Edition
Problems with the Traditional File
Environment
• Data redundancy and inconsistency: the presences
of duplicate data in multiple data files so that the
same data are stored in more than one place or
location
• Data inconsistency – the same attribute may have
different values
• Program – data dependence: the coupling of data
stored in files and the specific programs required to
update and maintain those files
• Lack of flexibility: traditional file systems can
deliver routine scheduled reports, but cannot deliver
ad-hoc reports or respond to unanticipated
requirements.
Problems with the Traditional File
Environment (Continued)
• Lack of data sharing and availability: Information
cannot flow freely across different functional areas
or different parts of the organization. Users find
different values of the same piece of information in
two different systems.
• Poor security: Because there is little control or
management of data, management will have no
knowledge of who is accessing or even making
changes to the organization’s data.
Other Database Concepts
• Object-oriented database model
– Successor to the relational model
– Integration of data and programs
– Handles wider variety of field types
• Entity-relationship diagrams
– Graphical method of displaying relationships
between tables
– Tool for IS professionals
Types of Database Models
•
•
•
•
Hierarchical
Network
Relational
Object-oriented
– Extension of the relational model
– Stores both data and the procedures that act
on the data
– Stores more complex types of information
(graphics)
CREATING A DATABASE
ENVIRONMENT
An Entity-Relationship Diagram
Figure 7-12
Physical versus Logical Views
• In managing information, physical deals with the
structure of information as it resides on various
storage media.
• Logical deals with how knowledge workers view
their information needs, and includes such terms
as:
– CHARACTER - our smallest unit of
information.
– FIELD - group of related characters.
– RECORD - group of related fields.
– FILE - group of related records.
– DATABASE - group of logically associated
files.
– DATA WAREHOUSE - information from many
databases.
Other Logical Structures in a Database
• DATA DICTIONARY - contains the logical structure of
information in a database.
• An INTEGRITY CONSTRAINT is a rule that helps assure
the quality of the information in a database.
– A registration database at your school includes
integrity constraints concerning prerequisites for
certain classes.
– Designating primary keys, enforcing referential
integrity, using input masks, and validation rules are
ways to establish integrity constraints
Sample Data Dictionary Report
Components of a DBMS
DBMS engine- accepts logical requests from the various
other DBMS subsystems, converts them to their physical
equivalent, and actually accesses the database and data
dictionary as they exist on a storage device.
DATA DEFINITION SUBSYSTEM - helps you create and
maintain the data dictionary and define the structure of the
files in a database
You use this subsystem to define the information
logical structure when you first create a database.
Once you’ve created a database, you use this
subsystem to define new fields, delete fields, or change
field properties.
More Components of a DBMS
• DATA MANIPULATION SUBSYSTEM- helps you
add, change, and delete information in a database
and mine it for valuable information
– Tools in this subsystem include views, report
generators, query languages (QBE and SQL)
– SQL is both a DML and DDL
• APPLICATION GENERATION SUBSYSTEMcontains facilities to help you develop
transaction-intensive applications.
– Programming languages specific to a
particular DBMS
– Interfaces to commonly used programming
languages (e.g., COBOL or C++).
More Components of a DBMS
• DATA ADMINISTRATION SUBSYSTEM-helps you
manage the overall database environment by
providing facilities for:
– Backup and recovery
– Security management
Database Architectures- Centralized
• Centralized database use a single central
processor or multiple processors in a
client/server network. The major feature is that
the database is in a single physical location.
– Advantages of this design are that security
tends to be higher and risks are lower
– When data demands in terms of access are
highly decentralized this design tends to be
costly and inflexible
Database Architectures- Distributed
• Databases can be decentralized either by
partitioning or by replicating
• Partitioned database: Database is divided into
segments or regions. For example, a customer
database can be divided into Eastern customers and
Western customers, and two separate databases
maintained in the two regions.
• Duplicated database: The database is duplicated at
two or more locations. The separate databases are
synchronized in off hours on a batch basis.
Distributed Databases
Ensuring Data Quality
• Corporate and government databases have
unexpectedly poor levels of data quality.
• National consumer credit reporting databases
have error rates of 20-35%.
• 32% of the records in the FBI’s Computerized
Criminal History file are inaccurate, incomplete,
or ambiguous.
• Gartner Group estimates that consumer data in
corporate databases degrades at the rate of 2% a
month.
Ensuring Data Quality (Continued)
• The quality of decision making in a firm is directly
related to the quality of data in its databases.
• Data Quality Audit: Structured survey of the
accuracy and level of completeness of the data in
an information system
• Data Cleansing: Consists of activities for
detecting and correcting data in a database or file
that are incorrect, incomplete, improperly
formatted, or redundant
• Integrity constraints (mentioned earlier)
Data Warehouse
• Definition- a database with tools that stores current
and historical data that is designed to support
business analysis activities and decision-making
tasks of managers; typically a relational database
model is used
• Benefits
improved access
improved information
isolation from operational systems
tools permit advanced data analysis
• Users
• Data marts
Comparison of Data in a Data Warehouse
and Operational Data
•
•
•
•
Operational Data
Data is on many systems
Current operational data
Inconsistent data
definitions
• Functionally organized
data
• Data are constantly
changing
• Warehouse Data
• Integrated in one
enterprise-wide system
• Recent and historical data
• Consistent data definitions
• Data are organized around
business entities
• Data are stabilized
Building a Data Warehouse (ETL)
• Extraction phase – create files on the computer that
will store the data warehouse and move transaction
data to this machine; data may come from many
sources or parts of the organization
• Transformation phase – cleanse and standardize the
data. Why is this necessary?
• Load phase – transfer the data from the
transformation phase into the data warehouse
• The ETL process becomes automated to make
regular transfers of transaction data into the data
warehouse
Data-Mining and Data-Mining Tools
• Data-mining is the process of selecting, exploring, and
modeling large amounts of data to discover previously
unknown relationships that support decision making.
• Traditional data mining tools answer questions about
variables that we think are related
– Query languages (QBE or SQL)
– Report generators
– Multidimensional analysis tools (OLAP and pivot
tables)
– Standard statistical procedures (regression,
ANOVA)
• Knowledge discovery Data-mining tools look for
relationships that are not discernable to the human
eye (see next slide)
Data-Mining
Multidimensionality
• Multidimensional data analysis enables users to view
data using various dimensions, measures and time
frames OLAP
– dimensions: products, business units, country,
industry (categories)
– measures: money, unit sales, head count, variances
– time: daily, weekly, monthly, quarterly, yearly)
• This type of analysis also provides the ability to view
data in different ways (tables, charts, 3-D,
geographically)
• OLAP tools provide for this
• Pivot tables in Excel or Access
A Data Cube
Examples of OLAP Tools
• Go to www.fedscope.opm.gov
– Under data cubes on entry page click on
employment
– Demonstrate drill down and adding charts
– Data for this example comes from the Central
Personnel Data File (CPDF) of the federal
government
– The OLAP tool used to build this site is from a
company named Cognos (PowerPlay)
• OLAP tools based on Excel
– http://wLCubed.com
– http://www.cubularity.com
Databases and the Web
• Physical relationship of the hardware
• The role of middleware (conversion of HTML to SQL;
conversion of query result back to HTML).
• Using the Web
– The browser is a virtual standard and easy to use
– The browser does not require training in a
database query tool
– The use of the browser requires no change to the
internal database; this enables firms to provide
access to internal databases with little cost thus
leveraging their investment in older systems.
Linking Internal Databases to the
Web
Management Opportunities and
Challenges
• Effectively managing an organization’s data
resources is more than selecting a logical database
design
– Ongoing commitment requiring discipline
– Requires organizational and conceptual changes
– Management commitment and understanding
required
– Huge opportunities to improve performance by
managing data better
• Obstacles
– Cost/benefit is difficult; costs are upfront and
benefits are in the future
Solutions
• Data administration function
– Data are the property of the organization
– Establish a group to administer data
• Data-planning and modeling methodology
– Enterprise planning for data using a common
methodology
• Database technology, management, and users
– New software requires new personnel trained on
the software
– Database administration
– Increased training for end users
Key Organizational Elements in the Database
Environment
Spreadsheets Versus DBMS
• Linkage between elements
– spreadsheet - between cells in same table
– DBMS - between elements in different tables
• Orientation
– spreadsheet is toward calculations
– DBMS is tilted toward organization and linkage
of data elements in different tables
• Capabilities
– DBMS has extensive querying and reporting
power
– spreadsheet is limited
• Memory requirements
– entire spreadsheet table must be in memory
– not true for the database table