* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 8. managing data resources - College of Business Administration
Survey
Document related concepts
Transcript
Managing Data Resources Th 9 Edition Problems with the Traditional File Environment • Data redundancy and inconsistency: the presences of duplicate data in multiple data files so that the same data are stored in more than one place or location • Data inconsistency – the same attribute may have different values • Program – data dependence: the coupling of data stored in files and the specific programs required to update and maintain those files • Lack of flexibility: traditional file systems can deliver routine scheduled reports, but cannot deliver ad-hoc reports or respond to unanticipated requirements. Problems with the Traditional File Environment (Continued) • Lack of data sharing and availability: Information cannot flow freely across different functional areas or different parts of the organization. Users find different values of the same piece of information in two different systems. • Poor security: Because there is little control or management of data, management will have no knowledge of who is accessing or even making changes to the organization’s data. Other Database Concepts • Object-oriented database model – Successor to the relational model – Integration of data and programs – Handles wider variety of field types • Entity-relationship diagrams – Graphical method of displaying relationships between tables – Tool for IS professionals Types of Database Models • • • • Hierarchical Network Relational Object-oriented – Extension of the relational model – Stores both data and the procedures that act on the data – Stores more complex types of information (graphics) CREATING A DATABASE ENVIRONMENT An Entity-Relationship Diagram Figure 7-12 Physical versus Logical Views • In managing information, physical deals with the structure of information as it resides on various storage media. • Logical deals with how knowledge workers view their information needs, and includes such terms as: – CHARACTER - our smallest unit of information. – FIELD - group of related characters. – RECORD - group of related fields. – FILE - group of related records. – DATABASE - group of logically associated files. – DATA WAREHOUSE - information from many databases. Other Logical Structures in a Database • DATA DICTIONARY - contains the logical structure of information in a database. • An INTEGRITY CONSTRAINT is a rule that helps assure the quality of the information in a database. – A registration database at your school includes integrity constraints concerning prerequisites for certain classes. – Designating primary keys, enforcing referential integrity, using input masks, and validation rules are ways to establish integrity constraints Sample Data Dictionary Report Components of a DBMS DBMS engine- accepts logical requests from the various other DBMS subsystems, converts them to their physical equivalent, and actually accesses the database and data dictionary as they exist on a storage device. DATA DEFINITION SUBSYSTEM - helps you create and maintain the data dictionary and define the structure of the files in a database You use this subsystem to define the information logical structure when you first create a database. Once you’ve created a database, you use this subsystem to define new fields, delete fields, or change field properties. More Components of a DBMS • DATA MANIPULATION SUBSYSTEM- helps you add, change, and delete information in a database and mine it for valuable information – Tools in this subsystem include views, report generators, query languages (QBE and SQL) – SQL is both a DML and DDL • APPLICATION GENERATION SUBSYSTEMcontains facilities to help you develop transaction-intensive applications. – Programming languages specific to a particular DBMS – Interfaces to commonly used programming languages (e.g., COBOL or C++). More Components of a DBMS • DATA ADMINISTRATION SUBSYSTEM-helps you manage the overall database environment by providing facilities for: – Backup and recovery – Security management Database Architectures- Centralized • Centralized database use a single central processor or multiple processors in a client/server network. The major feature is that the database is in a single physical location. – Advantages of this design are that security tends to be higher and risks are lower – When data demands in terms of access are highly decentralized this design tends to be costly and inflexible Database Architectures- Distributed • Databases can be decentralized either by partitioning or by replicating • Partitioned database: Database is divided into segments or regions. For example, a customer database can be divided into Eastern customers and Western customers, and two separate databases maintained in the two regions. • Duplicated database: The database is duplicated at two or more locations. The separate databases are synchronized in off hours on a batch basis. Distributed Databases Ensuring Data Quality • Corporate and government databases have unexpectedly poor levels of data quality. • National consumer credit reporting databases have error rates of 20-35%. • 32% of the records in the FBI’s Computerized Criminal History file are inaccurate, incomplete, or ambiguous. • Gartner Group estimates that consumer data in corporate databases degrades at the rate of 2% a month. Ensuring Data Quality (Continued) • The quality of decision making in a firm is directly related to the quality of data in its databases. • Data Quality Audit: Structured survey of the accuracy and level of completeness of the data in an information system • Data Cleansing: Consists of activities for detecting and correcting data in a database or file that are incorrect, incomplete, improperly formatted, or redundant • Integrity constraints (mentioned earlier) Data Warehouse • Definition- a database with tools that stores current and historical data that is designed to support business analysis activities and decision-making tasks of managers; typically a relational database model is used • Benefits improved access improved information isolation from operational systems tools permit advanced data analysis • Users • Data marts Comparison of Data in a Data Warehouse and Operational Data • • • • Operational Data Data is on many systems Current operational data Inconsistent data definitions • Functionally organized data • Data are constantly changing • Warehouse Data • Integrated in one enterprise-wide system • Recent and historical data • Consistent data definitions • Data are organized around business entities • Data are stabilized Building a Data Warehouse (ETL) • Extraction phase – create files on the computer that will store the data warehouse and move transaction data to this machine; data may come from many sources or parts of the organization • Transformation phase – cleanse and standardize the data. Why is this necessary? • Load phase – transfer the data from the transformation phase into the data warehouse • The ETL process becomes automated to make regular transfers of transaction data into the data warehouse Data-Mining and Data-Mining Tools • Data-mining is the process of selecting, exploring, and modeling large amounts of data to discover previously unknown relationships that support decision making. • Traditional data mining tools answer questions about variables that we think are related – Query languages (QBE or SQL) – Report generators – Multidimensional analysis tools (OLAP and pivot tables) – Standard statistical procedures (regression, ANOVA) • Knowledge discovery Data-mining tools look for relationships that are not discernable to the human eye (see next slide) Data-Mining Multidimensionality • Multidimensional data analysis enables users to view data using various dimensions, measures and time frames OLAP – dimensions: products, business units, country, industry (categories) – measures: money, unit sales, head count, variances – time: daily, weekly, monthly, quarterly, yearly) • This type of analysis also provides the ability to view data in different ways (tables, charts, 3-D, geographically) • OLAP tools provide for this • Pivot tables in Excel or Access A Data Cube Examples of OLAP Tools • Go to www.fedscope.opm.gov – Under data cubes on entry page click on employment – Demonstrate drill down and adding charts – Data for this example comes from the Central Personnel Data File (CPDF) of the federal government – The OLAP tool used to build this site is from a company named Cognos (PowerPlay) • OLAP tools based on Excel – http://wLCubed.com – http://www.cubularity.com Databases and the Web • Physical relationship of the hardware • The role of middleware (conversion of HTML to SQL; conversion of query result back to HTML). • Using the Web – The browser is a virtual standard and easy to use – The browser does not require training in a database query tool – The use of the browser requires no change to the internal database; this enables firms to provide access to internal databases with little cost thus leveraging their investment in older systems. Linking Internal Databases to the Web Management Opportunities and Challenges • Effectively managing an organization’s data resources is more than selecting a logical database design – Ongoing commitment requiring discipline – Requires organizational and conceptual changes – Management commitment and understanding required – Huge opportunities to improve performance by managing data better • Obstacles – Cost/benefit is difficult; costs are upfront and benefits are in the future Solutions • Data administration function – Data are the property of the organization – Establish a group to administer data • Data-planning and modeling methodology – Enterprise planning for data using a common methodology • Database technology, management, and users – New software requires new personnel trained on the software – Database administration – Increased training for end users Key Organizational Elements in the Database Environment Spreadsheets Versus DBMS • Linkage between elements – spreadsheet - between cells in same table – DBMS - between elements in different tables • Orientation – spreadsheet is toward calculations – DBMS is tilted toward organization and linkage of data elements in different tables • Capabilities – DBMS has extensive querying and reporting power – spreadsheet is limited • Memory requirements – entire spreadsheet table must be in memory – not true for the database table