Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Protection Act, 2012 wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data center wikipedia , lookup
Forecasting wikipedia , lookup
Data analysis wikipedia , lookup
3D optical data storage wikipedia , lookup
Clusterpoint wikipedia , lookup
Information privacy law wikipedia , lookup
Data vault modeling wikipedia , lookup
Chapter 7: Databases and Data Warehouses Oz (5th edition) Ideas From the First Part of Chapter 7 • Problems with the traditional file approach (pre database) – Data redundancy – Data integrity – Data security – Program data dependence – Programmers are required to access data • Advantages of the database approach – Reductions in data redundancy – Application-data independence – Better control; better security – Flexibility More Ideas • Object-oriented database model – Successor to the relational model – Integration of data and programs – Handles wider variety of field types • Entity-relationship (ER) diagrams – Graphical method of displaying relationships between tables – An ER diagram is an example of a schema (conceptual model of the database) – Tool for IS professionals CREATING A DATABASE ENVIRONMENT An Entity-Relationship Diagram Physical versus Logical Views • In managing information, physical deals with the structure of information as it resides on various storage media. • Logical deals with how knowledge workers view their information needs, and includes such terms as: – CHARACTER - our smallest unit of information. – FIELD - group of related characters. – RECORD - group of related fields. – FILE - group of related records. – DATABASE - group of logically associated files. – DATA WAREHOUSE - information from many databases. Other Logical Structures in a Database • DATA DICTIONARY - contains the logical structure of information in a database. – Definitions of all fields, records, and tables – Relationships between tables – Who is responsible for maintaining data in the database – Descriptions of who is authorized to access different parts of the database • Data dictionary contains meta data (data about the data) Sample Data Dictionary Report Components of a DBMS • Data definition subsystem (language; DDL) – Defines the structure of the database tables (design view in Access) – Creates and maintains the data dictionary – Defines the relationships between tables – Add, delete, or modify field properties More Components of a DBMS • Data manipulation subsystem (language; DML) – Add, delete, and modify data in the database – Contains the query languages (QBE or SQL) for the database. SQL is both a DML and DDL – Contains report generation capability • Data administration subsystem – Manage the overall database environment by providing facilities for: • Backup and recovery • Security management Data Warehouse – Definition- a database with tools that stores current and historical data that is designed to support business analysis activities and decisionmaking tasks of managers; typically a relational database model is used – Benefits • improved access • improved information • isolation from operational systems • tools permit advanced data analysis – Users – Data marts Building a Data Warehouse (ETL) • Extraction phase – create files on the computer that will store the data warehouse and move transaction data to this machine; data may come from many sources or parts of the organization • Transformation phase – cleanse and standardize the data. Why is this necessary? • Load phase – transfer the data from the transformation phase into the data warehouse • The ETL process becomes automated to make regular transfers of transaction data into the data warehouse Comparison of Data in a Data Warehouse and Operational Data • Operational Data • Data is on many systems • Current operational data • Inconsistent data definitions • Functionally organized data • Data are constantly changing • Support OLTP • Warehouse Data • Integrated in one enterprise-wide system • Recent and historical data • Consistent data definitions • Data are organized around business entities • Data are stabilized • Support OLAP Data-Mining and Data-Mining Tools • Data-mining is the process of selecting, exploring, and modeling large amounts of data to discover previously unknown relationships that support decision making. • Traditional data mining tools answer questions about variables that we think are related – Query languages (QBE or SQL) – Report generators – Multidimensional analysis tools (OLAP or pivot tables) – Standard statistical procedures (regression, ANOVA) • Knowledge discovery tools are data-mining tools for finding relationships that are not discernable to the human eye (see next slide); Data-Mining as Knowledge Discovery: Selected Examples Figure 8.22 Potential applications of data-mining Multidimensionality • Multidimensional data analysis (or OLAP) enables users to view data using various dimensions, measures and time frames (i. e., OLAP) – dimensions: products, business units, country, industry (e.g., categories) – measures: money, unit sales, head count, variances – time: daily, weekly, monthly, quarterly, yearly) • This type of analysis also provides the ability to view data in different ways (tables, charts, 3-D, geographically) • OLAP tools provide for this • Pivot tables in Excel or Access Examples of OLAP Tools • Go to www.fedscope.opm.gov – Under data cubes on entry page click on employment – Demonstrate drill down and adding charts – Data for this example comes from the Central Personnel Data File (CPDF) of the federal government – The OLAP tool used to build this site is from a company named Cognos (PowerPlay) • OLAP tools based on Excel – http://wLCubed.com – http://www.cubularity.com Multidimensionality Database Architecture: The Physical and Logical Layout of the Hardware, Data, and Applications • Centralized databases with remote access • Distributed Databases – With replication a full copy of the entire database is stored at all sites – With fragmentation the database is partitioned • Parts of database are stored where they are most often accessed Web Databases • The ease of use of Web browsers enables firms to link their databases to the Web • Ease of use enables users to – Access and retrieve information from a database – Enter information into the database • The user requires no special training in a DBMS to perform the above activities; prior to the browser and the Web this would not have been feasible • What does this mean? Federal Trade Commission’s Fair Information Practice Principles* (1973) • Notice/awareness – disclosure of practices before collecting data • Choice/consent – opt in/opt out for consumers • Access/participation – consumers can review and contest data for accuracy and completeness • Security – data collectors must take steps to secure data for accuracy and unauthorized use • Enforcement – there must be a mechanism in place to enforce FIP principles • *Laws enforce these principles for data collected by federal agencies; not so in the private sector Spreadsheets Versus DBMS • Linkage between elements – spreadsheet - between cells in same table – DBMS - between elements in different tables • Orientation – spreadsheet is toward calculations – DBMS is tilted toward organization and linkage of data elements in different tables • Capabilities – DBMS has extensive querying and reporting power – spreadsheet is limited • Memory requirements – entire spreadsheet table must be in memory – not true for the database table