Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
6.1 A grouping of characters into a word, a group of words, or a complete number (such as a person’s name or age) is called a field. A group of related fields, such as the student’s name, the course taken, the date, and the grade, comprises a record; a group of records of the same type is called a file. A group of related files makes up a database. A record describes an entity. An entity is a person, place, thing, or event on which we store and maintain information. Each characteristic or quality describing a particular entity is called an attribute. Data redundancy wastes storage resources and also leads to data inconsistency, where the same attribute may have different values. Program-data dependence refers to the coupling of data stored in files and the specific programs required to update and maintain those files such that changes in programs require changes to the data. 6.2 A more rigorous definition of a database is a collection of data organized to serve many applications efficiently by centralizing the data and controlling redundant data. A database management system (DBMS) is software that permits an organization to centralize data, manage them efficiently, and provide access to the stored data by application programs. The most popular type of DBMS today for PCs as well as for larger computers and mainframes is the relational DBMS. Relational databases represent data as two-dimensional tables (called relations). The actual information about a single supplier that resides in a table is called a row. Rows are commonly referred to as records, or in very technical terms, as tuples. The field for Supplier_Number in the SUPPLIER table uniquely identifies each record so that the record can be retrieved, updated, or sorted and it is called a key field. Each table in a relational database has one field that is designated as its primary key. This key field is the unique identifier for all the information in any row of the table and this primary key cannot be duplicated. When the field Supplier_Number appears in the PART table it is called a foreign key and is essentially a lookup field to look up data about the supplier of a specific part. An object-oriented DBMS stores the data and procedures that act on those data as objects that can be automatically retrieved and shared. Hybrid object-relational DBMS systems are now available to provide capabilities of both object-oriented and relational DBMS. DBMS have a data definition capability to specify the structure of the content of the database. It would be used to create database tables and to define the characteristics of the fields in each table. A data dictionary is an automated or manual file that stores definitions of data elements and their characteristics. Most DBMS have a specialized language called a data manipulation language that is used to add, change, delete, and retrieve the data in the database. The most prominent data manipulation language today is Structured Query Language, or SQL. The process of creating small, stable, yet flexible and adaptive data structures from complex groups of data is called normalization. Relational database systems try to enforce referential integrity rules to ensure that relationships between coupled tables remain consistent. Database designers document their data model with an entity-relationship diagram, illustrated in the relationship between the entities SUPPLIER, PART, LINE_ITEM, and ORDER. (The boxes represent entities). 6.3 A data warehouse is a database that stores current and historical data of potential interest to decision makers throughout the company. A data mart is a subset of a data warehouse in which a summarized or highly focused portion of the organization’s data is placed in a separate database for a specific population of users. To obtain the answer, you would need online analytical processing (OLAP). OLAP supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions. Each aspect of information—product, pricing, cost, region, or time period—represents a different dimension. Data mining is more discovery-driven. Data mining provides insights into corporate data that cannot be obtained with OLAP by finding hidden patterns and relationships in large databases and inferring rules from them to predict future behavior. The patterns and rules are used to guide decision making and forecast the effect of those decisions. The types of information obtainable from data mining include associations, sequences, classifications, clusters, and forecasts. Predictive analytics use data mining techniques, historical data, and assumptions about future conditions to predict outcomes of events, such as the probability a customer will respond to an offer or purchase a specific product. Text mining tools are now available to help businesses analyze these data. These tools are able to extract key elements from large unstructured data sets, discover patterns and relationships, and summarize the information. The discovery and analysis of useful patterns and information from the World Wide Web is called Web mining. In a client/server environment, the DBMS resides on a dedicated computer called a database server. The DBMS receives the SQL requests and provides the required data. The middleware transfers information from the organization’s internal database back to the Web server for delivery in the form of a Web page to the user. 6.4 An information policy specifies the organization’s rules for sharing, disseminating, acquiring, standardizing, classifying, and inventorying information. Information policy lays out specific procedures and accountabilities, identifying which users and organizational units can share information, where information can be distributed, and who is responsible for updating and maintaining the information. Data administration is responsible for the specific policies and procedures through which data can be managed as an organizational resource. These responsibilities include developing information policy, planning for data, overseeing logical database design and data dictionary development, and monitoring how information systems specialists and end-user groups use data. You may hear the term data governance used to describe many of these activities. Promoted by IBM, data governance deals with the policies and processes for managing the availability, usability, integrity, and security of the data employed in an enterprise, with special emphasis on promoting privacy, security, data quality, and compliance with government regulations. In close cooperation with users, the design group establishes the physical database, the logical relations among elements, and the access rules and security procedures. The functions it performs are called database administration. Analysis of data quality often begins with a data quality audit, which is a structured survey of the accuracy and level of completeness of the data in an information system. Data quality audits can be performed by surveying entire data files, surveying samples from data files, or surveying end users for their perceptions of data quality. Data cleansing, also known as data scrubbing, consists of activities for detecting and correcting data in a database that are incorrect, incomplete, improperly formatted, or redundant. Data cleansing not only corrects errors but also enforces consistency among different sets of data that originated in separate information systems.