* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 10: Title
Survey
Document related concepts
Transcript
Chapter 10: Databases: Controlling the Information Every week, we have the capacity to double our current store of information. The Computer Continuum 10-1 Databases: Controlling the Information In this chapter: • Why has the last third of the 20th century been dubbed “The Information Age”? • What computer readable media are used to collect data? • How are statistical methods used to transform data into information? • What are the elements of a database? • How are Database Management Systems used to organize data? • What are some advantages and disadvantages of databases? • What are some advantages of Web-based databases? The Computer Continuum 10-2 Introduction: Information Overload The last third of the 20th century was dubbed “The Information Age”. • We are inundated with information: facts, figures, opinions, stories, pictures, records, predictions, etc. • Internet users generate a terabyte of information daily = 70 million 300 page books per month. • No longer store the bulk on paper: – Computer technology provides many types of storage medial that are economical, longer-lasting, and easier to access. The Computer Continuum 10-3 Introduction: Information Overload We will examine the role computers play performing: • Collecting and manipulating large amounts of data. • Accessing that data in a timely fashion. • Analyzing and formatting it for easy understanding. Data and information are generally used interchangeably. However, here they have a specific use: • Data: A given thing or fact. (Computers process data.) • Information: Data repackaged into a meaningful form that we can understand and use. The Computer Continuum 10-4 Introduction: Information Overload In order to include whatever data might be needed, a database must be carefully designed. • Database programmers and the users of the databases must answer the following questions: – How can we efficiently collect large amounts of relevant data? – How can we reliably store that data for later use? – Who will use the data? – How often will they need to access the data? – What format will they need the data in (text, numerical, visual)? – How will they use the data? The Computer Continuum 10-5 The Technology of Data Collection Early computers provided no means to collect data or to store it from one program execution to the next. • Each piece of data needed for a calculation had to be input during the actual execution of the program. • Earliest form of computer data was collected and stored on paper. – Hollerith card: Used for external storage of data. • Each card consisted of 80 columns of digits from 0-9 (each column represented one character.) • Data was entered using a keypunch machine, and read with a card reader. The Computer Continuum 10-6 The Technology of Data Collection Paper collected in a form that the computer could read directly: • Mark-Sensor Data Collection Sheets: A sheet of paper used to collect responses to multiple choice questions. – Using a graphite pencil, the responder fills in small spaces indicating answers. – A computer scans forms to read marks. The Computer Continuum 10-7 The Technology of Data Collection Remote electronic data sensing: • Uses a remote sensing device such as a satellite to collect information. • Satellites collect millions of times more data in one day than an entire 10 year census. • Hubble Space Telescope collects 85 million bytes per second. • Right: Satellite image of Washington D.C. The Computer Continuum 10-8 The Technology of Data Collection Bar codes: • Used for data collection. • Uses a combination of thick and thin lines to identify a specific item in an inventory. The Computer Continuum 10-9 The Technology of Data Collection Data probe tools: • A key-like instrument which, when inserted into a meter, will electronically read and record the meter location and usage numbers into a computer. Voice recognition data entry: • The collector of information speaks into a microphone of a portable terminal. The receiving computer accepts the voice transmission, and transcribes it into a form the computer can store. – A laptop can be used as a remote terminal. The Computer Continuum 10-10 The Technology of Data Collection Online Interactive Data Entry: • New data can be entered into a database from a Web browser screen. • A company provides a user with an on-screen form in a secure environment. – Secure environment: A Web site that protects the security of the data being entered or displayed, and the privacy of the user to whom the data belongs. – Most common use is in electronic commerce. • Order taken, payment made, shipment made, confirmation sent. The Computer Continuum 10-11 Retrieving Data Effective data retrieval is affected by how well the files were organized. • FBI Fingerprint Processing - A Study in Data Collection, Storage, and Retrieval. – FBI maintains a database of over 270 million sets of fingerprints. – The FBI fingerprint collection process: • Fingertips are inked then are collected onto a card. • Completed cards, including personal information, are sent to the FBI. • 800+ technicians determine each print’s Henry classification. (classifies according to ridge patterns) The Computer Continuum 10-12 Retrieving Data • The FBI fingerprint storage process: – The information and images on the cards are scanned into the computer’s memory. • The computer operator: » Adjusts for location and orientation. » Computes the center points. » The computer scans and determines classification. – Until the mid-1970’s, all fingerprint data was stored in file cabinets on cards. The Computer Continuum 10-13 Retrieving Data • The FBI fingerprint retrieval process: – After the information and images on the cards are scanned into the computer’s memory and has been classified: • The set is compared to others of the same classification. • If sufficient points of identification are found, a match is declared. • A final check is performed by a human technician to verify the match. • Before computers: 1400 fingerprint technicians processed 24,000 requests per day. • After computers: Over 30,000 sets are processed with less than half the technicians. The Computer Continuum 10-14 Retrieving Data Visualization of Information • Uses many different techniques to transform any form of data into a visual image. Population densities in 1979. The Computer Continuum 10-15 Retrieving Data Ultrasound Imaging: A medical diagnostic technique that provides visual images constructed from the sounds that reflect off various organs of the body. The Computer Continuum 10-16 The Role of Statistics: Transforming Data/Information The retrieval process involves the examination, summarization, and manipulation of data into information. • Most commonly used method of transforming data is statistical analysis. – Some important statistical concepts include: • Percents • Probability • Selecting data for statistical analysis (Sampling) • Normal distribution • Correlation The Computer Continuum 10-17 The Role of Statistics: Transforming Data/Information Percents: A special type of fraction. It is the number of parts out of the total of 100 parts that are in question. • 25% of 100 people would be 25 people. • 20% of 10 dogs would be 2 dogs (20/100 of 10). Probability: Deals with our ability to predict whether certain events will occur or not. • “50% chance of showers this evening”: This prediction is based on the probability that certain patterns of observed weather will continue. The Computer Continuum 10-18 The Role of Statistics: Transforming Data/Information Gathering data on a particular topic: • Sample: A small group used to represent the much larger group. • Sampling: The technique of predicting a total situation using a comparative few isolated representatives. ** A sample must be carefully selected to avoid predetermined statistical results. ** • Skewed sample: Selection of the group participating in a survey supports some predetermined outcome. – If a sample is randomly chosen, we can expect some semblance of a normal distribution. • Normal distribution: 68% of all data values fall within a limited range near the center of the distribution. The Computer Continuum 10-19 The Role of Statistics: Transforming Data/Information Performing Statistical Analyses on data: • Correlation: A connection or relation linking two or more pieces of information. • Example: There is a correlation between index finger length and whether a person is male or female. – Measurements of 100 male and 100 female index finger measurements: Gender Male Female Mean Index Finger Length 72 Centimeters 68 Centimeters The Computer Continuum 10-20 The Role of Statistics: Transforming Data/Information Performing Statistical Analyses on data: • False Correlation or False Relevance: Involves the creation of a cause and effect relationship between two facts that seem to be related but are not. (The facts might be true, but the relationship is not!) – Fact 1: All human beings breath oxygen. – Fact 2: All human beings must die sometime. – False relevance: Oxygen must be toxic, because 100% of those breathing oxygen today will die in the future. The Computer Continuum 10-21 Creating a Custom Database Database: An organized collection of information. • The arrangement of a typical database: – Field: A location which contains one specific piece of information. – Record: A collection of related data items. – File: A group of records, all of the same type. Last: First: ID: Phone#: DOB: The Computer Continuum 10-22 Creating a Custom Database DBMS (Database Management System): A software application that allows you to store, organize and retrieve data from one or more databases. • Combines (into a complete package): – Structural elements of a database (fields, records, files). – A query language. – Programs for data modification. – Programs for statistical analysis. – Report writing. The Computer Continuum 10-23 Creating a Custom Database Using a Relational Database System • Relational Database: A commonly used DBMS based on the relational model. – Uses two-dimensional tables, called relations, to store data. • Relations are linked to each other by common fields. – The relational model has two important features: • Its structure is simple and direct. (data is stored in tables) • Its structure is well suited to the client/server environment. » Involves two computers connected by a network. Database resides on one (server), and the software needed to access the data resides on the other (client). The Computer Continuum 10-24 Creating a Custom Database Steps to create a new database using a DBMS: 1. Decide what information you might need about the subject. 2. Define the structure of your database. (Setting up the fields) 3. Enter the information about each item. 4. Select exactly the information you wish to extract. 5. Update the database. 6. Print out all or any part of the database in a format of your choice. The Computer Continuum 10-25 Creating a Custom Database Using information to enhance targeted marketing in the business world. • Data warehouses or data marts: The collection and consolidation of data from many individual sources into centralized “warehouses”. – Can apply the data to: • Provide better customer service. • Do better marketing analysis. • Spot problems or opportunities. • Data mining: Searching collections of databases to discover relationships and global patterns that exist among them, and applying these patterns to assist in management decisions. The Computer Continuum 10-26 Creating a Custom Database Database Advantages • • • • • • Space saver. - Only need one copy. Increase accuracy. - Less chance of human error. Multiple use of data. Data integrity. - Securing data is easier. Time saver due to search abilities. Easier to use the data. - Different questions can be asked of the data. The Computer Continuum 10-27 Creating a Custom Database Ethical Hazards of Database Systems: • Misrepresentation of data. - Can you “trust” the source of the statistical analyses? • Invasion of privacy – Large databases hold personal information about each of us. – UID (Universal Identifier) The collection of all citizens’ data obtained from a single source. The Computer Continuum 10-28 Web-Database Connectivity Dynamically generated Web sites • Web-database connectivity: The interaction between one or more web pages and the contents of a specific database. – The Web pages are designed as templates. • Image areas and text boxes to be filled in with data from the database. • Separates the design and layout from of the Web page from the content to be displayed on the screen. • Query-based programming: Uses 4th generation query language to select Web content from a database and display it on a dynamically generated Web page. The Computer Continuum 10-29 Web-Database Connectivity Key Advantages of Web-Database Connectivity • Database information is universally available. - Access requires a Web browser and Internet connection. • Web site design and development are easier and faster. • Web site maintenance is easier and more efficient. - Changes on the template updates all pages. • Adjustments can be made by any authorized person rather than a Web developer. • Web site can display updated calculations and multimedia information. The Computer Continuum 10-30 Web-Database Connectivity Getting Started: Connecting a database to the Web • You need Middleware: Software that acts as an intermediary between a Web server and a database. (i.e. ColdFusion, Java) – Middleware accesses most databases by using ODBC • ODBC (Open Database Connectivity): A set of standards allowing information to be passed from a database to a dynamic Web page. • The use of a query language: The language that the database can understand. – Example: SQL (Structured Query Language) supported by most database software. – Query Wizard: Like other wizards, this tool helps frame requests for retrieving specific data from a database. • And, an appropriate database program. The Computer Continuum 10-31 Web-Database Connectivity Oracle WebDB: An integrated Solution • A complete, integrated software solution (high-end database program) for building, loading and monitoring Web database applications and content-driven Web sites. – Can help: • Create and manage database objects. • Develop HTML components. • Build and maintain content-driven Web sites. • Track Web site and database connectivity performance. • Manage database security. The Computer Continuum 10-32