* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download TYPES OF DATABASES...…
Oracle Database wikipedia , lookup
Microsoft Access wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Ingres (database) wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Functional Database Model wikipedia , lookup
Concurrency control wikipedia , lookup
ContactPoint wikipedia , lookup
Relational model wikipedia , lookup
LECTURE THREE Database System Environment 1 TYPES OF DATABASES • Hierarchical databases: The earliest DBMS were based on hierarchical method of storing data. The earlier systems were an extension of the COBOL file structure. This method begins by claiming that business data exhibits a hierarchical relationship. For example, a small office without computers might store data in filling cabinets. 2 TYPES OF DATABASES..… • The cabinets would be organized by customer. Each customer section would contain folders for individual orders, and each order would list items being purchased. To store or retrieve data, the database must start at the top with a customer in this example. When the database stores the customer data, it stores with the rest of the hierarchical data with it. 3 TYPES OF DATABASES.… 4 TYPES OF DATABASES..… • Hierarchical database is relatively fast as long as you only want to access the data from the top. The serious problem is when one is searching for items from the bottom or middle. E.g. to find all customers who ordered a specific item, the database would have to inspect each customer, every order and each item. Many of this earlier database approaches still survive, partly because it is difficult to throw away applications that still work. 5 TYPES OF DATABASES...… • Network databases: This database has nothing to do with physical networks, (e.g. Local Area Networks (LANs). The network model is named from the network of connections between the data elements. The primary goal of the network model was to solve the hierarchical problem of searching for data from different perspective. The following figure illustrates this. 6 TYPES OF DATABASES..… 7 TYPES OF DATABASES….. • First notice that the items are now physically separated and they are connected by arrows. There are also entry points which are predefined items that can be searched. The purpose of the arrows is to show that once you enter the database, the DBMS can follow the arrows to find and display the matching data. 8 TYPES OF DATABASES….. • Although this approach solves the search problem, it is very complex costly. The developer must anticipate every question a user might ask about the data because the arrows (indexes) have to be built physically before the question is asked. Building and maintaining Indexes requires huge amounts of time and storage space. 9 TYPES OF DATABASES… • Relational databases: The relational database originated in the 1970s. The key is that the tables (called “relations”) are sets of data. Each table stores attributes in columns that describe specific entities. This tables are not physical connected to each other. The connections exist through the matching data stored in each table. The illustration is on the next slide 10 TYPES OF DATABASES..… 11 TYPES OF DATABASES.… • The strength of the relation approach is that the designer/developer does not need to know the which questions might be asked of data. If the data is carefully defined, the database can answer virtually any question efficiently. This flexibility and efficiency is the primary reason for the dominance of the relational model. The focus of the course will be on building applications for relational databases. 12 TYPES OF DATABASES… • Object-Oriented databases: This is a new and evolving method of organizing data. It began as a new method of developing programs. The goal is to create objects that can be reused in many programs, thus saving on time and reducing errors. An object has three major components: Name, Set of properties (attributes) and Methods (Functions). 13 TYPES OF DATABASES..… • The properties describe the object just as attributes describe an entity in a relational database. • The methods are the true innovations of the oo approach. They are short programs that define the actions that each object can take. For example, a code to add a customer could be stored in an object “Customer”. 14 TYPES OF DATABASES… Object-Oriented DBMS Order Customer OrderID CustomerID … NewOrder DeleteOrder … CustomerID Name … Add Customer Drop Customer Change Address ContactName ContactPhone … OrderItem Item NewContact OrderID ItemID … OrderItem DropOrderItem … ItemID Description … New Item Sell Item Buy Item … Commercial Customer 15 TYPES OF DATABASES… • There are two approaches to handling true object oriented data: 1.Extend the relation model so that it can handle OO features 2.Create a new object –oriented DBMS Most commercial successful database systems follow the first approach by adding object features to the relational model 16 Examples of Commercial Systems • • • • • • • • Oracle Informix (Unix) DB2, SQL/DS (IBM) Access (Microsoft) SQL Server (Microsoft +) Many older (Focus, IMS, ...) mySQL ProgresSQL 17 Database System Environment Stored Data Manager • The database and the database catalogue are stored on disk • Access to the disk is handled by the Operating System. • A higher-level stored data manager controls access to DBMS information that is stored on disk, whether part of the database or the catalogue. 18 Database System Environment..… • The stored data manager may use basic OS services for carrying out low-level data transfer, such as handling buffers. • Once data is in buffers, the other DBMS modules, as well as other application programs can process it. 19 Database System Environment… Data Definition Language (DDL) Compiler • Processes the schema definitions and stores the descriptions (meta-data) in the catalogue. Runtime Database Processor • Handles database access at runtime. • Received retrieval or update operations and carries them out on the database. 20 Database System Environment… • Access to the disk goes through the stored data manager. Query Compiler • Handles high-level queries entered interactively. • Passes, analyzes and interprets a query, then generates calls to the runtime processor for execution. 21 Database System Environment… Precompiler • Extracts Data Manipulation Language (DML) commands from an application program written in a host language. • Commands are sent to DML compiler for compilation into code for database access. The rest is sent to the host language compiler. 22 Database System Environment… Client Program • Accesses the DBMS running on a separate computer from the computer on which the database resides. It is called the client computer, and the other is the database server. In some cases a middle level is called the application server. 23 Database System Utilities DBMSs have database utilities that help the DBA manage the system. Functions include: Loading - used to load existing text/sequential files into the database. Source format and desired target file are specified to this utility, and the utility reformats the data to load into a table. 24 Database System Utilities… • Backup – creates a backup copy of the database, usually by dumping database onto tape. Can be used to restore the database in case of failure. Incremental backup can be used which records only the changes since the last backup. • File Reorganization – reorganize database files into different file organizations to improve performance. 25 Database System Utilities..… • Performance Monitoring – monitors database usage and provides statistics to the DBA. DBA uses the statistics for decision-making. 26 Data Dictionary • Data dictionary system – stores catalog information about schemas and constraints, as well as design decisions, usage standards, application program descriptions, user information. Also called an information repository. Can be accesses directly by DBA or users when needed. 27 Application development • Application development environments – (i.e. JBuilder) provide environment for developing database applications, and include facilities to help in database design, GUI development, querying and updating and application development. • CASE Tools – used in the design phase to help speed up the development process. 28 Communication Facilities • Communication software – allow users at remote locations to access the database through computer terminals, workstations or personal computers. Connected to the database through data communications hardware such as phone lines, local area networks etc. 29 Centralized DBMS Architecture • Used mainframes to provide main processing for user application programs, user interface programs and DBMS functionality • User accessed systems via ‘dumb’ computer terminals that only provided display capabilities, with no processing capabilities. 30 Centralized DBMS Architecture… • All processing was performed remotely on the computer system, and only display information was sent to the terminals, connected via a network. • Dumb terminals were replaced with workstations, which lead to the client/server architecture. 31 Centralized DBMS Architecture Terminals Display Display Display Monitor Monitor Monitor Network Mainframe SOFTWARE (Application Programs, DBMS, Text Editors, Compilers etc) HARDWARE (CPU, Controller, Memory, Disk, IO Devices) 32 Client-Server Server Server Shared Database Front-end User Interface Clients Clients 33 Client Server Architecture • Define specialized servers with specific functionalities (file servers, print servers, web servers, database servers) • Many client machines can access resources provided by specialized server. 34 Client Server Architecture…. • Some machines are client sites, with client software installed and other machines are dedicated servers. • Client machines provide user with the appropriate interfaces to utilize servers, as well as with local processing power to run local applications. 35 Client Server Architecture…. • Client – a user machine that provides user interface capabilities and local processing. • Server – machine that provides services to client machines such as file access, printing, and database access. 36 Two Tier Client/Server Architecture for DBMSs • In relational DBMSs, user interfaces and application programs were first moved to the client side. • SQL provided a standard language, which was a logical dividing point between client and server. 37 Two Tier Client/Server Architecture for DBMSs… • Query and transaction functionality remained on server side. In this architecture, the server is called a query server, or transaction server. • In relational DBMSs, the server is called an SQL server, because most RDBMSs use SQL. 38 Two Tier Client/Server Architecture for DBMSs… • In such systems, the user interface and application programs run on the client, when DMBS access is needed, the program establishes a connection to the DBMS on the server side. Once the connection is created, the client can communicate with the DBMS. 39 Two Tier Client/Server Architecture for DBMSs… • ODBC (Open Database Connectivity) is a standard that provides an application processing interface which allows client side programs to call the DBMS as long as both sides have the required software. Most database vendors provide ODBC drivers for their systems. 40 Two Tier Client/Server Architecture for DBMSs… • Client programs can connect to several RDBMS and send query and transaction requests using the ODBC API • Query requests are sent from the client to the server, and the server processes the request and sends the result to the client. 41 Two Tier Client/Server Architecture for DBMSs… • A related Java standard is JDBC, which allows Java programs to access the DBMS through a standard interface. • These systems are called two tier architectures because the software components are distributed over two systems, the client and server. 42 Three-Tier Client Server Architecture for Web Applications • Many web applications use three-tier architecture, which adds an intermediate layer between the client and the database server. • The middle tier is called the application server, or the web server. Plays an intermediate role, by storing business rules (procedures/constraints) used to access data from database. 43 Three-Tier Client Server Architecture for Web Applications…. • Can improve database security by checking the clients credentials before forwarding request to database server. • Clients contain GUI interfaces and application specific rules. 44 Three-Tier Client Server Architecture for Web Applications…. • The intermediate server accepts the requests from the client, processes the request and sends the database commands to the db server, then passes the data from the database server to the client, where it may be processes further and filtered. 45 Three-Tier Client Server Architecture for Web Applications…. • The three tiers are: user interface, application rules, and data access. GUI Web Interface Application Programs, Web Pages DBMS 46 Three-Tier Client-Server • Server Databases • Client front-end • Middle – Locate databases – Business rules – Program code Databases. Transactions. Legacy applications. Database links. Business rules. Program code. Application. Front-end. User Interface. Database Servers Middleware Client 47 Distributed Databases • A distributed database consists of multiple independent databases that operate on two or more computers which are connected. The databases are usually in different physical locations. Each database is controlled by an independent DBMS, which is responsible in maintaining the integrity of its own databases. 48 Distributed Databases….. • In extreme situations, the databases might be installed on different hardware, use different operating systems, and event use DBMS from different vendors. This is a complex environment. Most current distributed database function better if all of the environments are running DBMS software from the same vendor. 49 Distributed Database Definition • Multiple independent databases – Each DBMS is a complete DBMS (engine, queries, locking, transactions, etc.) Database – Usually on different Zeus machines. England – Usually in different locations. • Connected by a network. • Might be different environments – Hardware – Operating System – DBMS Software Database Apollo France Database Athena United States 50 Distributed Databases….. • In the above example, a company could have offices in England, France and USA. Workers in USA would rarely need to see the daily operations of workers in France. On the other hand, workers in France and England could be working on a large international project. The network and distributed databases would enable them to share data and treat the project as if all information were in one place. 51 Distributed Databases.... • Distributed databases can have different configurations. The most popular method is the client/server approach. The server computers is more powerful and provides data for client computers which could be PCs with a GUI. The role of the client would be to provide interface to the user, collect and display data, and return data to the appropriate server. 52 Distributed Databases.... • An important rule of distributed databases is that the user should not know or care that the database is distributed. A user should be able to create and run queries as if the database was on one computer. Behind the scenes, the DBMS might connect to several computers, collect data, format the results. The user does not need to know of these steps. 53 Distributed Database Rules • The user should not know or care that the database is distributed. –Local autonomy. –No reliance on a central site. –Continuous operation. –Location independence. –Fragmentation independence (physical storage). –Replication independence. 54 Distributed Database Rules..... –Distributed query processing. –Distributed transaction management. –Hardware independence. –Operating system independence. –Network independence. –DBMS independence 55 Distributed Databases.... • The main advantage of distributed database approach is that it matches the organizations function. Business operations are distributed across different locations. Most updates and queries are performed locally. Each office retains local control and responsibility of data. Yet the system enables anyone with proper authority to retrieve data from any portion of the company when the need arises. 56 Distributed Databases.... The advantages of distributed databases are as follows: • Each database can continue to run even if a portion fails. • Data and hardware can be moved without affecting operations or users. –Expanding operations. –Performance issues. 57 Distributed Databases.... System expansion and upgrades. –Add new section without affecting others. –Upgrade hardware, network and DBMS. 58 Advantages of Distributed Databases • Business operations are often distributed local transactions – Work and data are segmented by department. – Work and data are segmented by geographical location. • Improved performance – Most updates and queries are performed locally. – Maintain local control and responsibility over data. • Can still combine data across the system. • Scalability and expansion – Add on, not replacement. future expansion 59 Data Warehousing • A data warehouse is where information is organized for quick retrieval. Data is got from different sources (usually databases) set up for different purposes 60 Differences to Traditional Database • Data is organized around major subjects rather than individual transactions • Summarized data is used rather than detailed data • Data is framed for long time decision making • They are organized for quick queries not so much for efficient storage 61 • Optimized for complex queries known as OLAP (online analytical processing). Allows managers to look at a database at different dimensions • Allows easy access via data mining (swift ware) that searches for patterns and is able to identify relationships 62 • Include multiple databases that have been processed so that data is uniform (clean data) • They include data from outside sources and the one generated internally • Building a warehouse is complex. An analyst gathers information from a variety of sources, translates it into a common form e.g. a database of gender could be “male” “female”, another one could have “M” and 63 “F” while a third one could have “0” and “1” • Once clean, the analyst has to decide how to summarize data and predict the type of queries that might be asked (details are usually lost during summarization). The warehouse is then designed both logically and physically • Note: the analyst must know a lot about the business. • Because of its size, expensive a warehouse is 64 Data Mining • Data mining can identify patterns that human is unable to detect The data mining algorithms search data warehouses for patterns. It is known by another name Knowledge Data Discovery (KDD). 65 • It is the process of discovering patterns and trends which are meaningful by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques. These patterns and trends extracted as information can then be applied to prediction or classification models by identifying relations within the data records or between databases. They can then guide decision making and forecast the effects of those decisions. E.g. predicting buying habits of customers based 66 on past patterns Software for Data Mining Known decision aids include: • Statistical analysis software • Neural networks • Fuzzy networks • Intelligent argents • Logic and data visualization 67 • Patterns that decision makers try to identify include: • Associations: Patterns that occur together at the same time. For example, a person who buys milk usually buys bread • Sequences: Actions that take place over a period of time, e.g. if a family buys a house this year, they will most likely buy a fridge and cooker next year. 68 • Clustering: A pattern that develops among a group of people. e.g. Customers who live in a particular area tend to buy a particular product • Trends: Patterns that are noticed over a period of time. E.g. Customers may move from buying processed food to natural foods (herbal products) or African attires 69 Classification • Classification: Given a set of items that have several classes, and given the past instances (training instances) with their associated class, Classification is the process of predicting the class of a new item. • Therefore to classify the new item and identify to which class it belongs • Example: A bank wants to classify its Home Loan Customers into groups according to their response to bank advertisements. The bank might use the classifications “Responds Rarely, Responds Sometimes, Responds Frequently”. • The bank will then attempt to find rules about the customers that respond Frequently and Sometimes. • The rules could be used to predict needs of potential customers. Technique for Classification Decision-Tree Classifiers Job Engineer Carpenter Income <30K Bad >50K Good Income <40K Bad >90K Good Doctor Income >100K <50K Bad Predicting credit risk of a person with the jobs specified. Good • Data mining also targets customers. Assuming that past behavior is a good predictor for the future. A large amount of data is captured from a particular person and companies share this information. Credit companies have taken advantage of this where they target customers. 72 Uses of Data Mining Sales/ Marketing Diversify target market Identify clients needs to increase response rates Risk Assessment Identify Customers that pose high credit risk Fraud Detection Identify people misusing the system. E.g. People who have two Social Security Numbers Customer Care Identify customers likely to change providers Identify customer needs Problems with Data Mining • Cost could be too high to justify data mining • Coordination of several customers departments could be problematic or • Customers could resent their privacy being invaded and reject the offers that are coming their way • Erroneous profiles could be made of people, stored, and not deleted. The police could act on these profiles without meeting 74 the people Ethical Issues • Analysts should take the responsibilities for considering the ethical aspects of any data mining projects that are proposed. • Length of time the material is kept • Privacy safe guards should be installed • Confidentially of the material • The uses to which inferences are put should be asked and considered with the client. 75 • The opportunities for abuse are apparent and must be guarded against. For consumers, data mining is a push technology and if consumers do not want to be pushed, data mining efforts could back fire. 76 Data Warehousing Operational databases accountin g databases Intern al Data source s Customer databases Extract and transform Manufacturi ng databases Extract Filter Transform Classify Aggregate Summariz e Historical databases External Data sources Data extraction and transformation External databases Data warehouse s Custom er Data Product data Sales data Integrated Subject oriented Timevariant Nonvolatile Data Data access and analysis Business intelligence OLAP Data Mining Querying Reportin g 77 78