* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Bulk-loading
Survey
Document related concepts
Relational algebra wikipedia , lookup
Concurrency control wikipedia , lookup
Oracle Database wikipedia , lookup
Ingres (database) wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Clusterpoint wikipedia , lookup
Transcript
LABORATORY MANUAL CONTENTS This manual is intended for the Second year students of Information Technology in the subject of Advance Database Management System. This manual typically contains practical/Lab Sessions related Database covering various aspects related the subject to enhanced understanding. Although, as per the syllabus, study of SQL is prescribed, we have made the efforts to cover various aspects of Advance Database Management System covering different DDL, DML, Trigger, functions, ER, EER diagram , OLAP operation etc. elaborative understandable concepts and conceptual visualization. Students are advised to thoroughly go through this manual rather than only topics mentioned in the syllabus as practical aspects are the key to understanding and conceptual visualization of theoretical aspects covered in the books. Good Luck for your Enjoyable Laboratory Sessions Prof. Pradip Mane HOD,IT Department ONLY FOR ARMIET STUDENTS………. Prof. Pradip Mane Lecturer, IT Department Do’s and Don’ts in Laboratory: 1. Make entry in the Log Book as soon as you enter the Laboratory. 2. All the students should sit according to their roll numbers starting from their left to right. 3. All the students are supposed to enter the terminal number in the log book. 4. Do not change the terminal on which you are working. 5. All the students are expected to get at least the algorithm of the program/concept to be implemented. 6. Strictly observe the instructions given by the teacher/Lab Instructor. Instruction for STUDENTS 1. Submission related to whatever lab work has been completed should be done during the next lab session. The immediate arrangements for printouts related to submission on the day of practical assignments. 2. Students should be taught for taking the printouts under the observation of lab teacher. 3. The promptness of submission should be encouraged by way of marking and evaluation patterns that will benefit the sincere students. ONLY FOR ARMIET STUDENTS………. PREREQUISITES Define Data. Define Database. How to install Oracle? What is Trigger And Assertion? How to use Microsoft Excel? What is Warehouse and star schema? ONLY FOR ARMIET STUDENTS………. SUBJECT INDEX Sr. No. NAME OF EXPERIMENT 1 Problem Definition and draw ER /EER diagram 2 Creation of the database: using constrains and triggers 3 Advanced SQL – must cover Views, nested and recursive queries. 4 Implementing an application and integrating with the database using JDBC, Dynamic and embedded SQL 5 Any one Database Hashing technique 6 Implementing and index using B or B+ trees. 7 Creating and querying an Object database. – Use ODL and OQL. 8 Demonstration of database security techniques – SQL injection, inference attacks etc. 9 Problem Definition for a Data Warehouse, Construction of Star Schema Model. 10 Creation of a DW and running OLAP operations on them ( Roll up, Drill down, Slice,Dice, pivot) ONLY FOR ARMIET STUDENTS………. EXPERIMENT NO 1 Aim:- To study Problem Definition and draw ER /EER diagram. Theory:- In software engineering, an entity–relationship model (ER model) is a data model for describing the data or information aspects of a business domain or its process requirements, in an abstract way that lends itself to ultimately being implemented in a database such as a relational database. The main components of ER models are entities (things) and the relationships that can exist among them, and databases. An entity may be defined as a thing capable of an independent existence that can be uniquely identified. An entity is an abstraction from the complexities of a domain. When we speak of an entity, we normally speak of some aspect of the real world that can be distinguished from other aspects of the real world. An entity is a thing that exists either physically or logically. An entity may be a physical object such as a house or a car(they exist physically), an event such as a house sale or a car service, or a concept such as a customer transaction or order(they exist logically--as a concept). Although the term entity is the one most commonly used, following Chen we should really distinguish between an entity and an entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most people tend to use the term entity as a synonym for this term. Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical theorem. A relationship captures how entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Examples: an owns relationship between a company and a Only for ARMIET students Page 5 computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, a proved relationship between a mathematician and a theorem. The model's linguistic aspect described above is utilized in the declarative database query language ERROL, which mimics natural language constructs. ERROL's semantics and implementation are based on reshaped relational algebra (RRA), a relational algebra that is adapted to the entity–relationship model and captures its linguistic aspect. Entities and relationships can both have attributes. Examples: an employee entity might have a Social Security Number (SSN) attribute; the proved relationship may have a date attribute. Every entity (unless it is a weak entity) must have a minimal set of uniquely identifying attributes, which is called the entity's primary key. Entity–relationship diagrams don't show single entities or single instances of relations. Rather, they show entity sets(all entities of the same entity type) and relationship sets(all relationships of the same relationship type). Example: a particular song is an entity. The collection of all songs in a database is an entity set. The eaten relationship between a child and her lunch is a single relationship. The set of all such child-lunch relationships in a database is a relationship set. In other words, a relationship set corresponds to a relation in mathematics, while a relationship corresponds to a member of the relation. Certain cardinality constraints on relationship sets may be indicated as well. Mapping natural language Chen proposed the following "rules of thumb" for mapping natural language descriptions into ER diagrams: Only for ARMIET students Page 6 English grammar structure ER structure Common noun Entity type Proper noun Entity Transitive verb Relationship type Intransitive verb Attribute type Adjective Attribute for entity Adverb Attribute for relationship Physical view show how data is actually stored. Relationships, roles and cardinalities In Chen's original paper he gives an example of a relationship and its roles. He describes a relationship "marriage" and its two roles "husband" and "wife". A person plays the role of husband in a marriage (relationship) and another person plays the role of wife in the (same) marriage. These words are nouns. That is no surprise; naming things requires a noun. However as is quite usual with new ideas, many eagerly appropriated the new terminology but then applied it to their own old ideas. Thus the lines, arrows and crows-feet of their diagrams owed more to the earlier Bachman diagrams than to Chen's relationship diamonds. And they similarly misunderstood other important concepts. Only for ARMIET students Page 7 FIG no 1:-ER diagram for Inventory Management System The enhanced entity–relationship (EER) model (or extended entity–relationship model) in computer science is a high-level or conceptual data model incorporating extensions to the original entity–relationship (ER) model, used in the design of databases. The EER model includes all of the concepts introduced by the ER model. Additionally it includes the concepts of a subclass and superclass (Is-a), along with the concepts of Only for ARMIET students Page 8 specialization and generalization. Furthermore, it introduces the concept of a union type or category, which is used to represent a collection of objects that is the union of objects of different entity types. Subclass and superclass Entity type Y is a subtype (subclass) of an entity type X if and only if every Y is necessarily an X. A subclass entity inherits all attributes and relationships of its superclass entity. This property is called the attribute and relationship inheritance. A subclass entity may have its own specific attributes and relationships (together with all the attributes and relationships it inherits from the superclass). Most common superclass examples is a vehicle with subclasses of Car and Truck. There are a number of common attributes between a car and a truck, which would be part of the Superclass, while the attributes specific to a car or a truck (such as max payload, truck type...) would make up two subclasses. It was developed to reflect more precisely the properties and constraints that are found in more complex databases, such as in engineering design and manufacturing (CAD/CAM), telecommunications, complex software systems and geographic information systems (GIS). Conclusion:-Hence we have studied ER/EER diagram successfully. Only for ARMIET students Page 9 EXPERIMENT NO 2 Aim:- To study and implement Creation of the database: using constrains and triggers. Theory:- A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record (representing a new worker) is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Oracle In addition to triggers that fire when data is modified, Oracle 9i supports triggers that fire when schema level objects (that is, tables) are modified and when user logon or logoff events occur. These trigger types are referred to as "Schema-level triggers". Schema-level triggers After Creation Before Alter After Alter Before Drop After Drop Before Logoff After Logon The four main types of triggers are: 1. Row Level Trigger: This gets executed before or after any column value of a row changes 2. Column Level Trigger: This gets executed before or after the specified column changes Only for ARMIET students Page 10 3. For Each Row Type: This trigger gets executed once for each row of the result set affected by an insert/update/delete 4. For Each Statement Type: This trigger gets executed only once for the entire result set, but fires each time the statement is executed. Microsoft SQL Server Microsoft SQL Server supports triggers either after or instead of (but not before: http://msdn.microsoft.com/en-us//library/ms189799.aspx) an insert, update or delete operation. They can be set on tables and views with the constraint that a view can be referenced only by an INSTEAD OF trigger. Microsoft SQL Server 2005 introduced support for Data Definition Language (DDL) triggers, which can fire in reaction to a very wide range of events, including: Drop table Create table Alter table Login events A full list is available on MSDN. Performing conditional actions in triggers (or testing data following modification) is done through accessing the temporary Inserted and Deleted tables. Integrity Constraints:- Constraint describes conditions that every legal instance of a relation must satisfy. Inserts/deletes/updates that violate ICs are disallowed. Can be used to : • ensure application semantics (e.g., sid is a key), or Only for ARMIET students Page 11 • prevent inconsistencies (e.g., sname has to be a string, age must be < 200) Types of IC’s: Fundamental: Domain constraints, primary key constraints, foreign key constraints General constraints : Check Constraints, Table Constraints and Assertions. Check or Table Constraints:CREATE TABLE Sailors ( sid INTEGER, sname CHAR(10), rating INTEGER, age REAL, PRIMARY KEY (sid), CHECK ( rating >= 1 AND rating <= 10 )) Explicit Domain Constraints:CREATE DOMAIN values-of-ratings INTEGER DEFAULT 1 CHECK ( VALUE >= 1 AND VALUE <= 10) Triggers (Active database):- Trigger: A procedure that starts automatically if specified changes occur to the DBMS Analog to a "daemon" that monitors a database for certain events to occur Three parts: Event (activates the trigger) Condition (tests whether the triggers should run) [Optional] Action (what happens if the trigger runs) Semantics: Only for ARMIET students Page 12 When event occurs, and condition is satisfied, the action is performed. Events could be : BEFORE|AFTER INSERT|UPDATE|DELETE ON <tableName> e.g.: BEFORE INSERT ON Professor Condition is SQL expression or even an SQL query (query with non-empty result means TRUE) Action can be many different choices : SQL statements , body of PSM, and even DDL and transaction-oriented statements like “commit”. Example:Assume our DB has a relation schema : Professor (pNum, pName, salary) We want to write a trigger that : Ensures that any new professor inserted has salary >= 60000 CREATE TRIGGER minSalary BEFORE INSERT ON Professor FOR EACH ROW BEGIN IF (:new.salary < 60000) THEN RAISE_APPLICATION_ERROR (-20004, ‘Violation of Minimum Professor Salary’); END IF; END; Conclusion:- Hence we have stuied and implemented Triggers and constraints on database successfully. Only for ARMIET students Page 13 EXPERIMENT NO 3 Aim:- To study and implement Advanced SQL – must cover Views, nested and recursive queries.. Theory:- In database theory, a view is the result set of a stored query on the data, which the database users can query just as they would in a persistent database collection object. This preestablished query command is kept in the database dictionary. Unlike ordinary base tables in a relational database, a view does not form part of the physical schema: as a result set, it is a virtual table computed or collated dynamically from data in the database when access to that view is requested. Changes applied to the data in a relevant underlying table are reflected in the data shown in subsequent invocations of the view. In some NoSQL databases, views are the only way to query data. Views can provide advantages over tables: Views can represent a subset of the data contained in a table. Consequently, a view can limit the degree of exposure of the underlying tables to the outer world: a given user may have permission to query the view, while denied access to the rest of the base table. Views can join and simplify multiple tables into a single virtual table. Views can act as aggregated tables, where the database engine aggregates data (sum, average, etc.) and presents the calculated results as part of the data. Views can hide the complexity of data. For example, a view could appear as Sales2000 or Sales2001, transparently partitioning the actual underlying table. Views take very little space to store; the database contains only the definition of a view, not a copy of all the data that it presents. Depending on the SQL engine used, views can provide extra security. Just as a function (in programming) can provide abstraction, so can a database view. In another parallel with functions, database users can manipulate nested views, thus one view can aggregate Only for ARMIET students Page 14 data from other views. Without the use of views, the normalization of databases above second normal form would become much more difficult. Views can make it easier to create lossless join decomposition. Just as rows in a base table lack any defined ordering, rows available through a view do not appear with any default sorting. A view is a relational table, and the relational model defines a table as a set of rows. Since sets are not ordered — by definition — neither are the rows of a view. Therefore, an ORDER BY clause in the view definition is meaningless; the SQL standard (SQL:2003) does not allow an ORDER BY clause in the subquery of a CREATE VIEW command, just as it is refused in a CREATE TABLE statement. However, sorted data can be obtained from a view, in the same way as any other table — as part of a query statement on that view. Nevertheless, some DBMS (such as Oracle Database) do not abide by this SQL standard restriction. A view is equivalent to its source query. When queries are run against views, the query is modified. For example, if there exists a view named accounts_view with the content as follows: accounts_view: ------------SELECT name, money_received, money_sent, (money_received - money_sent) AS balance, address, ... FROM table_customers c JOIN accounts_table a ON a.customer_id = c.customer_id then the application could run a simple query such as: Only for ARMIET students Page 15 Simple query -----------SELECT name, balance FROM accounts_view The RDBMS then takes the simple query, replaces the equivalent view, then sends the following to the query optimizer: Preprocessed query: -----------------SELECT name, balance FROM (SELECT name, money_received, money_sent, (money_received - money_sent) AS balance, address, ... FROM table_customers c JOIN accounts_table a ON a.customer_id = c.customer_id ) The optimizer then removes unnecessary fields and complexity (for example: it is not necessary to read the address, since the parent invocation does not make use of it) and then sends the query to the SQL engine for processing. A Subquery or Inner query or Nested query is a query within another SQL query and embedded within the WHERE clause. A subquery is used to return data that will be used in the main query as a condition to further restrict the data to be retrieved. Only for ARMIET students Page 16 Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along with the operators like =, <, >, >=, <=, IN, BETWEEN etc. There are a few rules that subqueries must follow: Subqueries must be enclosed within parentheses. A subquery can have only one column in the SELECT clause, unless multiple columns are in the main query for the subquery to compare its selected columns. An ORDER BY cannot be used in a subquery, although the main query can use an ORDER BY. The GROUP BY can be used to perform the same function as the ORDER BY in a subquery. Subqueries that return more than one row can only be used with multiple value operators, such as the IN operator. The SELECT list cannot include any references to values that evaluate to a BLOB, ARRAY, CLOB, or NCLOB. A subquery cannot be immediately enclosed in a set function. The BETWEEN operator cannot be used with a subquery; however, the BETWEEN operator can be used within the subquery. Subqueries with the SELECT Statement: Subqueries are most frequently used with the SELECT statement. The basic syntax is as follows: SELECT column_name [, column_name ] FROM table1 [, table2 ] WHERE column_name OPERATOR (SELECT column_name [, column_name ] FROM table1 [, table2 ] [WHERE]) Only for ARMIET students Page 17 Example: Consider the CUSTOMERS table having the following records: +----+----------+-----+-----------+----------+ | ID | NAME | AGE | ADDRESS | SALARY | +----+----------+-----+-----------+----------+ | 1 | Ramesh | 35 | Ahmedabad | 2000.00 | | 2 | Khilan | 25 | Delhi | 1500.00 | | 3 | kaushik | 23 | Kota | 2000.00 | | 4 | Chaitali | 25 | Mumbai | 6500.00 | | 5 | Hardik | 27 | Bhopal | 8500.00 | | 6 | Komal | 4500.00 | | 22 | MP | 7 | Muffy | 24 | Indore | 10000.00 | +----+----------+-----+-----------+----------+ Now, let us check following subquery with SELECT statement: SQL> SELECT * FROM CUSTOMERS WHERE ID IN (SELECT ID FROM CUSTOMERS WHERE SALARY > 4500) ; This would produce the following result: +----+----------+-----+---------+----------+ | ID | NAME | AGE | ADDRESS | SALARY | +----+----------+-----+---------+----------+ | 4 | Chaitali | 25 | Mumbai | 6500.00 | | 5 | Hardik | 27 | Bhopal | 8500.00 | Only for ARMIET students Page 18 | 7 | Muffy | 24 | Indore | 10000.00 | +----+----------+-----+---------+----------+ Subqueries with the INSERT Statement: Subqueries also can be used with INSERT statements. The INSERT statement uses the data returned from the subquery to insert into another table. The selected data in the subquery can be modified with any of the character, date or number functions. The basic syntax is as follows: INSERT INTO table_name [ (column1 [, column2 ]) ] SELECT [ *|column1 [, column2 ] FROM table1 [, table2 ] [ WHERE VALUE OPERATOR ] Example: Consider a table CUSTOMERS_BKP with similar structure as CUSTOMERS table. Now to copy complete CUSTOMERS table into CUSTOMERS_BKP, following is the syntax: SQL> INSERT INTO CUSTOMERS_BKP SELECT * FROM CUSTOMERS WHERE ID IN (SELECT ID FROM CUSTOMERS) ; Subqueries with the UPDATE Statement: The subquery can be used in conjunction with the UPDATE statement. Either single or multiple columns in a table can be updated when using a subquery with the UPDATE statement. The basic syntax is as follows: Only for ARMIET students Page 19 UPDATE table SET column_name = new_value [ WHERE OPERATOR [ VALUE ] (SELECT COLUMN_NAME FROM TABLE_NAME) [ WHERE) ] Example: Assuming, we have CUSTOMERS_BKP table available which is backup of CUSTOMERS table. Following example updates SALARY by 0.25 times in CUSTOMERS table for all the customers whose AGE is greater than or equal to 27: SQL> UPDATE CUSTOMERS SET SALARY = SALARY * 0.25 WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP WHERE AGE >= 27 ); This would impact two rows and finally CUSTOMERS table would have the following records: +----+----------+-----+-----------+----------+ | ID | NAME | AGE | ADDRESS | SALARY | +----+----------+-----+-----------+----------+ | 1 | Ramesh | 35 | Ahmedabad | 125.00 | | 2 | Khilan | 25 | Delhi | 1500.00 | | 3 | kaushik | 23 | Kota | 2000.00 | | 4 | Chaitali | 25 | Mumbai | 6500.00 | | 5 | Hardik | 27 | Bhopal | 2125.00 | | 6 | Komal | 4500.00 | | 22 | MP Only for ARMIET students Page 20 | 7 | Muffy | 24 | Indore | 10000.00 | +----+----------+-----+-----------+----------+ Subqueries with the DELETE Statement: The subquery can be used in conjunction with the DELETE statement like with any other statements mentioned above. The basic syntax is as follows: DELETE FROM TABLE_NAME [ WHERE OPERATOR [ VALUE ] (SELECT COLUMN_NAME FROM TABLE_NAME) [ WHERE) ] Example: Assuming, we have CUSTOMERS_BKP table available which is backup of CUSTOMERS table. Following example deletes records from CUSTOMERS table for all the customers whose AGE is greater than or equal to 27: SQL> DELETE FROM CUSTOMERS WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP WHERE AGE > 27 ); This would impact two rows and finally CUSTOMERS table would have the following records: +----+----------+-----+---------+----------+ | ID | NAME | AGE | ADDRESS | SALARY | +----+----------+-----+---------+----------+ Only for ARMIET students Page 21 | 2 | Khilan | 25 | Delhi | 1500.00 | | 3 | kaushik | 23 | Kota | 2000.00 | | 4 | Chaitali | 25 | Mumbai | 6500.00 | | 6 | Komal | 22 | MP | 4500.00 | | 7 | Muffy | 24 | Indore | 10000.00 | +----+----------+-----+---------+----------+ Conclusion:- Hence we have studied and implemented Views and nested queries in SQL successfully. Only for ARMIET students Page 22 EXPERIMENT NO 4 Aim:- To study and Implementing an application and integrating with the database using JDBC, Dynamic and embedded SQL. Theory:- The programming involved to establish a JDBC connection is fairly simple. Here are these simple four steps: Import JDBC Packages: Add import statements to your Java program to import required classes in your Java code. Register JDBC Driver: This step causes the JVM to load the desired driver implementation into memory so it can fulfill your JDBC requests. Database URL Formulation: This is to create a properly formatted address that points to the database to which you wish to connect. Create Connection Object: Finally, code a call to the DriverManager object's getConnection( ) method to establish actual database connection. Import JDBC Packages: The Import statements tell the Java compiler where to find the classes you reference in your code and are placed at the very beginning of your source code. To use the standard JDBC package, which allows you to select, insert, update, and delete data in SQL tables, add the following imports to your source code: import java.sql.* ; // for standard JDBC programs import java.math.* ; // for BigDecimal and BigInteger support Only for ARMIET students Page 23 Register JDBC Driver: You must register the your driver in your program before you use it. Registering the driver is the process by which the Oracle driver's class file is loaded into memory so it can be utilized as an implementation of the JDBC interfaces. You need to do this registration only once in your program. You can register a driver in one of two ways. Approach (I) - Class.forName(): The most common approach to register a driver is to use Java's Class.forName() method to dynamically load the driver's class file into memory, which automatically registers it. This method is preferable because it allows you to make the driver registration configurable and portable. The following example uses Class.forName( ) to register the Oracle driver: try { Class.forName("oracle.jdbc.driver.OracleDriver"); } catch(ClassNotFoundException ex) { System.out.println("Error: unable to load driver class!"); System.exit(1); } You can use getInstance() method to work around noncompliant JVMs, but then you'll have to code for two extra Exceptions as follows: try { Class.forName("oracle.jdbc.driver.OracleDriver").newInstance(); } Only for ARMIET students Page 24 catch(ClassNotFoundException ex) { System.out.println("Error: unable to load driver class!"); System.exit(1); catch(IllegalAccessException ex) { System.out.println("Error: access problem while loading!"); System.exit(2); catch(InstantiationException ex) { System.out.println("Error: unable to instantiate driver!"); System.exit(3); } Approach (II) - DriverManager.registerDriver(): The second approach you can use to register a driver is to use the static DriverManager.registerDriver() method. You should use the registerDriver() method if you are using a non-JDK compliant JVM, such as the one provided by Microsoft. The following example uses registerDriver() to register the Oracle driver: try { Driver myDriver = new oracle.jdbc.driver.OracleDriver(); DriverManager.registerDriver( myDriver ); } catch(ClassNotFoundException ex) { System.out.println("Error: unable to load driver class!"); System.exit(1); } Only for ARMIET students Page 25 Database URL Formulation: After you've loaded the driver, you can establish a connection using the DriverManager.getConnection() method. For easy reference, let me list the three overloaded DriverManager.getConnection() methods: getConnection(String url) getConnection(String url, Properties prop) getConnection(String url, String user, String password) Here each form requires a database URL. A database URL is an address that points to your database. Formulating a database URL is where most of the problems associated with establishing a connection occur. Following table lists down popular JDBC driver names and database URL. RDBMS JDBC driver name URL format MySQL com.mysql.jdbc.Driver jdbc:mysql://hostname/ databaseName ORACLE oracle.jdbc.driver.OracleDriver jdbc:oracle:thin:@hostname:port Number:databaseName DB2 COM.ibm.db2.jdbc.net.DB2Driver jdbc:db2:hostname:port Number/databaseName Sybase com.sybase.jdbc.SybDriver jdbc:sybase:Tds:hostname: port Number/databaseName All the highlighted part in URL format is static and you need to change only remaining part as per your database setup. Only for ARMIET students Page 26 Create Connection Object: Using a database URL with a username and password: I listed down three forms of DriverManager.getConnection() method to create a connection object. The most commonly used form of getConnection() requires you to pass a database URL, a username, and a password: Assuming you are using Oracle's thin driver, you'll specify a host:port:databaseName value for the database portion of the URL. If you have a host at TCP/IP address 192.0.0.1 with a host name of amrood, and your Oracle listener is configured to listen on port 1521, and your database name is EMP, then complete database URL would then be: jdbc:oracle:thin:@amrood:1521:EMP Now you have to call getConnection() method with appropriate username and password to get a Connection object as follows: String URL = "jdbc:oracle:thin:@amrood:1521:EMP"; String USER = "username"; String PASS = "password" Connection conn = DriverManager.getConnection(URL, USER, PASS); Using only a database URL: A second form of the DriverManager.getConnection( ) method requires only a database URL: DriverManager.getConnection(String url); However, in this case, the database URL includes the username and password and has the following general form: Only for ARMIET students Page 27 jdbc:oracle:driver:username/password@database So the above connection can be created as follows: String URL = "jdbc:oracle:thin:username/password@amrood:1521:EMP"; Connection conn = DriverManager.getConnection(URL); Using a database URL and a Properties object: A third form of the DriverManager.getConnection( ) method requires a database URL and a Properties object: DriverManager.getConnection(String url, Properties info); A Properties object holds a set of keyword-value pairs. It's used to pass driver properties to the driver during a call to the getConnection() method. To make the same connection made by the previous examples, use the following code: import java.util.*; String URL = "jdbc:oracle:thin:@amrood:1521:EMP"; Properties info = new Properties( ); info.put( "user", "username" ); info.put( "password", "password" ); Connection conn = DriverManager.getConnection(URL, info); Closing JDBC connections: At the end of your JDBC program, it is required explicitly close all the connections to the database to end each database session. However, if you forget, Java's garbage collector will close the connection when it cleans up stale objects. Only for ARMIET students Page 28 Relying on garbage collection, especially in database programming, is very poor programming practice. You should make a habit of always closing the connection with the close() method associated with connection object. To ensure that a connection is closed, you could provide a finally block in your code. A finally block always executes, regardless if an exception occurs or not. To close above opened connection you should call close() method as follows: conn.close(); Explicitly closing a connection conserves DBMS resources, which will make your database administrator happy. Conclusion:- Hence we have studied and implemented application and integrating with the database using JDBC, Dynamic and embedded SQL successfully. Only for ARMIET students Page 29 EXPERIMENT NO 5 Aim:- To study and implement Database Hashing technique. Theory:- For a huge database structure it is not sometime feasible to search index through all its level and then reach the destination data block to retrieve the desired data. Hashing is an effective technique to calculate direct location of data record on the disk without using index structure. It uses a function, called hash function and generates address when called with search key as parameters. Hash function computes the location of desired data on the disk. Hash Organization Bucket: Hash file stores data in bucket format. Bucket is considered a unit of storage. Bucket typically stores one complete disk block, which in turn can store one or more records. Hash Function: A hash function h, is a mapping function that maps all set of search-keys K to the address where actual records are placed. It is a function from search keys to bucket addresses. Static Hashing In static hashing, when a search-key value is provided the hash function always computes the same address. For example, if mod-4 hash function is used then it shall generate only 5 values. The output address shall always be same for that function. The numbers of buckets provided remain same at all times. Only for ARMIET students Page 30 [Image: Static Hashing] Operation: Insertion: When a record is required to be entered using static hash, the hash function h, computes the bucket address for search key K, where the record will be stored. Bucket address = h(K) Search: When a record needs to be retrieved the same hash function can be used to retrieve the address of bucket where the data is stored. Delete: This is simply search followed by deletion operation. Bucket Overflow: Only for ARMIET students Page 31 The condition of bucket-overflow is known as collision. This is a fatal state for any static hash function. In this case overflow chaining can be used. Overflow Chaining: When buckets are full, a new bucket is allocated for the same hash result and is linked after the previous one. This mechanism is called Closed Hashing. [Image: Overflow chaining] Linear Probing: When hash function generates an address at which data is already stored, the next free bucket is allocated to it. This mechanism is called Open Hashing. Only for ARMIET students Page 32 [Image: Linear Probing] For a hash function to work efficiently and effectively the following must match: Distribution of records should be uniform Distribution should be random instead of any ordering Dynamic Hashing Problem with static hashing is that it does not expand or shrink dynamically as the size of database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are added and removed dynamically and on-demand. Dynamic hashing is also known as extended hashing. Hash function, in dynamic hashing, is made to produce large number of values and only a few are used initially. Only for ARMIET students Page 33 [Image: Dynamic Hashing] Organization The prefix of entire hash value is taken as hash index. Only a portion of hash value is used for computing bucket addresses. Every hash index has a depth value, which tells it how many bits are used for computing hash function. These bits are capable to address 2n buckets. When all these bits are consumed, that is, all buckets are full, then the depth value is increased linearly and twice the buckets are allocated. Operation Only for ARMIET students Page 34 Querying: Look at the depth value of hash index and use those bits to compute the bucket address. Update: Perform a query as above and update data. Deletion: Perform a query to locate desired data and delete data. Insertion: compute the address of bucket o o If the bucket is already full Add more buckets Add additional bit to hash value Re-compute the hash function Else o Add data to the bucket If all buckets are full, perform the remedies of static hashing. Hashing is not favorable when the data is organized in some ordering and queries require range of data. When data is discrete and random, hash performs the best. Hashing algorithm and implementation have high complexity than indexing. All hash operations are done in constant time. Conclusion :-Hence we have studied and implemented hashing technique for database. Only for ARMIET students Page 35 EXPERIMENT NO 6 Aim:- Implementing and index using B or B+ trees. Theory:- A B+ tree is an n-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children. A B+ tree can be viewed as a B-tree in which each node contains only keys (not key-value pairs), and to which an additional level is added at the bottom with linked leaves. The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context — in particular, filesystems. This is primarily because unlike binary search trees, B+ trees have very high fanout (number of pointers to child nodes in a node,[1] typically on the order of 100 or more), which reduces the number of I/O operations required to find an element in the tree. The NTFS, ReiserFS, NSS, XFS, JFS, ReFS, and BFS filesystems all use this type of tree for metadata indexing; BFS also uses B+ trees for storing directories. Relational database management systems such as IBM DB2, Informix, Microsoft SQL Server, Oracle 8, Sybase ASE, and SQLite support this type of tree for table indices. Key-value database management systems such as CouchDB and Tokyo Cabinet support this type of tree for data access. Search The root of a B+ Tree represents the whole range of values in the tree, where every internal node is a subinterval. We are looking for a value k in the B+ Tree. Starting from the root, we are looking for the leaf which may contain the value k. At each node, we figure out which internal pointer we should Only for ARMIET students Page 36 follow. An internal B+ Tree node has at most d ≤ b children, where every one of them represents a different sub-interval. We select the corresponding node by searching on the key values of the node. Function: search (k) return tree_search (k, root); Function: tree_search (k, node) if node is a leaf then return node; switch k do case k < k_0 return tree_search(k, p_0); case k_i ≤ k < k_{i+1} return tree_search(k, p_{i+1}); case k_d ≤ k return tree_search(k, p_{d+1}); This pseudocode assumes that no duplicates are allowed. prefix key compression It is important to increase fan-out, as this allows to direct searches to the leaf level more efficiently. Index Entries are only to `direct traffic’, thus we can compress them. Insertion Perform a search to determine what bucket the new record should go into. If the bucket is not full (at most b - 1 entries after the insertion), add the record. Only for ARMIET students Page 37 Otherwise, split the bucket. o Allocate new leaf and move half the bucket's elements to the new bucket. o Insert the new leaf's smallest key and address into the parent. o If the parent is full, split it too. o Add the middle key to the parent node. Repeat until a parent is found that need not split. If the root splits, create a new root which has one key and two pointers. (That is, the value that gets pushed to the new root gets removed from the original node) B-trees grow at the root and not at the leaves. Deletion Start at root, find leaf L where entry belongs. Remove the entry. o If L is at least half-full, done! o If L has fewer entries than it should, Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). If re-distribution fails, merge L and sibling. If merge occurred, must delete entry (pointing to L or sibling) from parent of L. Merge could propagate to root, decreasing height. merging propagates to sink But … occupancy Factor of L dropped below 50% (d=2) which is not acceptable. Thus, L needs to be either i) merged with its sibling or ii) redistributed with its sibling Bulk-loading Only for ARMIET students Page 38 Given a collection of data records, we want to create a B+ tree index on some key field. One approach is to insert each record into an empty tree. However, it is quite expensive, because each entry requires us to start from the root and go down to the appropriate leaf page. An efficient alternative is to use bulk-loading. The first step is to sort the data entries according to a search key. We allocate an empty page to serve as the root, and insert a pointer to the first page of entries into it. When the root is full, we split the root, and create a new root page. Keep inserting entries to the right most index page just above the leaf level, until all entries are indexed. Note (1) when the right-most index page above the leaf level fills up, it is split; (2) this action may, in turn, cause a split of the right-most index page on step closer to the root; and (3) splits only occur on the right-most path from the root to the leaf level. Characteristics For a b-order B+ tree with h levels of index: The maximum number of records stored is The minimum number of records stored is The minimum number of keys is The maximum number of keys is The space required to store the tree is Inserting a record requires Only for ARMIET students operations Page 39 Finding a record requires Removing a (previously located) record requires Performing a range query with k elements occurring within the range requires operations operations operations Performing a pagination query with page size s and page number p requires operations Implementation The leaves (the bottom-most index blocks) of the B+ tree are often linked to one another in a linked list; this makes range queries or an (ordered) iteration through the blocks simpler and more efficient (though the aforementioned upper bound can be achieved even without this addition). This does not substantially increase space consumption or maintenance on the tree. This illustrates one of the significant advantages of a B+tree over a B-tree; in a B-tree, since not all keys are present in the leaves, such an ordered linked list cannot be constructed. A B+tree is thus particularly useful as a database system index, where the data typically resides on disk, as it allows the B+tree to actually provide an efficient structure for housing the data itself (this is described in as index structure "Alternative 1"). If a storage system has a block size of B bytes, and the keys to be stored have a size of k, arguably the most efficient B+ tree is one where b=(B/k)-1. Although theoretically the one-off is unnecessary, in practice there is often a little extra space taken up by the index blocks (for example, the linked list references in the leaf blocks). Having an index block which is slightly larger than the storage system's actual block represents a significant performance decrease; therefore erring on the side of caution is preferable. If nodes of the B+ tree are organized as arrays of elements, then it may take a considerable time to insert or delete an element as half of the array will need to be shifted on average. To overcome Only for ARMIET students Page 40 this problem, elements inside a node can be organized in a binary tree or a B+ tree instead of an array. B+ trees can also be used for data stored in RAM. In this case a reasonable choice for block size would be the size of processor's cache line. Space efficiency of B+ trees can be improved by using some compression techniques. One possibility is to use delta encoding to compress keys stored into each block. For internal blocks, space saving can be achieved by either compressing keys or pointers. For string keys, space can be saved by using the following technique: Normally the ith entry of an internal block contains the first key of block i+1. Instead of storing the full key, we could store the shortest prefix of the first key of block i+1 that is strictly greater (in lexicographic order) than last key of block i. There is also a simple way to compress pointers: if we suppose that some consecutive blocks i, i+1...i+k are stored contiguously, then it will suffice to store only a pointer to the first block and the count of consecutive blocks. All the above compression techniques have some drawbacks. First, a full block must be decompressed to extract a single element. One technique to overcome this problem is to divide each block into sub-blocks and compress them separately. In this case searching or inserting an element will only need to decompress or compress a sub-block instead of a full block. Another drawback of compression techniques is that the number of stored elements may vary considerably from a block to another depending on how well the elements are compressed inside each block. Conclusion:-Hence we have implemented and studied B+ tree algorithm successfully. Only for ARMIET students Page 41 EXPERIMENT NO 7 Aim:- Creating and querying an Object database. – Use ODL and OQL. Theory:- Object Query Language (OQL) is a query language standard for object-oriented databases modeled after SQL. OQL was developed by the Object Data Management Group (ODMG). Because of its overall complexity no vendor has ever fully implemented the complete OQL. OQL has influenced the design of some of the newer query languages like JDOQL and EJB QL, but they can't be considered as different flavors of OQL. The following rules apply to OQL statements: All complete statements must be terminated by a semi-colon. A list of entries in OQL is usually separated by commas but not terminated by a comma(,). Strings of text are enclosed by matching quotation marks. Examples Simple query The following example illustrates how one might retrieve the CPU-speed of all PCs with more than 64MB of RAM from a fictional PC database: SELECT pc.cpuspeed FROM PCs pc WHERE pc.ram > 64; Query with grouping and aggregation The following example illustrates how one might retrieve the average amount of RAM on a PC, grouped by manufacturer: SELECT manufacturer, AVG(SELECT part.pc.ram FROM partition part) FROM PCs pc Only for ARMIET students Page 42 GROUP BY manufacturer: pc.manufacturer; Note the use of the keyword partition, as opposed to aggregation in traditional SQL. Object Definition Language (ODL) is the specification language defining the interface to object types conforming to the ODMG Object Model. Often abbreviated by the acronym ODL. Class declarations Interface < name > { elements = attributes, relationships, methods } Element Declarations attributes ( < type > : < name > ); relationships ( < rangetype > : < name > ); Example Type Date Tuple (year, day, month) Type year, day, month integer Class Manager attributes{id : string unique name : string phone : string set employees : Tuple ( [Employee], Start_Date : Date )} Class Employee attributes{id : string unique name : string Start_Date : Date manager : [Manager]} Conclusion :- Hence we have studied and implemented ODL & OQL successfully. Only for ARMIET students Page 43 EXPERIMENNT NO 8 Aim:- To Demonstration of database security techniques – SQL injection Theory:- SQL injection is a code injection technique, used to attack data-driven applications, in which malicious SQL statements are inserted into an entry field for execution (e.g. to dump the database contents to the attacker). SQL injection must exploit a security vulnerability in an application's software, for example, when user input is either incorrectly filtered for string literal escape characters embedded in SQL statements or user input is not strongly typed and unexpectedly executed. SQL injection is mostly known as an attack vector for websites but can be used to attack any type of SQL database. In a 2012 study, security company Imperva observed that the average web application received 4 attack campaigns per month, and retailers received twice as many attacks as other industries. Incorrectly filtered escape characters This form of SQL injection occurs when user input is not filtered for escape characters and is then passed into a SQL statement. This results in the potential manipulation of the statements performed on the database by the end-user of the application. The following line of code illustrates this vulnerability: statement = "SELECT * FROM users WHERE name ='" + userName + "';" This SQL code is designed to pull up the records of the specified username from its table of users. However, if the "userName" variable is crafted in a specific way by a malicious user, the SQL statement may do more than the code author intended. For example, setting the "userName" variable as: ' or '1'='1 Only for ARMIET students Page 44 or using comments to even block the rest of the query (there are three types of SQL comments[13]). All three lines have a space at the end: ' or '1'='1' -' or '1'='1' ({ ' or '1'='1' /* renders one of the following SQL statements by the parent language: SELECT * FROM users WHERE name = '' OR '1'='1'; SELECT * FROM users WHERE name = '' OR '1'='1' -- '; If this code were to be used in an authentication procedure then this example could be used to force the selection of a valid username because the evaluation of '1'='1' is always true. The following value of "userName" in the statement below would cause the deletion of the "users" table as well as the selection of all data from the "userinfo" table (in essence revealing the information of every user), using an API that allows multiple statements: a';DROP TABLE users; SELECT * FROM userinfo WHERE 't' = 't This input renders the final SQL statement as follows and specified: SELECT * FROM users WHERE name = 'a';DROP TABLE users; SELECT * FROM userinfo WHERE 't' = 't'; While most SQL server implementations allow multiple statements to be executed with one call in this way, some SQL APIs such as PHP's mysql_query() function do not allow this for security reasons. This prevents attackers from injecting entirely separate queries, but doesn't stop them from modifying queries. Only for ARMIET students Page 45 Incorrect type handling This form of SQL injection occurs when a user-supplied field is not strongly typed or is not checked for type constraints. This could take place when a numeric field is to be used in a SQL statement, but the programmer makes no checks to validate that the user supplied input is numeric. For example: statement := "SELECT * FROM userinfo WHERE id =" + a_variable + ";" It is clear from this statement that the author intended a_variable to be a number correlating to the "id" field. However, if it is in fact a string then the end-user may manipulate the statement as they choose, thereby bypassing the need for escape characters. For example, setting a_variable to 1;DROP TABLE users will drop (delete) the "users" table from the database, since the SQL becomes: SELECT * FROM userinfo WHERE id=1;DROP TABLE users; Conclusion:- Hence we have studied and implemented SQL injection attack successfully. Only for ARMIET students Page 46 EXPERIMENT NO 9 Aim:- Problem Definition for a Data Warehouse, Construction of Star Schema Model. Theory:- The foundation of each data warehouse is a relational database built using a dimensional model. A dimensional model consists of dimension and fact tables and is typically described as star or snowflake schema. Star schema resembles a star; one or more fact tables are surrounded by the dimension tables. Dimension tables aren't normalized - that means even if you have repeating fields such as name or category no extra table is added to remove the redundancy. For example, in a car dealership scenario you might have a product dimension that might look like this: Product_key Product_category Product_subcategory Product_brand Product_make Product_model Product_year In a relational system such design would be clearly unacceptable because product category (car, van, truck) can be repeated for multiple vehicles and so could product brand (Toyota, Ford, Nissan), product make (Camry, Corolla, Maxima) and model (LE, XLE, SE and so forth). So a vehicle table in a relational system is likely to have foreign keys relating to vehicle category, vehicle brand, vehicle make and vehicle model. However in the dimensional star schema model you simply list out the names of each vehicle attribute. Star schema also contains the entire dimension hierarchy within a single table. Dimension hierarchy provides a way of aggregating data from the lowest to highest levels within a dimension. For example, Camry LE and Camry XLE sales roll up to Camry make, Toyota brand and cars category. Here is what a star schema diagram could look like: Only for ARMIET students Page 47 Notice that each dimension table has a primary key. The fact table has foreign keys to each dimension table. Although data warehouse does not require creating primary and foreign keys, it is highly recommended to do so for two reasons: 1. Dimensional models that have primary and foreign keys provide superior performance, especially for processing Analysis Services cubes. 2. Analysis Services requires creating either physical or logical relationships between fact and dimension tables. Physical relationships are implemented through primary and foreign keys. Therefore if the keys exist you save a step when building cubes. Snowflake schema resembles a snowflake because dimension tables are further normalized or have parent tables. For example we could extend the product dimension in the dealership warehouse to have a product_category and product_subcategory tables. Product categories could include trucks, vans, sport utility vehicles, etc. Product subcategory tables could contain subcategories such as leisure vehicles, recreational vehicles, luxury vehicles, industrial trucks and so forth. Here is what the snowflake schema would look like with extended product dimension: Only for ARMIET students Page 48 Snowflake schema generates more joins than a star schema during cube processing, which translates into longer queries. Therefore it is normally recommended to choose the star schema design over the snowflake schema for optimal performance. Snowflake schema does have an advantage of providing more flexibility, however. For example, if you were working for an auto parts store chain you might wish to report on car parts (car doors, hoods, engines) as well as subparts (door knobs, hood covers, timing belts and so forth). In such cases you could have both part and subpart dimensions, however some attributes of subparts might not apply to parts and vise versa. For example, you could examine the thread size attribute would apply to a tire but not for nuts and bolts that go on the tire. If you wish to aggregate your sales by part you will need to know which subparts should rollup to each part as in the following: Dim_subpart subpart_key subpart_name subpart_SKU subpart_size subpart_weight subpart_color part_key Dim_part part_key part_name part_SKU With such a design you could create reports that show you a breakdown of your sales by each type of engine, as well as each part that makes up the engine. Conclusion:- Hence we have studied Data Warehouse, Construction of Star Schema Model . Only for ARMIET students Page 49 EXPERIMENT NO 10 Aim:- To Creation of a DW and running OLAP operations on them ( Roll up, Drill down, Slice,Dice, pivot) Theory:- In computing, online analytical processing, or OLAP , is an approach to answering multi-dimensional analytical (MDA) queries swiftly. OLAP is part of the broader category of business intelligence, which also encompasses relational database, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications coming up, such as agriculture. The term OLAP was created as a slight modification of the traditional database term Online Transaction Processing ("OLTP"). OLAP tools enable users to analyze multidimensional data interactively from multiple perspectives. OLAP consists of three basic analytical operations: consolidation (roll-up), drilldown, and slicing and dicing. Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate sales trends. By contrast, the drill-down is a technique that allows users to navigate through the details. For instance, users can view the sales by individual products that make up a region’s sales. Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the OLAP cube and view (dicing) the slices from different viewpoints. Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad hoc queries with a rapid execution time. They borrow aspects of navigational databases, hierarchical databases and relational databases. Conceiving data as a cube with hierarchical dimensions leads to conceptually straightforward operations to facilitate analysis. Aligning the data content with a familiar visualization enhances Only for ARMIET students Page 50 analyst learning and productivity. The user-initiated process of navigating by calling for page displays interactively, through the specification of slices via rotations and drill down/up is sometimes called "slice and dice". Common operations include slice and dice, drill down, roll up, and pivot. OLAP slicing Slice is the act of picking a rectangular subset of a cube by choosing a single value for one of its dimensions, creating a new cube with one fewer dimension. The picture shows a slicing operation: The sales figures of all sales regions and all product categories of the company in the year 2004 are "sliced" out of the data cube. OLAP dicing Dice: The dice operation produces a subcube by allowing the analyst to pick specific values of multiple dimensions. The picture shows a dicing operation: The new cube shows the sales figures of a limited number of product categories, the time and region dimensions cover the same range as before. Only for ARMIET students Page 51 OLAP Drill-up and drill-down Drill Down/Up allows the user to navigate among levels of data ranging from the most summarized (up) to the most detailed (down). The picture shows a drill-down operation: The analyst moves from the summary category "Outdoor-Schutzausrüstung" to see the sales figures for the individual products. Roll-up: A roll-up involves summarizing the data along a dimension. The summarization rule might be computing totals along a hierarchy or applying a set of formulas such as "profit = sales - expenses". OLAP pivoting Pivot allows an analyst to rotate the cube in space to see its various faces. For example, cities could be arranged vertically and products horizontally while viewing data for a particular Only for ARMIET students Page 52 quarter. Pivoting could replace products with time periods to see data across time for a single product. The picture shows a pivoting operation: The whole cube is rotated, giving another perspective on the data. Summary functions in value fields The data in the values area summarize the underlying source data in the PivotTable report. For example, the following source data: Produces the following PivotTable and PivotChart reports. If you create a PivotChart report from the data in a PivotTable report, the values in that PivotChart report reflect the calculations in the associated PivotTable report. Only for ARMIET students Page 53 In the PivotTable report, the Month column field provides the items March and April. The Region row field provides the items North, South, East, and West. The value at the intersection of the April column and the North row is the total sales revenue from the records in the source data that have Month values of April and Region values of North. In a PivotChart report, the Region field might be a category field that shows North, South, East, and West as categories. The Month field could be a series field that shows the items March, April, and May as series represented in the legend. A Values field named Sum of Sales could contain data markers that represent the total revenue in each region for each month. For example, one data marker would represent, by its position on the vertical (value) axis, the total sales for April in the North region. To calculate the value fields, the following summary functions are available for all types of source data except Online Analytical Processing (OLAP) source data. Conclusion:- Hence we have studied and implemented OLAP operations successfully. Only for ARMIET students Page 54