* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Relational algebra wikipedia , lookup
Clusterpoint wikipedia , lookup
ContactPoint wikipedia , lookup
A database consists of an organized collection of interrelated data for one or more uses, typically in digital form. Digital databases are managed using database management systems, which store database contents, allowing data creation and maintenance, search and other access. Purpose of Database: =>A database system should be a repository of data needed for an organization’s data processing. =>This data should be accurate ,private and protected from damage. it should be organized so that diverse applications with different data requirements can employ the data. Different application programmers and different end users will have different views of the data which must be derived from a common overall data structure . The ways in which ends users want to utilize existing data will constantly change, and in some cases demands for new uses of the data will arise rapidly and urgently. The extent to which these demands can be satisfied determines the overall value of the database system. Database concept Disadvantage: 1.inconsistency 2.unsharable data 3.unstandardized data 4.unsecure data 5.incorrect data. Database management system It provides a centralized control of the data IT is a software that responsible for storing, maintaining and utilizing database. A database along with a DBMS is referred to as a database system. Advantages 1.Databases reduces the data redundancy to a large extent: =>Duplication of data is known as Data Redundancy. The database systems do not maintain separate copies of the same data. Rather, all the data are kept at one place and all the application that required data refer to centrally maintained database If any change is to be made to data, it will be made at just one place and the same changed information will be available to all the application referring to it. Thus redundancy gets controlled and so are problems associated with it. Database Management System Database End User Centrally Controlled Database System Application Programs 2. Databases can control data inconsistency to a large extent. Inconsistent database provide incorrect or conflict information. When the redundancy is not controlled, there may be occasions on which the two entries about the same data do not agree (when one of them stores the updated information and the other does not). Data base said to be inconsistent. By controlling redundancy, the inconsistency is also be controlled. Even if there is some redundancy retained in the database due to some technical reasons, the database management system ensures that any change made to either of the two entries is automatically made to the other . This process is known as propagating up dates 3.Databases facilitate sharing of data: Individual pieces of data in database may be shared by different users and users may have access to the same piece of data for different purposes. The database management system makes sure that data not only the existing application can may be satisfied without having to create any new stored files 4.Databases enforce standards The database management system can ensures that all the data (that is stored centrally ) follow the applicable standards . There may be certain standards laid by the company or organization using database. Standards storing data formats is particularly desirable as an aid to data interchange or migration between system. 5.Databases can ensure data security: The information stored inside a database is sometimes of great value to a corporations. Data Security: It refers to protection of data against accidental or intentional disclosure to unauthorized persons, or unauthorized modification or destruction. Privacy of Data :refers to the rights of individuals and organizations to determine for themselves when “how” and to what extent information about them is to be transmitted to others. 6.Integrity can be maintained through database: By integrated database we mean unification of several otherwise distinct data files, with any redundancy among those files partially or whole elimiminated. 1. A collection of interrelated files and a set of programs that allow users to access and modify these files is known as a data management system. 2.Database system is to provide the users only that much information that is required by them. 3. A database abstraction layer is an application programming interface which unifies the communication between a computer application and databases such as MySQL, PostgreSQL, Oracle or SQL. Traditionally, all database vendors provide their own interface tailored to their products which leaves it to the application programmer to implement code for all database interfaces he would like to support. Database abstraction layers reduce the amount of work by providing a consistent API to the developer and hide the database specifics behind this interface as much as possible. There exist many abstraction layers with different interfaces in numerous programming languages. Database management system (DBMS) A collection of interrelated files and a set of programs that allow users to access and modify these files is known as a DBMS. End user is a person who is not a computer-trained An application system analyst is the one who person . is concerned about the all of the database of logical level. what are the relationships between the dataentities . Physical Storage system analyst is concerned with the physical implementation details of the database. i.e. How would the database the stored on which storage device. what will be the starting address of the database. =>Physical storage system analyst It concerned with the physical implementation details of the database .i.e. how would the database be stored on which storage device? Levels of Database Implementation INTERNAL LEVEL (Physical Level): The lowest level of abstraction, the internal level, is the one closest to physical storage. This level is also sometimes termed as physical level. It describes how the data are stored on the storage medium CONCEPTUAL LEVEL: This level of describes what data are actually stored in the database. It also describes the relationships exists among data. The level are not concerned with how these logical in terms of simple data structure EXTERNAL LEVEL : This is the level closest to the users and is concerned with the way in which the data are viewed by individual user Sales officer View 1 Item-Name Price Purchase Officer View 1 Item-Name Price Conceptual Level Item-number Character(6) Item-Name Characer(20) Internal Level External Level Application Programs are used to fetch the desired information Stored-Item Length=40 Item # Type=Byte(6),offset=0 Index=Ix Name Type=Byte(20),offset=6 Data Independence: The ability to modify a scheme definition in one level without affecting a scheme in the next higher level is called Data independence. There are two Levels of data independence: 1.Physical Data Independence: The ability to change the physical schema without changing the logical schema is called physical data independence. For example, a change to the internal schema, such as using different file organization or storage structures, storage devices, or indexing strategy, should be possible without having to change the conceptual. 2.LogicalData Independence: The ability to change the logical (conceptual) schema without changing the External schema (User View) is called logical data independence. For example, the addition or removal of new entities, attributes, or relationships to the conceptual schema should be possible without having to change existing external schemas. =>It more difficult to achieve logical data independence than physical data independence. There are several models for databases: • Tabular ("flat file") – data in a single table, eg. spreadsheet • Hierarchical – e.g.. company departments • Relational – Each row is a record (patient, child etc.) – Each column is an attribute (age, sex etc.) =>Relational Data Model A relational database is a set of related tables, each concerning a specific topic. Its central idea was to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values. The data is organized into tables (i.e. row and columns). Tables are called Relations. A Row in a table represents a relationship among a set of values. since a table is a collection of such relationships For example Suppliers (Supp#, supp-name, Status, City) Underlined fields indicate primary key Advantages: – Reduction of data redundancy – Eliminates inconsistencies (errors). – High data integrity and quality. – Data can be descriptive. – Allows implementation of security. Primary-key is the key-field that can uniquely identify a row in a relation. Suppliers Supp# Supp-Name Status City S1 Britannia 10 Delhi S2 New Bakery 30 Mumbai S3 10 Delhi S4 Mother Dairy Cook 50 Bangalore S5 Haldiram 40 Jaipur Row of relations are known as tuples and columns are known as Attribute The relational data model is based on a collection of tables (relations) Conceptual design Logical design Physical design Logical Data Model Logical Process Model Track 01 Country TR Physical Implementation Process Database creation CREATE DATABASE CREATE TABLE LOAD Current systems evaluation Development of Standards Technological feasability Operational feasability Economical feasability Network Data Model: The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice. A record is a collection of field (attribute), each of which contains only one data values. Link mean that it is an association between precisely two records S1 Britannia 10 I2 20 Cake I3 30 Bread I6 Cream Biscuit ADVANTAGES: • Network Model is able to model complex relationships and represents semantics of add/delete on the relationships. • Can handle most situations for modeling using record types and relationship types. • Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET etc. Programmers can do optimal navigation through the database. DISADVANTAGES: •Navigational and procedural nature of processing •Database contains a complex array of pointers that thread through a set of records. • Little scope for automated "query optimization” A hierarchical data model is a data model in which the data is organized into a tree-like structure. The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent. All attributes of a specific record are listed under an entity type. Example: Serial Num Type User EmpNo 3009734-4 Computer 100 10-L 3-23-283742 Monitor 100 Schultz 20-B 2-22-723423 Monitor 100 Drachmei er 20-B 232342 Printer 100 Emp No First Name Last Name Dept. Num 100 Sally Baker 10-L 101 Jack Douglas 102 Sarah 103 David Hierarchical Form (Supplier Superior to items) S1 Britannia I2 I3 I6 10 Cake Delhi 5.00 Bread 9.00 10 Cram Biscuit 10.00 20 20 ADVANTAGES: • Hierarchical Model is simple to construct and operate on • Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in manufacturing, personnel organization in companies • Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT etc. DISADVANTAGES: •Navigational and procedural nature of processing •Database is visualized as a linear arrangement of records •Little scope for "query optimization" A relational database matches data by using common characteristics found within the data set. The resulting groups of data are organized and are much easier for many people to understand. For example, a data set containing all the real-estate transactions in a town can be grouped by the year the transaction occurred; or it can be grouped by the sale price of the transaction; or it can be grouped by the buyer's last name; and so on. Relational Model developed by E.F. Codd (IBM) Relation: A relation is a table i.e. data arranged in rows and columns. Domain : A domain is a pool of values from which the actual values appearing in a given column are drawn. A domain describes the set of possible values for a given attribute, and can be considered a constraint on the value of the attribute. A domain is said to be if atomic domain are considered to be indivisible domain. Tuple: The row of tables (relations). Attributes: The columns of tables (relations). The number of attributes in a relation is called Degree and the number of row in a relation is known as Cardinality. View: A view is a table that does not really exist in its own right but is instead derived from one or more underlying base table (s). The concept of views ‘name of view Select * from Items ‘ Name of base table where Price>12 ‘ the condition Create VIEW GoodItems AS =>GoodItems (A virtual table based on Items table) Item# Item-Name Price I1 Milk 15.00 I4 Milk Bread 14.00 I7 Ice Cream 16.00 I9 Namkeen 15.00 Primary Key: It is a set of one or more attributes that can uniquely identify tuples within the relation. Supp# is the primary key for suppliers. Primary key consists of more than one attribute ,it is called composite-primary key. Candidate keys: All attribute combinations inside a relation that can serve as primary key are candidate keys as they are candidates for the primary key position. For Example: Supp# and supp-Name are two candidate key in the suppliers relation. and also Item# and Item-Name are candidate key. In case of two or more candidate keys, only one of them serves as the primary key. the rest of them are alternates key. Alternate Key: A candidate key that is not the primary key is called an alternate key. In suppliers table, supp-name is the alternate key and in Item table ItemName is the alternate key Key : A non-key attribute, whose values are derived from Foreign-key: A non –key attribute, whose values are derived from the primary key of some other table, is known as Foreign-key in its current table. Primary key of one table is a foreign key of other table. For Example: Sid Sname Email 01 A [email protected] 02 B [email protected] Sid Total Fee Fee Paid FeeBal 01 6000 4000 2000 02 6000 2000 4000 Referential Integrity is a system of rules that a DBMS uses to ensure that relationships between records in related tables are valid, and that users don’t accidentally delete or change related data. Relational Algebra: The relational algebra is a collection of operations on relations. Each operation takes one or more relations as its operand(s) and produces another relation as its result. Select operation: The Select operation selects tuples(horizontal subset) from a relation that satisfy a given condition. It is denoted by lower letter Greek letter Sigma ( σ ). For Example :To select those tuples from Items relation where the price is more than 14.00. We shall write; σ Price >14.00(Items) Item# I1 I7 Item-Name Milk Ice Cream Price 15.00 16.00 I9 Namkeen 15.00 i. ii. iii. Project Operation: The project operation yields a “vertical” subset of a given relation . In Result, Duplicating tuples are automatically removed It is denoted by Greek letter Pi(л) . For Example: л Supp-Name, city(Suppliers) Britannia City Delhi New Bakery Mumbai Mother Dairy Delhi Cook Haldiram Bangalore Supp-Name Jaipur Supp# S1 S2 S3 S4 S5 SuppName Britanni a New Baker y Mother Dairy Cook Status City Haldira m 40 10 Delhi 30 Mumba i 10 Delhi 50 Bangal ore Jaipur Cartesian Product operation: The Cartesian product is a binary operation. It is denoted by a cross(X). The Cartesian product of two relations A and B is written as A X B. It produce a new relation which has a degree equal to the sum of the degree of the two relations operated upon. All tuples of first relation are concatenated with all the tuples of second relation to form the tuples of the new relation. Student Stud# S001 S002 S003 StudName Suraj Preeti Asra Hosteler Y N N Instructor Inst# Inst-Name Subject I01 KVS, jabalpur IP I02 KVS, kurnool Math Stud# StudName Hosteler Inst# InstName Subject S001 Suraj Y I01 KVS, IP jabalpur S001 Suraj Y I02 KVS kurnool Math S002 Preeti N I01 IP S002 Preeti N I02 S003 Asra N I01 S003 Asra N I02 KVS, jabalpur KVS kurnool KVS, jabalpur KVS kurnool Math IP Math Union operation: The union operation is a binary operation that requires two relations as its operands. It produces a third relation that contains tuples from both the operand relations It is denoted by U. Union of two relations x and y is x U y Condition for union operation 1.The relations A and B must be of the same degree. that is, they must have the same number of attribute. 2. The domains of the ith attributes of A and ith attribute of B must be the same. Examples Drama Song Roll No Name Age 13 Rush 15 17 Swati 14 Roll No 2 10 13 Name Manya Rishabh Rush Result of Drama U Song will be Roll No 2 10 13 17 Name Manya Rishabh Rush Swati Age 15 15 13 14 A B Age 15 15 13 The Set Difference Operation: The set difference operation gives tuples that are in one relation but not in another It is denoted by –(minus). The expression A-B results in a relation containing those tuples in A but in B. Result of Drama – Song will be Roll No Name Age 17 Swati 14 A B The set Intersection operation The set intersection operation finds tuples that are common to the two operand relations It is denoted by ∏. Result of Drama ∏ Song will be Roll No 13 Name kush Age 15 END The Join operation The join operation joins two relations to form a new relation on the basis of common column the two operand relations have. Suppliers Supp# Supp-Name S1 S2 Britannia 10 New Bakery 30 Delhi Mumbai S3 Mother Dairy Cook Haldiram 10 Delhi 50 Bangalore 40 Jaipur S4 S5 Status City Clients relation Clients# C101 C102 C104 Client-Name ABC Co. XYZ Co. Zigs Co. City Delhi Jaipur Jaipur If tables suppliers and Clients are joined over City columns ; the results will be Supp # SuppName Stat City u s Clients# ClientName City S1 Britannia 10 Delhi C101 ABC Co. Delhi S3 Mother Dairy 10 Delhi C104 Zigs Co. Jaipur S5 Haldiram 40 Jaipur C102 XYZ Co. Jaipur S5 Haldiram 40 C104 C104 Zigs Co. Jaipur Type of Joins 1.Inner or equi –join: Inner join is used to give matching records from joined tables. Normalization: It is a process by which one goes from to another preferable form through a set of rule and observations. The objective of normalization on tables is to remove the redundancy of data. First normal form: A relation R is in First Normal form(1NF) if and only if all underlying domains of the relation atomic (indivisible) values. In every tuples of the relation R, no attribute should have repeating groups. In every tuple of the relation R, each attribute must have a value and that too an atomic(indivisible) values Method of Bringing Unnormalized Relation into 1NF 1.Remove all repeating groups from the relation 2. Decompose non-atomic attributes to atomic attributes. Student Stud Name # 101 Shobhit 102 Subject English Maths Biology Upasana Economic s English Hindi Stud Name # 101 Shobhit 101 Shobhit Subject 101 Shobhit Biology English Maths 102 Upasana Econom ics 102 Upasana English 102 Upasana Hindi Second Normal Form(2NF) Functional Dependence: Functional Dependence is a relationship that exists between any two fields. JK(K is functionally dependent on j) JL(L is not functionally dependent on j) J K L X 1 0 X 1 6 Y 4 1 Y 4 9 Z 3 5 Second Normal Form(2NF): A relation R is in Second Normal form(2NF) if and only if it is 1NF and every non-key attribute is fully dependent on the primary key. An attribute is a non-key if it does not participate in the primary key. Partial dependence means a non-key attribute is dependent upon a portion of the compositeprimary- key and not the whole primary key. Fully functional dependency: A FD X Y is a fully FD if remove of any attribute A from X means that the dependency does not hold good any more. Teacher T-Name Subject Child Name Gurinder Comp.Sc Saran Child Age 5 Anand Child Sex F Electroni Vivek 7 cs Shridhar Physics Sandhya 4 M Maya Comp.Sc Vinayaka 7 M Maya Comp.Sc Sangeeta 5 F Maya Comp.Sc Ananth M 3 F Teacher T-Name Subject Gurinder Comp.Sc Anand Electronics Shridhar Physics Maya Comp.Sc Child TName Gurind er Anand Child Name Saran Child Child Age Sex 5 F Vivek 7 M Shridh Sandh 4 ar ya Maya Vinaya 7 ka Maya Sange 5 eta Maya Anant 3 h F M F M The third Normal Form: A Relation R is said to be in Third Normal Form(3NF) if and only if it is in 2NF and every non-key attribute is non-transitively dependent upon the primary key Every non-key attribute depends upon the primary key. Teacher T-Name Gurinder Anand Shridhar Maya Subject Comp.Sc Electronics Physics Comp.Sc Classes 5 4 5 2 Teacher T-Name Gurinder Anand Shridhar Maya Sbcl Subject Comp.Sc Electronics Physics Comp.Sc Subject Comp.Sc Electronics Physics Comp.Sc Classes 5 4 5 2 Boyce- Codd Normal Form(BCNF): A relation is in BCNF if it is in 3NF and all of its determinants (i.e. the attributes upon which other attributes depend) are candidate key (i.e. the attributes that posses the properties to be a primary key but could not have been chosen as keys). Fields – different types of data (number or text) Records Queries Reports Searching for novel patterns, rules or relationships in data, e.g.: •correlations •classification •clustering •visualization Versus traditional statistics: hypothesis testing States Data Mart - Cognos