Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Week1. Database Life Cycle. Data and data modeling Terms to remember Conceptual design, logical design, physical design Hierarchical, network, relational data model, RDBMS Accuracy, consistency, integrity, redundancy, accessibility of data DBLC (Data Base Life Cycle) Phases compared to SDLC (System Development Life Cycle) DBLC Database initial study - list company objectives, operations, structure - define problems and constraints - define the database system objectives - define the scope and boundaries of the project Database design - Conceptual design (ERD) - DBMS software selection - Logical Design (RDB) - Physical Design Implementation and loading - install the DBMS, create the database, load the data Testing and evaluation - testing, debugging, installation, fine-tuning Operation Database maintenance and Evolution - evaluation, maintenance, enhancement, change SDLC Planning - initial assessment and feasibility study - system analysis - conclusion of this stage determines if the database needs to be reassessed (the database initial study) Analysis - user requirements, existing system evaluation, logical system design. This stage may involve Data Flow Diagrams (DFD) or Hierarchical Input Process Output Diagrams (HIPO) or Use Case Diagrams Detailed design: - detailed system specification made - database design created Implementation - coding and installation Testing and evaluation - testing, debugging, installation, fine-tuning Operation Database maintenance and Evolution - evaluation, maintenance, enhancement, change Within the information system we do transform data into useful information. Information system itself can be considered as the model of business processes, including data, logic and algorithms. Database is the model of information first of all, but such things like logic and algorithms make their impact to information model as well. It means that database is not something what, being once developed, will remain forever in this state. It has being developed and maintained along with the Information System. Data are raw facts. After some processing they become useful for operation and decision making – they produce information. Collecting data for computing – data must be structured and arranged for storage, extraction and processing. Specialized tool to support data storage is known as Database Management System (DBMS) and actually storage is named a database. There are some conceptual requirements we have to remember: - Accuracy. Data must be accurate to be useful. It means that each particular raw fact must be correct Consistency. It means that any changes occurred with any raw fact must drive relevant changes of other raw facts in order to be useful Integrity. It means that relationships existed in the data are supported in its electronic version Accessibility. Data must be easy accessible to support the real life processes Shared resource. Data must be arranged as a shared resource to support multi access and remain consistent To meet the requirements we have to resolve the number of problems: Data redundancy. It means not only to get rid of something not needed for this particular system. More serious problems appear when useful data are placed into more than one storage unit. It causes different data anomalies during the process of: - insertion - deletion - modification Any data anomalies set the stage for data inconsistency, what can actually kill any use of data. Thus the goal of computerized data storage, known as database, is to take data redundancy under control. Data accessibility requirement creates a number of questions to resolve. To achieve the goal: - information space must be structured to give us the possibility to see data units and how they interact each other. Those data units maybe titled differently: as “entity” (usually in relational database system) or “object” (in object-oriented system) or “facts and dimension” in data warehousing, however in each case they do represent something important for operation support or decision making - data units, whatever they are, must be identified - data units must be described with some set of characteristics important for operation - data units relationships must be determined and described - all above mentioned must be done efficiently to support a proper performance of data access Data as a shared resource creates specific problems of data management. DBMS is targeted to give the tools to implement database management – to provide effective data accessibility supporting data integrity, accuracy and consistency in shared environment. However BEFORE being implemented, database must be designed to meet the requirements of effective data accessibility in consistent and accurate data state. The biggest challenge of database design is that there is no “cooking book” for it. We can provide the methodology, the technique but still for each particular system effective data structuring remains the issue. Database design is the art. The process of data structuring and modeling can be described roughly as the consequence of stages and steps: Conceptual data modeling: - On the base of system and data analysis to discover data units - entities - For each entity to define its description characteristics – attributes - To find out how each entity can be uniquely identified - To discover the relationships among entities needed to support business processes To find out the type of relationships – it can be a few (dependency, association, aggregation, recursive etc) The result – conceptual data model provides us with the understanding of data, independent of implementation methods or tools. It relates to business processes and rules and it is the least formalized process if to compare to the others: logical and physical. Logical design assumes the knowledge about a type of the expected database model. Historically starting from flat file systems, what really were not the databases in full sense of it, The types of model: hierarchical, network, relational – basic models incorporated into commercial DBMS. This classification is based on how the model implements association relationships of entities from the relationship type point of view: 1:1, 1:M, M:N. The models differ by the data navigation mechanisms first of all. Each model has its own advantages and disadvantages in implementing the relationships. Hierarchical model. Data items have been organized as the trees (in other words, parent-child relationships). Good to support 1:M relationships fixed over time. For other cases, to implement for example M:N type of relationships, this model cause essential data redundancy. Application programming and use are complicated in any case. Data navigation is pre-determined by hierarchical structure. It is very expensive to change data structure. The example: DBMS ADABAS. Network model. Allows relationships between data items be organized as the net. Navigation is possible in any direction. More flexible, however still difficult for programming and use. Flexibility achieved at the expense of the volume of meta information to support the networked pointers. Changes of data structure are still difficult to implement, in fact this means database restructuring. The example: CODASIL based DBMS. Relational model. Data items have been organized as the rectangle tables. Data navigation has been implemented as the SQL based queries. The most effective among commercial DBMS. Support any types of relationships, flexible to changes, easier to use. Disadvantages: complicated RDBMS with high requirements to computer characteristics – not the problem now. The problem is that this model is more vulnerable to design hidden mistakes. When hierarchical and network data modeling mistakes often stop the possibility to achieve correct data processing result, relational model design mistakes in most cases will not prevent data processing from running, keeping the problem behind the scene. However the later discovered mistakes could be quite painful, especially if it concerns data consistency or performance. That’s why it is very important to learn relational database design technique. The example of relational DBMS: Oracle, MS SQL Server, Informix, DB/2 Other models Flat file system. Relationships and data structure are supported by application software. There were no DBMS. Practically impossible to implement the basic paradigm of database – SHARED data. Object – Oriented model. Implement different approach of computer system design. Conventional approach takes two things distinctly – data and data processing procedures, and separate them in the implementation. Object-oriented approach assumes both things be joint into a single unit – object. The example of pure object-oriented DBMS – Manifesto Data Warehousing brings special data and data processing model – OLAP (OnLine Analytical Processing), targeted to Decision Supporting System implementation. Incorporates some special features of collecting big volumes of data for analytical processing. The example of OLAP system - Cognos LDAP (Light Data Access Protocol) – specialized data model used by directory services.