* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DATABASE DESIGN
Operational transformation wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Concurrency control wikipedia , lookup
Versant Object Database wikipedia , lookup
Data center wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data analysis wikipedia , lookup
3D optical data storage wikipedia , lookup
Information privacy law wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Data vault modeling wikipedia , lookup
DATABASE DESIGN Conventional Files versus the Database File – a collection of similar records. – Files are unrelated to each other except in the code of an application program. – Data storage is built around the applications that use the files. Database – a collection of interrelated files – Records in one file (or table) are physically related to records in another file (or table). – Applications are built around the integrated database Files Versus Database Pros and Cons of Conventional Files Pros Easy to design because of their singleapplication focus Excellent performance due to optimized organization for a single application Easy to design because of their singleapplication focus Excellent performance due to optimized organization for a single application Cons Harder to adapt to sharing across applications Harder to adapt to new requirements Need to duplicate attributes in several files. Pros and Cons of Databases Pros Data independence from applications increases adaptability and flexibility Superior scalability Ability to share data across applications Less, and controlled redundancy (total nonredundancy is not achievable) Cons More complex than file technology Somewhat slower performance Investment in DBMS and database experts Need to adhere to design principles to realize benefits Increased vulnerability due to consolidating data in a centralized database Previous file design methods required that the analyst specify precisely how the records in a file should be: – Sequenced (File organization) – Accessed (File access) Database technology usually predetermines and/or limits this – Trained database administrator may be given some control over organization, storage location, and access methods for performance tuning. Data architecture – a definition of how: – Files and databases are to be developed and used to store data – The file and/or database technology to be used – The administrative structure is set up to manage the data resource Data is stored in some combination of: – Conventional files – Operational databases – data bases that support day-to-day operations and transactions for an information system. Also called transactional databases. – Data warehouses – databases that store data extracted databases. from • To support data mining – Personal databases – Work group databases operational A Modern Data Architecture Data administrator – a database specialist responsible for data planning, definition, architecture, and management. Database administrator – a specialist responsible for database technology, database design, construction, security, backup and recovery, and performance tuning. – A database administrator will administer one or more databases Why Use A Database? Data overload is a common problem in business today. Corporations and individuals have plenty of raw data, but can't always find it or aren't aware that they even have it. Raw data must be filtered and organized to become useful information. Databases are a primary tool for the task; a tool which takes advantage of the speed and power of modern computers. Some terms in DB Design Entity - the principal data object about which information is to be collected. Entities are usually recognizable concepts, either concrete or abstract, such as person, places, things, or events which have relevance to the database. Some specific examples of entities are EMPLOYEES, PROJECTS, INVOICES. An entity is analogous to a table in the relational model. Name your entities in singular form and in ALL CAPS. For example, an entity that contains data about your company's employees would be named EMPLOYEE. Attribute - is a descriptive or quantitative characteristic of an entity. They describe the entity of which they are associated . The physical counterpart of an attribute is a database column (or field). Name your attributes in singular form with either Initial Capital Letters or in all lower case. For example, some attribute names for your EMPLOYEE entity might be: EmployeeId (employee_id) and BirthDate (or birthdate). Relationship - is a logical link between two entities; an association between two or more entities. Represents a business rule and can be expressed as a verb phrase. Most relationships between entities are of the "one-to-many" type in which one instance of the parent entity relates to many instances of the child entity. Examples of relationships are: i) employees are assigned to projects ii) projects have subtasks iii) departments manage one or more projects The second type of relationship is the "many-to-many" relationship. In a "manyto-many" relationship, many instances of one entity relate to many instances of the other entity. "Many-to-many" relationships need to be resolved in order to avoid data redundancy. Database Design Database design has two parts: Data Model Focuses on what data should be stored in the database. The data model is used to design the relational tables Function Model Deals with how the data is processed. The functional model is used to design the queries that will access and perform operations on those tables designed at the data model stage. Planning and Analysis Planning defines the goals of the database , explains why the goals are important, and sets out the path by which the goals will be reached. Analysis involves determining the requirements of the database. This is typically done by examining existing documentation and interviewing users. Planning and analysis…… Data modeling is preceded by planning and analysis. The effort devoted to this stage is proportional to the scope of the database. The planning and analysis of a database intended to serve the needs of an enterprise will require more effort than one intended to serve a small workgroup. An accurate and up-to-date data model can serve as an important reference tool for DBAs, developers, and other members of a JAD (joint application development) team. Data Modeling The process of creating a data model helps the team uncover additional questions to ask of end users. Effective database design also allows the team to develop applications that perform well from the beginning. By building quality into the project, the team reduces the overall time it takes to complete the project, which in turn reduces project development costs. An effective data model completely and accurately represents the data requirements of the end users. It is simple enough to be understood by the end user yet detailed enough to be used by a database designer to build the database. The model eliminates redundant data, it is independent of any hardware and software constraints, and can be adapted to changing requirements with a minimum of effort. Data modeling is a bottom up process. A basic model, representing entities and relationships, is developed first. Then detail is added to the model by including information about attributes and business rules. The information needed to build a data model is gathered during the requirements analysis. The Requirements Analysis Goals: a) to determine the data requirements of the db in terms of primitive objects b) to classify and describe the information about these objects c) to identify and classify the relationships among the objects e) to determine the types of transactions that will be executed on the DB and the interactions between the data and the transactions f) to identify rules governing the integrity of the data g) the modeler, works with the end users of an organization to determine the data The requirements analysis is usually done at the same time as the data modeling. As information is collected, data objects are identified and classified as either entities, attributes, or relationship; assigned names; and, defined using terms familiar to the end-users. The objects are then modeled and analyzed using an ER diagram. The diagram can be reviewed to determine its completeness and accuracy, and/or modified. The review and edit cycle continues until the model is certified as correct. Points to note a) Talk to the end users about their data in "real-world" terms. Users do not think in terms of entities, attributes, and relationships but about the actual people, things, and activities they deal with daily. a) Take the time to learn the basics about the organization and its activities that you want to model. Having an understanding about the processes will make it easier to build the model. b) End-users typically think about and view data in different ways according to their function within an organization. Therefore, it is important to interview the largest number of people that time permits. Steps In Building the Data Model i. Identification relationships of data objects and ii. Drafting the initial ER (Entity relationship) diagram with entities and relationships iii. Refining the ER diagram iv. Add key attributes to the diagram i. Adding non-key attributes ii. Diagramming Generalization Hierarchies iii. Validating the normalization model through iv. Adding business and integrity rules to the Model N.B: In practice, model building is not a strict linear process. Identification of data objects and relationships In order to begin constructing the basic model, the modeler must analyze the information gathered during the requirements analysis for the purpose of: classifying data objects as either entities or attributes identifying and between entities defining relationships naming and defining identified entities, attributes, and relationships documenting this information in the data document What makes an object an entity or attribute? For example, given the statement "employees work on projects". Should employees be classified as an entity or attribute? Very often, the correct answer depends upon the requirements of the database. In some cases, employee would be an entity, in some it would be an attribute. Some commonly given guidelines are: entities contain descriptive information attributes either identify or describe entities relationships are associations between entities The Entity-Relationship Model Is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represents data objects. The ER model views the real world as a construct of entities and association between entities. Achieving a Well-Designed Database A table should have an identifier. A table should store only data for a single type of entity. A table should avoid nullable columns. A table should not have repeating values or columns. Some Common Database Design Mistakes 1. Poor design/planning 2. Ignoring normalization 3. Poor naming standards 4. Lack of documentation 5. Lack of testing 1.Poor Design/Planning "If you don't know where you are going, any road will take you there" – George Harrison 2. Ignoring Normalization Normalization defines a set of methods to break down tables to their constituent parts until each table represents one and only one "thing", and its columns serve to fully describe only the one "thing" that the table represents. Normalization Normalization is a database design approach that seeks the following four objectives: i. minimization of data redundancy, ii. minimization of data restructuring, iii. minimization of I/O by reduction of transaction sizes, and iv. enforcement of referential integrity. Normalization…. Consider the following example Customer table: A payment does not describe a Customer and should not be stored in the Customer table. Details of payments should be stored in a Payment table, in which you could also record extra information about the payment, like when the payment was made, and what the payment was for. 3.Poor naming standards Consistency. The names you choose are not just to enable you to identify the purpose of an object, but to allow all future programmers, users, and so on to quickly and easily understand how a component part of your database was intended to be used, and what data it stores. Poor naming standards …… Present to the users clear, simple, Descriptive names, such as Customer and Address. Avoid names such as: - colVarcharAddress - X304_DSCR These mean nothing to the user. The usage of dashes, spaces, digits and special characters is discouraged 4.Lack of Documentation Poorly documented code is a synonym for "job security." Your goal should be to provide enough information that when you turn the database over to a support programmer, they can figure out your minor bugs and fix them. Lack of Documentation….. In many cases, you may want to include sample values, where the need arose for the object, and anything else that you may want to know in a year or two when "future you" has to go back and make changes to the code. 5.Lack of Testing Proper test plan takes into consideration all possible types of failures, codes them into an automated test, and tries them over and over. Good testing won't find all of the bugs, but it will get you to the point where most of the issues that correspond to the original design are ironed out. DATABASE SECURITY SECURITY CONCERNS AND MEASURES Classes of Vulnerabilities Keep your confidential data secure from (internal or external) intruders Vendor bugs – are programming errors that result in users executing commands that they are not allowed to execute. Downloading and applying patches fixes this problem Database worms – (A worm is a selfreplicating computer program. It uses a network to send copies of itself to other nodes (computer terminals on the network) and it may do so without any user intervention. Unlike a virus, it does not need to attach itself to an existing program. Worms always harm the network (if only by consuming bandwidth), whereas viruses always infect or corrupt files on a targeted computer). Misconfiguration – caused by not locking down databases – setting configuration options in a way that compromises security Poor Architecture – Not factoring security into the design of how the application works, e.g. use of a weak form of encryption. Are hardest to fix since they require a major rework by the vendor. Database Security Measures Some security measures: a) Encrypt Data and Packages b) Audit Access to Sensitive Data Regardless of Access – ensures that any issues are dealt with in good time Database Security Measures… d) Server security - Use of firewalls e) User-Authentication passwords Security Use of f) Physical security – location of the server holding the Database Database Security Summary Stay aware of data security holes Explore possible third-party options Perform audits and pen tests on your databases regularly Encryption of data in motion Encryption of data at rest within the database Monitor your log files Implement Intrusion Detection p.s Provide multiple levels of security The data stored in a database is managed by a Data Base Management System (DBMS). The DBMS is responsible for adding, modifying, and deleting data from the database. The DBMS is also responsible for providing access to the data for viewing and reporting. Open source DBMS's include MySQL, Postgres, and BerkleyDB. Commercial DBMS's include Oracle, DB2, Sybase, Informix, and Microsoft SQL. Effective database design can help the development team reduce overall development time and costs. Undertaking the process of database design and creating a data model helps the team better understand the user's requirements and thus enables them to build a system that is more reflective of the user's requirements and business rules. Data Warehousing A data warehouse is where information is organized for quick retrieval. Data is got from different sources (usually databases) set up for different purposes Differences to Traditional Database Data is organized around major subjects rather than individual transactions Summarized data is used rather than detailed data Data is framed for long time decision making They are organized for quick queries not so much for efficient storage Optimized for complex queries known as OLAP (online analytical processing). Allows managers to look at a database at different dimensions Allows easy access via data mining (swift ware) that searches for patterns and is able to identify relationships Include multiple databases that have been processed so that data is uniform (clean data) They include data from outside sources and the one generated internally Building a warehouse is complex. An analyst gathers information from a variety of sources, translates it into a common form e.g. a database of gender could be “male” “female”, another one could have “M” and “F” while a third one could have “0” and “1” Once clean, the analyst has to decide how to summarize data and predict the type of queries that might be asked (details are usually lost during summarization). The warehouse is then designed both logically and physically Note: the analyst must know a lot about the business. Because of its size, expensive a warehouse is Data Mining Data mining can identify patterns that human is unable to detect The data mining algorithms search data warehouses for patterns. It is known by another name Knowledge Data Discovery (KDD). Software for Data Mining Known as decision aids include: Statistical analysis software Neural networks Fuzzy networks Intelligent argents Logic and data visualization Patterns that decision makers try to identify include: Associations: Patterns that occur together at the same time. For example, a person who buys milk usually buys bread Sequences: Actions that take place over a period of time, e.g. if a family buys a house this year, they will most likely buy a fridge and cooker next year. Clustering: A pattern that develops among a group of people. e.g. Customers who live in a particular area tend to buy a particular product Trends: Patterns that are noticed over a period of time. E.g. Customers may move from buying processed food to natural foods (herbal products) or African attires Data mining also targets customers. Assuming that past behavior is a good predictor for the future. A large amount of data is captured from a particular person and companies share this information. Credit companies have taken advantage of this where they target customers. Problems with Data Mining Cost could be too high to justify data mining Coordination of several customers departments could be problematic or Customers could resent their privacy being invaded and reject the offers that are coming their way Erroneous profiles could be made of people, stored, and not deleted. The police could act on these profiles without meeting the people Ethical Issues Analysts should take the responsibilities for considering the ethical aspects of any data mining projects that are proposed. Length of time the material is kept Privacy safe guards should be installed Confidentially of the material The uses to which inferences are put should be asked and considered with the client. The opportunities for abuse are apparent and must be guarded against. For consumers, data mining is a push technology and if consumers do not want to be pushed, data mining efforts could back fire. Data Warehousing Operational databases accounting databases Intern al Data source s Customer databases Extract and transform Manufacturin g databases Extract Filter Transform Classify Aggregate Summarize Historical databases External Data sources Data extraction and transformation External databases Data warehouses Custome r Data Product data Sales data Integrated Subject oriented Timevariant Non-volatile Data Data access and analysis OLAP Data Mining Querying Reporting Business intelligence THANK YOU