Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Overall objective of class: Enhance knowledge about the management of data resources Conceptual Objectives: What you “know” when you finish the class Understand what it means to “compete on analytics”. Understand the issues that affect the success and failure of BI Know the basic components of a business intelligence/data warehousing system. Understand the objectives and design goals of a transaction database vs. a data warehouse. Understand the role of an IT professional in facilitating the use of BI within an organization. Skill Objectives: What you can “do” when you finish the class Analyze an organization’s approach to business intelligence. Use ERD modeling tool to document the design of a transaction database and data warehouse. Design a data warehouse. Design a data mart. Use SQL Server to develop a prototype BI solution for an organization. Class Objectives – Next 3 Weeks Better understand the processes in the database design life cycle. Enhance personal knowledge of database design. Enhance personal ability to design databases for transaction processing. Learn how to design a database targeted at decision support (data warehouse and data mart). Database Design Life Cycle Requirements Analysis Database Modification Logical Design Metadata Repository Database Implementation Physical Design Data Distribution Metadata repository Database to store metadata. Repository contains: Business information: context, business rules, policies, data owner. Technical information: field names, data types, sizes, security, currency. Transformation information: calculations, aggregations. Application information: Processes used to create and update data, processes used to access data, processes used to download data, data quality, data edit/validation. Best collected automatically rather than through human responsibility and input. Administration of the Repository Requires an automated repository with a database and tools for creation, update and access. Requires an administrator to: Decide what should be stored and develop the structure for storage. Create policies and procedures for metadata maintenance. Enforce maintenance of metadata. Requirements Analysis Identify what data needs to be stored. New data is needed due to: Data strategy. Assessment identified current and future data needs for transaction processing and decision making. New applications. New technical applications are identified and data needs follow the new processes. New decisions. Environmental changes trigger the need for new decisions which require new, or differently configured, information. Methods used to determine requirements “Give me what I say I want, then I can tell you what I really want.” (W. H. Inmon) Give me what I am thinking about but can’t explain, then I may be able to tell you want I actually want. Give me something that will help my business but I don’t know what that is. Give me something quickly (agile methodologies) so that I have time to change my mind. The requirements will change. Change is a fact of life in database design in general, and BI in particular. Logical Design Development of a general data model. Uses an entity-relationship diagram as the communication tool. Identifies: Entities. Relationships. Primary and Foreign Keys. Attributes. Tasks during Logical Design Identify needed data. Develop user-oriented views of data requirements Integrate the views into a global schema. Normalize the global schema. Physical Design Forecast performance requirements. Change database as necessary to accommodate performance considerations. May require de-normalization of database for performance considerations. Convert logical design to SQL CREATE TABLE statements. Identify and create necessary indexes. Data Distribution Identify geographic considerations for data structures. Partition data as necessary to accommodate geographic considerations. Database Implementation Extract existing data. Transform existing data. Load existing data. Develop policies and procedures for data security. Develop policies and procedures for data privacy. Database Modification Monitor performance. Monitor data access. Monitor data usage. Modify structure as necessary to accommodate changes from performance, access and usage. Our databases are intended to be implemented on a relational database management system. Rules of a relational database: A database that calls itself “relational” must abide by all the rules mandated here. All data is represented by twodimensional tables – both the logical and physical design of the data are depicted in 2 dimensional tables. All data is accessible by table name and column name. Only one value is stored in an intersection of any given row and column, called a “cell”. Null values are supported as distinct values. The data dictionary is available in the same way as all user data. There is one comprehensive language to define, create, update and access data (SQL). Data can be updated and accessed at the table level. Changes to application programs do not affect data structures. Changes to data structures do not affect application programs. Data validation and integrity is done by the database. Programs cannot bypass data validation and integrity rules. Changes to the distribution of the data do not affect application programs. Logical design review Goals of database design Components of a database design Structure and purpose of a model Purpose of normalization Goals of database design Goal 1: Correctly identify data necessary to store. Goal 2: Store data without redundancy Goal 3: Store data so that information can be accessed effectively. What is data redundancy? (Micro) Orderid Order Date Cust Name Item id ItemName Item cost Sell price Quantity Ordered 1234 01/20/2012 Jones A72 Martini Glasses – 4 pack 7.50 19.95 12 1234 01/20/2012 Jones A43 Tumbler, 12 oz – 8 pack 9.20 22.95 16 3900 01/13/2012 Smith A72 Martini Glasses – 4 pack 7.50 19.95 5 3900 01/13/2012 Smith B33 Ketchup Dispensers – 12 pack 8.70 25.95 22 3900 01/13/2012 Smith B97 Salt Shakers – 2 pack 2.20 9.95 3 8911 01/20/2012 Resco A43 Tumbler, 12 oz – 8 pack 9.20 22.95 35 8911 01/20/2012 Resco A72 Martini Glasses – 4 pack 7.50 19.95 235 What is data redundancy? (Macro) SupportRequest Order PK PK OrderID SupportID RequestType RequestComments CustomerName CustomerAddress CustomerCity CustomerState CustomerZip CustomerPhone CustomerFax CustomerMobile OrderDate RequiredDeliveryDate BillingType CustomerBillName CustomerBillAddress CustomerBillCity CustomerBillState CustomerBillZip CustomerPhone MarketingContact PK contactID Name Address City State Zip Phone Mobile Fax PreferredContact LastContactDate Comments Components of a database design (1) Entity: person, place or thing about which we want to store data. Synonyms: table, file. Attribute: descriptive characteristic of an entity. Synonyms: column, field. Relationship: association between entities created by a business rule. Relationship cardinality: The number of instances of one entity that can (or must) be related with each instance of another entity. Components of a database design (2) Constraints Rules or limits. Constraints depicted on a logical model Primary key. Referential integrity. Order Customer PK customerID LastName PK orderID OrderDate DiscountDescription Item PK ItemID Description ListPrice DiscountCode PK DiscountCodeID Description Order Customer PK PK customerID LastName HomeAddress FK1 FK2 OrderLine orderID OrderDate DiscountDescription customerID DiscountCodeID BillingAddress PK PK,FK1 PK,FK2 shippingaddress orderID ItemID quantity price Item PK ItemID Description ListPrice Normalization Normalization is: The process of decomposing relations with anomalies to produce wellstructured relations. A process accomplished in stages, or “normal forms”. Normalization identifies data dependencies. The purpose of normalization is to: Provide greater structure to the design process. Reduce data redundancy. Simplify the enforcement of referential integrity. Make it easier to maintain data. What is an entity? How is an entity denoted on an ERD? Strong vs. weak entity? Parent vs. child entity? Associative entity? Intersection entity? Entity instance? Examples of all… What is a relationship? How is a relationship denoted on an ERD? What is a binary relationship? What is the cardinality of a relationship? How is cardinality denoted on an ERD? What is a ternary relationship? What is an n-ary relationship? All about keys… What is a primary key? Surrogate primary key vs. natural primary key Primary key during logical design vs. physical design How is it denoted on an ERD? What is a foreign key? How is it denoted on an ERD? What is the difference between a single attribue key and a composite key? What kinds of entities use composite keys? Review about attributes What is an attribute? Single-valued vs. multi-valued attribute? Stored vs. derived attribute? Atomic vs. composite attribute?