Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Entity–attribute–value model wikipedia , lookup
Clusterpoint wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Data analysis wikipedia , lookup
Forecasting wikipedia , lookup
Database model wikipedia , lookup
3D optical data storage wikipedia , lookup
Data vault modeling wikipedia , lookup
Chapter 18 Ali Parandian Ashira Khera 1 OLAP ….stands for On-Line Analytical Processing ….a series of protocols used mainly for business reporting …. Using OLAP, businesses can analyze data in all manner of different ways, including budgeting, planning, simulation, Data warehouse reporting, and trend analysis ….Multidimensional view of the data allowing a manager to Pull down data from an OLAP database in broad or specific terms 2 is a repository of information gathered from multiple sources, stored under a unified schema, at a single site. A single data model and query language can be used to retrieve data from the data warehouse. Accessing information for decision support is separate from operational system of an organization, hence providing fast retrieval of data without any slow down. Once gathered, data is stored for a long time, hence providing access to historical data. 3 Data Sources (operational systems and flat files)・ Staging Area (where data sources go before the warehouse)・ Warehouse (metadata, summary data, and raw data)・ Users (analysis, reporting, and mining) 4 Data Warehouse Schema Store_id Item_id Itemname Color size City State country Item_id Store_id Customer_id date Number price date Fact Table Customer_id Month Quarter year Name Street State zip Descriptor Descriptor Star Schema 5 Extended Aggregation Cube Example SELECT Type, Store, SUM(Number) as Number FROM Pets GROUP BY type,store WITH CUBE 6 ROLL UP Example SELECT Time, Region, Department, sum(Profit) AS Profit FROM sales GROUP BY ROLLUP(Time, Region, Dept) 7 Cube and Rollup in a nutshell ROLLUP enables a SELECT statement to calculate multiple levels of subtotals across a specified group of dimensions. It also calculates a grand total. CUBE enables a SELECT statement to calculate subtotals for all possible combinations of a group of dimensions. It also calculates a grand total 8 The term data mining refers loosely to the process of semiautomatically analyzing large databases to find useful pattern. Data Mining attempts to discover rules and patterns from data Difference between Data Mining and AI AI uses large volumes of data stored on the disk Data Mining deals with knowledge discovery in the database 9 Data Mining Continued……….. Data mining consists of five major elements: Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals. Analyze the data by application software. Present the data in a useful format, such as a graph or table. graph or table 10 Applications of Data Mining Prediction: Example: Person applying for a credit card Credit card company makes prediction based on known attributes Such as age, income, credit history etc. to predict credit risks. Association: Example: Customer purchasing books online will have a tendency To buy a likely merchandise at the same time. Associations, clusters, classes and sequential patterns are examples of descriptive patterns. 11 Weaknesses of Data Mining Data Dredging: Data dredging is the scanning of the data for any relationships, and then when one is found coming up with an interesting explanation. For example, if we test 100 random patterns, it is expected that one of them will be "interesting" with a statistical significance at the 0.01 level. Pre Processing and Post Processing of data is extremely time consuming. There is no cross-industry standard practice by which classification functions deal with ties in the data. 12 Other Types Of Mining Data Visualization: It is a system to examine large volumes of data and to Detect patterns visually such as Maps, charts, and other graphical representations Data visualization systems do not automatically detect patterns But provide system support for users to detect patterns. 13 Decision Trees In operations research, specifically in decision analysis, a decision tree (or tree diagram) is a decision support tool that uses a graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. A decision tree is used to identify the strategy most likely to reach a goal In data mining and machine learning, a decision tree is a predictive model. An example of decision Trees is classification tree. 14 Example 15 Advantages of decision trees They are simple to understand and interpret Have value even with little hard data. Use a white box model Can be combined with other decision techniques 16 References 1. http://www.anderson.ucla.edu/faculty/jason.frand/teacher /technologies/palace/datamining.htm 17