Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Clusterpoint wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Forecasting wikipedia , lookup
Database model wikipedia , lookup
Data analysis wikipedia , lookup
Data vault modeling wikipedia , lookup
3D optical data storage wikipedia , lookup
Database Access and Techniques, most notably Data Mining. DSS Humans: Decision Making Process Tools: Computers and IT. VB, VBA, Excel, InterDev, Etc. DSS Data: Facts pertinent to the decision at hand. Algorithms: Math/Flow Chart stuff that helps the tools help the humans make decisions. Warehouses & Marts • Data Warehouse – a database designed to support decision-making in an organization. It is batch-updated and structured for fast online queries and exploration. Data warehouses may aggregate enormous amounts of data from many different operational systems. • Data Mart – a database focused on addressing the concerns of a specific problem or business unit (e.g. Marketing, Engineering). Size doesn’t define data marts, but they tend to be smaller than data warehouses. Data Warehouses & Data Marts 3rd party data Data Mart (Marketing) TPS & other operational systems Data Warehouse = operational clients = query, OLAP, mining, etc. Data Mart (Engineering) Data Warehousing • Physical separation of operational and decision support environments • Purpose: to establish a data repository making operational data accessible • Transforms operational data to relational form • Only data needed for decision support come from the TPS • Data are transformed and integrated into a consistent structure • Data warehousing (or information warehousing): a solution to the data access problem • End users perform ad hoc query, reporting analysis and visualization Data Warehousing Benefits • Increase in knowledge worker productivity • Supports all decision makers’ data requirements • Provide ready access to critical data • Insulates operation databases from ad hoc processing • Provides high-level summary information • Provides drill down capabilities Yields – – – – – Improved business knowledge Competitive advantage Enhances customer service and satisfaction Facilitates decision making Help streamline business processes DW Suitability For organizations where • Data are in different systems • Information-based approach to management in use • Large, diverse customer base • Same data have different representations in different systems • Highly technical, messy data formats Transform Data from TPS to Warehouse • Consolidate data – e.g. from multiple TPS around the country/world • “Scrub” the data – keep definitions consistent (e.g. translate part numbers/product names if they differ per country) • Calculate fields (decrease processor load) • Summarize fields (decrease processor load) • De-normalize data (ease of use) OLAP: Data Access and Mining, Querying and Analysis Online Analytical processing (OLAP) – DSS and EIS computing done by end-users in online systems – Versus online transaction processing (OLTP) OLAP Activities – – – – Generating queries Requesting ad hoc reports Conducting statistical analyses Building multimedia applications OLAP uses the data warehouse and a set of tools, usually with multidimensional capabilities • Query tools • Spreadsheets • Data mining tools • Data visualization tools Using SQL for Querying • SQL (Structured Query Language) Data language English-like, nonprocedural, very user friendly language Free format Example: SELECT FROM WHERE Name, Salary Employees Salary >2000 Sample of SQL Statements Natural Language SQL List of all purchases of L.B. University since January of 1996, in terms of products, prices, and quantities SELECT PRODUCTS PURCH PRICE QUANTITY FROM PURCHASE-HIST WHERE CUST-NAME EQ L.B. UNIVERSITY AND PURCH-DATE GE 01/01/96 List the price of cotton shirts, SELECT PRICE, medium size, with short sleeves and AMOUNT- AVAIL FROM white color PRODUCT WHERE PROD-NAME EQ COTTON SHIRT AND SIZE EQ MEDIUM AND STYLE EQ SHORT SLEEVES AND COLOR EQ WHITE Query Tools & OLAP • Query Tools – user-lead discovery. Can return individual records or summaries. Requests are formulated in advance (e.g. “show me all delinquent accounts in the northeast region during Q1”). • OLAP - Online Analytical Processing – user-lead discovery. Data is explored via “drill down” into the data by selecting variables to summarize on. Results are usually reported in a cross-tab report or graph (e.g. “show me a tabular breakdown of sales by business unit, product type, and year”). OLAP • Online Analytical Processing. (example of cross-tab results presented below) 1. business unit 2. product type 3. year D a ta S o u rc e s Bu sin ess Com m u n ica t ion Qu er yin g In t er n a l Da t a Sou r ces E xt er n a l Da t a Sou r ces Da t a Acqu isit ion , E xt r a ct ion , Deliver y Tr a n sfor m a t ion Repor t Gen er a t ion Da t a Wa r eh ou se Spr ea dsh eet F or eca st in g An a lysis Modelin g Da t a P r esen t a t ion and Visu a liza t ion Mu lt im edia E IS, Ot h er s On lin e An a lyt ica l P r ocessin g Data Mining • automated information discovery process, uncovers important patterns in existing data – can use neural networks or other approaches. Requires ‘clean’, reliable, consistent data. Historical data must reflect the current environment. • e.g. “What are the characteristics that identify when we are likely to lose a customer?” Data Mining For • • • • • • • Knowledge discovery in databases Knowledge extraction Data archeology Data exploration Data pattern processing Data dredging Information harvesting Data Mining Examples • • • • • • Market Segmentation - e.g. Dayton Hudson Direct Marketing - e.g. Chase Market basket analysis - e.g. Wal-Mart Customer Churn - e.g. Fleet Bank Fraud Detection - e.g. Bank of America Cost Reduction Prospecting - e.g. Merk Medco. Major Data Mining Characteristics and Objectives • Data are often buried deep • Client/server architecture • Sophisticated new tools--including advanced visualization tools--help to remove the information “ore” • Massaging and synchronizing data • Usefulness of “soft” data • End-user minor is empowered by “data drills” and other power query tools with little or no programming skills • Often involves finding unexpected results • Tools are easily combined with spreadsheets etc. • Parallel processing for data mining Data Mining Application Areas • • • • • • • • • • • • • Marketing Banking Retailing and sales Manufacturing and production Brokerage and securities trading Insurance Computer hardware and software Government and defense Airlines Health care Broadcasting Law Enforcement … almost everywhere!!! Stupid Data-Miner Tricks • Ad-Hoc Theories – when an oddity jumps out of the data, it’s tempting to develop a theory for it. Sometimes findings are just statistical flukes. • Using Too Many Variables – the more factors considered, the more likely a relationship will be found - valid or not. • Not Taking No for an Answer – it’s OK to stop looking if you can’t find anything. There are no silver bullets. • Limited or incorrect interpretation