Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction Instructor: Cengiz Örencik E-mail: [email protected] Course materials: myweb.sabanciuniv.edu/cengizo/courses Reference Books ◦ Veri Madenciliği: Kavram ve Algoritmaları, Doç. Dr. Gökhan Silahtaroğlu, 2013 ◦ Data Mining: Concepts and Techniques, Jiawei Han and Micheline Kamber, 2010 1 midterm 2 inclass quiz 1 final HW ? %30 %20 %50 Fundamental data mining tools / concepts Classification, clustering, associations and correlations algorithms Real life examples and implementations Data preprocess Data Warehouses ◦ Data from different sources/different structure unified schema, reside at a single site ◦ Periodic data summary Associations and correlations ◦ Market basket analysis, etc. Classification and prediction ◦ E.g. is he trustable for credit application? Cluster Analysis ◦ People with similar spending patterns Text and WEB mining Privacy preserving data mining ◦ Protect personal information “Necessity is the mother of invention” Plato Continuously petabytes of new data is produced ◦ 90% of world's data generated over last two years ◦ Twitter, facebook, online shopping, mobese cams etc. Easy to access and store data e.g. customer voice records Web Crawler e.g. twits that contain “election” and “party” terms Hard part is getting knowledge from the data Data mining is extracting non-trivial (previously unknown) and valid knowledge from large amounts of data that can be used in decision making Non-trivial ◦ Huge cost to get predictable info ◦ Not to prove sth you already know Diaper – beer correlation Large data Decision making ◦ Validity Query ◦ Suitable ◦ Not suitable ◦ No common language SQL – relational DB Data Output ◦ known ◦ Subset of data Databases Data ◦ Static ◦ Dynamic Query Output ◦ Not known ◦ Not subset of data Data Mining Database queries ◦ List of the people that has a boat at Kalamış marine and has the name “Ahmet” ◦ Credit card owners under 30 that has >5000 TL/m spending Data Mining Queries ◦ Credit application with low risk (classification) ◦ Card owners with similar buying patterns (clustering) ◦ Products purchased together with PS4 games (association rules) Cleaning Databases Selection transformation Data Mining Data Warehouse Presentation Evaluation Knowledge patterns Increasing potential to support business decisions Decision Making Data Presentation Visualization Techniques End User Business Analyst Data Mining Information Discovery Data Analyst Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems DBA 14 Market analysis ◦ Target audience, customer relations Risk analysis ◦ Resource management, check competitive enterprise Fraud detection ◦ Insurance, banking ◦ Modeling using history data Document similarity ◦ plagiarism Want to fit data into a model Predictive mining ◦ Classify people that may not pay mortgage payments ◦ Predict people that leave your company for another ◦ Predict exchange market (borsa) Descriptive mining ◦ ◦ ◦ ◦ Shows hidden information Shows your best customers Which products sell together Which customers have similar shopping trends Classification [Predictive] Clustering [Descriptive] Association Rules [Descriptive]