Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SAS TENNIS STATISTICS Team Members• SRI LATHA BUYYANAPRAGADA • AKSHAY BHINGE • SREEKANTH REDDY PATHI REDDY Guided ByProf. Meiliu Lu OVERVIEW Introduction Process Schedule Data Warehouse Implementation Data Mining Implementation Future Scope References INTRODUCTION Objective is to create a data mart on tennis data and apply the concepts of OLAP operations Try to predict information based on the dataset using a data mining tool We are interested in sports, specially tennis, hence wanted to do further analysis on a tennis dataset Get more information about tennis players and their statistics Implement the OLAP operations learnt during the courseware Implement various classification and clustering algorithms using a data mining tool like Rapid Miner PROCESS Pre-processing Had to do data preprocessing and change values of some attributes Made changes to Names of players to make them consistent across all Grand Slams Remove unwanted attributes from the dataset Added required attributes for data mart implementation SCHEDULE Week 11- Data Preprocessing Week 12- Data Warehouse Implementation Week 13- Data Mining Implementation Week 14- Presentation and Report Preparation DATA WAREHOUSE STAR SCHEMA Player Id (PK) Name Gender Tennis Fact_Table Time Player_Id (PK) Time_Id(PK) Slam_Id(PK) Id(PK) Year Ace Dbf Wnr Ufe Bpc Bpw Npa Npw Tpw NoOfSets NoOf Matches Grand Slam Id(PK) Name IMPLEMENTATION Created Dimensions table and fact table using Sql Server. Implemented basic OLAP operations on the data. Implemented web interface to show basic OLAP operations and other use full statistics Pictorially Technologies Used: ASP.Net MVC 4 Using Visual Studio IDE HTML, Jquery. Database Using SQL Server QUESTIONS ANSWERED Each player Aggregate Statistics for the entire year- Roll Up Each player Aggregate Statistics for each GrandSlam in a Year- Roll Down Each players statistics for a specific GrandSlam- Slice Players statistics of French Open and US Open whose Id is less than 10- Dice DATA WAREHOUSE DEMO QUIZ Q) Based on our data warehouse demo, on which attribute did we apply roll-up operation? • Grand Slam Id • Time Id • Player Id Answer- Player Id DATA MINING NEED OF DATA MINING Computers have become cheaper and More Powerful. Automated data collection tools and mature database technologies lead to tremendous amount of data stored in databases. Web data, Music data, Games data etc. Bank/Credit card transaction We are drowning in data, but starving for knowledge. Solution: Data Mining. DATA MINING Extraction of interesting (non-trivial, implicit, previously, unknown and potentially useful)) information or patterns from data in large databases. Data Mining Tasks: Prediction Tasks – Use some variable to predict unknown or future values of other variables. Description Tasks – Find human interpretable patterns that describe the data. IMPLEMENTATION • Based on errors and faults made, analysis can be done on when they are occurring and how it affects player performance • Classifying Players into Good, Average and Below Average categories based their performance statistics. • From that classified data, we are predicting the Players who has more chances to win in the next year Grand Slam. • Predicted Player who has the highest Chance to Win. Algorithms Implemented using Rapid Miner Decision Tree Naïve Bayes ID3 K-means KNN And also we tried to implement FP-Growth, SVM and few others in order to see how it results when algorithms are applied on data. DATA MINING DEMO QUIZ Q) Which classification algorithm is used to classify players? • • • • K-NN Decision Tree Rule Induction Naïve- Bayes Answer- Decision Tree FUTURE SCOPE Develop a mobile app to increase the availability and accessibility of statistics with ease Create an API to fill the gap between the data production and data utilization. REFERENCES http://archive.ics.uci.edu/ml/datasets/Tennis+Major+Tournament+Match+Sta tistics - Data Source http://www.asp.net/mvc/overview/getting-started / introduction /gettingstarted https://www.youtube.com/watch?v=EyygHzSVZpM&list=PLLYiNNLBO1EvVz2W JLWfbp_JWgg5It1O6 - Rapid Miner Tutorial THANK YOU QUESTIONS? APPENDIX 1. SCREENSHOTS DATA MINING SCREEN SHOTS Decision Tree Naïve-Bayes Simple Distribution Naïve Bayes Chart Representation ID3 –Tree ID3 –Tree Description ID3 Radial ID3 Balloon ID3 FRLayout ID3 circle Naïve Bayes Performance Vector Naïve Bayes – Simple Distribution for Win predict Win Prediction Histogram Representation Win Prediction Bar Representation Win Prediction Representation: Pie 3D