Download SAS TENNIS STATISTICS

SAS TENNIS STATISTICS Team Members• SRI LATHA BUYYANAPRAGADA • AKSHAY BHINGE • SREEKANTH REDDY PATHI REDDY Guided ByProf. Meiliu Lu OVERVIEW  Introduction  Process  Schedule  Data Warehouse Implementation  Data Mining Implementation  Future Scope  References INTRODUCTION  Objective is to create a data mart on tennis data and apply the concepts of OLAP operations  Try to predict information based on the dataset using a data mining tool  We are interested in sports, specially tennis, hence wanted to do further analysis on a tennis dataset  Get more information about tennis players and their statistics  Implement the OLAP operations learnt during the courseware  Implement various classification and clustering algorithms using a data mining tool like Rapid Miner PROCESS Pre-processing  Had to do data preprocessing and change values of some attributes  Made changes to Names of players to make them consistent across all Grand Slams  Remove unwanted attributes from the dataset  Added required attributes for data mart implementation SCHEDULE  Week 11- Data Preprocessing  Week 12- Data Warehouse Implementation  Week 13- Data Mining Implementation  Week 14- Presentation and Report Preparation DATA WAREHOUSE STAR SCHEMA Player Id (PK) Name Gender Tennis Fact_Table Time Player_Id (PK) Time_Id(PK) Slam_Id(PK) Id(PK) Year Ace Dbf Wnr Ufe Bpc Bpw Npa Npw Tpw NoOfSets NoOf Matches Grand Slam Id(PK) Name IMPLEMENTATION  Created Dimensions table and fact table using Sql Server.  Implemented basic OLAP operations on the data.  Implemented web interface to show basic OLAP operations and other use full statistics Pictorially  Technologies Used:  ASP.Net MVC 4 Using Visual Studio IDE  HTML, Jquery.  Database Using SQL Server QUESTIONS ANSWERED  Each player Aggregate Statistics for the entire year- Roll Up  Each player Aggregate Statistics for each GrandSlam in a Year- Roll Down  Each players statistics for a specific GrandSlam- Slice  Players statistics of French Open and US Open whose Id is less than 10- Dice DATA WAREHOUSE DEMO QUIZ Q) Based on our data warehouse demo, on which attribute did we apply roll-up operation? • Grand Slam Id • Time Id • Player Id Answer- Player Id DATA MINING NEED OF DATA MINING  Computers have become cheaper and More Powerful.  Automated data collection tools and mature database technologies lead to tremendous amount of data stored in databases.  Web data, Music data, Games data etc.  Bank/Credit card transaction We are drowning in data, but starving for knowledge. Solution: Data Mining. DATA MINING  Extraction of interesting (non-trivial, implicit, previously, unknown and potentially useful)) information or patterns from data in large databases.  Data Mining Tasks:  Prediction Tasks – Use some variable to predict unknown or future values of other variables.  Description Tasks – Find human interpretable patterns that describe the data. IMPLEMENTATION • Based on errors and faults made, analysis can be done on when they are occurring and how it affects player performance • Classifying Players into Good, Average and Below Average categories based their performance statistics. • From that classified data, we are predicting the Players who has more chances to win in the next year Grand Slam. • Predicted Player who has the highest Chance to Win. Algorithms Implemented using Rapid Miner  Decision Tree  Naïve Bayes  ID3  K-means  KNN  And also we tried to implement FP-Growth, SVM and few others in order to see how it results when algorithms are applied on data. DATA MINING DEMO QUIZ Q) Which classification algorithm is used to classify players? • • • • K-NN Decision Tree Rule Induction Naïve- Bayes Answer- Decision Tree FUTURE SCOPE  Develop a mobile app to increase the availability and accessibility of statistics with ease  Create an API to fill the gap between the data production and data utilization. REFERENCES  http://archive.ics.uci.edu/ml/datasets/Tennis+Major+Tournament+Match+Sta tistics - Data Source  http://www.asp.net/mvc/overview/getting-started / introduction /gettingstarted  https://www.youtube.com/watch?v=EyygHzSVZpM&list=PLLYiNNLBO1EvVz2W JLWfbp_JWgg5It1O6 - Rapid Miner Tutorial THANK YOU QUESTIONS? APPENDIX 1. SCREENSHOTS DATA MINING SCREEN SHOTS  Decision Tree Naïve-Bayes Simple Distribution Naïve Bayes Chart Representation ID3 –Tree ID3 –Tree Description ID3 Radial ID3 Balloon ID3 FRLayout ID3 circle Naïve Bayes Performance Vector Naïve Bayes – Simple Distribution for Win predict Win Prediction Histogram Representation Win Prediction Bar Representation Win Prediction Representation: Pie 3D

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download SAS TENNIS STATISTICS