Download SAS TENNIS STATISTICS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
SAS TENNIS STATISTICS
Team Members• SRI LATHA BUYYANAPRAGADA
• AKSHAY BHINGE
• SREEKANTH REDDY PATHI REDDY
Guided ByProf. Meiliu Lu
OVERVIEW
 Introduction
 Process
 Schedule
 Data Warehouse Implementation
 Data Mining Implementation
 Future Scope
 References
INTRODUCTION
 Objective is to create a data mart on tennis data and apply the
concepts of OLAP operations
 Try to predict information based on the dataset using a data
mining tool
 We are interested in sports, specially tennis, hence wanted to do
further analysis on a tennis dataset
 Get more information about tennis players and their statistics
 Implement the OLAP operations learnt during the courseware
 Implement various classification and clustering algorithms using a
data mining tool like Rapid Miner
PROCESS
Pre-processing
 Had to do data preprocessing and change values of some
attributes
 Made changes to Names of players to make them consistent
across all Grand Slams
 Remove unwanted attributes from the dataset
 Added required attributes for data mart implementation
SCHEDULE
 Week 11- Data Preprocessing
 Week 12- Data Warehouse Implementation
 Week 13- Data Mining Implementation
 Week 14- Presentation and Report Preparation
DATA WAREHOUSE
STAR SCHEMA
Player
Id (PK)
Name
Gender
Tennis Fact_Table
Time
Player_Id (PK)
Time_Id(PK)
Slam_Id(PK)
Id(PK)
Year
Ace
Dbf
Wnr
Ufe
Bpc
Bpw
Npa
Npw
Tpw
NoOfSets
NoOf Matches
Grand Slam
Id(PK)
Name
IMPLEMENTATION
 Created Dimensions table and fact table using Sql Server.
 Implemented basic OLAP operations on the data.
 Implemented web interface to show basic OLAP operations
and other use full statistics Pictorially
 Technologies Used:
 ASP.Net MVC 4 Using Visual Studio IDE
 HTML, Jquery.
 Database Using SQL Server
QUESTIONS ANSWERED
 Each player Aggregate Statistics for the entire year- Roll Up
 Each player Aggregate Statistics for each GrandSlam in a Year- Roll Down
 Each players statistics for a specific GrandSlam- Slice
 Players statistics of French Open and US Open whose Id is less than 10- Dice
DATA WAREHOUSE DEMO
QUIZ
Q) Based on our data warehouse demo, on which attribute did we apply
roll-up operation?
• Grand Slam Id
• Time Id
• Player Id
Answer- Player Id
DATA MINING
NEED OF DATA MINING
 Computers have become cheaper and More Powerful.
 Automated data collection tools and mature database
technologies lead to tremendous amount of data stored in
databases.
 Web data, Music data, Games data etc.
 Bank/Credit card transaction
We are drowning in data, but starving for knowledge.
Solution: Data Mining.
DATA MINING
 Extraction of interesting (non-trivial, implicit, previously,
unknown and potentially useful)) information or patterns from
data in large databases.
 Data Mining Tasks:
 Prediction Tasks – Use some variable to predict unknown or
future values of other variables.
 Description Tasks – Find human interpretable patterns that
describe the data.
IMPLEMENTATION
• Based on errors and faults made, analysis can be done on when they
are occurring and how it affects player performance
• Classifying Players into Good, Average and Below Average categories
based their performance statistics.
• From that classified data, we are predicting the Players who has more
chances to win in the next year Grand Slam.
•
Predicted Player who has the highest Chance to Win.
Algorithms Implemented using
Rapid Miner
 Decision Tree
 Naïve Bayes
 ID3
 K-means
 KNN
 And also we tried to implement FP-Growth, SVM and few
others in order to see how it results when algorithms are
applied on data.
DATA MINING DEMO
QUIZ
Q) Which classification algorithm is used to classify players?
•
•
•
•
K-NN
Decision Tree
Rule Induction
Naïve- Bayes
Answer- Decision Tree
FUTURE SCOPE
 Develop a mobile app to increase the availability and accessibility
of statistics with ease
 Create an API to fill the gap between the data production and data
utilization.
REFERENCES
 http://archive.ics.uci.edu/ml/datasets/Tennis+Major+Tournament+Match+Sta
tistics - Data Source
 http://www.asp.net/mvc/overview/getting-started / introduction /gettingstarted
 https://www.youtube.com/watch?v=EyygHzSVZpM&list=PLLYiNNLBO1EvVz2W
JLWfbp_JWgg5It1O6 - Rapid Miner Tutorial
THANK YOU
QUESTIONS?
APPENDIX
1. SCREENSHOTS
DATA MINING SCREEN SHOTS
 Decision Tree
Naïve-Bayes Simple Distribution
Naïve Bayes Chart Representation
ID3 –Tree
ID3 –Tree Description
ID3 Radial
ID3 Balloon
ID3 FRLayout
ID3 circle
Naïve Bayes Performance Vector
Naïve Bayes – Simple Distribution
for Win predict
Win Prediction Histogram
Representation
Win Prediction Bar Representation
Win Prediction
Representation: Pie 3D