Download Data Analysis and Interpretation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Transcript
DATA ANALYSIS & INTERPRATATION
Baseball Report
Final Project
Eric Henson
7/31/2012
The report consists of charts and graphs to prove that home runs are valued more to baseball owners
than batting average because the average of salary is higher if a player hits more homeruns.
Part 1
Names
Bats
Throws
Height
Weight
Salary
Age
PositionN
Position
TeamN
Team
League
Seasons
G
AB
R
H
2B
3B
HR
RBI
BB
K
SB
CS
AVG
OBP
SLG
Qualitative – Nominal
Qualitative – Ordinal
Qualitative - Ordinal
Quantitative – Discrete
Quantitative – Discrete
Quantitative – Continuous
Quantitative – Discrete
Quantitative - Discrete
Qualitative – Ordinal
Quantitative – Discrete
Qualitative – Nominal
Qualitative – Nominal
Quantitative – Discrete
Quantitative - Discrete
Quantitative – Continuous
Quantitative – Continuous
Quantitative – Continuous
Quantitative – Continuous
Quantitative – Continuous
Quantitative – Continuous
Quantitative – Continuous
Quantitative – Continuous
Quantitative – Continuous
Quantitative – Continuous
Quantitative – Continuous
Quantitative – Discrete
Quantitative – Discrete
Quantitative – Discrete
enson Final Project
Page 1
Part 2
A.
120.0
100.0
80.0
60.0
40.0
32.090731.4626
29.5798
27.8768
24.6402
22.3215
16.1524
20.0
10.5628
6.2424
3.3339 1.6091
0.7018 0.2766 0.0985 0.0317
Salary
enson Final Project
Page 2
22,660,285
21,123,714
19,587,143
18,050,571
16,514,000
14,977,428
13,440,857
11,904,286
10,367,714
8,831,143
7,294,571
5,758,000
4,221,429
2,684,857
1,148,286
0.0
B.
$25,000,000
$20,000,000
y = 8E+07x - 2E+07
$15,000,000 R² = 0.1504
$10,000,000
$5,000,000
$0
0.015
0.065
0.115
0.165
0.215
0.265
0.315
0.365
($5,000,000)
enson Final Project
Page 3
C.
$25,000,000
y = 27943x + 1E+06
R² = 0.5192
$20,000,000
$15,000,000
$10,000,000
$5,000,000
$0
0
200
400
600
800
0.250
0.300
1000
D.
$30,000,000
Salary $3,000,000
$300,000
0.150
0.200
0.350
Average
Part 3
A. The player with the highest salary is Jason Giambi with an annual salary of $23,428,571. His
batting average is .289.The avg. batting average on the list is a .27553 which would put Giambi
just above the 75th percentile mark.
enson Final Project
Page 4
B. I believe by reviewing the charts that owners value home runs more than batting average. There
are more players with higher salaries in the HR chart than compared to the batting average
chart.
C.
Sammy Sosa - he has over 600 HR but only makes $500k compared to others with similar HR
and they make $5-$15 million a season. The Avg. Salary on the entire list is $4.6 mil. I did a
univariate statistics on salary and found that Sosa is just above the 25th percentile which is very
low considering his production. His age and the amount of seasons he’s played have probably
affected his salary.
D. The largest bubbles are concentrated on the chart in the upper right. It shows that the higher
paid players are getting paid significantly more money. The player whose salary is lower than
expected is Sammy Sosa. His salary is much less compared to players with similar HR and BA.
E. The mean of the salary is $4,689,717.22 and the median salary is $3,500,000.00. I found that by
looking at my univariate statistics.
F. The standard deviation for the salaries is $4,808,744.053. I found this by looking at my
univariate statistics.
G. If a player made the mean salary, then he would be in the 64.2 percentile in the data.
H. The skewness of the salary distribution is positive skewed. The values are clustered toward the
left of the histogram.
I.
The mean of the batting average is .27553 and the standard deviation is .022529. I found this by
looking at the BA univariate statistic.
J.
There are thirty-one players that are averaging .300 or greater. That is very similar to the
amount of players that are expected to finish .300 or greater.
K. The independent variable is the salary and the independent variable is the AVG and HR. The
multiple R value is equal to the square root of the R² value. The R² measures the percentage of
variation in the values of the dependent variable that can be explained by the change in the
independent variable. The standard variable will show the information needed to determine the
precision of x¯ in estimating the value of µ. The f-ratio displays the ratio of the mean square for
the regression to the mean square error of residuals. The p-value is the probability of a value as
extreme as the observed line.
enson Final Project
Page 5
enson Final Project
Page 6