Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DATA ANALYSIS & INTERPRATATION Baseball Report Final Project Eric Henson 7/31/2012 The report consists of charts and graphs to prove that home runs are valued more to baseball owners than batting average because the average of salary is higher if a player hits more homeruns. Part 1 Names Bats Throws Height Weight Salary Age PositionN Position TeamN Team League Seasons G AB R H 2B 3B HR RBI BB K SB CS AVG OBP SLG Qualitative – Nominal Qualitative – Ordinal Qualitative - Ordinal Quantitative – Discrete Quantitative – Discrete Quantitative – Continuous Quantitative – Discrete Quantitative - Discrete Qualitative – Ordinal Quantitative – Discrete Qualitative – Nominal Qualitative – Nominal Quantitative – Discrete Quantitative - Discrete Quantitative – Continuous Quantitative – Continuous Quantitative – Continuous Quantitative – Continuous Quantitative – Continuous Quantitative – Continuous Quantitative – Continuous Quantitative – Continuous Quantitative – Continuous Quantitative – Continuous Quantitative – Continuous Quantitative – Discrete Quantitative – Discrete Quantitative – Discrete enson Final Project Page 1 Part 2 A. 120.0 100.0 80.0 60.0 40.0 32.090731.4626 29.5798 27.8768 24.6402 22.3215 16.1524 20.0 10.5628 6.2424 3.3339 1.6091 0.7018 0.2766 0.0985 0.0317 Salary enson Final Project Page 2 22,660,285 21,123,714 19,587,143 18,050,571 16,514,000 14,977,428 13,440,857 11,904,286 10,367,714 8,831,143 7,294,571 5,758,000 4,221,429 2,684,857 1,148,286 0.0 B. $25,000,000 $20,000,000 y = 8E+07x - 2E+07 $15,000,000 R² = 0.1504 $10,000,000 $5,000,000 $0 0.015 0.065 0.115 0.165 0.215 0.265 0.315 0.365 ($5,000,000) enson Final Project Page 3 C. $25,000,000 y = 27943x + 1E+06 R² = 0.5192 $20,000,000 $15,000,000 $10,000,000 $5,000,000 $0 0 200 400 600 800 0.250 0.300 1000 D. $30,000,000 Salary $3,000,000 $300,000 0.150 0.200 0.350 Average Part 3 A. The player with the highest salary is Jason Giambi with an annual salary of $23,428,571. His batting average is .289.The avg. batting average on the list is a .27553 which would put Giambi just above the 75th percentile mark. enson Final Project Page 4 B. I believe by reviewing the charts that owners value home runs more than batting average. There are more players with higher salaries in the HR chart than compared to the batting average chart. C. Sammy Sosa - he has over 600 HR but only makes $500k compared to others with similar HR and they make $5-$15 million a season. The Avg. Salary on the entire list is $4.6 mil. I did a univariate statistics on salary and found that Sosa is just above the 25th percentile which is very low considering his production. His age and the amount of seasons he’s played have probably affected his salary. D. The largest bubbles are concentrated on the chart in the upper right. It shows that the higher paid players are getting paid significantly more money. The player whose salary is lower than expected is Sammy Sosa. His salary is much less compared to players with similar HR and BA. E. The mean of the salary is $4,689,717.22 and the median salary is $3,500,000.00. I found that by looking at my univariate statistics. F. The standard deviation for the salaries is $4,808,744.053. I found this by looking at my univariate statistics. G. If a player made the mean salary, then he would be in the 64.2 percentile in the data. H. The skewness of the salary distribution is positive skewed. The values are clustered toward the left of the histogram. I. The mean of the batting average is .27553 and the standard deviation is .022529. I found this by looking at the BA univariate statistic. J. There are thirty-one players that are averaging .300 or greater. That is very similar to the amount of players that are expected to finish .300 or greater. K. The independent variable is the salary and the independent variable is the AVG and HR. The multiple R value is equal to the square root of the R² value. The R² measures the percentage of variation in the values of the dependent variable that can be explained by the change in the independent variable. The standard variable will show the information needed to determine the precision of x¯ in estimating the value of µ. The f-ratio displays the ratio of the mean square for the regression to the mean square error of residuals. The p-value is the probability of a value as extreme as the observed line. enson Final Project Page 5 enson Final Project Page 6