Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
xxxxxxxx xxxxxxxx Regression Analysis β Student Project January 2013 TOTAL MEDALS WON AT THE LONDON 2012 SUMMER OLYMPICS Objective: The London 2012 Summer Olympics featured an array of athletes with varying abilities from 204 countries around the globe. After 3 weeks of competition, all participating athletes returned to their bases β some with medals to show for their participation and others satisfied with the thrill of having been a part of the games. Unquestionably, a highly captivating part of this global event is the medal count. So which factors are statistically significant to the number of medals won? Numerous factors could be examined. But for this project, I examined the significance of the following factors: 1. 2. 3. 4. 5. The gender of the participating athlete The total number of athletes representing each country The population of each country The number of athletes per country in proportion to the population of that country The gross domestic product (GDP) of each country. Data: The data used in this analysis was obtained from the archives of the U.K. news agency, The Guardian (http://www.guardian.co.uk/sport/datablog/2012/jul/30/olympics-2012-alternative-medal-table#data The data contains the total number of medals won by each country, along with each of the five variables listed above per country. Analysis: The approach I took in this analysis is to run a regression using the Total Medals as the response variable and the five variables listed above as the explanatory variables. The classical regression model was employed. π = πΌ + π½1 π1 + π½2 π2 + β¦ . + π½5 π5 Once the regression for the full model is complete, I then proceed to exclude certain variables to test the significance of those variables using the F-Test πΉ= π ππ ππππ’πππ πππππβπ ππ ππ’ππ πππππ ππ ππ’ππ πππππβππ ππππ’πππ πππππ π ππ ππ’ππ πππππ ππ ππ’ππ πππππ The result from the F-test was then compared to a calculated critical value at the 95% significance level to determine whether a stated null hypothesis can or cannot be rejected. Results: Full Model Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.950327626 0.903122598 0.900676198 4.231944983 204 ANOVA df Regression Residual Total Intercept Male Athletes Female Athletes GDP.2011 (Billions) Population.2010 (Millions) Athletes Per Population (100K) 5 198 203 Coefficients -1.092093 -0.030457 0.224253 0.003079 0.008482 0.034970 SS MS F 33057.45685 6611.49137 369.16406 3546.052951 17.9093583 36603.5098 Standard Error 0.385325 0.020896 0.024654 0.000366 0.002629 0.063815 t Stat -2.834214 -1.457525 9.095911 8.417454 3.226761 0.547980 P-value 0.005070 0.146556 0.000000 0.000000 0.001465 0.584323 Significance F 2.80758E-98 Lower 95% -1.851960 -0.071664 0.175635 0.002358 0.003298 -0.090875 The Adjusted R Square value of 0.9007 suggests that, after adjusting for degrees of freedom, our explanatory variables have good predictive powers and a large percentage of variation is explained by the regression. The P-value for βAthletes Per Population (100K)β claims that the probability of the variation being caused by chance is 58.4%, which is fairly high. The implication is that this particular variable may not be statistically significant. This will be examined, and possibly confirmed, later in my analysis. Reduced Models: The next step in this analysis was to exclude certain variable to determine the significance. Exclude Male Athletes: Null Hypothesis, H0: Total number of male athletes representing a country is not significant Regression Statistics Multiple R 0.949780596 R Square 0.902083181 Adjusted R Square 0.900115004 Standard Error 4.24388371 Observations 204 ANOVA df Regression Residual Total Intercept Female Athletes GDP.2011 (Billions) Population.2010 (Millions) Athletes Per Population (100K) SS MS F 33019.41056 8254.85264 458.33432 3584.099239 18.0105489 36603.5098 4 199 203 Coefficients -1.25912 0.19109 0.00317 0.00860 0.04091 Standard Error 0.36893 0.00952 0.00036 0.00263 0.06386 Full Model Reduced Model RSS df 3546.052951 3584.099239 F-Test 0.053645967 Critical Value 12.70620473 t Stat -3.41291 20.07385 8.76276 3.26353 0.64060 P-value 0.00078 0.00000 0.00000 0.00130 0.52252 5 4 Since the F-Test produced a value that is less than the Critical value, we cannot reject the null hypothesis. Significance F 3.5333E-99 Lower 95% -1.98663 0.17232 0.00246 0.00340 -0.08503 Exclude Female Athletes: Null Hypothesis, H0: Total number of female athletes representing a country is not significant Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.928785089 0.862641742 0.859880772 5.026459812 204 ANOVA df Regression Residual Total Intercept Male Athletes GDP.2011 (Billions) Population.2010 (Millions) Athletes Per Population (100K) SS MS F 31575.71545 7893.92886 312.44155 5027.794351 25.2652982 36603.5098 4 199 203 Coefficients -1.52789 0.14496 0.00447 0.00883 0.05353 Standard Error 0.45411 0.00956 0.00039 0.00312 0.07576 Full Model Reduced Model RSS df 3546.052951 5027.794351 F-Test 2.089282676 Critical Value 12.70620473 t Stat -3.36454 15.16941 11.31013 2.83003 0.70665 P-value 0.00092 0.00000 0.00000 0.00513 0.48061 Significance F 1.42966E-84 Lower 95% -2.42338 0.12612 0.00369 0.00268 -0.09586 5 4 Again, based on the comparison of the F-value and the Critical value, we cannot reject the null hypothesis. However, an observation worth noting is that the F-Test produced a higher value in this instance than it did when male athletes were excluded. Could it be that, even though not statistically significant, a country may be able to slightly increase the total number of medals won at the Olympics by investing in female athletes and sending a larger contingent of female athletes? Exclude All Athletes: Null Hypothesis, H0: Total number of athletes representing a country is not significant Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.838933441 0.703809319 0.699366459 7.362614516 204 ANOVA df Regression Residual Total SS MS F 25761.8913 8587.2971 158.41356 10841.6185 54.2080925 36603.5098 3 200 203 Intercept GDP.2011 (Billions) Population.2010 (Millions) Athletes Per Population (100K) Coefficients 1.76542 0.00818 0.00676 -0.04910 Full Model Reduced Model RSS df 3546.052951 10841.6185 F-Test 5.143440926 Critical Value Standard Error 0.58423 0.00045 0.00457 0.11052 t Stat 3.02177 18.00235 1.47941 -0.44426 P-value 0.00284 0.00000 0.14060 0.65734 Significance F 1.36725E-52 Lower 95% 0.61337 0.00728 -0.00225 -0.26704 5 3 4.30265273 In this case, the F-Test produces a value higher than the critical value. This implies the null hypothesis has to be rejected. Exclude GDP: Null Hypothesis, H0: The GDP of a participating country is not significant Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.931909542 0.868455394 0.865811282 4.918938001 204 ANOVA df Regression Residual Total Intercept Female Athletes Male Athletes Population.2010 (Millions) Athletes Per Population (100K) SS MS F 31788.51554 7947.12889 328.44871 4814.994261 24.1959511 36603.5098 4 199 203 Coefficients -1.48554 0.31078 -0.06002 0.01676 0.03904 Standard Error 0.44457 0.02605 0.02394 0.00283 0.07417 Full Model Reduced Model RSS df 3546.052951 4814.994261 F-Test 1.789230628 Critical Value 12.70620473 5 4 Based on the result above, the null hypothesis cannot be rejected. t Stat -3.34153 11.93178 -2.50691 5.91438 0.52638 P-value 0.00100 0.00000 0.01298 0.00000 0.59921 Significance F 1.94686E-86 Lower 95% -2.36221 0.25942 -0.10724 0.01117 -0.10722 Exclude Population: Null Hypothesis, H0: The population of a participating country is not significant Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.9476435 0.8980282 0.8959785 4.3308667 204 ANOVA df Regression Residual Total Intercept Female Athletes Male Athletes GDP.2011 (Billions) Athletes Per Population (100K) 4 199 203 SS 32870.98489 3732.524911 36603.5098 Coefficient s Standard Error -0.8980 0.3895 0.2254 0.0252 -0.0325 0.0214 0.0035 0.0003 0.0212 0.0652 Full Model Reduced Model RSS df 3546.052951 3732.524911 F-Test 0.262928899 Critical Value 12.70620473 The null hypothesis cannot be rejected in this case. 5 4 MS 8217.746 18.756 F 438.130 Significance F 1.99391E-97 t Stat -2.3055 8.9357 -1.5211 10.1408 0.3249 P-value 0.0222 0.0000 0.1298 0.0000 0.7456 Lower 95% -1.6660 0.1757 -0.0747 0.0028 -0.1073 Exclude Athletes Per Population: Null Hypothesis, H0: The number of athletes representing a country as a percentage of the countryβs population is not significant Regression Statistics Multiple R 0.950250322 R Square 0.902975675 Adjusted R Square 0.901025438 Standard Error 4.224498317 Observations 204 ANOVA df Regression Residual Total Intercept Female Athletes Male Athletes GDP.2011 (Billions) Population.2010 (Millions) 4 199 203 SS MS F 33052.07898 8263.01975 463.00801 3551.43082 17.846386 36603.5098 Coefficients Standard Error -1.01104 0.35518 0.22469 0.02460 -0.03119 0.02082 0.00308 0.00037 0.00839 0.00262 Full Model Reduced Model RSS df 3546.052951 3551.43082 F-Test 0.007582894 Critical Value 12.70620473 Again, the null hypothesis cannot be rejected in this case. 5 4 t Stat -2.84656 9.13417 -1.49822 8.43670 3.20286 P-value 0.00488 0.00000 0.13566 0.00000 0.00158 Significance F 1.4221E-99 Lower 95% -1.71144 0.17618 -0.07224 0.00236 0.00322 Conclusion: My analysis confirms what many people might have suspected all along: The size of a countryβs contingent to the Olympic Games has a statistical significance to the total number of medals won by the country at the Games. This is evident in the outcome of the London Olympic Games as the countries with the 8 largest contingents finished in the top 8 on the medals table. Another result that may not be surprising is that countries with higher GDPs may perform better at the Games, even though some countries with significantly lower GDPs (such as Jamaica and Belarus) had a greater medal haul then other countries with higher GDPs (such as India and Nigeria) that both finished with no medal. A finding that may not have been so apparent, however, is that, while statistically insignificant, the total number of medals won by a country at the Olympic Games may be slightly increased by having a larger contingent of female athletes than male athletes to the Olympic Games. This may be due to the fact that the average number of female athletes per country currently trails the average number of male athletes per country. Lastly, it is worth pointing out that the impacts of the variables examined in this analysis are not necessary mutually exclusive from one another. Countries such as the United States, China, Japan, and Germany all finished with a lot of medals respectively. These countries also happen to have higher GDPs. They also sent a lot of athletes (male or female) to the Olympic Games. For each of these countries, there is an abundance of world-class athletes competing in numerous sports. In addition to the need to examine interactions between variables included in this analysis, there is also the need to include several other variables that could potentially contribute to medal winnings. Therefore, for any analysis on this topic to be complete, several additional factors per country (socio-economic, cultural, political, religious, along with their interaction) need to be thoroughly examined.