Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
OPIM5103 – Term Paper Assignment Sample Student OPIM5103 – Term Paper Assignment Sample Student 1 OPIM5103 – Term Paper Assignment Sample Student Table of Contents Introduction ..................................................................................................................... 3 Literature Review ............................................................................................................ 3 The Data ......................................................................................................................... 3 Relationship Analysis .................................................................................................... 17 Conclusion .................................................................................................................... 26 2 OPIM5103 – Term Paper Assignment Sample Student Introduction This paper analyzes the individual relationships between the dependent variable of “Gross Domestic Product/Capita “(GDP/CAP) in 110 countries around the world and the following explanatory variables. 1. Population in thousands (POPULATN) 2. Percentage of people who read and (LITERACY) 3. The birth rate per thousand people (BIRTH_RT) Throughout the paper, I will try and identify the influence of any or all of the abovementioned explanatory variables over the value of GDP/CAP for each country. Literature Review In my research for existing statistical analysis on the relationships between the above mentioned dependent and explanatory variables, I came across the following material. 1. “Evolutionary Theories of Long-Run World Economic History: The Theory/History Interconnection Re-Examined http://www.helsinki.fi/iehc2006/papers3/Korotayev.pdf This paper discusses the relationship between world GDP and population as well as literacy levels and concludes that an increase in literacy levels leads to an increase in GDP and population and birth rate also plays an important role in this. The Data I have used the States data from “WORLD95.XLS” data file source associated with the website for this course for the analysis in this paper. Here is some more information about the data being analyzed. 3 OPIM5103 – Term Paper Assignment Sample Student GDP/CAP: This data element represents the per capita gross domestic product for a country and is the dependent variable for the purpose of this paper. The GDP/CAP variable is measured in terms of an absolute numeric index value. The table below provides descriptive statistics information about this data element and is followed by a Frequency Table and Histogram to provide a sense of the frequencies and percentage distributions for the variable. There is a big difference between the mean GDP/CAP value, its median and mode. The mean 5859.98 is almost double its median (2995) which indicates a positive or rightskewness in the GDP data. The Pearson Measure of Skewness score measured as the (Mean-Median)/SD is .4421 which is way more than .1 and therefore tells us that the data is not symmetrical. The box and whisker diagram for GDP/CAP on the page after the next page provides a good idea of the skewness characteristic of the data. GDP/CAP Descriptive Statistics Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 5859.981651 620.6557167 2995 1500 6479.835919 41988273.54 -0.028158742 1.145650254 23352 122 23474 638738 109 4 OPIM5103 – Term Paper Assignment Sample Student GDP/CAP Histogram Interval Frequency 0 - Less than 1600 1600 - Less than 3200 3200 - Less than 4800 4800 - Less than 6400 6400 - Less than 8000 8000 - Less than 9600 9600 - Less than 11200 11200 - Less than 12800 12800 - Less than 14400 14400 - Less than 16000 16000 - Less than 17600 17600 - Less than 19200 19200 - Less than 20800 20800 - Less than 22400 22400 - Less than 24000 42 17 7 5 12 2 0 1 4 4 6 5 2 1 1 Percentage 38.53% 15.60% 6.42% 4.59% 11.01% 1.83% 0.00% 0.92% 3.67% 3.67% 5.50% 4.59% 1.83% 0.92% 0.92% Cumulative % Midpts 38.53% 54.13% 60.55% 65.14% 76.15% 77.98% 77.98% 78.90% 82.57% 86.24% 91.74% 96.33% 98.17% 99.08% 100.00% -800 2400 4000 5600 7200 8800 10400 12000 13600 15200 16800 18400 20000 21600 Bins 5 22400 19200 16000 12800 9600 - 6400 - 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 3200 - 45 40 35 30 25 20 15 10 5 0 0- Frequency GDP/CAP Histogram Frequency Cumulative % OPIM5103 – Term Paper Assignment Sample Student The frequency table and histogram above clearly show that more than 38% of the countries fall in the lowest interval of GDP/CAP and about 65% of the countries fall below the mean GDP/CAP value of 5860. This tells us that that the median value of 2995 may be a better measure of the center for the GDP/CAP data because there are too many outliers in either direction of the mean. The box and whisker plot below also shows that the outlier value of 23474 makes the right whisker longer than the left. This depicts that the data is not symmetrical at all. GDP/CAP Box & Whisker Plot GDP_CAP 120 5120 10120 15120 20120 25120 The standard deviation for GDP/CAP is 6479.835919 and the mean is 5860 approximately. Considering the fact that most data values for a distribution usually lie within 1 standard deviation of the mean, we can conclude that the 6 OPIM5103 – Term Paper Assignment Sample Student GDP/CAP data is no exception to the rule because 79% of the data lies in this interval. POPULATN: This variable provides a measure of the population of each of the 109 countries in the data file. The unit of measure is in “thousands”. Mentioned below are the descriptive statistics for this variable followed by a normal probability plot for it. The mean value (47724 approx.) exceeds the median (10400). The mean is more than 4 times the median. More than 77% of the values are below the mean and about 45% of the values are below the median. This tells us that the median may be a better measure of the center of this data. The box and whisker plot on the page after next indicates a positive, right skewness in the POPULATN data. The Pearson Measure of Skewness score is (47723.88-10400)/146726 = .254 which is greater than .1 indicating that the values for POPULATN are skewed. POPULATN Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 47723.88073 14053.83679 10400 2900 146726.3637 21528625814 46.65097372 6.592335985 1204944 256 1205200 5201903 109 7 OPIM5103 – Term Paper Assignment Sample Student The standard deviation for POPULATN is 146726 and more than 88% of the values for the POPULATN variable lie within 1 standard deviation of the mean value of 47724. The left whisker on the box & whisker plot is not visible at all and the right whisker is longer due to a high outlier value in the maximum for POPULATN being 1205200. The distribution is not symmetrical at all. Frequencies (POPULATN) Intervals Bins 0 Less Than 5000 5000 Less Than 10000 10000 Less Than 15000 15000 Less Than 20000 20000 Less Than 25000 25000 Less Than 30000 30000 Less Than 35000 35000 Less Than 40000 40000 Less Than 45000 45000 Less Than 50000 50000 Less Than 55000 55000 Less Than 60000 60000 Less Than 65000 65000 Less Than 70000 70000 Less Than 1210000 4999 9999 14999 19999 24999 29999 34999 39999 44999 49999 54999 59999 64999 69999 1E+06 Frequency 27 22 14 6 7 4 1 3 1 1 1 5 2 2 13 Percentage 24.77% 20.18% 12.84% 5.50% 6.42% 3.67% 0.92% 2.75% 0.92% 0.92% 0.92% 4.59% 1.83% 1.83% 11.93% Cumulative % Midpts 24.77% 44.95% 57.80% 63.30% 69.72% 73.39% 74.31% 77.06% 77.98% 78.90% 79.82% 84.40% 86.24% 88.07% 100.00% -2500 7500 12500 17500 22500 27500 32500 37500 42500 47500 52500 57500 62500 67500 30 25 20 15 10 5 0 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% 49 9 14 9 99 24 9 99 34 9 99 44 9 99 54 9 99 64 9 12 99 09 9 99 9 Frequency POPULATN Histogram Bins 8 Frequency Cumulative % OPIM5103 – Term Paper Assignment Sample Student POPULATN - Box & Whisker plot POPULATN 250 200250 400250 600250 9 800250 1000250 1200250 OPIM5103 – Term Paper Assignment Sample Student LITERACY: This variable defines the number of people who are able to read in a country. It is represented as a percentage value and the table below provides the descriptive statistics for LITERACY. LITERACY Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 76.89908257 2.395521396 87 99 25.00997762 625.4989806 0.514929291 -1.151088787 100 0 100 8382 109 The median value of this variable (87) is greater than its mean value (76.90). About 55% of the population falls under the median value and therefore it provides a better measure of the center for the literacy data than the mean value which is slightly distorted due to outliers on the left side of it. The box and whisker diagram on the page after next indicates a negative, left skewness in the distribution. The Pearson Measure of Skewness for the LITERACY value can be calculated as (76.9 – 87)/25 = -.4 an absolute value of .4 which is further proof that the data is skewed. The standard deviation for LITERACY is 25.00 and by looking at the frequency distribution we can without doubt say that 100% of the values lie within 1 standard deviation of the mean (76.89). 10 OPIM5103 – Term Paper Assignment Sample Student Frequencies (LITERACY) Bins Intervals 0 - Less than 10 10 - Less than 20 20 - Less than 30 30 - Less than 40 40 - Less than 50 50 - Less than 60 60 - Less than 70 70 - Less than 80 80 - Less than 90 90 - Less than 100 100 - Less than 110 Frequency 9 19 29 39 49 59 69 79 89 99 109 Percentage 2 1 5 4 4 10 7 12 15 46 3 Cumulative % 1.83% 0.92% 4.59% 3.67% 3.67% 9.17% 6.42% 11.01% 13.76% 42.20% 2.75% Midpts 1.83% 2.75% 7.34% 11.01% 14.68% 23.85% 30.28% 41.28% 55.05% 97.25% 100.00% -5 15 25 35 45 55 65 75 85 95 30 60.00% Frequency 20 40.00% Cumulative % 10 20.00% 0 0.00% 89 10 9 80.00% 69 40 49 100.00% 29 50 9 Frequency LITERACY Histogram Bins 11 OPIM5103 – Term Paper Assignment Sample Student LITERACY- Box & Whisker Plot LITERACY -10 10 30 50 12 70 90 110 OPIM5103 – Term Paper Assignment Sample Student BIRTH_RT: This variable reflects the birth rate number per thousand people in the population. Here are the descriptive statistics for this variable. The mean and median are approximately the same, which tells us that either value could be used as a measure of centre for the variable. The box and whisker plot on the next page shows a slight right or positive skewness tendency in the data. The right whisker is longer than the left one indicating that the data is not perfectly symmetrical. The standard deviation is 12.36 and the frequency table on the next page shows that more than 77% of the values for BIRTH_RT fall within 1 standard deviation of the mean (25.92). BIRTH_RT Mean Standard Error Median Mode Standard Deviation Sample Variance 25.92293578 1.183959241 25 13 12.36089737 152.7917839 1.146535118 0.445576718 43 10 53 2825.6 109 Kurtosis Skewness Range Minimum Maximum Sum Count BIRTH_RT Histogram Intervals Bins 0 - Less than 8 8 - Less than 16 16 - Less than 24 24 - Less than 32 32 - Less than 40 40 - Less than 48 48 - Less than 56 7.99 15.99 23.99 31.99 39.99 47.99 55.99 Frequency (BIRTH_RT) Percentage 0 35 16 23 11 21 3 0.00% 32.11% 14.68% 21.10% 10.09% 19.27% 2.75% 13 Cumulative % 0.00% 32.11% 46.79% 67.89% 77.98% 97.25% 100.00% Midpts -4 12 20 28 36 44 OPIM5103 – Term Paper Assignment Sample Student Frequency BIRTH_RT Histogram 40 35 30 25 20 15 10 5 0 100.00% 80.00% 60.00% Frequency 40.00% Cumulative % 20.00% 0.00% 7.99 16 24 32 40 48 56 Bins Birth_RT Box & Whisker Plot BIRTH_RT 0 10 20 30 14 40 50 OPIM5103 – Term Paper Assignment Sample Student Data Requirements for Multiple Regression Models: For multiple regression models it is more important that the data in question a. Does not contain a serial (auto) correlation issue. This is to say that the error terms for LITERACY, POPULATN and BIRTH_RT in the model should be independent of each other. Since the data in my paper is not time series related data, I will assume that serial correlation is not an issue with it. b. The data should not have error terms with unequal variances or it should not be heteroskedastic. It should be homoskedastic or the errors should have constant variance. I used the prescribed Limedep test again to prove the same and it recommended that the data had some problems in this area. Based on my discussion with professor Jantzen, I selected to not use the corrected results for this paper. c. The third data requisite for multiple regression states that the error data should be normally distributed. The normal probability chart for the error terms on the next page shows that the errors are more or less normally distributed and therefore I will conclude that I could use the same to create a multiple regression model for GDP/CAP. 15 OPIM5103 – Term Paper Assignment Sample Student Multiple Regression Error - Normal Probability Plot 20000 15000 Residuals 10000 5000 0 -3 -2 -1 0 -5000 -10000 Z Value 16 1 2 3 OPIM5103 – Term Paper Assignment Sample Student Relationship Analysis Since I am trying to identify the influence of the POPULATN, LITERACY and BIRTH_RT variables on GDP/CAP, I will use the multiple regression method to predict how large or small the dependent variable will be, given differing values for the explanatory variables. I will use the standard equation to represent “My Multiple Regression Model”. Substituting the dependent and explainer variables within this equation, I get the following equation Yi(GDP/CAP) = b0 + b1(POPULATN) + b2(LITERACY) + b3(BIRTH_RT) + E In other words, GDP/CAP in a state (Y) variable can be expressed in terms of a constant (b0) and a slope (b1) times POPULATN (X1 variable), plus a slope (b2) times LITERACY (X2 variable) and a slope (b3) times BIRTH_RT(X3 variable). The output of the regression model using the sample data and the above-mentioned dependent and explainer variables is presented on the next page. In the sections following the results, I will concentrate on examining the overall fit of the regression model and measure how each coefficient in the equation impacts the value of GDP/CAP. 17 OPIM5103 – Term Paper Assignment Sample Student a) The Overall Significance of the Model This step assesses whether all of the regression coefficients (except the constant) in the "true" model describing the underlying population are equal to zero. It proves the overall fitness of the multiple regression model in question. Yi(GDP/CAP) = b0 + b1(POPULATN) + b2(LITERACY) + b3(BIRTH_RT) + E The above equation being our model for this paper, we can use the following hypotheses statements for the overall fitness test. H0: b1 = b2 = b3 = 0 (no linear relationship between GDP/CAP and either of the explainer variables) HA: At least one βi (B1 or B2 or B3) ≠ 0 (at least one independent variable affects GDP/CAP) 18 OPIM5103 – Term Paper Assignment Sample Student In order to validate this we perform and evaluate the results of the F-Test on the overall regression. The formula for calculating the F Statistic is as follows where k = # of explainers & n= sample size. The F Statistic Value for the model = (.4395/1-.4395) / (3/109-3-1) = 27.41 The critical F statistic using a significance level of 0.05 and (k) or 3 being the degrees of freedom in the numerator and (n-k-1) or 105 as the degrees of freedom in the denominator calculates (using the F value calculator on the OPIM303 class website) as 2.6911 Therefore the F-Statistic value is greater that the critical F and I will reject the null. This helps me understand that there isn’t sufficient evidence that the coefficients for the explanatory variables in the underlying population are all zeros. Our conclusion from this is that there is a possibility that at least one of the explanatory variables within POPULATN, LITERACY and BIRTH_RT influences the value of the dependent variable GDP/CAP. 19 OPIM5103 – Term Paper Assignment Sample Student b) Test on a single coefficient The t test on a single regression coefficient assesses whether a population regression coefficient is =, not =, >= or <= a particular number. T tests are conducted for each estimated regression coefficient and typically use a reference value of zero. We will test the coefficients for all of our explainer variables namely POPULATN, LITERACY and BIRTH_RT. 1) To test hypotheses about POPULATN’s regression coefficient (b1), the following pairs of null and alternative hypotheses could be assessed with the “t test” (i) H0: b1 = 0 Ha: b1 <> 0 (ii) H0: b1<= 0 Ha: b1 > 0 (iii) H0: b1>= 0 Ha: b1< 0 The value for the t-statistic for POPULATN is stated as -1.604014027 in the Multiple Regression model output. We use the absolute value of 1.6040 to analyze our hypothesis. For hypothesis (i) above, we are trying to prove that POPULATN has some impact on GDP/CAP, either positive or negative. The two-tail critical t-value to be considered for (n-k-1) or 105 degrees of freedom and with “.05” significance level is 1.9828 (using the calculator). 20 OPIM5103 – Term Paper Assignment Sample Student This tells us that the absolute value of the t-statistic 1.604 is less than the 2-tail critical t-value of 1.9828 and therefore we cannot reject the null for this hypothesis. This tells us that there is not sufficient evidence to prove that b1, the coefficient for POPULATN is NOT EQUAL TO “0”. We can therefore conclude that we do not have enough evidence to prove that POPULATN influences the value of GDP/CAP for a country. 2) To test whether LITERACY influences GDP/CAP, we could use the following hypotheses. (i) H0: b2 = 0 (ii) Ha: b2 <> 0 H0: b2<= 0 Ha: b2 > 0 (iii) H0: b2>= 0 Ha: b2< 0 Bullet (i) above tries to test if LITERACY has any impact, positive or negative on GDP/CAP. H0: b2 = 0 HA: b2 <> 0 The absolute value for the t-statistic for LITERACY is stated as 0.683511186 in the results for the Multiple Regression Model. The two-tail critical t-value (absolute) to be considered for (n-k-1) or 105 degrees of freedom is 1.98217. This value is greater than the t-statistic value of 0.6835. Therefore we once again cannot reject the null for 21 OPIM5103 – Term Paper Assignment Sample Student this hypothesis. This tells us that there isn’t sufficient evidence to prove that b2, the coefficient for LITERACY is NOT EQUAL TO “0”. We can therefore conclude that we do not have enough evidence to prove that LITERACY impacts the value of GDP/CAP for a country. 3) To assess if the BIRTH_RT of a country influences it’s GDP/CAP we can use the following hypotheses statements. (i) Ho: b3 = 0 Ha: b3 <> 0 (ii) Ho: b3<= 0 Ha: b3 > 0 (iii) Ho: b3 >= 0 Ha: b3< 0 In bullet (i) above, we try to assess if BIRTH_RT for a country has any influence, positive or negative, on its GDP/CAP at all. The absolute value for the t-statistic for BIRTH_RT is stated as 6.09938 in the multiple regression model results. T The two-tail critical t-value (absolute) for (n-k-1) or 105 degrees of freedom is 1.98217. This value is less than the t-statistic value of 6.09938. Therefore we reject the null for this hypothesis. This allows us to conclude that there is sufficient evidence that the coefficient for BIRTH_RT is not equal to 0 and it therefore BIRTH_RT of a country influences the value of the dependent variable GDP/CAP. 22 OPIM5103 – Term Paper Assignment Sample Student In the following sections, we perform the lower and upper tail t-test for BIRTH_RT to find out if the value is greater than or less than 0. Hypothesis (ii) mentioned above tries to prove that the value for the coefficient for BIRTH_RT is > 0 or in other words that BIRTH-RT has a positive impact on the GDP/CAP. H0: b3 <= 0 HA: b3 > 0 For this hypothesis we need to run an upper tail t-test. Once again the tvalue in absolute form is given as 6.09937. The absolute value for the upper-tail critical t is 1.659. The sample t-value is greater than the critical value but the sample coefficient value for BIRTH_RT is “-376.928” or a value less than 0. This coefficient value for b3 does not agree with the alternative hypothesis in this case, which says that the b3 (coefficient for BIRTH_RT) value is greater than 0. Therefore we cannot reject the null which means that there is no evidence that BIRTH_RT has a positive impact on the GDP/CAP of a country. In bullet (iii) above, we hypothesize the following. 23 OPIM5103 – Term Paper Assignment Sample Student H0: b3 >= 0 HA: b3 < 0 The t-statistic value from the regression run is |6.09937|. The one-tail critical-t value from the calculator is |1.6594|. The sample-t value is greater than the critical-t value. In addition to this, the model coefficient value for b3 is -376.928, a value that is less than 0. It therefore satisfies the alternate hypothesis of b3 value being less than 0. We can therefore reject the null which means that there is sufficient evidence that the value of b3, the coefficient for BIRTH_RT is less than 0. We can therefore conclude that BIRTH_RT negatively impacts GDP/CAP for a country . 24 OPIM5103 – Term Paper Assignment Sample Student c) Standardized Coefficients To explain which variable amongst POPULATN, LITERACY and BIRTH_RT have a greater influence on GDP/CAP, we need to calculate the standardized coefficients. The standardized coefficients show how many standard deviations the GDP/CAP will change if the POPULATN, LITERACY or BIRTH_RT changes by one standard deviation. Larger standardized coefficients indicate more influence, smaller ones less. The formula for calculating the standardized coefficient is as follows. Where bi is the estimated regression coefficient, Sxi is the standard deviation of the explanatory variable, and Sy is the standard deviation of the dependent variable. Standardized coefficient for POPULATN = -0.0052*(146726.364/6479.836) = - 0.1177. Standardized coefficient for LITERACY = -20.8782*(25.01/6479.836) = - 0.08058. Standardized coefficient for BIRTH_RT = -376.9277*(12.361/6479.836) = - 0.7190. The standardized coefficient values calculated and shown above imply the following. A one SD increase in POPULATN leads to a -0.1177 SD decrease in GDP/CAP. A one SD increase in LITERACY leads to a -0.0806 SD decrease in GDP/CAP. A one SD increase in BIRTH_RT leads to a -0.7190 SD decrease in GDP/CAP. 25 OPIM5103 – Term Paper Assignment Sample Student Conclusion This paper tried to evaluate if the Population, Literacy level and Birth Rate for a country influenced it’s Gross Domestic Product Per Capita. After conducting a test for overall fit we concluded that there was enough evidence that each one of these explainer variables had some impact on the gross domestic product, per capita for the country. We tried to test further to identify the influence that each individual explainer (population, literacy level, birth rate) had on the dependent variable (gross domestic product per capita. This is what we concluded from those tests. There wasn’t sufficient evidence for us to conclude that population had any impact on gross domestic product/capita. There wasn’t sufficient evidence to conclude that literacy levels for a country had any influence on its gross product/capita. There was sufficient evidence that birth rate of a country had a negative impact on its gross domestic product/capita. R Square = Regression Sum of Squares/Total Sum of Squares = .440 approx. This says that approximately 44% of variation in the GDP/CAP data for a country can be explained by variations in the values for its POPULATN, LITERACY and BIRTH_RT. Adjusted R Square = .4234 or approximately .420. This says that 42% of the variation in GDP/CAP is explained by the variation in POPULATN, LITERACY and BIRTH_RT, taking into account the sample size of 109 and 10 and 3 as the number of independent variables. 26 OPIM5103 – Term Paper Assignment Sample Student The confidence intervals for the regression coefficients show how large the population coefficients are likely to be. Specifically, we're 95% confident that the "true" marginal effects on GDP/CAP of changes in POPULATN, LITERACY and BIRTH_RT lie between -0.0116 and 0.0012, -81.4441 and 39.6878, and 499.4613 and -254.3941 respectively. Note that zero lies within POPULATN and LITERACY variable intervals indicating that the population regression coefficients could be zeros (hence POPULATN and BIRTH_RT have no effect on GDP/CAP). We also proved this in our individual slope evaluation for POPULATN and LITERACY. 27