Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
51-601-96 Statistics Fall 1999 FINAL EXAM Teachers : François Bellavance and Jean-Claude Lebrun Problem 1. (12 points) From October 18 to 25, 1999, a survey from « La Presse SOM » was carried out with 505 Montrealers to know their level of satisfaction on the administration of the mayor Pierre Bourque as well as their opinion on the « one island, one city» project. For the question, « The mayor presently exercises his second mandate. Would you say that you currently trust him more, as much as or less than during his first mandate ? », we obtained the following results : Mother tongue French English (or other) Trust more 32 40 Answer to the question Trust as much 156 50 Trust less 118 59 Note : 50 people (18 francophones and 32 anglophones or others) did not know or did not answer the question and thus were not entered in this table. Questions : a. In Montreal, is there a significant difference in the level of confidence granted to the mayor between the francophones and the anglophones (or others)? Use a =5% level and verify, at least 3 TIMES, that there is no errors in the transcription of the data in EXCEL or MINITAB,!!!! (2 points) b. If you observed a significant link, briefly describe this link. If you did not observe a significant link, briefly describe why. (4 points) 51-601-96 Final exam – January 2000 1 During the same survey, for the question, « If a referendum was held on the island of Montreal, would you vote yes or no to the following question : do you want to replace the 29 municipalities of the island of Montreal by one city, i.e. one island, one city? », we obtained the following results : Mother tongue French English (or other) Answer to the question Yes 161 64 No 145 102 Note : 33 people (18 francophones and 15 anglophones or others) did not know or did not answer the question and thus were not entered in this table. . We want to verify the hypothesis that in Montreal, the proportion of francophones in favour of the « one island, one city» project is different from the proportion of anglophones (or others). Questions : c. Formulate the hypotheses for this problem.(1 point) d. In the sample, what are the respective proportions of francophones and anglophones in favour of the « one island, one city» project? (1 point) e. Obtain the « p-value » for the test of the hypotheses formulated in c) and give your conclusion at the =5 % level? (4 points) 51-601-96 Final exam – January 2000 2 51-601-96 Final exam – January 2000 3 For problems 2 to 5, see the EXCEL data file « jan2000.xls ». Before using this file to answer the questions, be sure that you save on your hard disk, at least one copy of this file under another name or in another directory. Context and data file description : A firm of« head hunters » offers its services to recruit the best managers either from inside or outside the company. In the business world, several people claim that the managers hired from outside of the company obtain better performances than the ones recruited from within the company. A team of researchers asked 150 United States managers, chosen at random, to take part in a small study in order to verify this business world claim. The sample was obtained using a simple random draw with a 91% rate of participation. A comparison of some demographic characteristics between the participants and the non participants did not reveal a significant difference between these two groups. In other words, we can say that the sample obtained is probably not biased. The file « jan2000.xls » contains the data collected for each of the 150 participants (source : Foster D.P. et al. Business Analysis Using Regression. A Casebook . Springer-Verlag, New York, 1998). The detailed content of the file and the description of the measured variables are as follows: Column Variable name A Id B Performance C Salary D E F G Years Ext-Int Origin Perf-Ext H Perf-Int I Salary-Ext J Salary-Int Description Anonymous identification number of the participants (1 to 150) Score of performance evaluated by the researchers’ team Managers annual salary in thousands of U.S. dollars (Note that a higher salary is an indicator of a higher level in the company, i.e. closer to the top management) Managers years of experience Variable indicating the managers origin: 1=External and 0=Internal Same variable as column F, but not numerically coded Score of performance of the managers recruited from outside the company Score of performance of the managers recruited from within the company Annual salary of the of the managers recruited from outside the company Annual salary of the managers recruited from within the company Note that the data in columns G and H are the same as the ones in column B, but grouped by the managers’ origin. Also, the data in columns I and J are the same as the ones in column C, but are grouped by the managers’ origin. 51-601-96 Final exam – January 2000 4 Problem 2. (10 points) Questions : a. Using EXCEL (or MINITAB) obtain the minimum, the maximum, the mean and the standard deviation of the 150 managers salaries for the sample? (4 points) Minimum : Maximum : Mean : Standard deviation : b. Using EXCEL (or MINITAB), obtain the 95% confidence interval for the mean of the United States managers salaries and briefly give the interpretation of that interval. (4 points) c. Starting from the confidence interval for the mean of the salaries found in b), define if the « pvalue » to confront the hypotheses H0 : = 70 against H1 : 70 , where represents the true mean of the United States managers salaries in thousands of U.S. dollars, would be higher, lower or equal to 5%. Briefly justify your answer. (2 points) 51-601-96 Final exam – January 2000 5 Problem 3. (15 points) Questions : Although several people from the business world claim that the managers recruited from outside the company obtain better performances, we believe that in the United States the proportion of theses managers is lower than 50%. Starting from our sample we want to verify this last assertion: in the United States the proportion of managers recruited from outside the company is lower than 50%. a. Formulate precisely the hypotheses H0 and H1 that we want to confront in this problem. (1 point) b. In the sample, what is the proportion of managers recruited from outside the company?(2 points) c. Using EXCEL (or MINITAB), obtain the p-value corresponding to your hypotheses formulated in a) and give your conclusion at the =5% level? (3 points) d. Using EXCEL (or MINITAB), obtain the 95% confidence interval for the proportion of the United States managers recruited from outside the company and briefly give the interpretation of this interval. (4 points) 51-601-96 Final exam – January 2000 6 e. Would you have been able to verify the hypotheses formulated in a) if instead of having taken a simple random sample, the researchers’ team had used a stratified sampling design with the managers recruited externally as first strata and the managers recruited within the company as second strata? Briefly justify your answer. (5 points) Problem 4. (8 points) Questions : a. What are the means and the standard deviations of the score of performance for the groups of managers recruited internally and externally respectively? (4 points) External mean of performance: standard deviation : Internal mean of performance: standard deviation : b. We are now interested to verify the hypothesis that on average the managers recruited from outside the company obtain higher scores of performance than the ones recruited from within the company. Previously we carried out a test on the variances in order to take the good statistical test to compare the means. We obtained the following results for the test on the variances: H0 : equal variances H1 : unequal variances p-value = 0.309. Using EXCEL (or MINITAB), find the p-value corresponding to the hypothesis on the means that we want to verify and briefly comment the results considering a =5% level . (4 points) 51-601-96 Final exam – January 2000 7 Problem 5. (15 points) Before undertaking a multiple linear regression analysis, it is important to examine the scatterplots between all the variables as well as the correlation coefficients. Performance 9,5 e c n a m ro rfe P 7,5 6,5 5,5 9,5 8,5 Performance Performance 8,5 7,5 e c n a m ro fr e P 4,5 3,5 6,5 5,5 4,5 3,5 2,5 2,5 1,5 1,5 45 55 65 75 85 95 105 0 Salaire Salary 10 20 Yeaére Years 9,5 105 8,5 95 Salary Salaire Performance 7,5 e c n a m ro rfe P 6,5 5,5 4,5 85 75 65 3,5 55 2,5 1,5 45 Externe Outsi de Interne 0 Origin Origin 10 20 Years Années 105 20 95 Années years Salary Salaire 85 75 65 10 55 0 45 Externe Interne Origin Origine Externe Interne Origin Origine Correlations (Pearson) P-Value Salary Years 51-601-96 Performance 0.684 0.000 0.068 0.410 Salary -0.323 0.000 Final exam – January 2000 8 In order to greater analyse and understand this set of data and the relations between the variables, it is also interesting to examine the scatterplots between the performance, the salary and the years of experience by identifying on scatterplots the two manager groups. The Pearson correlation coefficients between these variables were also calculated separately for each of the two groups. 9,5 O external Externe + internal Interne e c n a m ro rfe P O external Externe + internal Interne 9,5 8,5 7,5 7,5 Performance Performance 8,5 6,5 5,5 4,5 3,5 6,5 5,5 4,5 3,5 2,5 2,5 1,5 1,5 45 55 65 75 85 95 105 0 Salaire 10 20 Années Years Salary O external Externe + internal Interne 105 95 Salary Salaire 85 75 65 55 45 0 10 Years Années 20 Managers recruited from outside the company (external) Managers recruited within the company (internal) Correlations (Pearson) P-Value Correlations (Pearson) P-Value Salary Years 51-601-96 Performance 0.736 0.000 0.150 0.245 Salary Salary -0.174 0.175 Performance 0.642 0.000 Years Final exam – January 2000 0.276 0.009 Salary -0.014 0.899 9 Questions : a. According to the graphs and the Pearson correlation coefficients, we note that for all the managers there is a negative linear relation which is significant (r = -0.323) between the salary and the number of years of experience. How would you explain this relation which, at first sight, seems to be somewhat unexpected? (5 points) Using SAS software, we have obtained a summary of all the multiple linear regression models characteristics. The results are the following : N = 150 Number in Model Regression Models for Dependent Variable: Performance R-square Adjusted R-square C(p) Variables in Model 1 0.46737553 0.46377672 30.84091 SALARY 1 0.05674645 0.05037311 167.17715 EXT-INT 1 0.00458214 -.00214366 184.49664 YEARS ---------------------------------------------------2 0.56021507 0.55423160 2.01651 SALARY YEARS 2 0.48849467 0.48153541 25.82897 SALARY EXT-INT 2 0.11058043 0.09847948 151.30330 YEARS EXT-INT -----------------------------------------------------------3 0.56026479 0.55122914 4.00000 SALARY YEARS EXT-INT -------------------------------------------------------------------- Questions : b. Which one of the various multiple and simple linear regression models seems to be the best and why ? (4 points) 51-601-96 Final exam – January 2000 10 c. Using EXCEL (or MINITAB), obtain the linear regression line for the best model found in b) and briefly interpret the coefficients of this model as well as the squared coefficient of determination. (6 points) 51-601-96 Final exam – January 2000 11 Solutions : Problem 1. a) 2 by 3 crossed table. Yes there is a significant difference since the p-value = 0.000007 < 0.05. So, we reject the hypothesis H0 : there is no link between the confidence level granted to the mayor by the Montreal francophones comparatively to the anglophones (or others). (The coefficient of Cramer = 0.2288) b) The proportion of Montrealers who have less confidence in the mayor is similar for francophones and anglophones (38.56% and 39.60% respectively). However, francophones have more confidence in the mayor in a proportion of only 10.46% comparatively to 26.85% for anglophones. On the other hand, the proportion of Montrealears who have the same level of confidence in the mayor is 50.98% for francophones and 33.56% for anglophones. c) H0 : pfrancophones = panglophones vs H1 : pfrancophones panglophones . d) p francophones 161 52,61% 306 p anglophones 64 38,55% 166 e) p-value = 0.0035 < = 0.05. Consequently, we reject the hypothesis H0 . So, in Montreal, the proportion of francophones in favour of the « one island, one city» project is significantly different from the proportion of anglophones (or others) in favour of the project. Problem 2. a) Minimum = 48, Maximum = 103, Mean = 71.63 and standard deviation = 10.704 b) 95% CI (69.906 ; 73.360). By saying that the true mean of the United States managers salaries is between 69,906$ and 73,360$, there is only 5% chance of error. c) The p-value to confront these hypotheses would be > 0.05 because 70,000 is included in the 95% confidence interval . Problem 3. a) H0 : pexternal 50% against H1 : pexternal < 50%. b) pexternal 62 41.3% 150 c) p-value = 0.0169 < = 0.05. Consequently, we reject the hypothesis H0 . So, the proportion of managers recruited from outside the company is significantly lower than 50%. d) 95% CI (33.45% ; 49.21 By saying that the true proportion of managers recruited from outside the company is between 33.45% and 49.21% , there is only 5% chance of error. 51-601-96 Final exam – January 2000 12 e) No. In the case of a stratified sampling where the two strata are the managers recruited from within and outside the company respectively, researchers predetermine the number of managers to sample from within and outside the company and thus automatically determine the percentage of managers recruited from outside the company (and within the company) that will be included in the total sample. Problem 4. a) External performance mean : 6.32 Internal performance mean : 5.60 standard deviation : 1.342 standard deviation : 1.518 b) H0 : external internal vs H1 : external > internal . Test to compare two means with equal variances (because we do not reject the equality of variances, p-value = 0.309 > 0.05): p-value = 0.001665 < = 0.05. Consequently, we reject the hypothesis H0 . Thus, the mean of the managers scores of performance hired from outside the company is significantly higher than the mean of the managers scores of performance recruited from within the company. Problem 5. a) By Analysing the scatter plots, we observe that the « externals » have on average a higher salary than the « internals » while having on average less years of experience. Also when we look separately at the relation between the salaries and the years of experience for the « externals » and the «internals», the link is no longer significant (externals : r = -0.174 p-value = 0.175 ; iternals : r = -0.014 p-value = 0.899). b) The model with salaries and years of experience. Comparatively to the other models, this model has the greatest value of R2 ajusted (55.4%) and the smallest value of Cp (2.01). c) Performance = -2.9206 + 0.1093 x salary + 0.1215 x years of experience. R2 = 56.02%, therefore, 56.02% of the observed variability in the managers scores of performance is explained by the salaries and the years of experience. According to the model, when the salary is higher, the score of performance is higher. Also, when the number of years of experience is higher, the score of performance is higher. 51-601-96 Final exam – January 2000 13