Download January 2000

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
51-601-96 Statistics
Fall 1999
FINAL EXAM
Teachers : François Bellavance and Jean-Claude Lebrun
Problem 1. (12 points)
From October 18 to 25, 1999, a survey from « La Presse SOM » was carried out with 505 Montrealers
to know their level of satisfaction on the administration of the mayor Pierre Bourque as well as their
opinion on the « one island, one city» project. For the question, « The mayor presently exercises his
second mandate. Would you say that you currently trust him more, as much as or less than during his
first mandate ? », we obtained the following results :
Mother tongue
French
English (or other)
Trust more
32
40
Answer to the question
Trust as much
156
50
Trust less
118
59
Note : 50 people (18 francophones and 32 anglophones or others) did not know or did not answer the
question and thus were not entered in this table.
Questions :
a. In Montreal, is there a significant difference in the level of confidence granted to the mayor
between the francophones and the anglophones (or others)? Use a =5% level and verify, at least
3 TIMES, that there is no errors in the transcription of the data in EXCEL or MINITAB,!!!! (2
points)
b. If you observed a significant link, briefly describe this link. If you did not observe a significant
link, briefly describe why. (4 points)
51-601-96
Final exam – January 2000
1
During the same survey, for the question, « If a referendum was held on the island of Montreal, would
you vote yes or no to the following question : do you want to replace the 29 municipalities of the island
of Montreal by one city, i.e. one island, one city? », we obtained the following results :
Mother tongue
French
English (or other)
Answer to the question
Yes
161
64
No
145
102
Note : 33 people (18 francophones and 15 anglophones or others) did not know or did not answer the
question and thus were not entered in this table.
.
We want to verify the hypothesis that in Montreal, the proportion of francophones in favour of the
« one island, one city» project is different from the proportion of anglophones (or others).
Questions :
c. Formulate the hypotheses for this problem.(1 point)
d. In the sample, what are the respective proportions of francophones and anglophones in favour of
the « one island, one city» project? (1 point)
e. Obtain the « p-value » for the test of the hypotheses formulated in c) and give your conclusion at
the  =5 % level? (4 points)
51-601-96
Final exam – January 2000
2
51-601-96
Final exam – January 2000
3
For problems 2 to 5, see the EXCEL data file « jan2000.xls ». Before using this file
to answer the questions, be sure that you save on your hard disk, at least one copy
of this file under another name or in another directory.
Context and data file description :
A firm of« head hunters » offers its services to recruit the best managers either from inside or outside
the company. In the business world, several people claim that the managers hired from outside of the
company obtain better performances than the ones recruited from within the company.
A team of researchers asked 150 United States managers, chosen at random, to take part in a small
study in order to verify this business world claim. The sample was obtained using a simple random
draw with a 91% rate of participation. A comparison of some demographic characteristics between the
participants and the non participants did not reveal a significant difference between these two groups.
In other words, we can say that the sample obtained is probably not biased.
The file « jan2000.xls » contains the data collected for each of the 150 participants (source : Foster
D.P. et al. Business Analysis Using Regression. A Casebook . Springer-Verlag, New York, 1998). The
detailed content of the file and the description of the measured variables are as follows:
Column
Variable name
A
Id
B
Performance
C
Salary
D
E
F
G
Years
Ext-Int
Origin
Perf-Ext
H
Perf-Int
I
Salary-Ext
J
Salary-Int
Description
Anonymous identification number of the participants (1 to 150)
Score of performance evaluated by the researchers’ team
Managers annual salary in thousands of U.S. dollars
(Note that a higher salary is an indicator of a higher level in the
company, i.e. closer to the top management)
Managers years of experience
Variable indicating the managers origin: 1=External and 0=Internal
Same variable as column F, but not numerically coded
Score of performance of the managers recruited from outside the
company
Score of performance of the managers recruited from within the
company
Annual salary of the of the managers recruited from outside the
company
Annual salary of the managers recruited from within the company
Note that the data in columns G and H are the same as the ones in column B, but grouped by the
managers’ origin. Also, the data in columns I and J are the same as the ones in column C, but are
grouped by the managers’ origin.
51-601-96
Final exam – January 2000
4
Problem 2. (10 points)
Questions :
a. Using EXCEL (or MINITAB) obtain the minimum, the maximum, the mean and the standard
deviation of the 150 managers salaries for the sample? (4 points)
Minimum :
Maximum :
Mean :
Standard deviation :
b. Using EXCEL (or MINITAB), obtain the 95% confidence interval for the mean of the United
States managers salaries and briefly give the interpretation of that interval. (4 points)
c. Starting from the confidence interval for the mean of the salaries found in b), define if the « pvalue » to confront the hypotheses H0 :  = 70 against H1 :   70 , where  represents the true
mean of the United States managers salaries in thousands of U.S. dollars, would be higher, lower
or equal to 5%. Briefly justify your answer. (2 points)
51-601-96
Final exam – January 2000
5
Problem 3. (15 points)
Questions :
Although several people from the business world claim that the managers recruited from outside the
company obtain better performances, we believe that in the United States the proportion of theses
managers is lower than 50%. Starting from our sample we want to verify this last assertion: in the
United States the proportion of managers recruited from outside the company is lower than 50%.
a. Formulate precisely the hypotheses H0 and H1 that we want to confront in this problem. (1 point)
b. In the sample, what is the proportion of managers recruited from outside the company?(2 points)
c. Using EXCEL (or MINITAB), obtain the p-value corresponding to your hypotheses formulated in
a) and give your conclusion at the =5% level? (3 points)
d. Using EXCEL (or MINITAB), obtain the 95% confidence interval for the proportion of the United
States managers recruited from outside the company and briefly give the interpretation of this
interval. (4 points)
51-601-96
Final exam – January 2000
6
e. Would you have been able to verify the hypotheses formulated in a) if instead of having taken a
simple random sample, the researchers’ team had used a stratified sampling design with the
managers recruited externally as first strata and the managers recruited within the company as
second strata? Briefly justify your answer. (5 points)
Problem 4. (8 points)
Questions :
a. What are the means and the standard deviations of the score of performance for the groups of
managers recruited internally and externally respectively? (4 points)
External  mean of performance:
standard deviation :
Internal  mean of performance:
standard deviation :
b. We are now interested to verify the hypothesis that on average the managers recruited from
outside the company obtain higher scores of performance than the ones recruited from within the
company. Previously we carried out a test on the variances in order to take the good statistical test
to compare the means. We obtained the following results for the test on the variances:
H0 : equal variances
H1 : unequal variances
p-value = 0.309.
Using EXCEL (or MINITAB), find the p-value corresponding to the hypothesis on the means that
we want to verify and briefly comment the results considering a =5% level . (4 points)
51-601-96
Final exam – January 2000
7
Problem 5. (15 points)
Before undertaking a multiple linear regression analysis, it is important to examine the scatterplots
between all the variables as well as the correlation coefficients.
Performance
9,5
e
c
n
a
m
ro
rfe
P
7,5
6,5
5,5
9,5
8,5
Performance
Performance
8,5
7,5
e
c
n
a
m
ro
fr
e
P
4,5
3,5
6,5
5,5
4,5
3,5
2,5
2,5
1,5
1,5
45
55
65
75
85
95
105
0
Salaire
Salary
10
20
Yeaére
Years
9,5
105
8,5
95
Salary
Salaire
Performance
7,5
e
c
n
a
m
ro
rfe
P
6,5
5,5
4,5
85
75
65
3,5
55
2,5
1,5
45
Externe
Outsi
de
Interne
0
Origin
Origin
10
20
Years
Années
105
20
95
Années
years
Salary
Salaire
85
75
65
10
55
0
45
Externe
Interne
Origin
Origine
Externe
Interne
Origin
Origine
Correlations (Pearson)
P-Value
Salary
Years
51-601-96
Performance
0.684
0.000
0.068
0.410
Salary
-0.323
0.000
Final exam – January 2000
8
In order to greater analyse and understand this set of data and the relations between the variables, it is
also interesting to examine the scatterplots between the performance, the salary and the years of
experience by identifying on scatterplots the two manager groups. The Pearson correlation coefficients
between these variables were also calculated separately for each of the two groups.
9,5
O external
Externe
+ internal
Interne
e
c
n
a
m
ro
rfe
P
O external
Externe
+ internal
Interne
9,5
8,5
7,5
7,5
Performance
Performance
8,5
6,5
5,5
4,5
3,5
6,5
5,5
4,5
3,5
2,5
2,5
1,5
1,5
45
55
65
75
85
95
105
0
Salaire
10
20
Années
Years
Salary
O external
Externe
+ internal
Interne
105
95
Salary
Salaire
85
75
65
55
45
0
10
Years
Années
20
Managers recruited from outside the company
(external)
Managers recruited within the company
(internal)
Correlations (Pearson)
P-Value
Correlations (Pearson)
P-Value
Salary
Years
51-601-96
Performance
0.736
0.000
0.150
0.245
Salary
Salary
-0.174
0.175
Performance
0.642
0.000
Years
Final exam – January 2000
0.276
0.009
Salary
-0.014
0.899
9
Questions :
a. According to the graphs and the Pearson correlation coefficients, we note that for all the managers
there is a negative linear relation which is significant (r = -0.323) between the salary and the
number of years of experience. How would you explain this relation which, at first sight, seems to
be somewhat unexpected? (5 points)
Using SAS software, we have obtained a summary of all the multiple linear regression models
characteristics. The results are the following :
N = 150
Number in
Model
Regression Models for Dependent Variable: Performance
R-square
Adjusted
R-square
C(p)
Variables in Model
1
0.46737553 0.46377672
30.84091 SALARY
1
0.05674645 0.05037311
167.17715 EXT-INT
1
0.00458214 -.00214366
184.49664 YEARS
---------------------------------------------------2
0.56021507 0.55423160
2.01651 SALARY YEARS
2
0.48849467 0.48153541
25.82897 SALARY EXT-INT
2
0.11058043 0.09847948
151.30330 YEARS EXT-INT
-----------------------------------------------------------3
0.56026479 0.55122914
4.00000 SALARY YEARS EXT-INT
--------------------------------------------------------------------
Questions :
b. Which one of the various multiple and simple linear regression models seems to be the best and
why ? (4 points)
51-601-96
Final exam – January 2000
10
c. Using EXCEL (or MINITAB), obtain the linear regression line for the best model found in b) and
briefly interpret the coefficients of this model as well as the squared coefficient of determination.
(6 points)
51-601-96
Final exam – January 2000
11
Solutions :
Problem 1.
a) 2 by 3 crossed table. Yes there is a significant difference since the p-value = 0.000007 < 0.05. So,
we reject the hypothesis H0 : there is no link between the confidence level granted to the mayor by
the Montreal francophones comparatively to the anglophones (or others). (The coefficient of
Cramer = 0.2288)
b) The proportion of Montrealers who have less confidence in the mayor is similar for francophones
and anglophones (38.56% and 39.60% respectively). However, francophones have more confidence
in the mayor in a proportion of only 10.46% comparatively to 26.85% for anglophones. On the
other hand, the proportion of Montrealears who have the same level of confidence in the mayor is
50.98% for francophones and 33.56% for anglophones.
c) H0 : pfrancophones = panglophones vs H1 : pfrancophones  panglophones .
d) p francophones 
161
 52,61%
306
p anglophones 
64
 38,55%
166
e) p-value = 0.0035 <  = 0.05. Consequently, we reject the hypothesis H0 . So, in Montreal, the
proportion of francophones in favour of the « one island, one city» project is significantly different
from the proportion of anglophones (or others) in favour of the project.
Problem 2.
a) Minimum = 48, Maximum = 103, Mean = 71.63 and standard deviation = 10.704
b) 95% CI (69.906 ; 73.360). By saying that the true mean of the United States managers salaries is
between 69,906$ and 73,360$, there is only 5% chance of error.
c) The p-value to confront these hypotheses would be > 0.05 because 70,000 is included in the 95%
confidence interval .
Problem 3.
a) H0 : pexternal  50%
against H1 : pexternal < 50%.
b) pexternal  62  41.3%
150
c) p-value = 0.0169 <  = 0.05. Consequently, we reject the hypothesis H0 . So, the proportion of
managers recruited from outside the company is significantly lower than 50%.
d) 95% CI (33.45% ; 49.21 By saying that the true proportion of managers recruited from outside the
company is between 33.45% and 49.21% , there is only 5% chance of error.
51-601-96
Final exam – January 2000
12
e) No. In the case of a stratified sampling where the two strata are the managers recruited from within
and outside the company respectively, researchers predetermine the number of managers to sample
from within and outside the company and thus automatically determine the percentage of managers
recruited from outside the company (and within the company) that will be included in the total
sample.
Problem 4.
a) External  performance mean : 6.32
Internal  performance mean : 5.60
standard deviation : 1.342
standard deviation : 1.518
b) H0 : external  internal vs H1 : external > internal . Test to compare two means with equal variances
(because we do not reject the equality of variances, p-value = 0.309 > 0.05): p-value = 0.001665 <
 = 0.05. Consequently, we reject the hypothesis H0 . Thus, the mean of the managers scores of
performance hired from outside the company is significantly higher than the mean of the managers
scores of performance recruited from within the company.
Problem 5.
a) By Analysing the scatter plots, we observe that the « externals » have on average a higher salary
than the « internals » while having on average less years of experience. Also when we look
separately at the relation between the salaries and the years of experience for the « externals » and
the «internals», the link is no longer significant (externals : r = -0.174 p-value = 0.175 ; iternals : r
= -0.014 p-value = 0.899).
b) The model with salaries and years of experience. Comparatively to the other models, this model
has the greatest value of R2 ajusted (55.4%) and the smallest value of Cp (2.01).
c) Performance = -2.9206 + 0.1093 x salary + 0.1215 x years of experience. R2 = 56.02%, therefore,
56.02% of the observed variability in the managers scores of performance is explained by the
salaries and the years of experience. According to the model, when the salary is higher, the score of
performance is higher. Also, when the number of years of experience is higher, the score of
performance is higher.
51-601-96
Final exam – January 2000
13