Download PPT presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Omnibus test wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
MBA
Statistics 51-651-00
COURSE #2
Do we have winning
conditions?
Decision making from
statistical inference
Very often, a decision is taken
following a quantitative analysis of
certain parameters.



You are proposed two advertising concepts to
launch a new product. You will choose the one
which will obtain the best score of
effectiveness in your targeted market.
If the resistance or the average durability of a
new product is significantly larger than the
one of the best competing product, you will
put this product on the market.
If the « winning conditions » were present and
more than 50% of people in Quebec voted yes
in a referendum for sovereignty, then Bernard
Landry would make the decision to hold one.
2
In general, the parameters which interest us
are estimated using a sample and our
decision will be made following a hypothesis
test.
 Example:
We ask 1000 residents of the
Province of Quebec, chosen at
random who have the right to
vote, if today, they would vote
yes in a Quebec referendum on
sovereignty.
3
What would Bernard
Landry do if:
 432
voters voted yes?
(432/1000 = 43.2%)
• He would most probably not hold a referendum.
 517
voters voted yes?
(517/1000 = 51.7%)
• Is 51.7 % significantly larger than 50%?
 612
voters voted yes?
(612/1000 = 61.2%)
• 61.2% is probably significantly larger than 50%.
Therefore he would decide to hold a referendum on
the sovereignty of Quebec.
4
Basic notions of hypothesis
tests


To help us decide (especially in case 2 of
the previous slide), we will try to
quantify the term «significantly
different », statistically speaking, by
associating a probability of error with it.
In other words, we want to know,
starting from the results obtained in the
sample, what is the probability that the
Premier is making a mistake deciding to
hold a referendum on sovereignty.
5
Basic notions of hypothesis
tests (cont’d.)


If the probability of making a mistake is
small (for example, lower than 5%) he
will then decide to hold a referendum on
sovereignty soon.
If this probability is large (for example,
higher than 5%) he will then wait a
certain time to have « winning
conditions » and to hold a referendum.
6
Basic notions of hypothesis
tests(cont’d)

There are essentially two possibilities:
1.
2.


50% or less of the voters would vote yes if
a referendum took place today;
more than 50% of the voters would vote
yes.
The first possibility is called the null
hypothesis (noted H0).
The second possibility is called the
alternate hypothesis (noted H1).
7
Notation:

Let « p » be the true proportion of voters
who would vote yes at a referendum. We
then have the following two possibilities :


H0: p  50%
vs
H1: p > 50%
Often, the alternate hypothesis is what
we want to show « in any reasonable
doubt! » i.e. we want the probability of
making a mistake by making the decision
H1 starting from the results of the
sample, to be small.
8
Choosing H1

The choice of H1 is determined by the
question you need to answer.

H1 must be chosen in such a way that
you can answer yes (resp. no) to the
question if one accepts H1 and you can
answer no (resp. yes) if one accepts H0.

Typically there are three choices for H1 :
 > 0,  < 0 or  ≠ 0
9
Choosing H1 (continued)
The question Bernard Landry is asking himself is:
Do I have a chance of winning?

H1: p < ½ is not good. If one accepts H0 then
one can conclude that p ≥ ½ so the answer to
his question is not yes or no! The same is true
for the choice H1: p ≠ ½.

But H1: p > ½ is the right choice. If H1 is
accepted, the answer is yes while if H0 is
accepted, then p ≤ ½ so the answer is no.
10
Possible errors in decision
making starting from a sample:

Type I error:



To reject H0 in favour of H1 (i.e. to take the
decision H1) when actually H0 is true.
The probability of Type I error is the
probability that we have observed the
« value » obtained in our sample, or a value
even « further away » from H0 , if H0 is true.
In statistical jargon, this probability is often
called «p-value ».
Type II error:

Not rejecting H0 in favour of H1 when actually
H1 is true.
11
Is the defendant guilty or
not guilty?
Jury decision
H0
not
guilty
Truth
H0
not
guilty
H1
guilty
H1
guilty

Type I
Error
Type II
Error

12
Control of Type I and Type II
errors

Given the results obtained in the
sample, we calculate the probability of
Type I error (p-value).

If this probability is relatively small (for
example p-value < 5%), then we will
reject H0 to make the decision H1. If
not, we will not reject H0.
13
P-value
Measures the confidence you should
have about H0
 A small p-value indicates that you
should be less confident in H0
 How small the p-value should be to
reject H0 in favor of H1?
 It depends on you…
 Illustration: p-value.xls

14
Real life analog
One of your friend just lied to you.
Is he still your friend?
 Then he lies again, and again, and
again?
 When will you stop considering
him/her as a friend?

15
Control of Type I and Type II
errors (continued)

For a type I error fixed in advance
(ex. 5%), we control, using the
sample size, the type II error before
undertaking the study.

We define the power of the
hypothesis test as the quantity:
( 1 - probability of a type II error )
16
In the next few hours, we will see
basic statistical tests:
1.
2.
3.
Test of a proportion.
Test of a mean.
Test of a difference between two
means from the same sample
(similar to case 2).
17
1.
Test of a proportion:
Example:
Two years ago, a company put a
new product on the market.
The top management of the firm
plans to increase expense if less
than 70% of the population know
the product.
18
What are the possible hypotheses
we want to examine?
Let « p » be the true proportion of
individuals in the population who know the
product and « p0 » the value which
corresponds to our hypothesis or decision
making (p0 = 70% in the previous
example). We have to choose between :

H0 : p  p0 vs H1 : p > p0 (right-tailed test)
 H0 : p  p0 vs H1 : p <
p0
(left-tailed test)
 H0 : p = p0 vs H1 : p  p0 (two-tailed test)
19

One must choose the hypothesis H1
so that the answer to the question is
yes or no.

In this case, the question is: should
we increase advertising expenses?
20
 H0
: p  70% vs H1 : p > 70%
 If
H1 is accepted, the answer is
No. If H0, is accepted, the
answer is NYES!
 H1
: p > 70% is not appropriate.
21
 H0
: p = 70% vs H1 : p  70%
 If
H0 is accepted, the answer is
No. If H1is accepted, the answer
is NYES!
 H1
: p  70% is not appropriate.
22



H0 : p  70% vs H1 : p < 70%
If H0 is accepted, the answer is No.
If H1 is accepted, the answer is Yes!
H1 : p < 70% is the appropriate
choice.
23
Procedure :
We take a sample of n individuals in
the target population, and we
calculate the proportion of
individuals who know the product.
We will reject the null hypothesis H0,
at the  level, if we have sufficient
proof against it, i.e. enough evidence
in favour of the alternate hypothesis
H1, i.e. p-value < .
24
The test statistic is given by :
If the null hypothesis H0 is true and the
sample size is large, the statistic z will
approximately follow a normal distribution
with mean 0 and variance 1 [ denoted
N(0,1) ].
25
In order to make a decision, we
calculate the p-value

Right-tailed test:
p-value = Prob[N(0,1) > z]

Left-tailed test:
p-value = Prob[N(0,1) < z]

Two-tailed test:
p-value = 2 x Prob[N(0,1) > |z|]
The p-value is calculated with proportion-1t.xls
26
The company contacted by telephone 500
people from the target population
 330 individuals answer that they know the product
(330/500 = 66%).
 H0 : p  70% vs H1 : p < 70%
z
0.66  0.70
 1.9518
0.70(1  0.70)
500
 p-value = 0.0255
 We reject H0 (or accept H1) at level 5%.
 Therefore we will make the decision to rise the
advertising budget for this product.
27
Intentions to vote example:

We choose at random 1000 residents of Quebec that
have the right to vote and ask them if today, they
would vote yes in a referendum on sovereignty. In the
sample, 517 voters answered that they would vote yes.
 H0: p  50%
vs
H1: p > 50%

 p-value = 0.1411
 We will not reject H0 at the 5% level
 Bernard Landry will not hold a referendum in a near
future.
28
Intentions to vote example:

We choose at random 1000 residents of Quebec that
have the right to vote and ask them if today, they would
vote yes in a referendum on sovereignty. In the
sample, 612 voters answered that they would vote yes.
 H0: p  50% vs
H1: p > 50%
z 0.6120.5 7.0203
0.5(10.5)
1000
 p-value = 1.1146E-12
 We will reject H0 at the 5% level
 Bernard Landry will hold a referendum in a near
future.
29
Exercise

Recall the last example in the
estimation section.

Can you now answer the question
satisfactorily?
30
Remark: Test vs Confidence
interval
Testing H0 : p = p0 vs H1 : p  p0 is
equivalent to constructing a
confidence interval for p0.
 H0 is rejected iff p0 is not in the
interval.

31
2. Test of one mean

Example:You are in charge of the
department which manufactures and
produces 170 g bags of chips (brand CCC).
To verify if, on average, the process of filling
is maintained at 170 g, each day one of
your employees is asked to take a random
sample of 100 bags and the average weight
of the sample is calculated. The process of
filling will be stopped if the average weight
is significantly different from 170 g.
32
What are the possible hypotheses
we want to examine?
Let «  » be the true mean of a
characteristic in the population. This mean is
unknown, as is the variance 2. Let « 0 »
be the value of the mean which corresponds
to our hypothesis or decision making
(0=170g in the previous example ). We
have to choose between:
 H0 :   0 vs H1 :  > 0 (right-tailed test )

H0 :   0 vs H1 :  < 0
(left-tailed test )

H0 :  = 0 vs H1 :   0
(two-tailed test)
33
Procedure:
We take a sample of size n in the
target population and we calculate the
mean and the standard deviation s.
We will reject the null hypothesis H0,
at the  level, if we have sufficient
proof against it, i.e. enough evidence
in favour of the alternate hypothesis
H1, i.e. p-value < .
34
The test statistic is given by:
If the null hypothesis H0 is true,
the t statistic will follow a
Student distribution with n-1
degrees of freedom [noted t(n-1)].
35
In order to make a decision, we
calculate p-value.



Right-tailed test:
p-value=Prob[ t(n-1) > t ]
Left-tailed test :
p-value=Prob[ t(n-1) < t ]
Two-tailed test :
p-value= 2 x Prob[ t(n-1) > |t| ]

(1-) confidence interval for  :
X  t(n-1);  /2

s2
n
The p-value is calculated using mean-1t.xls
36
Example:

The sample mean of the 100 bags of chips is
169.9 grams and the standard deviation s=0.27.
 H0:  = 170g

vs
H1:   170g
 p-value = 0.0003
 We reject H0 without being afraid of being wrong!
 95% confidence interval for :
[169.846 ; 169.953]
 The interval does not contain the value 170
 We reject H0 at the 5% level
37

If the mean of the sample of 100 bags
of chips is 170.011 grams and the
standard deviation s = 0.27.
H0:  = 170g

vs
H1:   170g
p-value = 0.69
We will not reject H0
95% confidence interval for  :

[169.957 ; 170.064]
The interval contains the value 170 
we will not reject H0 at the 5% level
38
Case study


The average annual salary of a group of
employees in a city is 45 000$. One of the main
issue of the negotiations is that the
representative of the union states that this
particular group is paid much lower than in
other comparable cities.
One decides to verify that hypothesis. If the
union is right, the employer will increase the
salaries in such a way that the average salary
will not be significantly lower than in the other
cities. Both parties agree to take a risk of 5%.
39
Case study (continued)

To perform the comparison, 50 comparable
cities were chosen at random, the mean of the
50 (average) annual salaries was 50000$, and
the standard deviation was 16 000$.

a) What is the conclusion?

b) The city proposes to increase to average
annual salary to 46 500$. Is it honest?
40
Remark: Test vs Confidence
interval
Testing H0 :   0 vs H1 :   0 is
equivalent to constructing a
confidence interval for 0.
 H0 is rejected if 0 is not in the
interval.

41
3. Test of a difference of two means
from the same sample
Example:The human resources director of a
company wants to suggest that the management
implement a special training program for the
employees assigned to the assembling department.
To evaluate the effectiveness of this 3-week
program, we chose, at random, 15 employees and
we observed the number of parts assembled during
this period of time. Thereafter, these 15 employees
participated in the training program and once
again, we observed the number of parts assembled
during the same period of time.
42
The results obtained (hr.xls) were as
follow:
individual
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
before
15
13
8
9
7
12
11
12
11
9
10
12
11
7
12
after
17
16
10
9
9
13
14
15
14
11
14
11
13
10
13
difference
2
3
2
0
2
1
3
3
3
2
4
-1
2
3
1
43
The results of the statistical analysis using
Excel were as follow:
44
This test is equivalent to a test of the mean difference
between after and before:
T test for a mean (unknown sigma)
X-bar Mu0
n
s
t statictic
2
0
15
1.309
5.916
p-value
Confidence CI: lower limit CI: upper limit
2-tailed test
level
0.0000
95.0%
1.3
2.7
p-value
for H1: Mu > Mu0
0.0000
p-value
for H1: Mu < Mu0
1.0000
Thus, the average productivity is significantly higher
after the program. If the costs of the training program are
less than the profits in productivity, then the program will
be adopted.
45