Download hypothesis testing

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Five types of statistical analysis
Descriptive
What are the characteristics of the respondents?
Inferential
What are the characteristics of the population?
Differences
Are two or more groups the same or different?
Associative
Are two or more variables related in a systematic way?
Predictive
Can we predict one variable if we know one or more
other variables?
General Procedure for
Hypothesis Test
1. Formulate H0 (null hypothesis) and H1
(alternative hypothesis)
2. Select appropriate test
3. Choose level of significance
4. Calculate the test statistic (SPSS)
5. Determine the probability associated with
the statistic.
•
Determine the critical value of the test
statistic.
General Procedure for
Hypothesis Test
6 a) Compare with the level of significance, 
b) Determine if the critical value falls in the
rejection region. (check tables)
7 Reject or do not reject H0
8 Draw a conclusion
1. Formulate H1and H0
• The hypothesis the researcher wants to test is called
the alternative hypothesis H1.
• The opposite of the alternative hypothesis is the null
hypothesis H0 (the status quo)(no difference between
the sample and the population, or between samples).
• The objective is to DISPROVE the null hypothesis.
• The Significance Level is the Critical probability of
choosing between the null hypothesis and the
alternative hypothesis
2. Select Appropriate Test
• The selection of a proper Test depends on:
– Scale of the data
• nominal
• interval
– the statistic you seek to compare
• Proportions (percentages)
• means
– the sampling distribution of such statistic
• Normal Distribution
• T Distribution
• 2 Distribution
– Number of variables
• Univariate
• Bivariate
• Multivariate
– Type of question to be answered
Example
A tire manufacturer believes that men are more aware of their
brand. To find out, a survey is conducted of 100 customers, 65
of whom are men and 35 of whom are women.
The question they are asked is:
Are you aware of our brand: Yes or No. 50 of the men were
aware and 15 were not whereas 10 of the women were aware
and 25 were not.
Are these differences significant?
Aware
Unaware
Men
50
15
65
Women
10
25
35
1. Formulate H1and H0
We want to know whether brand awareness is
associated with gender. What are the Hypotheses
H0: There is no difference in brand awareness based on gender
H1: There is a difference in brand awareness based on gender
2. Select Appropriate Test
X2 (Chi Square)
• Used to discover whether 2 or more groups of one variable
(dependent variable) vary significantly from each other with
respect to some other variable (independent variable).
• Are the two variables of interest associated:
– Do men and women differ with respect to product usage
(heavy, medium, or light)
– Is the preference for a certain flavor (cherry or lemon) related
to the geographic region (north, south, east, west)?
H0: Two variables are independent (not associated)
H1: Two variables are not independent (associated)
• Must be nominal level, or, if interval or ratio must be divided into
categories
Awareness of Tire Manufacturer’s Brand
Men
Women
Total
Aware
50/39
10/21
60
Unaware
15/26
65
25/14
35
40
100
Estimated cell
Frequency
E
ij
=
R iC
j
n
Ri = total observed frequency in the ith row
Cj = total observed frequency in the jth column
n = sample size
Eij = estimated cell frequency
3. Choose Level of Significance
• Whenever we draw inferences about a population, there is
a risk that an incorrect conclusion will be reached
• The real question is how strong the evidence in favor of the
alternative hypothesis must be to reject the null
hypothesis.
• The significance level states the probability of incorrectly
rejecting H0. This error is commonly known as Type I
error, The value of  is called the significance level of
the test
• In the example a Type I error would be committed if we
said that
There is a difference between men and women with respect
to brand awareness when in fact there was no difference
• Significance Level selected is typically .05 or .01
• i.e 5% or 1%
•In other words we are willing to accept the risk
that 5% (or 1%) of the time the results we get
indicate that there is a difference between men
and women with respect to brand awareness when
in fact there is no difference
3. Choose Level of Significance
• We commit Type error II when we
incorrectly accept a null hypothesis when it
is false. The probability of committing Type
error II is denoted by .
• In our example we commit a type II error
when we say that.
there is NO difference between men and women
with respect to brand awareness (we accept the
null hypothesis) when in fact there is
Type I and Type II Errors
Null is true
Null is false
Accept null
Reject null
Correctno error
Type I
error
Type II
error
Correctno error
Which is worse?
• Both are serious, but traditionally Type I error has
been considered more serious, that’s why the
objective of hypothesis testing is to reject H0 only
when there is enough evidence that supports it.
• Therefore, we choose  to be as small as possible
without compromising .
• Increasing the sample size for a given α will
decrease β (I.e. accepting the null hypothesis when it is
in fact false)
Awareness of Tire Manufacturer’s Brand
Men
Women
Total
Aware
50/39
10/21
60
Unaware
15/26
65
25/14
35
40
100
Estimated cell
Frequency
E
ij
=
R iC
j
n
Ri = total observed frequency in the ith row
Cj = total observed frequency in the jth column
n = sample size
Eij = estimated cell frequency
4. Calculate the Test Statistic
Chi-Square Test: Differences Among Groups
X
+
2
=
( 15
( 50
- 39 )
39
- 26 )
26
2
+
2
+
( 25
( 10
- 21 )
21
- 14 )
14
2
2
 2 = 3 .102 + 5 .762 + 4 .654 + 8 .643 =
 2 = 22 .161
d . f . = ( R - 1)( C - 1)
d . f . = ( 2 - 1)( 2 - 1) = 1
Chi-square test results are unstable if cell count is lower than 5
Degrees of Freedom
 the number of values in the final calculation of a statistic that
are free to vary
For example To calculate the standard deviation of a random
sample, we must first calculate the mean of that sample and
then compute the sum of the squared deviations from that mean
While there will be n such squared deviations only (n - 1) of them
are free to assume any value whatsoever.
This is because the final squared deviation from the mean must
include the one value of X such that the sum of all the Xs divided by
n will equal the obtained mean of the sample.
All of the other (n - 1) squared deviations from the mean can,
theoretically, have any values whatsoever..
5. Determine the Probabilityvalue (Critical Value)
•The p-value is the probability of seeing a random
sample at least as extreme as the sample observed given
that the null hypothesis is true.
• given the value of alpha,  we use statistical theory to
determine the rejection region.
• If the sample falls into this region we reject the null
hypothesis; otherwise, we accept it
• Sample evidence that falls into the rejection region is
called statistically significant at the alpha level.
Significance from p-values -continued
• How small is a “small” p-value? This is largely a
matter of semantics but if the
– p-value is less than 0.01, it provides “convincing”
evidence that the alternative hypothesis is true;
– p-value is between 0.01 and 0.05, there is
“strong” evidence in favor of the alternative
hypothesis;
– p-value is between 0.05 and 0.10, it is in a “gray
area”;
– p-values greater than 0.10 are interpreted as
weak or no evidence in support of the alternative.
5. Determine the Probability-value (Critical Value)
Chi-square Test for Independence
Under H0, the probability distribution is approximately
distributed by the Chi-square distribution (2).
Chi-square
Reject H0
3.84
2

22.16
X2 with 1 d.f. at .05 critical value = 3.84
6 a) Compare with the level of significance, 
b) Determine if the critical value falls in the rejection
region. (check tables)
22.16 is greater than 3.84 and falls in the rejection area
In fact it is significant at the .001 level, which means that the
chance that our variables are independent, and we just happened
to pick an outlying sample, is less than 1/1000
7 Reject or do not reject H0
Since 22.16 is greater than 3.84 we reject the null hypothesis
8 Draw a conclusion
Men and women differ with respect to brand awareness,
specifically, men are more brand aware then women
Example 2:
• The manager of Pepperoni Pizza Restaurant has
recently begun experimenting with a new method
of baking its pepperoni pizzas.
• He believes that the new method produces a
better-tasting pizza, but he would like to base a
decision on whether to switch from the old method
to the new method on customer reactions.
• Therefore he performs an experiment.
The Experiment
• For 40 randomly selected customers who order a
pepperoni pizza for home delivery, he includes both
an old style and a free new style pizza in the order.
• All he asks is that these customers rate the difference
between pizzas on a -10 to +10 scale, where -10
means they strongly favor the old style, +10 means
they strongly favor the new style, and 0 means they
are indifferent between the two styles.
New pizza
Old pizza
-10
0
+10
1. Formulate H1and H0
One-Tailed Versus Two-Tailed Tests
• The form of the alternative hypothesis can be either a
one-tailed or two-tailed, depending on what you are
trying to prove.
• A one-tailed hypothesis is one where the only sample
results which can lead to rejection of the null hypothesis
are those in a particular direction, namely, those where
the sample mean rating is positive.
• A two-tailed test is one where results in either of two
directions can lead to rejection of the null hypothesis.
1. Formulate H1and H0
One-Tailed Versus Two-Tailed Tests -- continued
• Once the hypotheses are set up, it is easy to detect
whether the test is one-tailed or two-tailed.
• One tailed alternatives are phrased in terms of “>” or
“<“ whereas two tailed alternatives are phrased in
terms of “”
• The real question is whether to set up hypotheses for
a particular problem as one-tailed or two-tailed.
• There is no statistical answer to this question. It
depends entirely on what we are trying to prove.
1. Formulate H1and H0
• As the manager you would like to observe a
difference between both pizzas
• If the new baking method is cheaper, you would
like the preference to be for it.
– Null Hypothesis –H0 =0 (there is no difference
between the old style and the new
style pizzas) (The difference between
the mean of the sample and the mean
of the population is zero)
– Alternative
= mu=population mean
–H1 0
Two tail
test
or
H1  >0
One tail
test
2. Select Appropriate Test
The one-sample t test is used to test whether the
mean of the data sample is equal to a hypothesized
value of the population from which the sample is is
drawn.
What we want to test is whether consumers prefer the
new style pizza to the old style. We assume that there
is no difference (i.e. the mean of the population is
zero) and want to know whether our observed result is
significantly (I.e. statistically) different.
Type I Error
Rejecting the null hypothesis that the pizzas are
equal, when they really are perceived equal by the
customers of the entire population.
Type II error
Not rejecting the null hypothesis that the pizzas are
equal, when they are perceived to be different by the
customers of the entire population.
3. Choose Level of Significance
Significance Level selected is typically .05 or
.01
•I.e 5% or 1%
The ratings of 40 randomly selected customers and
produce the following table and statistics
From the summary statistics, we see that the sample
mean is 2.10 and the sample standard deviation is 4.717
Summary Statistics
The positive sample mean provides some
evidence in favor of the alternative
hypothesis, but given the rather large
standard deviation does it provide enough
evidence to reject H0?
4. Calculate the Test Statistic
t=
t - value =
X- 0
s/n
 T(n-1)
2.10 - 0
4.717 / 40
= 2.816
5. Determine the Probability-value (Critical Value)
• We use the right tail because the alternative is
one-tailed of the “greater than” variety
• The probability beyond this value in the right
tail of the t distribution with n-1 = 39 degrees
of freedom is approximately 0.004
• The probability, 0.004, is the p-value for the
test. It indicates that these sample results
would be very unlikely if the null hypothesis
is true.
6. Compare with the level of significance,  (.05)and
determine if the critical value falls in the rejection region
Do not Reject H0
1-
Reject H0
Reject H0
/2
/2
-2.074
0
2.074 2.8.16
7. Reject or do not reject H0
Since the statistic falls in the rejection area we reject Ho
and conclude that the perceived difference between the
pizzas is significantly different from zero.
8 Conclusion
• Should the manager switch to the new-style pizza on
the basis of these sample results?
• We would probably recommend “yes”. There is no
indication that the new-style pizza costs any more to
make than the old-style pizza, and the sample
evidence is fairly convincing that customers, on
average, will prefer the new-style pizza.
• Therefore, unless there are reasons for not switching
(for example, costs) then we recommend the switch.
Example 3
• Suppose you are the brand manager for Tylenol,
and a recent TV ad tells the consumers that Advil
is more effective (quicker) at treating headaches
than Tylenol.
• An independent random sample of 400 people with
a headache is given Advil, and 260 people report
they feel better within an hour.
• Another independent sample of 400 people is taken
and 252 people that took Tylenol reported feeling
better.
• Is the TV ad correct? Or, in other words, is there a
difference between the means of the two samples
Hypothesis Test for Two
Independent Samples
•Test for mean difference:
– Null Hypothesis
– Alternative
–H0 1= 2
–H1 1 2
–Under H0 1- 2 = 0. So, the test concludes whether there is
a difference between the parameters or not.
Comparison of means:
Graphically
Are the means equal?
2. Select Appropriate Test
In this example we have two independent samples
Other examples
• populations of users and non-users of a brand differ in perceptions
of the brand
• high income consumers spend more on the product than low income
consumers
•The proportion of brand-loyal users in Segment 1 is more than the
proportion in segment II
•The proportion of households with Internet in Canada exceeds that
in USA
• Can be used for examining differences between means and
proportions
2. Select Appropriate Test
The two populations are sampled and the means and
variances computed based on the samples of sizes n1 and n2
If both populations are found to have the same variance
then A t-statistic is calculated.
 The comparison of means of independent samples assumes
that the variances are equal.
If the variances are not known an F-test is conducted to
test the equality of the variances of the two populations.
F

0
f
Unequal variances: The problem
Tylenol vs Advil
• We would need to test if the difference is
zero or not.
H0: A - T = 0;
H1: A - T  0
pA = 260/400= 0.65
pT = 252/400= 0.63
.65 - .63 -0
= 0.66
z=
(.65)(.35)/400+ (.63)(.37)/400
For large samples the t-distribution approaches the normal
distribution and so the t-test and the z-test are equivalent.
Differences Between Groups
when Comparing Means
• Ratio scaled dependent variables
• t-test
– When groups are small
– When population standard deviation is
unknown
• z-test
– When groups are large
Degrees of Freedom
• d.f. = n - k
• where:
– n = n1 + n2
– k = number of groups
The degrees of freedom is (n1 + n2 –2)
Tylenol vs Advil
 = 0.10 N(0,1) = 1.64
 -1
/2
-
/2
-1.64
0
0.66
1.64

Since 0.66 is less than the critical value of 1.64 we accept the null
hypothesis: there is no difference between Advil and Tylenol users
Test for Means Difference on
Paired Samples
What is a paired sample?
–When two sets of observations relate to the same
respondents
• When you want to measure brand recall before and after
an ad campaign.
• Shoppers consider brand name to be more important than
price
• Households spend more money on pizza than on
hamburgers
• The proportion of a bank’s customers who have a
checking account exceeds the proportion who have a
savings account
–Since it is the same population that is being sampled
the observations are not independent.
Test for Means Difference on
Paired Samples
• Since both samples are not independent we
employ the differences as a random sample
di=x1i-x2i
i=1,2,…,n
• The appropriate test is a paired-t-test
Example
Q1. When purchasing golf clubs rate the importance 1-5 of
price
Q2. When purchasing golf clubs rate the importance 1-5 of
brand
H0
There is no difference in importance between brand and
price
H1 One tailed
Price is more important than brand
H1 Two Tailed
There is a difference in importance between
brand and price
What is an ANOVA?
• One-way ANOVA stands for Analysis of
Variance
• Purpose:
– Extends the test for mean difference between
two independent samples to multiple samples.
– Employed to analyze the effects of
manipulations (independent variables) on a
random variable (dependent).
Definitions
• Dependent variable: the variable we are
trying to explain, also known as response
variable (Y).
• Independent variable: also known as
explanatory variables or Factors (X).
• Research normally involves determining
whether the independent variable has an
effect on the variability of the dependent
variable
What does ANOVA tests?
•
The null hypothesis tests whether the mean of all
the independent samples is equal
H0 1= 2 = 3 …..= n
H1 1 2  3 …..  n
•
The alternative hypothesis specifies that all the
means are not equal
Comparing Antacids
The maker of Acid-off, an antacid stomach remedy wants to
know which type of ad results in the most positive brand
attitude among consumers.
• Non comparative ad:
– Acid-off provides fast relief
• Explicit Comparative ad:
– Acid-off provides faster relief than Tums
• Non explicit comparative ad
– Acid-off provides the fastest relief
Three groups of people are exposed to one type of ad and
asked to rate their attitude towards the ad.
Comparing Antacids
Brand
Attitude
Means
Type of Ad
Non
Comparative
Explicit
Comparative
Non Explicit
Comparative
The dependent variable is called the response variable and in
this case it is brand attitude
The independent variables are called factors, in this case
type of ad
And the different levels of the factor are called treatments.
In this case the treatments are each of the three types of ads:
non-comparative, explicit comparative, non-explicit
comparative.
There will be two sources of variation.
Variation within the treatment (e.g. within the noncomparative ad)
Variation between the treatments (I.e. between the three
types of ads)
Decomposition of the Total
Variation
Within
Category
Variation
SSwithin
Category
Mean
Independent Variable X
Categories
Total Sample
X1
X2
X3
Xc
….
Y1
Y1
Y1
Y1
Y1
….
Y2
Y2
Y2
Y2
Y2
….
Total
Variation
SSy
Yn
Yn
Yn
Yn
Yn
….
Y1
Y2
Y3
Yc
Y
Between Category Variation SSbetween
Grand
Mean
ANOVA Test
• The null hypothesis would be tested with
the F distribution
F distribution
•Degrees of Freedom
Reject H0 
f(c-1)(Nc)
–cn-1 where c=number of groups, n= number of
observations in a group
– One way ANOVA investigates:
– Main effects
• factor has an across-the-board effect
• e.g., type of ad
• Or age
• or involvement
– A TWO-WAY ANOVA investigates:
– INTERACTIONS
• effect of one factor depends on another factor
• e.g., larger advertising effects for those with no
experience
• importance of price depends on income level and
involvement with the product