Download chap016 - courses.psu.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
1995 7888 4320 000 000001 00023
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
16
16-2
Data Analysis: Testing for
Significant Differences
1234 0001 897251 00000
1995 7888 4320 000 000001 00023
C
H
A
P
T
E
R
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
The Value of Testing for Differences in Data
1995 7888 4320 000 000001 00023
Basic statistical techniques bring “structure” to
the raw data which has been captured by the
research team.
Every data set needs to be summarized to
discern what the entire set of responses mean.
The output from statistical analysis can be
displayed graphically, adding a fresh
perspective to a decision-maker’s
understanding of an information problem, or
market opportunity.
16-3
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
SPSS Applications Database
1995 7888 4320 000 000001 00023
Easy-to-use software like SPSS for Windows has changed the
way statistics is being taught and learned:
 Class participants no longer have to learn a system of
elaborate code to conduct data analysis.
 Data is entered, items are chosen from pull-down menus, and
options can be “clicked” to create graphs and perform simple
or complex analyses.
 Generally, SPSS for Windows has improved the quality of life
for:
1. Research Teams applying statistics.
2. Teachers teaching statistics.
3. Students learning statistics.
16-4
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Bar Charts: An Example
1995 7888 4320 000 000001 00023
100
Frequency
80
60
40
20
0
1
2
3
4
5
6
Importance
16-5
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Line Charts: An Example
1995 7888 4320 000 000001 00023
100
80
60
40
20
0
1
16-6
2
3
4
5
6
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Pie Charts: An Example
1995 7888 4320 000 000001 00023
Very Important
26.3%
Important
19.7%
Somewhat Important
6.6%
6.6%
23.4%
17.1%
16-7
Somewhat
Unimportent
Unimportant
Very unimportant
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Statistical Analysis Techniques
1995 7888 4320 000 000001 00023
 Descriptive Statistics: Used by researchers to summarize sample data.
 Univariate Statistics: Used when a researcher investigates one variable at a
time.
 Bivariate Statistics: Used when a researcher investigates two variables at a
time.
 Multivariate Statistics: In a broad sense, multivariate statistics refer to any
simultaneous analysis of more than two variables.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Descriptive Statistics
1995 7888 4320 000 000001 00023
 This type of statistics describes sample data and often leads to
subsequent analyses.




What is the average income of the sample?
How old is the average employee in Company X?
How different are the employees’ ages in Company X?
How spread out is the income data that have been drawn as a
sample of the population?
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Descriptive Statistics – cont’d
1995 7888 4320 000 000001 00023
 Three Groups
 Central Tendency of the Variable
 Mean
 Median
 Mode
 Dispersion
 Range
 Variation (or Standard Deviation)
 Coefficient of Variation
 Shape of the Distribution
 Skewness
 Kurtosis
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Measures of Central Tendency
1995 7888 4320 000 000001 00023
 Average: Single value that is typical or representative of a group
of numbers.
 An average is frequently referred to as a measure of central
tendency. The most common known measures of central
tendency are the mean, median, and mode. A measure of central
tendency describes the center of a distribution.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Measures of Central Tendency – cont’d
1995 7888 4320 000 000001 00023
 Mean: Most commonly used measure of central tendency. The
sum of the values in a data set divided by the number of values
in the set.
 The computation of the mean is based on all values of a set
of data.
 Median: Value of the middle item when the numbers are
arranged in order of magnitude.
 A positional average
 Not defined algebraically as is the mean
 In some cases, cannot be computed exactly as can the mean
 Is centrally located
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Measures of Central Tendency – cont’d
1995 7888 4320 000 000001 00023
 Mode: The value that occurs most frequently in the set. When there are
two or more modes in a set of data, the data are called bimodal or
multimodal.




The value with the highest frequency in the set of values.
By definition, is not affected by extreme values.
Is easy to compute with a set of discrete data.
The value of the mode may be greatly affected by the method of
designating the class intervals.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Frequency Distribution
1995 7888 4320 000 000001 00023
 Raw Data: The collected data which have not been organized
numerically.
 Frequency: The number of times a value is repeated.
 Frequency Distribution: A distribution of data that summarizes
the number of times a certain value of a variable occurs and is
expressed in terms of percentage.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Measures of Shapes of
Frequency Distributions
1995 7888 4320 000 000001 00023
 In addition to the averages and dispersions, two other
measures are used in describing the characteristics of a
group of data. These measures are skewness and kurtosis.
 Measure of Skewness
 Skewness: Indicates the direction of an
asymmetrical distribution, either leaning toward
higher values or lower values.
 Kurtosis: Indicates the relative peakedness according to the
frequency distribution.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Figure 16.5 Skewness of a Distribution
1995 7888 4320 000 000001 00023
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Measures of Central Tendency
1995 7888 4320 000 000001 00023
I. The Mean is the arithmetical average of the
sample. For example, 5.5 is the mean of the
following sample: (1,2,3,4,5,6,7,8,9,10)
II. The Median is the middle value of a rankordered distribution: For example, 5.5. is the
median of the following sample:
(1,2,3,4,5,6,7,8,9,10)
III. The Mode is the most frequently mentioned
response, or number in a data set. There isn’t a
mode in the following sample:
(1,2,3,4,5,6,7,8,9,10)
16-8
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Measures of Dispersion
1995 7888 4320 000 000001 00023
 The Range is the distance between the biggest
and smallest values in a data set. For example, 9
is the range for the following data set:
(1,2,3,4,5,6,7,8,9,10)
 The Variance is the average squared deviation
about the mean of a distribution of values. What
is the variance for the data set listed above?
 The Standard Deviation is the average distance
of the distribution values from the mean. What
is the standard deviation for the data set listed
above?
16-9
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
What are Univariate Statistics?
1995 7888 4320 000 000001 00023
 Univariate Statistics: Statistics used when a researcher investigates
only one variable at a time. More specifically, this type of statistics is
used when only one measurement of each element in the sample is
taken, or multiple measurements of each element are taken but each
variable is analyzed independently.
 Univariate statistical techniques can be further broken down according
to whether the data is nominal, ordinal, interval, or ratio scaled.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
An Overview of Hypothesis Testing
1995 7888 4320 000 000001 00023
 A Statistical Hypothesis: (or simply Hypothesis) An assumption
or informed guess made about a population characteristic. It can
be defined as an unproven statement or proposition about
something under investigation by a researcher.
 Before accepting or rejecting any hypothesis, marketing
managers test it to determine the likelihood of it being true. It can
either be rejected or not rejected.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
An Overview of Hypothesis Testing (cont.)
1995 7888 4320 000 000001 00023
 Null hypothesis: The hypothesis to be tested for possible acceptance or
rejection. A null hypothesis is usually denoted by the symbol Ho.
 Alternative hypothesis: An assumption believed to be true if the null
hypothesis is false. An alternative hypothesis is denoted by H1. In a given
test, there is usually only one null hypothesis, but there may be several
alternative hypotheses.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Terminology
1995 7888 4320 000 000001 00023
 Degrees of Freedom: The number of variables that can vary
freely in a set of variables under certain conditions.
 Statistical Significance: The differences in findings that cannot
be caused by chance or sampling error alone.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Steps for Testing Hypotheses
1995 7888 4320 000 000001 00023
State the null and alternative hypotheses.
Select a suitable test statistic and its distribution.
Select the level of significance and critical values.
State the decision rule.
Collect relevant data and perform the
calculations.
Make a decision.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Figure 16.6 A General Procedure for Hypothesis Testing
Formulate H0 and H1
Step 2
Select Appropriate Test
Step 3
Choose Level of Significance, α
Step 4
Collect Data and Calculate Test Statistic
1995 7888 4320 000 000001 00023
Step 1
a)
Determine Probability
Associated with Test
Statistic(TSCAL)
a)
Compare with Level of
Significance, α
Step 5
Step 6
b)
Determine Critical
Value of Test Statistic
TSCR
Determine if TSCR falls
into (Non) Rejection Region
b)
Step 7
Reject or Do Not Reject H0
Step 8
Draw Marketing Research Conclusion
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Procedure for Testing Hypotheses
1995 7888 4320 000 000001 00023
Step 1: State the Null and Alternative Hypotheses
The null hypothesis can be stated as having no
difference between the two given values, or the
difference is zero.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Step 2: Select a Suitable Test Statistic and Its Distribution
1995 7888 4320 000 000001 00023
To decide on the appropriate test statistic, the
researchers should consider the shape and
characteristics of the sampling distribution.
 Test statistic is calculated from the sample data,
whose sampling distribution is used to test whether we
may reject the null hypothesis.
 Popular test statistics for testing hypotheses are t-test,
F-test, and the chi-square goodness of fit test.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Step 3: Select the Level of Significance and Critical Values
1995 7888 4320 000 000001 00023
Two Types of Problems, or Errors, Can Result
 Type I Error: The researcher rejects a null
hypothesis that actually is true.
 Type II Error: The researcher accepts a null
hypothesis that actually is not true.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Step 3: Select the Level of Significance and Critical Value( cont.)
1995 7888 4320 000 000001 00023
 Level Of Significance Specifying Type I Error (): The
maximum probability of making a Type I error specified in a
hypothesis test. The level of significance is usually specified
before a test is made.
 The value of 5% (= 0.05) or 1% (= 0.01) is frequently
used to set the level of significance, although other values
may also be used.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Step 3: Select the Level of Significance and Critical Values (cont.)
1995 7888 4320 000 000001 00023

Two-Tailed Tests and One-Tailed Tests: The level of
significance may be represented by a portion of the area under
the normal curve in two ways:
1. The hypothesis tests based on the level of significance
represented by both tails under the normal curve are called
two-tailed tests or two-sided tests.
2. If the level of significance is represented by only one tail,
the tests are called one-tailed tests or one-sided tests.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
1995 7888 4320 000 000001 00023
Step 4: State the Decision Rule
 Specifies the conditions under which the null hypothesis may be
rejected, given the sample results. It is based on the level of
significance and is stated prior to data collection.
 Reject the null hypothesis if the difference between the sample
mean and the hypothesized population mean falls into a rejection
region. Otherwise, accept the null hypothesis.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
1995 7888 4320 000 000001 00023
Step 5: Collect Relevant Data and Perform
the Calculations
Collect the relevant information
Perform the calculations
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
1995 7888 4320 000 000001 00023
Step 6: Make a Decision
Refer back to our decision rule (Step 4). We reject
the null hypothesis when the computed value falls in
the rejection region or accept it when the computed
value falls in the acceptance region.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Figure 16.8 Probability of z With a One-Tailed Test
Chosen Confidence Level = 95%
1995 7888 4320 000 000001 00023
Chosen Level of
Significance, α=.05
z = 1.645
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Selected Hypothesis Tests – cont’d
1995 7888 4320 000 000001 00023
 t-Test and t-Distribution: (a.k.a., the Student’s distribution) The t-distribution
is a bell-shaped and symmetric distribution that is used for testing small
samples (n < 30).
 Can be used to test a hypothesis about a sample mean when the population
standard deviation (σ) is unknown and the sample size is considered small,
usually less than or equal to 30. When a t-distribution is used to test a
hypothesis, then the test is called a t-test.
 The distribution of the values of t is not normal, but its use and the shape are
somewhat analogous to those of the standard normal distribution of z.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Developing a Hypothesis
1995 7888 4320 000 000001 00023
The first “stage” in testing a hypothesis is to
develop one to be tested!
 A hypothesis allows the research team to
compare two groups of respondents to see if
there are important differences between them.
 A hypothesis is developed before any data is
collected by the research team.
 A hypothesis is developed as part of the overall
research plan agreed to by the research team
and the client.
16-10
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Statistical Significance: Type I Error
1995 7888 4320 000 000001 00023
A Type I error occurs when a research
team rejects the null hypothesis when it
is true. This is often referred to as the
“probability of alpha”.
16-11
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Statistical Significance: Type II Error
1995 7888 4320 000 000001 00023
A Type II error occurs when a research
team accepts the null hypothesis when
the alternative hypothesis is true. This
is often referred to as the “probability
of beta”.
16-12
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Figure 16.7 Type I Error (α) and Type II Error (β )
1995 7888 4320 000 000001 00023
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Univariate Hypothesis Testing: An Example
One-Sample Statistics
1995 7888 4320 000 000001 00023
X2-Competitive Price
N
Mean
Std.
Deviation
50
2.22
1.15
Std.
Error
Mean
.16
One-Sample Test
Test Value = 5.5
95%
Confidence
Interval of the
Difference
X2-Competitive Price
16-13
t
df
-20.203
49
Sig.
Mean
(2-tailed) Difference Lower Upper
.000
-3.28
-3.61
-2.95
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Bivariate Hypothesis Testing: An Example
Group Statistics
1995 7888 4320 000 000001 00023
Std.
Error
Mean
N
Mean
Std.
Deviation
Male
20
4.35
1.04
.23
Female
30
5.07
.78
.14
Gender
Satisfaction level
16-14a
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Bivariate Hypothesis Testing: An Example
1995 7888 4320 000 000001 00023
Independent Samples Test
t-test for Equality of Means
Levene’s Test
for Equality
of Variances
95% Confidence
Interval of the Mean
F
Sig.
t
df
Sig.
(2-tailed)
Mean
Difference
Std. Error
Difference
Lower
Upper
3.415
.071
-2.775
48
.008
-.72
.26
-1.24
-.20
-2.624
33.048
.013
-.72
.27
-1.27
-.16
Satisfaction level
Equal variances
assumed
Equal variances
not assumed
16-14b
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Analysis of Variance: ANOVA
1995 7888 4320 000 000001 00023
In the “language-game” of marketing
research, ANOVA stands for the
“analysis of variance”. ANOVA is a
very sophisticated statistical technique
which tells a research team whether
three or more means are statistically
distinct from one another.
16-15
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Analysis of Variance: n-Way ANOVA
1995 7888 4320 000 000001 00023
In the language-game of marketing
research, an n-Way ANOVA is another
intricate statistical technique which
allows the research team to explore
several independent variables
simultaneously.
16-16
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Analysis of Variance: MANOVA
1995 7888 4320 000 000001 00023
In the language-game of marketing
research, there’s also something called
MANOVA. MANOVA considers the
mean differences for a group of
dependent measures, exploring a bunch
of dependent variables across a bunch
of independent variables.
16-17
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Summary of Learning Objectives
1995 7888 4320 000 000001 00023
 Understand the mean, median, mode as measures of central
tendency.
 Understand the range and standard deviation of a frequency
distribution as measures of dispersion.
 Understand how to graph measures of central tendency.
 Understand the difference between independent and related
samples.
 Explain hypothesis testing and assess potential error in its use.
 Understand univariate and bivariate statistical tests.
 Apply and interpret the results of the ANOVA and n-way ANOVA
statistical methods.
16-18
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.