Download Introduction - Southern Oregon University

Document related concepts

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Lecture Outlines for Applied Business Statistics
Rene Leo E. Ordonez, Southern Oregon University
Summer 2010
Note: The problems below were based on the text Doing Statistics for Business with
Excel, 2nd Edition, by Pelosi and Sandifer. The problems in each section are courtesy of
same text.
Page
Number
Reference to Excel and Minitab (Statistical functions)
3
Introduction
8
Probability Distributions
10
Sampling Distributions and Confidence Intervals
19
Hypothesis Testing: An Introduction
39
Inferences: One Population (Hypothesis Testing)
56
Comparing Two Populations
68
Improving and Managing Quality
83
Experimental Design and Analysis of Variance (ANOVA)
89
Analysis of Qualitative Data (Chi-square)
102
Regression and Correlation
113
Sample Midterm Exam
140
Sample Final Exam
148
EXCEL 2007
Statistical Procedure
Descriptive Statistics
(mean, median, etc.)
MINITAB
Data> Data Analysis > Descriptive Statistics
Stat > Basic Statistics > Display Descriptive Statistics
Confidence Interval Estimates
Mean
Proportion
Data > Data Analysis > Descriptive Statistics
NONE
Stat > Basic Statistics > 1 Sample t (or 1 Sample z)
Stat > Basic Statistics > 1 Proportion
One Population Hypothesis Test
Mean
Proportion
Data > Data Analysis > Descriptive Statistics
NONE
Stat > Basic Statistics > 1 Sample t (or 1 Sample z)
Stat > Basic Statistics > 1 Proportion
Two Populations Hypothesis Test
Means of 2 Dependent Samples
Means of 2 Independent Samples (small samples, equal vars.)
Means of 2 Independent Samples (small samples, unequal vars)
Means of 2 Independent Samples (large samples)
Data > Data Analysis >
t-Test:Paired Two Sample for Means
t-Test: Two Sample Assuming Equal Variances
t-Test: Two Sample Assuming Unequal Variances
z-Test: Two Sample for Means
Stat >
Stat >
Stat >
Stat >
Basic Statistics > Paired t
Basic Statistics > 2 Sample t
Basic Statistics > 2 Sample t
Basic Statistics > 2 Sample z
Variances of 2 Populations
F-Test Two-Sample for Variances
Stat > Basic Statistics > 2 Variances
Proportion of 2 Populations
NONE
Stat > Basic Statistics > 2 Proportions
Analysis of Variance
One Factor (use for comparing 2 or more population means)
Two Factor With Replication
Two Factor Without Replication
Interaction Effect Plot
Data > Data Analysis >
Anova: Single Factor
Anova: Two-Factor Wtih Replication
Anova: Two-Factor Wtihout Replication
NONE
Stat > ANOVA > Oneway Unstacked (or Stacked)
Stat > ANOVA > Twoway
Stat > ANOVA > Twoway
Stat > ANOVA > Interactions Plot
Chi-square Analysis
Goodness of Fit Test
Comparing Proportions of Two or More Groups
Testing Independence of Two Nominal Variables
NONE
NONE
NONE
NONE
Stat > Tables > Cross Tabulation (for raw data)
Stat > Tables > Chisquare Test (for summarized data)
Data > Data Analysis > F-Test Two-Sample for Variances
Stat > Basic Statistics > 2 Variances
Data > Data Analysis > Regression
Stat > Regression
Fitted Line Plot
Regression
Residual Plots
Comparing Variances of Two Populations
Regression and Correlation Analysis
STATISTICAL PROCESSING USING MINITAB
A free 30-day trial copy of the full commercial version of Minitab can be
downloaded from www.minitab.com
Basic Statistics
Confidence Interval Estimation
Hypothesis Testing
Use for generating means,
standard deviation, etc
Use for testing a population mean
(n ≥ 30 or  known)
Use for testing a population mean
(n < 30,  unknown, and normal)
Use for testing a population proportion
(approximation to the binomial)
Use for comparing means of two
INDEPENDENT sample
Use for comparing means of two
DEPENDENT samples
Use for comparing proportions of
two populations
One-way and Two-way ANOVA
Use for testing variances (or
standard deviation) of two groups
Use for One-way ANOVA
(procedure for comparing
means of two or more
independent samples)
Use for two-way ANOVA with
replication
Use for generating
interactions plot for two-way
ANOVA with replication
BA 282: Applied Business Statistics
Course Outline
Chisquare Tests
Use for performing Chisquare
test using RAW data
(procedure for testing
whether two qualitative
variables are independent)
Use for performing Chisquare
test using TABULATED data
Regression and Correlation Analysis
Use for generating output for
regression and correlation analysis
(for simple and multiple models)
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 4 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
STATISTICAL PROCESSING USING EXCEL
Data > Data Analysis
Use for One-way ANOVA
Use for two-way ANOVA with
replication
Use for generating means,
standard deviation, etc
Use for testing variances (or
standard deviation) of two groups
Use for regression and
correlation analysis
Use for comparing means of two
DEPENDENT sample
Use for comparing means of two
INDEPENDENT samples
(n < 30 and equal variances)
Use for comparing means of two
INDEPENDENT samples
(n < 30 and unequal variances)
Use for comparing means of two
INDEPENDENT samples
(n ≥ 30)
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 5 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Important Note:
If you don’t see Data Analysis option under the DATA tab you have to add it in. Here
are the steps:
1. Click on the Microsoft Icon (upper left corner), then select Excel Options
2. Click Add Ins, then click Analysis Tool Pak VBA, then Go
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 6 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
3. Select Analysis Toolpak and Analysis Toolpak VBA, then click OK
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 7 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
INTRODUCTION
1. What is statistics? A science that deals with rules and
procedures that govern how to:
collect summarize describe interpret data
2. Why study statistics?
Decisions! Decisions! Decisions!
3. The Importance of understanding probability
Some 'real life' examples (it’s just a game!)
Monty Hall Dilemma
Suppose you're on a game show, and you're given the choice of three doors.
Behind one door is a car, behind the others, goats. You pick a door, say number
1, and the host, who knows what's behind the doors, opens another door, say
number 3, which has a goat. He says to you, "Do you want to pick door number
2?" Is it to your advantage to switch your choice of doors? (Craig. F.
Whitaker,Columbia, MD )
Three Shell Game
Operator: Step right up, folks. See if you can guess which shell the pea is
under. Double your money if you win.
After playing the game a while, Mr. Mark decided he couldn't win more than
once out of three.
Operator: Don't leave, Mac. I'll give you a break. Pick any shell. I'll turn over
an empty one. Then the pea has to be under one of the other two, so your
chances of winning go way up.
4. Ways of assigning (determining) probabilities
Subjective - describes an individual's personal judgement
about how likely a particular event is to occur. It is not based
on any precise computation but is often a reasonable
assessment by a knowledgeable person.
Relative -- Relative probability is another term for
proportion; it is the value calculated by dividing the number
of times an event occurs by the total number of times an
experiment is carried out. p( x)  x n
Objective (classical) – is probability based on symmetry of
games of chance or similar situations. For example:
Coin tossing experiment  P(head)
Die tossing experiment  P(“one”)
Monty Hall Dilemma  P(win)
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 8 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
5. Important statistical terms and concepts
(KNOW THESE DEFINITIONS AND SYMBOLS!)

Population vs. sample
population -- any entire collection of people, animals,
plants or things from which we may collect data. It is the
entire group we are interested in, which we wish to
describe or draw conclusions about
sample -- a group of items selected from a population.
Conclusions about the population are drawn by studying
the sample.

Parameter vs. statistics
parameter – a numeric characteristic of a population
statistic – a numeric characteristic of a sample. It is used
to estimate and unknown population parameter
Parameters are often assigned Greek letters ( e.g. , , ),
whereas statistics are assigned Roman letters (e.g. s, p).

Common measures of central tendency
Mean, median, mode

Common measures of dispersion
Range, variance, standard deviation

Common symbols used in statistics
Parameters
Statistics
Important Symbols: Must Know
Size
One Population
Mean
Variance
Standard deviation
Proportion
Two Populations
Comparing Means
Comparing Proportions
Comparing Variances
Comparing Standard Deviations
Lecture Notes to Accompany
BA 282: Applied Business Statistics
POPULATION
N
SAMPLE
n

2


x
s2
s
p
1 vs 2
1 vs 2
21 vs 22
1 vs 2
x 1 vs x 2
p1 vs p2
s21 vs s22
s1 vs s2
Page 9 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
PROBABILITY DISTRIBUTIONS
1.
Definitions

probability – likelihood or chance of an event occurring

experiment -- any process or study which results in the
collection of data, the outcome of which is unknown.

random variable -- an outcome of an experiment. It need not
be a number, for example, the outcome when a coin is tossed
can be 'heads' or 'tails'. However, we often want to represent
outcomes as numbers. Usually denoted by the letter “X”
Example:
 toss a coin 5 times (experiment),
observe the number of heads (variable)
 Randomly select 20 students (sample),
record each student’s GPA (variable)
2.
Random variables

discrete (185) - usually involves counting (e.g. number of
defectives, number of correct answers, etc.) If a random
variable can take only a finite number of distinct values, then it
must be discrete
 in the coin tossing experiment above, the random variable is “number
of heads”
x = {0, 1, 2, 3, 4, 5}

continuous (186) – usually involves something that is
measured A continuous random variable is one which takes
an infinite number of possible values. Examples include height,
weight, the amount of sugar in an orange, the time required to
run a mile
 in the student sampling above, the random variable is GPA
x = {0 to 4.0}
3.
Common Discrete Probability Distributions
 Uniform

Binomial (191)
The trials must meet the following requirements:
a) the total number of trials (n) is fixed in advance;
b) there are just two outcomes of each trial; success and failure;
c) the outcomes of all the trials are statistically independent;
d) all the trials have the same probability of success
Example: coin tossing
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 10 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics

Course Outline
Hypergeometric (200)
a) each trial has just two outcomes; success and failure;
b) the outcomes of all the trials are statistically dependent;
c) the probability of success changes from trial to trial

4.
Poisson (203)-- typically, a Poisson random variable is a count
of the number of events that occur in a certain time interval or
spatial area. For example, the number of cars passing a fixed
point in a 5 minute interval; the number of calls received by a
switchboard during a given period of time
Common Continuous Probability Distributions

Uniform (219)
f(x)
A

B
Exponential (Not covered but will be introduced
and covered in BA 380-Operations Management)
x

5.
Normal (223 to 228)
The Normal Distribution (223 to 228)
 Characteristics
 bell-shaped
 mean = median = mode
 area underneath the curve equals 1
 symmetric about the mean (left side
is mirror-image of right side)
 area left of mean = 0.50 = area right
of mean
 asymptotic
Lecture Notes to Accompany
BA 282: Applied Business Statistics


x
Page 11 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
How to find areas (probabilities) using the Normal
table (229 to 234)
6.
A MUST UNDERSTAND CONCEPT!
Standard Normal
Distribution
z
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0
0.5000
0.5040
0.5080
0.5120
0.5160
0.5199
0.5239
0.5279
0.5319
0.5359
0.1
0.5398
0.5438
0.5478
0.5517
0.5557
0.5596
0.5636
0.5675
0.5714
0.5753
0.2
0.5793
0.5832
0.5871
0.5910
0.5948
0.5987
0.6026
0.6064
0.6103
0.6141
0.3
0.6179
0.6217
0.6255
0.6293
0.6331
0.6368
0.6406
0.6443
0.6480
0.6517
0.4
0.6554
0.6591
0.6628
0.6664
0.6700
0.6736
0.6772
0.6808
0.6844
0.6879
0.5
0.6915
0.6950
0.6985
0.7019
0.7054
0.7088
0.7123
0.7157
0.7190
0.7224
0.6
0.7257
0.7291
0.7324
0.7357
0.7389
0.7422
0.7454
0.7486
0.7517
0.7549
0.7
0.7580
0.7611
0.7642
0.7673
0.7704
0.7734
0.7764
0.7794
0.7823
0.7852
0.8
0.7881
0.7910
0.7939
0.7967
0.7995
0.8023
0.8051
0.8078
0.8106
0.8133
0.9
0.8159
0.8186
0.8212
0.8238
0.8264
0.8289
0.8315
0.8340
0.8365
0.8389
1
0.8413
0.8438
0.8461
0.8485
0.8508
0.8531
0.8554
0.8577
0.8599
0.8621
1.1
0.8643
0.8665
0.8686
0.8708
0.8729
0.8749
0.8770
0.8790
0.8810
0.8830
1.2
0.8849
0.8869
0.8888
0.8907
0.8925
0.8944
0.8962
0.8980
0.8997
0.9015
1.3
0.9032
0.9049
0.9066
0.9082
0.9099
0.9115
0.9131
0.9147
0.9162
0.9177
1.4
0.9192
0.9207
0.9222
0.9236
0.9251
0.9265
0.9279
0.9292
0.9306
0.9319
1.5
0.9332
0.9345
0.9357
0.9370
0.9382
0.9394
0.9406
0.9418
0.9429
0.9441
1.6
0.9452
0.9463
0.9474
0.9484
0.9495
0.9505
0.9515
0.9525
0.9535
0.9545
1.7
0.9554
0.9564
0.9573
0.9582
0.9591
0.9599
0.9608
0.9616
0.9625
0.9633
1.8
0.9641
0.9649
0.9656
0.9664
0.9671
0.9678
0.9686
0.9693
0.9699
0.9706
1.9
0.9713
0.9719
0.9726
0.9732
0.9738
0.9744
0.9750
0.9756
0.9761
0.9767
2
0.9772
0.9778
0.9783
0.9788
0.9793
0.9798
0.9803
0.9808
0.9812
0.9817
2.1
0.9821
0.9826
0.9830
0.9834
0.9838
0.9842
0.9846
0.9850
0.9854
0.9857
2.2
0.9861
0.9864
0.9868
0.9871
0.9875
0.9878
0.9881
0.9884
0.9887
0.9890
2.3
0.9893
0.9896
0.9898
0.9901
0.9904
0.9906
0.9909
0.9911
0.9913
0.9916
2.4
0.9918
0.9920
0.9922
0.9925
0.9927
0.9929
0.9931
0.9932
0.9934
0.9936
2.5
0.9938
0.9940
0.9941
0.9943
0.9945
0.9946
0.9948
0.9949
0.9951
0.9952
2.6
0.9953
0.9955
0.9956
0.9957
0.9959
0.9960
0.9961
0.9962
0.9963
0.9964
2.7
0.9965
0.9966
0.9967
0.9968
0.9969
0.9970
0.9971
0.9972
0.9973
0.9974
2.8
0.9974
0.9975
0.9976
0.9977
0.9977
0.9978
0.9979
0.9979
0.9980
0.9981
2.9
0.9981
0.9982
0.9982
0.9983
0.9984
0.9984
0.9985
0.9985
0.9986
0.9986
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 12 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
EXCEL FUNCTIONS FOR NORMAL DISTRIBUTION
IMPORTANT: LEARN HOW TO USE THESE FUNCTIONS!
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 13 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Exercises in Using the Standard Normal Table
Use Excel’s =NORMSDIST(z) function or the Standard Normal
Distribution to answer the problems below.
1) Draw the normal distribution, and shade and find the areas
(probabilities) of the following expressions:
e.g. P ( Z > 0 ) = ?
=1
0
z
a) P ( Z < 1.0 )
b) P ( Z > 1.0 )
c) P ( Z < 1.0 )
d) P ( 1.0 < Z < 1.0 )
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 14 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
e) P ( 1.0 < Z < 2.5)
Course Outline
f) P ( Z > 2.65 )
2) Given probabilities and their respective probability expressions, draw the normal
distribution, shade the areas and find the corresponding z values:
Use Excel’s =NORMSINV(area) function or the Standard
Normal Distribution to answer the problems below.
a) P ( Z < z ) = 0.95
Lecture Notes to Accompany
BA 282: Applied Business Statistics
b) P ( Z > z ) = 0.95
Page 15 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
c) P ( Z < z ) = 0.25
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Course Outline
d) P ( Z > z ) = 0.25
Page 16 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Learning it! Exercises
The amount of money spent by students for textbooks in a semester is a normally distributed
random variable with a mean of $235 and a standard deviation of $15
(a) Sketch the normal distribution that describes the amount of money spent on textbooks in a
semester.
(b) What is the probability that a student spends between $220 and $250 in any semester?
(c) What percentage of students spend more than $270 on textbooks in any semester?
(d) What percentage of students spend less than $225 in a semester?
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 17 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
The actual amount of a certain brand of orange juice in a container marked half gallon is a
normally distributed random variable with a mean of 65 oz. and a standard deviation of 0.35
oz.
(a) What percentage of the containers contain more than 64.5 oz?
(b) What percentage of the containers contain between 64 and 66 oz?
(c) If federal law says that 98% of all the containers must be or above the labeled weight, does this
brand of orange juice meet the requirement?
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 18 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
The size of a gift/specialty store in a regional super mall is a normally distributed random
variable with a mean of 8,500 sq ft and a standard deviation of 260 sq ft. What is the
probability that a randomly selected gift/specialty store in a regional super mall is:
a)
more than 8000 sq ft?
b)
between 8300 and 9000 sq ft?
c)
less than 9,500 square feet
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 19 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
SAMPLING DISTRIBUTIONS AND
CONFIDENCE INTERVALS
1.
The Distribution of the Sample Mean ( x ) and the Central Limit
Theorem
(266 to 269)
Central Limit Theorem Definition (271):
When randomly sampling from a population, the distribution of the
sample mean( x ) is:
 approximately normal regardless of the original population
distribution so as the sample is large (at least 30. But this
sample size restriction is not required if the population is normal
to begin with) with
 a mean  x equal to  and
 a standard deviation  x equal to 
X
n
X
x
x
=

x
n
X
X
x
Lecture Notes to Accompany
BA 282: Applied Business Statistics
X
x
Page 20 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
2.
Course Outline
Confidence Intervals

Use: For estimating unknown population parameters

Definition of confidence interval
(290)
A probability that the interval contains the true population
parameter
e.g. P ( U ≤  ≤ L ) = 1  

Components of confidence interval – point estimate and margin of
error
a. Point estimate
(290)
 A single number that is calculated from sample data
 Is used to estimate a population parameter
 e.g. sample mean is a point estimate for population
mean, sample proportion is a point estimate of population
proportion
POPULATION
SAMPLE
N
n
Size
One Population

2


Mean
Variance
Standard deviation
Proportion
x
s2
s
p
Two Populations
Comparing Means
Comparing Proportions
Comparing Variances
Comparing Standard Deviations
1 vs 2
1 vs 2
21 vs 22
1 vs 2
Parameters
Lecture Notes to Accompany
BA 282: Applied Business Statistics
x 1 vs x 2
p1 vs p2
s21 vs s22
s1 vs s2
Point estimators
a.k.a. statistics
Page 21 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
b. Margin of error (e)
 when added to and subtracted from the point estimator
gives the upper and lower limit for the range of values where
the population parameter could be found
 is affected (or determined) by:
 confidence level
 sample size
 population variability

Interpreting the confidence interval
(294 - 295)
Say that you computed a 95% confidence interval estimate for
the mean of a certain population as 3.2 and 3.5
 correct interpretation : “We are 95% confident that the
interval 3.2 and 3.5 contains the true population mean”
 incorrect interpretation: “There is a 95% chance that the
population mean is in the interval 3.2 and 3.5”
3.
Computing a confidence interval for the population mean ()

z-dist for Large samples or  known
C.I .  x  Z / 2 (

n)
t-dist for small samples and  unknown
C.I .  x  t / 2, n1 (s
Lecture Notes to Accompany
BA 282: Applied Business Statistics
(290-297)
(298-304)
n)
Page 22 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
4. Computing a confidence interval for qualitative data (the population
proportion ())
(305-310)
C.I .  p  Z / 2
p(1  p)
n
5. Sample Size Calculations (311-313)

For estimating a population mean
z 2 2
n
e2

For estimating a population proportion
(Using the Normal distribution as an approximation to the
Binomial distribution)
z 2 p(1  p)
n
e2

Factors affecting sample size requirement
(1)
confidence level
(2)
variability of the population
(3)
acceptable level of margin of error
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 23 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
CONFIDENCE INTERVAL ESTIMATION
A single POPULATION PARAMETER
(Mean and Proportion)
Population Proportion
()
Population Mean
( )
Yes
No
Is population
standard deviation
( ) known?
np and
n(1-p)  5 ?
Yes
Yes
Use z-distribution
(assume Normal if n < 30)
Use Normal Distribution as
Approximation to the Binomial
Distribution
Is n  30?
No
Use t-distribution
(assume Normal pop'n)
C.I .  x  Z (
n)
C.I.  x  Z (
n)
C .I .  x  t ( s
N n
N 1
Lecture Notes to Accompany
BA 282: Applied Business Statistics
C.I.  x  t ( s
n)
n)
C .I .  p  Z
N n
N 1
C.I.  p  Z
p (1  p )
n
p (1  p )
n
Page 24 of 148
Ordonez, School of Business, SOU
N n
N 1
BA 282: Applied Business Statistics
Course Outline
CONFIDENCE INTERVAL ESTIMATION
Confidence IntervaI for 
a.  Known
C.I .  x  Z / 2 (
b.
n)
 Unknown
C.I .  x  t / 2, n1 (s
n)
Confidence Interval for 
C.I .  p  Z / 2
p(1  p)
n
SAMPLE SIZE (n) DETERMINATION
For estimating 
n
z 2 2
e2
For estimating 
z 2 p(1  p)
n
e2
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 25 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
STUDENT’S t-DISTRIBUTION TABLE
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 26 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Learning it! Exercises (by hand and using Excel)
7.6
A manufacturer of pain reliever claims that it takes an average of 12.75 minutes for a person to be
relieved of headache pain after taking its pain reliever. The time it takes to relief is normally
distributed with a standard deviation of 0.5 minutes. A sample of 12 people is taken and the data are
shown here:
12.9
13.2
12.7
13.1
13.0
13.1
13.0
12.6
13.1
13.0
13.1
12.8
a.
Find the sample mean
b.
Find the standard error of the sample mean
c.
If the manufacturer claims that the mean is 12.75 minutes, find the z-score of the sample
mean
d.
What do you think of the manufacturer’s claim based on the z-score? (translation – if the
claim made by the manufacturer is correct, how likely is it to observe a sample at least as
large as your answer to (a))
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 27 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
7.8
The U-Male-It Hardware chain has 10 different stores within a certain geographical region. The dollar
value of customer sales is normally distributed with an average sale of $35.25 and a standard
deviation of $2.50. You have recently been hired to manage one these stores and under your
leadership the average sales based on a sample of 100 customers is $36.50. You are very proud of
the increased average sale and point this out to senior management.
a. Find the z-score for $36.50
b. Based on this z-score, is your pride justified?
(translation – is the difference observed between the original average sale of $35.25 and the sample
average sales of $36.50 significant, or is the difference mainly attributable to chance?)
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 28 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
7.16
A national grocery chain is considering opening a store at a particular location. To be sure that
enough traffic goes by that location, the grocery chain took a sample of vehicles crossing the
intersection on 40 days. The results are shown in the table below:
Number of Cars Crossing
Location per Day
1431
1540
1293
1340
1302
1700
1533
1402
1255
1840
1272
1467
1377
1642
1572
1220
1450
1139
1520
1477
1483
1227
1227
1515
1529
1684
1257
1242
1588
1782
1238
1350
1535
1491
1276
1367
1533
1513
1420
1375
a. Find a 95% confidence interval for the average number of cars that pass this location on a daily
basis. The standard deviation is assumed to be 165 cars.
b. The company has decided to open a store at this location only if there is a daily average of at
least 1400 cars passing this location. Based on your confidence interval, would you advise the
company to open a store at this location? Explain why or why not.
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 29 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
7.20
The police department is concerned about the ability of officers to identify drunk drivers on the road.
Before instituting a new training program they take a sample of 28 arrests and record the level of
alcohol in the blood at the time of the arrest. Assume that the level of alcohol in the blood is normally
distributed. The data are shown below:
92
127
204
209
93
256
182
141
108
184
173
151
173
253
105
133
194
159
153
147
133
101
150
209
207
133
180
252
(a)
Find a 90% confidence interval for the average alcohol level in the blood at the time of arrest
(b)
Find a 95% confidence interval for the average alcohol level in the blood at the time of arrest
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 30 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Excel Solution
Data > Data Analysis > Descriptive Statistics
Descriptive statistics
Margin of Error
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 31 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Minitab Solution
Stat > Basic Statistics > 1-Sample t > Options
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 32 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
7.21
A large amusement park has recently added 5 new rides, including a large roller coaster called Mind
Eraser. Management is concerned about the waiting times on the new roller coaster. A random
sample of 10 people is selected and the time (in minutes) that each person waits to ride the Mind
Eraser is recorded and shown below:
43
80
48
61
74
66
54
72
58
68
(a) Find a 95% confidence interval for the average waiting time for the Mind Eraser, assuming that
the waiting time is normally distributed.
(b) The park management thinks that if customers have to wait more than 60 minutes for a ride, then
the park should increase the staff to reduce the waiting time. Based on your confidence interval,
does the park need to increase the staff? Explain why or why not.
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 33 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
7.25
I asked 100 imaginary friends (only to avoid the time and cost of data collection) the following
question: “Do you regularly watch MTV’s Beavis and Butthead?” Of the 100 friends, 35 of them
answered yes.
(a) Calculate a 95% confidence interval for the viewership of this show.
(b) MTV is considering canceling the show if less than one-third of the population regularly watches
the show. Based on this information, what will MTV do?
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 34 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Minitab Solution
Stat > Basic Statistics > 1-Proportion
(No Excel procedure)
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 35 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
7.31
How many stores must be sampled for the woman who wants to buy a ranch to be 95% confident that
the error in estimating the average fat content per pound in steaks sold in the Portland, Maine area is
at most 0.05 oz? The standard deviation of fat content is known to be 0.30 oz.
7.32
How many months must be sample for analysts to be 99% confident that the error in estimating the
average monthly price of peanut butter is at most $0.02? Assume the standard deviation is $0.035
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 36 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
7.42
In an effort to improve the quality of the CD players that your company makes, you have started to
sample the component parts that you purchase from an outside supplier. You will accept the
shipment of parts only if there is less than 1% defectives in the shipment. Recognizing that you
cannot test the entire shipment (or population), you select a sample of 25 components to test. You
find 3 defectives in the sample.
(a) Find a 90 percent confidence interval for the proportion of components in the population that
are defective.
(b) Based on your confidence interval, should you accept the shipment? Why or why not?
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 37 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
7.44
A hotel is studying the proportions of rooms that are not ready when customers check in to the hotel.
(a)
How many rooms must be in the sample for the hotel to be 95% confident that the margin of
error is at most 1%?
(c) How many rooms must be in the sample for the hotel to be 95% confident that the margin of error
is at most 3%?
Lecture Notes to Accompany
BA 282: Applied Business Statistics
Page 38 of 148
Ordonez, School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Parametric Hypothesis
Testing
Testing a Mean
()
z test
t test
Comparing Two
or More
Populations
Comparing Two
Populations
One Population
Testing a
Proportion
()
Testing a Variance
( 2)
z -test
F test
Comparing
Two Means
( )
Dependent
Samples
Comparing Two
Proportions
( )
Comparing
Two Variances
( 2  2)
z-test
F test
Independent
Samples
Reverts back to
One Population
t test
z -test
t test
Equal variances
t-test (pooled t-test)
Lecture Notes to Accompany
Applied Business Statistics
Unequal
variances t-test
Page 39 of 148
School of Business, SOU
Comparing
Means of 2 or
More Groups
(   )
Comparing
Proportions
of 2 or More
Groups
(   )
ANOVA F test
Chi-square test
BA 282: Applied Business Statistics
Course Outline
HYPOTHESIS TESTING: AN INTRODUCTION
1.
What is a hypothesis test?
(327)
 a hypothesis is an idea, an assumption, or a theory about the behavior of one or
more variables in one or more populations
 a hypothesis test is a statistical procedure that involves formulating a hypothesis
and using sample data (n) to decide on the validity of the hypothesis i.e. is the
sample consistent with the hypothesis (in which case you believe the hypothesis) or
whether the sample is inconsistent with the hypothesis (in which case you choose
not to believe it or to reject it)
important!
in statistical testing, regardless of the specific hypothesis that you are testing, the
basic procedure is the same! Your understanding of the concepts introduced in this
chapter is crucial for the remaining chapters!
2.
Steps in performing hypothesis test: (328-332)
Step 1: Set up the null and alternative hypotheses
Step 2: Identify the significance level () for determining the critical value
Step 3: Identify the appropriate distribution
Step 4: Collect the sample data (for determining the computed value)
Step 5: Compare the computed value to the critical value (or the p-value to the
significance level)
Step 6: Make a statistical conclusion (reject the null or fail to reject the null)
Step 7: Make a managerial conclusion (usually a statistical test is conducted to assist
in a decision-making process)
3.
Null vs. Alternative Hypotheses and decision rule (329)
Important things to remember about H0 and H1
 H0: null hypothesis and H1: alternate hypothesis
 H0 and H1 are mutually exclusive and collectively exhaustive
 H0 is always presumed to be true
 H1 has the burden of proof
 a random sample (n) is used to “reject H0” or to “fail to reject H0 “
If we conclude 'do not reject H0', this does not necessarily mean that the
null hypothesis is true, it only suggests that there is not sufficient evidence
against H0 in favor of H1; rejecting the null hypothesis then, suggests that the
alternative hypothesis may be true.
 equality is always part of H0 (e.g. “=” , “≥” , “≤”).
“≠” “<” and “>” always part of H1
H 0 :  
 
H 1 :  
Lecture Notes to Accompany
Applied Business Statistics

 


 
 
Page 40 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Structure:
H 0 : null hypothesis
H1 : alternate hypothesis (can also be written as H A )
Reject H 0 if :
{evaluativ e condition} is true
4.
Setting up the null and alternative hypotheses (Is the test two-tailed (non-directional) or onetailed (directional)?) and establishing the Rejection Region (333)
 identify the parameter being tested (, , 2)
 determine how many populations are included in the test
 Is “the claim” the null hypothesis or the alternate hypothesis?



In actual practice, the status quo is set up as H0
If the claim is “boastful” the claim is set up as H1 (we apply the Missouri
rule – “show me”). Remember, H1 has the burden of proof
In problem solving, look for key words and convert them into symbols
(see examples below)
Some Examples
Keywords
Larger (or more) than
Smaller (or less)
No more than
At least
Has increased
Is there difference?
Has not changed
Has “improved”, “is
better than”. “is more
effective”
Inequality
Symbol
>
<

≥
>
≠
=
Part
of:
H1
H1
H0
H0
H1
H1
H0
See note
below
H1

The direction of the test involving claims that use the words “has
improved”, “is better than”, and the like will depend upon the
variable being measured.

For instance, if the variable involves time for a certain medication to
take effect, the words “better” “improve” or more effective” are
translated as “<” (less than, i.e. faster relief).

On the other hand, if the variable refers to a test score, then the
words “better” “improve” or more effective” are translated as “>”
(greater than, i.e. higher test scores)

The equality (, ≥, =) is always part of the null hypothesis.
Lecture Notes to Accompany
Applied Business Statistics
Page 41 of 148
School of Business, SOU
BA 282: Applied Business Statistics
5.
Course Outline
Two types of error in hypothesis testing: Type 1 () vs.Type 2 ()
(330-333)
Statistical definitions:
Type 1 () – the probability of rejecting a TRUE H0
Type 2 () – the probability of failing to reject (or “accepting”) a FALSE H0
True Condition
Statistical Conclusion
H0 TRUE
H0 FALSE
Reject H0
Type 1 ()
Correct
Correct
Type 2 ()
Fail to reject H0
More on Type 1 (): in addition to its definition as “the probability of rejecting a TRUE
H0 it is also:
 known as the significance level of a test (or simply, the significance level)
 usually ranges between 0.01 and 0.10 (which level is ‘best’? see next
subsection)
 used to generate the critical value for a test
 an area at the tail end of a distribution, and
 this area is known as the reject H0 region (or the rejection region)
 The critical value marks the boundary between the reject H0 and fail to
reject H0 regions
0
z
Which should be avoided - Type 1 or 2 error?
 For a given sample size (n), there is a trade-off between Type 1 and Type 2 errors,
that is, decreasing one will increase the other
 To decrease both types at the same time, a larger sample size must be taken
 However, because of cost, time, and practicality of sampling concerns, oftentimes
we need to choose between type 1 and type 2 errors.
 Which should we decrease? Depends on the cost associated with each type of
error
Lecture Notes to Accompany
Applied Business Statistics
Page 42 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
EXAMPLES:
In each of the example below, the Type 1 and Type 2 errors are defined in nonstatistical terms. Can you identify the ‘cost’ associated with each type of error? For
instance, in criminal cases, the cost associated with a Type 1 error (that is, a jury
convicting an innocent person) is the freedom, or worse yet, the life of the accused.
Now compare this to the cost of a Type 2 error. As a society, which do we consider
as worse?
Justice system - criminal and civil cases
H0: Innocent
H1: Guilty
True Condition
Statistical
Conclusion
Reject H0
(Guilty)
Fail to reject H0
(not Guilty)
Innocent
Guilty
Type 1 () –
conclude that
accused is guilty
when in fact is
innocent
Correct
Correct
Type 2 () –
conclude that
accused is not
guilty when in fact
is
Business - quality control situations – process monitoring
H0: Process is in control
H1: Process is not in control
True Condition
Statistical
Conclusion
Reject H0
(process not OK)
Fail to reject H0
(process OK)
Lecture Notes to Accompany
Applied Business Statistics
Process OK
Process Not OK
Type 1 () –
conclude that
process is not in
control when in
fact is
Correct
Correct
Type 2 () –
conclude that
process is OK when
in fact is not
Page 43 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Business - quality control situations- quality assurance
H0: Lot of shipment is good
H1: Lot of shipment is not good
True Condition
Statistical
Conclusion
Reject H0
(shipment is not
good)
Fail to reject H0
(shipment is
good)
Lecture Notes to Accompany
Applied Business Statistics
Good Lot
Not Good Lot
Type 1 () –
conclude that lot is
not good when in
fact is
(producer’s risk)
Correct
Correct
Type 2 () –
conclude that
shipment is good
when in fact is not
(consumer’s risk)
Page 44 of 148
School of Business, SOU
BA 282: Applied Business Statistics
6.
P-values
Course Outline
(339)
 The probability value (p-value) of a statistical hypothesis test is the probability of getting a
value of the test statistic as extreme as or more extreme than that observed by chance alone,
if the null hypothesis H0, is true. (see example below)
It is the probability of wrongly rejecting the null hypothesis if it is in fact true.
 When used as a decision rule in hypothesis testing, the p-value is compared to the
significance level (α). If the r-value is smaller, the conclusion is to reject the null hypothesis
(or, we say that the result “is significant.”
 Here’s a decision rule using the p-value as a decision rule – this applies to ALL forms of
hypothesis tests!
H 0 : null hypothesis
H1 : alternate hypothesis
Reject H 0 if :
Remember this very important rule!
p - value  
 Important interpretation! Small p-values suggest that the null hypothesis is unlikely to
be true. The smaller it is, the more convincing is the rejection of the null hypothesis. It
indicates the strength of evidence for say, rejecting the null hypothesis H0, rather than simply
concluding 'reject H0' or 'do not reject H0'.
Lecture Notes to Accompany
Applied Business Statistics
Page 45 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
P-value Example:
example:
Hypothesis Major Concepts, Hypothesis One PopulationDetermining Appropriate Test,
Hypothesis Testing Major Concepts_Pvalues
Lecture Notes to Accompany
Applied Business Statistics
Page 46 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
TESTING A POPULATION MEAN ()
H0:  = value
H1:   value
H0:   value
H1:  < value
Reject H0 if:
Z < Z
t <  t, n 1
Reject H0 if:
Z > Z/2
t > t/2, n 1
H0:   value
H1:  > value
Reject H0 if:
Z > Z
t > t, n 1
t
x
s
z
n
x

n
TESTING A POPULATION PROPORTION ()
H0:  = value
H1:   value
Reject H0 if:
Z > Z/2
H0:   value
H1:  < value
Reject H0 if:
Z < Z
H0:   value
H1:  > value
Reject H0 if:
Z > Z
z
p 
 (1   )
n
TESTING A POPULATION VARIANCE (2)
H0: 2 = value
H1: 2  value
Reject H0 if:
2 < 2 1-/2
2 > 2 /2
H0: 2  value
H1: 2 < value
Reject H0 if:
2 < 2 1-
Lecture Notes to Accompany
Applied Business Statistics
H0: 2  value
H1: 2 > value
Reject H0 if:
2 > 2 
Page 47 of 148
School of Business, SOU
2 
(n  1) s 2
2
BA 282: Applied Business Statistics
Course Outline
Learning it! Exercises
Setting up the Hypotheses and Determining Type I and II Levels
For items 8.7 to 8.27 below, do the following:
(a) State the Null and Alternative hypotheses
(b) State the consequence of a Type I error
(c) State the consequence of a Type II error
(d) Suggest a value for , and justify your choice
8.7
Administrators at a small college are concerned that part-time evening students may not be
familiar with all the services of the College. They wish to offer an orientation program to
these students but recognize that most of the part-time students work during the day and are
generally very busy. The administrators do not want to prepare an elaborate presentation if
only a handful of part-time students will attend. Hence, they will conduct the orientation if
more than 25% of the part-time students are interested in attending.
8.8
A company CEO is thinking about setting up an on-site day-care program for its employees.
The CEO has stated that she will do so only if more than 80% of the employees favor such a
decision. Set up the null and alternative hypotheses.
8.9
In an attempt to improve quality many manufacturers are developing partnerships with their
suppliers. A local fast-food burger outfit has partnered with its supplier of potatoes. The
burger outfit buys potatoes in bags that weigh 20 lbs. It wishes to set up the null and
alternative hypotheses to test if the bags do weigh on average 20 lbs.
8.10
You are a connoiseur of chocolate chip cookies and you do not think that Nabisco’s claim that
every bag of Chips Ahoy cookies has 1000 chocolate morsels is correct. Set up the null and
alternative hypotheses to test this claim.
Lecture Notes to Accompany
Applied Business Statistics
Page 48 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
8.11
Antilock brake systems (ABS) have been hailed as a revolutionary safety feature. A study by
the National Traffic Safety Administration looked at fatal accidents. The claim is that cars
with ABS are in fewer fatal crashes than those without.
8.12
A college placement office wonders whether there is a difference between the average salary
of engineering graduates and business school graduates.
8.13
Your new television has a 1-year warranty. You are given the option to buy a 3-year warranty
and wonder if it is worth it. You wish to test the hypothesis that the average time before a
problem occurs is more than 3 years
8.14
M&M/Mars claims that at least 20% of the M&M’s in each package are the new blue color
8.15
A computer center is arguing for more computers in the lab for students at a midsize college.
The computer center at a university claims that the average amount of time that students
spend on-line has increased from last year’s average of 1 hour per day.
Lecture Notes to Accompany
Applied Business Statistics
Page 49 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
8.16
It seems like you spend more money on groceries during the summer months when you eat
more ice cream and drink more fluids. You know that you spend on average of $25 per week
on groceries during the winter months. Set up the null and alternative hypotheses to decide if,
on average, you spend more than this amount per week during the summer months.
8.17
M&M Mars claims that at least 20% of the M&M’s in each package are the new blue color.
Set up the null and alternative hypotheses to test this claim.
8.18
The computer center at a university claims that the average amount of time that students
spend on-line has increased from last year’s average of 1 hour per day. Set up the null and
alternative hypotheses to test this claim.
Lecture Notes to Accompany
Applied Business Statistics
Page 50 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
(note: for the following problems use Excel’s Data Analysis Tools to generate the descriptive
statistics to minimize hand calculation)
Problem 8.1
The School Committee members of a midsize New England city agreed that a strict discipline code
had caused an increase in the number of student suspensions. The number of suspensions for a
sample of schools in this city for the periods September 1992 to February 1993 is shown below:
CITY
Central
MCDI
Chestnut
Duggan
Kennedy
Forest Park
Puttnam
Kiley
Central Academy
Commerce
Bridge
Number of Suspensions
245
1
65
133
97
149
1024
56
254
114
7
The average number of suspensions for the previous year was 130.5 with a population standard
deviation of 158.2
(a) Set up the null and alternative hypotheses to test if the average number of suspensions has
changed
(b) Test your hypothesis using significance level of 0.05
(c) Find the p-value
(d) Display the data to see if it is reasonable to assume that the underlying population distribution is
normal.
(e) Based on the p-value, what can you conclude about the average number of suspensions.
Lecture Notes to Accompany
Applied Business Statistics
Page 51 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Minitab Solution
Stat > Basic Statistics > 1-Sample Z
Population standard
deviation
Hypothesized
mean
Raw data
Direction of test
(alternative
hypothesis)
Lecture Notes to Accompany
Applied Business Statistics
Page 52 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Minitab Output
Confidence
interval for
the true mean
Lecture Notes to Accompany
Applied Business Statistics
Computed
statistic
(compare to
the critical
statistic)
p-value of the test
(compare to the
significance level)
Page 53 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Problem 8.2
The Educational Testing Service (ETS) designs and administers the SAT exams. Recently the format
of the exam changed and the claim has been made that the new exam can be completed in an
average of 120 minutes. A sample of 50 new exam times yielded an average of 115 minutes. The
standard deviation is assumed to be 2 minutes.
(a) Set up the null and the alternative hypotheses to test if the average time to complete the
exam is has changed from 120 minutes.
(b) Test your hypothesis using significance level of 0.05
(c) Find the p-value
(d) Based on the p-value, what can you conclude about the average time to complete the new
exam?
Lecture Notes to Accompany
Applied Business Statistics
Page 54 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Problem 8.28
A major manufacturer of glue products thinks it has found a way to make glue adhere longer than the
current average of 90 days. The manufacturer wishes to see whether the glue products made this
way have an average time to failure greater than 90 days. A sample of 30 tubes of new glue yield an
average of 93 days before failing. The failure time is normally distributed with a standard deviation of
3 days.
(a) Set up the null and the alternative hypotheses to test whether average time to failure is
greater than 90 days.
(b) Test your hypothesis using significance level of 0.05
(c) Find the p-value
(d) Based on the p-value, what can you conclude about the average time to failure for the new
product?
Lecture Notes to Accompany
Applied Business Statistics
Page 55 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
INFERENCES: ONE POPULATION (HYPOTHESIS TESTING)
1.
Testing the Mean ()
 z-dist for Large samples or  known
 t-dist for small samples and  unknown
(334)
(341)
2.
Testing the Population Variance(2)
 2 distribution
(Not covered)
3.
Testing the Population Proportion()
 z-dist. (approx. to the Binomial dist.)
(349)
4.
Hypothesis Testing using Minitab and Excel
One Population
Testing a Mean
()
z test
Testing a
Proportion
()
Testing a Variance
( 2)
z -test
F test
t test
Lecture Notes to Accompany
Applied Business Statistics
Page 56 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
TESTING A POPULATION MEAN ()
H0:  = value
H1:   value
H0:   value
H1:  < value
Reject H0 if:
Z < Z
t <  t, n 1
Reject H0 if:
Z > Z/2
t > t/2, n 1
z
x

H0:   value
H1:  > value
Reject H0 if:
Z > Z
t > t, n 1
t
x
s
n
n
TESTING A POPULATION PROPORTION ()
H0:  = value
H1:   value
Reject H0 if:
Z > Z/2
H0:   value
H1:  < value
Reject H0 if:
Z < Z
z
H0:   value
H1:  > value
Reject H0 if:
Z > Z
p 
 (1   )
n
TESTING A POPULATION VARIANCE (2)
H0: 2 = value
H1: 2  value
Reject H0 if:
2 < 2 1-/2, n - 1
2 > 2 /2, n - 1
H0: 2  value
H1: 2 < value
Reject H0 if:
2 < 2 1-, n-1
2 
Lecture Notes to Accompany
Applied Business Statistics
H0: 2  value
H1: 2 > value
Reject H0 if:
2 > 2 , n - 1
(n  1) s 2
2
Page 57 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Learning it! Exercises
Problem 9.1
The cost of common goods and service in 5 cities is shown in the table below (USA Today):
City
Aspirin
(100)
Los Angeles
Tokyo
London
Sydney
Mexico City
$7.69
$35.93
$9.69
$7.43
$1.16
Fast Food
(hamburger,
fries, soft drink)
$4.15
$7.62
$5.80
$4.53
$3.63
Woman’s
Toothpaste
Haircut/Blow Dry (6.4 oz)
$20.11
$76.24
$44.35
$29.93
$17.94
$2.42
$4.24
$3.63
$2.08
$1.08
a. You have just returned from a business trip and you lost your receipt for the aspirin you
purchased but would like to be reimbursed by your company (since you had taken the aspirin
after a stressful business meeting!). You guesstimate a cost of $10.00. Your boss claims that
the average cost of aspirin is less than $10.00. Using these data, can you “prove” your boss
wrong? Conduct the necessary hypothesis test. Assume that all costs are normally distributed.
b. Based on these data, is there enough evidence to support your submitting a cost of $10.00 for the
fast-food meal on your trip?
c.
If you remove Tokyo from the data set do your answers to parts (a) and (b) change? What does
this tell you about the effect of outliers on the hypothesis test of µ when you have a small
sample?
Lecture Notes to Accompany
Applied Business Statistics
Page 58 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Problem 9.2
The marketing material for a New England Ski resort advertises that they can make snow whenever
the temperature is 32°F or below. To demonstrate how often this happens their material includes the
following line graph of the weekly average temperatures (See graph in text).
The data that generated the graph are shown below:
Week
Temperature
1 2
18 19
3
24
4
35
5
33
6
14
7
22
8
20
9
23
10
33
11
27
12
23
13
30
14
35
Is there enough evidence for the ski resort to claim that the average weekly temperature is less than
32°F?
Assume that the average weekly temperature is normally distributed.
Lecture Notes to Accompany
Applied Business Statistics
Page 59 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Problem 9.4
If you like shopping for the best deal on long-distance phone services, then you’ll enjoy sorting
through offers from 10 different marketers vying to be your energy supplier. Residents of 16
communities will be the first in Massachusetts to wade into the coming nationwide experiment in
deregulation of the natural gas industry. The average consumer uses 1232 therms of natural gas, for
which the average cost has been $520.24. The table shows proposed costs to deliver 1232 therms of
gas from 10 competitors:
Company
All Energy Marketing Co
Broad Street/ Energy One
Global Energy Services
Green Mountain Energy Partners
KBC Energy Services
Louis Dreyfus Energy Services
National Fuel Resources
NorAm Energy
WEPCO Gas
Western Gas Resources
Lecture Notes to Accompany
Applied Business Statistics
Cost ($)
478.66
450.24
468.16
471.24
435.53
472.24
468.22
442.20
443.52
457.81
Page 60 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Problem 9.5
Computer centers at universities and colleges are certainly aware of the
increased number of Web surfers. To begin to understand the demands that
will be made on the computer center resources, one school studied 25 children
in grades 7 to 12. The number of hours that these children spend on the
Internet in 1 week is shown here:
5.0
4.6
5.9
4.4
4.9
5.1
5.7
4.0
3.8
5.6
6.7
4.1
5.5
5.5
6.7
5.2
5.4
5.0
6.7
4.8
5.8
3.6
5.4
4.1
4.8
Is there enough evidence to indicate that children spend more than a average
of 5 hours per week Web surfing? Assume that the time spent Web surfing is
normally distributed.
Lecture Notes to Accompany
Applied Business Statistics
Page 61 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
9.6
A company that sells mail-order computer systems has been planning inventory and staffing based
on an assumption that the variance of their weekly sales is 180 ($1000 2). The weekly sales are
normally distributed. The company selects 15 weeks at random from the past year and obtains the
data (in thousands of dollars) shown below:
Weekly Sales: 191 222 222 223 223 225 227 228 229 232 234 234 236 244 253
a) What is the sample variance for these data?
b) Set up the hypotheses to test whether the population variance is different from 180.
c) At the 0.05 level of significance, what can you conclude about the company’s assumption?
Lecture Notes to Accompany
Applied Business Statistics
Page 62 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
9.7
In manufacturing, the amount of material that is wasted or lost during a process is very important. In
preparing financial estimates, a company assumes that the percent material lost for its new process
has a variance 10%2 . After the new process has been running for a month and appears to be stable,
the cost analyst looks at the percent material lost and finds the following data:
Daily Loss:
10
12
12
13
14
14
18
19
19
20
a) What is the sample variance for these data?
b) Set up the hypotheses to test whether the actual variance is greater than the value the company
has been assuming. Assume that the daily loss is normally distributed.
c) At the 0.05 level of significance, what can you conclude?
Lecture Notes to Accompany
Applied Business Statistics
Page 63 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
9.10
Companies are increasingly concerned about employees playing video games at work. In addition to
reducing productivity, this habit shows down networks and uses valuable storage space. A recent
article stated at 80% of all employees play video games at work at least once a week. A large
company that employs many engineers wonders if its employees are as bad as the article claims. If
they are, the company will install software that detects and removes video games from the network.
The company surveys (anonymously) 100 employees and finds that 85 of the employees surveyed
have played video games at work in the past week.
a. Set up the null and alternative hypotheses to test whether the proportion of the company’s
employees that play video games is greater than the proportion stated in the article.
b. At the 0.05 level of significance, test the hypotheses.
c.
What do you recommend that the company do?
Lecture Notes to Accompany
Applied Business Statistics
Page 64 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
9.11
An alumni office is interested in serving their alumni better in order to encourage more donations to
the college. A survey of 200 alumni was conducted to determine whether half-day training sessions
offered on the campus were of interest. If more than 75% of the alumni were interested, the college
would start a program. The survey showed that 160 of the alumni surveyed were interested in such a
program.
a. Set up the null and alternative hypotheses to test whether the college should implement the
program.
b. At the 0.05 level of significance, test the hypotheses.
c. What do you recommend that the college do?
Lecture Notes to Accompany
Applied Business Statistics
Page 65 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
9.12
A company that makes computer keyboards has specifications that allow it produce a product that
has a maximum of 3% defective. The company has been receiving more customer complaints than
usual. A sample of 50 keyboards has 2 defectives.
a. Set up the null and alternative hypotheses to test whether the proportion defective keyboards has
exceeded the amount allowed by the specifications.
b. At the 0.05 level of significance, test the hypotheses.
c.
What do you recommend that the company do?
Lecture Notes to Accompany
Applied Business Statistics
Page 66 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
COMPARING TWO POPULATIONS
1.
Comparing Means of Two Populations (1 vs. 2)

Dependent vs. Independent Samples

Comparing Means using Two Independent Samples
- Large samples (z-distribution)
- Small samples (t-distribution)

Comparing Means using Two Dependent Samples
- t-distribution
(365)
(375)
(384)
2.
Comparing Proportions of Two Populations (1 vs. 2)
 Using the z-distribution as approximation to the Binomial
(371)
3.
Comparing Variances of Two Populations (21 vs. 22)
 Using the F-distribution
(404)
are available
Lecture Notes to Accompany
Applied Business Statistics
Page 67 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
COMPARING TWO POPULATIONS:
HYPOTHESIS TESTING
Comparing Means of T wo P opulat ions
(  1 vs.
(1
Dependent Samples
Comparing Variances of T wo P opulat ions
Comparing Proportions of Two Populations
2)
vs. 2)
2
2
( 1 vs.  2)
Independent Samples
Yes
No
Are n 1 and
One populat ion t-t est
Are n 1 1 ,
n1(1-1) ,
n2 2 , n2(1-2) 
5?
No
n2  30?
Yes
1
t
Use z test
d 
s n
Yes
Are the
population
variances equal?
2
z
No
( x1  x 2 )  ( 1   2 )
2
2
s1 s2

n1 n2
t
Use pooled t test
Use unpooled t test
Use z-test
3
4
5
( x1  x 2 )  ( 1   2 )
1 1
s 2p   
 n1 n2 
where : s 2p 
Lecture Notes to Accompany
Applied Business Statistics
z
(n1  1) s  ( n2  1) s
n1  n2  2
2
1
2
2
( p1  p2 )  ( 1   2 )
1 1
p (1  p )  
 n1 n2 
Page 68 of 148 n1 p1  n2 p2
School of Business, where
SOU: p  n1  n2
Us
e
Bin
om
ial
Dis
t rib
ut i
on
Use F test
6
where :
s2
F  L2
sS
sL2  larger of the two sample variances
sS2  smaller of the two sample variances
v1  (n  1), where n is the size of the sample that has
the larger variance
v2  (n  1), where n is the size of the sample that has
the smaller variance
BA 282: Applied Business Statistics
Course Outline
COMPARING
TWO POPULATION MEANS (1 vs. 2)
H0: 1 = 2
H1: 1  2
Reject H0 if:
Z > Z/2
t > t/2, n1+ n2 2
H0: 1  2
H1: 1 < 2
Reject H0 if:
Z < Z
t <  t, n1+ n2 2
z
( x1  x 2 )  ( 1   2 )
2
t
2
s1
s
 2
n1 n2
H0: 1  2
H1: 1 > 2
Reject H0 if:
Z > Z
t > t, n1+ n2 2
( x1  x 2 )  ( 1  2 )
1 1
s 2p   
 n1 n2 
where : s 2p 
(n1  1) s12  (n2  1) s22
n1  n2  2
COMPARING
TWO POPULATION PROPORTIONS (1 vs. 2)
H0: 1 = 2
H1: 1  2
Reject H0 if:
Z > Z/2
H0: 1  2
H1: 1 < 2
Reject H0 if:
Z < Z
z
( p1  p2 )  ( 1   2 )
1 1
p(1  p)  
 n1 n2 
wher e : p 
H0: 1  2
H1: 1 > 2
Reject H0 if:
Z > Z
n1 p1  n2 p2
n1  n2
COMPARING
TWO POPULATION VARIANCES (21 vs. 22)
H0: 21 = 22
H1: 21  22
Reject H0 if:
F > F(/2,v1,v2)
where :
F
sL2
sS2
sL2  larger of the two sample variances
sS2  smaller of the two sample variances
v1  (n  1), where n is the size of the sample that had the larger var iance
v2  (n  1), where n is the size of the sample that had the smaller va riance
Lecture Notes to Accompany
Applied Business Statistics
Page 69 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Learning it! Exercises
10.1
Many studies have been done comparing consumer behavior of men and women. One such ongoing study concerns take-out food. In particular, the study focuses on whether there is a difference
in the mean number of times per month that men and women buy take-out food for dinner. The most
recent results of the study are shown below:
Population
Sample Size
Sample Mean
Population Standard Deviation
Men
34
25.6
4.2
Women
28
21.2
3.7
Because the study has so much historical data, information is known about the population standard
deviations.
a. Set up the hypotheses to test whether there is a difference in the mean number of times per month
that a person buys take-out food for dinner for men and women.
b. Use the Z test with known population variances to set up and perform the test. Use a level of
significance of 0.05.
c.
Find the p value for the test.
d. Do the data provide evidence that the mean number of times per month for men differs from that
for women?
e. Does the choice of α in this case affect the decision?
Lecture Notes to Accompany
Applied Business Statistics
Page 70 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
10.2
Professional employees who work for large corporations often contend that the mean salary paid by a
company differs by location in the United States. To test that claim, data were collected on financial
analysts working for a large corporation at locations in New England and in the upper Midwest.
Because there is an extensive history of salary data, the population standard deviations are available.
The study found the following results:
Sample Size
Sample Mean
Population Standard Deviation
Population
New England
Upper Midwest
25
20
22.3
18.5
1.5
2.2
a) Set up the appropriate hypotheses to test whether the company’s analysts in New England were
paid more, on the average, than those working in the upper Midwest.
b) Use the Z test with known population variances to set up and perform the test. Use a level of
significance of 0.05.
c) Find the p value for the test.
d) Do the data support the contention that the mean pay for analysts in New England is higher than
that of analysts in the upper Midwest?
Lecture Notes to Accompany
Applied Business Statistics
Page 71 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
10.11
Having learned about the paired t test you realize that you really should have used the test for the
data on software price comparison. The data are repeated below:
Top Ten Business
Software Packages
Windows 95 Upgrade
Norton Anti-Virus
McAfee ViruScan
First Aid 97 Deluxe
Clean Sweep III
Norton Utilities
Netscape Navigator
MS Office Pro 97 Upgrade
First Aid 97
Win Fax Pro
Computability PC Connection
Price ($)
Price ($)
88
95
59
70
49
60
54
58
37
37
68
75
45
40
300
310
32
35
95
95
a) Calculate the differences between the prices for each type of software package. Just looking at
the differences, do you think that one company charges more than the other? Why or why not?
b) Calculate the average difference and the standard deviation of the differences.
c) Set up the hypotheses to test whether the mean difference in price between the two companies is
zero.
d) Assuming that the data are normally distributed, at the 0.05 level of significance, is there a
difference in the mean price of software for the two companies?
e) Did these results differ from the last time you analyzed the data? Why do you think this
happened?
Lecture Notes to Accompany
Applied Business Statistics
Page 72 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Minitab Solution
Stat > Basic Statistics > Paired t
Lecture Notes to Accompany
Applied Business Statistics
Page 73 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Excel Solution
Data > Data Analysis > t-test: Pair Two Sample for Means
Lecture Notes to Accompany
Applied Business Statistics
Page 74 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
10.12
A hospital administrator is concerned about the length of time that the nursing staff washes their
hands. A recent study in health care showed that longer washing greatly reduces the spread of
germs. The hospital observed the amount of time that a sample of nine nurses in the Cardiac Care
Unit (CCU) washed their hands. The data were colleted in such a way that the employees did not
know that they were being observed. The hospital then showed the nurses an educational video on
the negative effects of shortened time spent hand washing. After the video, the hospital again
watched and timed the group of nurses washing their hands. The data are shown below:
Observation
1
2
3
4
5
6
7
8
9
Unit
CCU
CCU
CCU
CCU
CCU
CCU
CCU
CCU
CCU
Time 1 (s)
3
2
0
5
2
0
2
3
0
Time 2 (s)
16
7
5
8
15
15
20
16
18
a) Calculate the differences between the times for each nurse. Just looking at the differences, do
you think that, on the average, they washed their hands longer the second time? Why or why
not?
b) Calculate the average difference and the standard deviation of the differences.
c) Set up the hypotheses to test whether there was an increase in the average amount of time spent
washing hands.
d) Assuming that the data are normally distributed, at the 0.05 level of significance, what can you
conclude?
e) Can you conclude that the video caused the nurses to wash their hands longer? Why or why
not?
Lecture Notes to Accompany
Applied Business Statistics
Page 75 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Problem 10.7
Women who smoke suffer an increased risk of dying of breast cancer, according to a
recently published study. In the study about, out of 319,000 women who never
smoked there were 468 deaths from breast cancer, whereas out of 120,000 smokers,
there were 187 deaths.
(a)
Calculate the sample proportion of women who died of breast cancer for
smokers and non-smokers.
(b)
Ste up the hypotheses to test whether the proportion of women who die
of breast cancer is higher for smokers than non-smokers.
(c)
At the 0.05 significance level, can you conclude that smoking causes
breast cancer? If not, what can you conclude?
Lecture Notes to Accompany
Applied Business Statistics
Page 76 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Problem 10.18
Selling personal computers is big business and consumers are becoming increasingly
aware of vendor reputation. A recent study of two vendors of desktop personal
computers reports on the units that need repair for Dell Computers and Gateway
2000. Of 1584 computers manufactured by Dell Computer 427 needed repair,
whereas for Gateway 2000, 825 of 2662 computers needed repair.
(a) Calculate the sample proportion of computers needing repair for each
company.
(b) Set up the hypotheses to test whether the proportion of computers needing
repairs is different for the two companies.
(c) At the 0.05 level of significance, what can you conclude?
Lecture Notes to Accompany
Applied Business Statistics
Page 77 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Minitab Solution
(No Excel Solution)
Lecture Notes to Accompany
Applied Business Statistics
Page 78 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
10.20
Consider the problem in which the Board of Realtors for Greater Bridgeport, CT, was
looking at the average selling prices of homes. The data are given again below:
Population
1995
1996
Sample Size
25
25
Sample Mean
$151,116
$160,669
Sample Standard Deviation
$5,332
$6,468
a) Assuming that the populations are normally distributed, set up the hypotheses to
test whether the population variances are equal at the 0.10 level of significance.
b) Was the decision to test using the pooled variance justified?
Lecture Notes to Accompany
Applied Business Statistics
Page 79 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
10.21
In your quest for the perfect golf clubs you made an assumption about the population
variances when you tested your hypotheses. The data you collected are given below:
Population
Sample Size
Sample Mean
Sample Standard Deviation
Brand X
15
255
8.7
Brand Z
15
271
9.1
a) Set up the appropriate hypotheses to test whether the variance of Brand Z clubs
is the same as the variance for Brand X.
b) Assuming that the populations are normally distributed, at the 0.10 level of
significance was your decision to pool the variances a good one?
c) In general, would a difference in variation between the clubs be a factor in your
purchase decision?
Lecture Notes to Accompany
Applied Business Statistics
Page 80 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Procedures for Testing Independence
Dependent Variable (y)
Quantitiative
Qualitative
Independent
Variable (x)
Quantitative
Qualitative
Lecture Notes to Accompany
Applied Business Statistics
Regression
and Correlation
Analysis
Discriminant
Analysis
ANOVA
Oneway
Twoway
Chi-square
Page 81 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
EXPERIMENTAL DESIGN AND ANOVA (ANALYSIS OF VARIANCE)
1.
Definition of terms
o
Factor and response variable (428)
o
ANOVA and treatment (410)
2.
Sources of Variance (411)
 Treatment or Between Groups Variation (a.k.a. explained, factor, treatment)
 Random or Within Groups Variation (a.k.a. unexplained, random, error)
3.
One-way ANOVA (410)
 Review of variables
- quantitative vs. qualitative
- dependent vs. independent
 Using ANOVA as procedure for comparing means of two or more groups
 Using ANOVA as procedure for determining whether a qualitative independent variable
and quantitative dependent variable are related
4.
Two-Way ANOVA with Replication – a.k.a. Two-way ANOVA with Interaction (427)
 Using ANOVA as procedure for comparing means of two or more groups (Factor A and
Factor B)
 Using ANOVA as procedure for determining whether a qualitative independent variable
and quantitative dependent variable are related (Factor A and Factor B)
 Testing the presence of interaction between Factors A and B
Lecture Notes to Accompany
Applied Business Statistics
Page 82 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
ANALYSIS OF VARIANCE (ANOVA)
A. ONEWAY ANOVA
H0: 1 = 2 =3 = … = t
H1: The population means are not all the same
Reject H0 if: F > F,v1,v2
Where: v1 = (t-1)
v2 = (N - t)
B. TWOWAY ANOVA(with replication)
1. Testing for Main Effects (Factor A)
H0: 1 = 2 =3 = … = t (No level of factor A has an effect)
H1: The population means are not all the same (at least 1 level has an effect)
Reject H0 if: MSA/MSE > F,v1,v2
Where: v1 = (a -1)
v2 = ab(r -1)
2. Testing for Main Effects (Factor B)
H0: 1 = 2 =3 = … = t (No level of factor B has an effect)
H1: The population means are not all the same (at least 1 level has an effect)
Reject H0 if: MSB/MSE > F,v1,v2
Where: v1 = (b -1)
v2 = ab(r -1)
3. Testing for INTERACTION EFFECTS (AB)
H0: There are NO interaction effects
H1: At least 1 combination of factor A and B levels has an effect
Reject H0 if: MSAB/MSE > F,v1,v2
Where: v1 = (a -1)(b - 1)
v2 = ab(r -1)
Lecture Notes to Accompany
Applied Business Statistics
Page 83 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Learning it! Exercises
14.1
A diaper company is considering 3 different filler materials for their disposable diapers. Eight diapers
were tested with each of the 3 filler materials, and 24 toddlers were randomly given a diaper to wear.
As the child played, fluid was injected into the diaper every 10 minutes until the product failed
(leaked). The amount of fluid (in grams) at the time of failure was recorded for each diaper. The data
are shown below:
Material 1
791
789
796
802
810
790
800
790
Material 2
809
818
803
781
813
808
805
811
Material 3
828
814
855
844
847
848
836
873
(a)
What is the response variable and what is the factor?
(b)
How many levels of the factor are being studied?
(c)
Is there any difference in the average amount of fluid the diaper can hold
using the three different filler materials? If so, which ones are different?
(d)
What is your recommendation to the company and why?
Lecture Notes to Accompany
Applied Business Statistics
Page 84 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
MINITAB OUTPUT
Stat > ANOVA > One-way (or One-way (Unstacked))
Results for: Problem 14_1.MTW
One-way ANOVA: Grams versus Material
Analysis of Variance for Grams
Source
DF
SS
MS
Material
2
9808
4904
Error
21
3452
164
Total
23
13260
Level
1
2
3
N
8
8
8
Pooled StDev =
Mean
796.00
806.00
843.00
StDev
7.50
11.12
17.70
12.82
F
29.83
P
0.000
Individual 95% CIs For Mean
Based on Pooled StDev
-------+---------+---------+--------(----*----)
(----*----)
(----*---)
-------+---------+---------+--------800
820
840
EXCEL OUTPUT
Tools > Data Analysis > Oneway ANOVA
Anova: Single Factor
SUMMARY
Groups
Mat1
Mat2
Mat3
Count
8
8
8
ANOVA
Source of Variation
Between Groups
Within Groups
SS
df
9864.083 2
3460.875 21
Total
13324.96 23
Lecture Notes to Accompany
Applied Business Statistics
Sum
6368
6448
6745
Average
796
806
843.125
Variance
56.28571
123.7143
314.4107
MS
F
P-value
4932.042 29.92679 0.000000711968
164.8036
F crit
3.466795
Page 85 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
14.3
Grading homework is a real problem. It takes an enormous amount of time and many students do
not do a very good job or copy answers from other students or the back of the book. A teacher of
elementary statistics decided to conduct a study to determine what effect grading homework had on
her students’ exam scores. She taught 3 sections of Elementary Statistics and randomly assigned
each class one of three conditions: (1) no homework given, (2) homework given, but not collected,
and (3) homework give, collected, and graded. After the first exam, she collected the data (exam
scores). They are shown in the Excel data file Homework.xls
(a) What is the response variable and what is the factor?
(b) How many levels of the factor are being studied?
(c) Is there any difference in the average amount of fluid the diaper can hold using the three
different filler materials? If so, which ones are different?
(d) What is your recommendation to the company and why?
MINITAB OUTPUT:
Results for: Problem 14_3.MTW
One-way ANOVA: C2 versus C1
Analysis of Variance for C2
Source
DF
SS
MS
C1
2
1700.4
850.2
Error
45
4295.4
95.5
Total
47
5995.8
Level
1
2
3
N
16
16
16
Pooled StDev =
Mean
74.500
70.313
84.500
9.770
Lecture Notes to Accompany
Applied Business Statistics
StDev
11.051
9.016
9.107
F
8.91
P
0.001
Individual 95% CIs For Mean
Based on Pooled StDev
-------+---------+---------+--------(------*------)
(------*------)
(------*------)
-------+---------+---------+--------70.0
77.0
84.0
Page 86 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
14.16
The manufacturer of batteries is designing a battery to be used in a device that will be subjected to
extremes in temperature. The company has a choice of 3 materials to use in the manufacturing
process. An experiment is designed to study the life of the battery when it is made from materials A,
B, C and is exposed to temperatures of 15, 70, and 125 degree Fahrenheit. For each combination of
material and temperature, 4 batteries are tested. The lifetimes in hours of the batteries are shown
below:
Material A
Material B
Material C
15F
130
155
74
180
150
188
159
126
138
110
168
160
Temperature
70F
34
40
80
75
126
122
106
115
174
120
150
139
125F
20
70
82
58
25
70
58
45
96
104
82
83
(a) Calculate the average life for each of the material types.
(b) Calculate the average life for each of the 3 temperatures.
(c) Calculate the average life for each of the 9 treatment groups.
(d) Plot the 9 treatment means on a graph with temperature factor on the x axis,
and the life of the battery in hours on the y axis. Use different color for each
of the 3 materials and connect the averages for those of the same material.
What do you speculate about the interaction effect based on the graph?
(e) Confirm your suspicions by doing a two-way ANOVA and testing to see if
there is a significant interaction effect.
(f) What materials do you recommend to this manufacturer and why?
Lecture Notes to Accompany
Applied Business Statistics
Page 87 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
MINITAB OUTPUT
Stat > ANOVA > Two-way
Lecture Notes to Accompany
Applied Business Statistics
Page 88 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Interaction Plot
Stat > ANOVA > Interaction Plot
Lecture Notes to Accompany
Applied Business Statistics
Page 89 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Excel Solution
Data >Data Analysis > Two-way ANOVA (with Replication)
Lecture Notes to Accompany
Applied Business Statistics
Page 90 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
14.20
A manufacturer of adhesive products designed an experiment to compare a new adhesive product to
a competitor’s product. The adhesive product, or glue, is used by automobile manufacturers. The
response variable was the strength of the glue measured by tensile strength in pounds per square
inch (psi). The ability to adhere to oil-contaminated surfaces under different humidity conditions was
studied. There were 2 levels for factor A: no oil or oil. Oil contamination was applied by hand dipping
the samples in an oil solution and allowing them to air dry at room temperature for 2 hours. There
were two levels for factor B: 50% humidity and 90% humidity. Three samples were tested for each of
the combinations of factor A and factor B. The tensile values (psi) for the new product are shown
below:
Humidity
50%
90%
No Oil
175
100
175
95
115
85
Oil
43
42
44
95
105
116
(a)
Does the product behave significantly differently if the surface is oil contaminated?
(b)
Does the product behave significantly differently at different humidity levels?
(c)
Is there any significant interaction effect present?
Lecture Notes to Accompany
Applied Business Statistics
Page 91 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Two-way ANOVA: PSI versus Surface, Humidity
Analysis of Variance for PSI
Source
DF
SS
Surface
1
7500
Humidity
1
85
Interaction
1
9747
Error
8
4439
Total
11
21772
Surface
No oil
Oil
Mean
124.2
74.2
Humidity
50%
90%
Mean
96.5
101.8
MS
7500
85
9747
555
F
13.52
0.15
17.56
P
0.006
0.705
0.003
Individual 95% CI
----------+---------+---------+---------+(--------*--------)
(--------*--------)
----------+---------+---------+---------+75.0
100.0
125.0
150.0
Individual 95% CI
---------+---------+---------+---------+-(-----------------*------------------)
(------------------*-----------------)
---------+---------+---------+---------+-84.0
96.0
108.0
120.0
Interaction Plot - Data Means for PSI
Surface
No oil
Oil
Mean
150
100
50
50%
90%
Humidity
Lecture Notes to Accompany
Applied Business Statistics
Page 92 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Analysis of Qualitative Data (Chi-square)
1.
The Chi-square test Explained (640)
2.
Four Uses of the Chi-square Distribution
(a)
Testing for goodness-of-fit (641)
 is a test for comparing a theoretical distribution, such as a Normal,
Poisson etc, with the observed data from a sample
 answers the question: “does the sample come from a specified
distribution?”
(b)
Testing (comparing) proportions of two or more groups (651)
(c)
Testing whether two categorical (a.k.a. nominal, qualitative, classification) variables
are independent (651)
(d)
Testing the variance of a population (covered in earlier chapter)
Chi-square Concepts and Solved Problems are Available
Lecture Notes to Accompany
Applied Business Statistics
Page 93 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
CHI-SQUARE (2) DISTRIBUTION
1.
Goodness-of-fit Test
H0: The sample comes from a specified distribution
H1: The sample does not come from a specified distribution
Reject H0 if: 2 > 2 ,(k - 1)
(O  E ) 2
E
k  number of categories
2  
2.
Test of Independence of 2 Categorical Variables
(Also used for comparing proportions of 2 or more groups)
H0: Variables 1 and 2 are not dependent
H1: Variables 1 and 2 are dependent
Reject H0 if: 2 > 2 ,(r - 1)(c - 1)
(O  E ) 2
E
r  number of rows
2  

c  number of columns
3.
Testing A Population Variance (2)
H0: 2 = value
H1: 2  value
Reject H0 if:
2 > 2 /2,n-1
2 < 2 1/2,n-1
H0: 2  value
H1: 2 < value
Reject H0 if:
2 < 2 1-,n-1
2 
Lecture Notes to Accompany
Applied Business Statistics
H0: 2  value
H1: 2 >value
Reject H0 if:
2 > 2 ,n-1
(n  1) s 2
2
Page 94 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Learning it! Exercises
15.1
The administration of a university has been using the following distribution to classify ages of their
students:
Age
Group
Less than 18
18 – 19
20 – 24
Older than 24
Estimated % of
Student Population
2.7
29.9
53.4
14
A recent student survey provided the following data on age of students:
Age
Group
Less than 18
18 – 19
20 – 24
Older than 24
Frequency
6
118
102
26
Set up a table that compares the expected and observed frequencies for each group.
Based on the table, do you think that the data represent the established distribution?
Set up the hypothesis for the Chi-square goodness of fit test.
Perform the goodness of fit test at the 0.05 significance level.
Based on the chi-square test, is the estimated age distribution that the university is correct?
Lecture Notes to Accompany
Applied Business Statistics
Page 95 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
15.2
As part of a survey on the use of Office Suites Software, the company doing the polling wanted to
know whether its population was uniformly distributed over the following age distribution: under 25, 25
to 4, 44 and up. The company looked at the data it had collected so far had found the following
distribution:
Age
Group
Under 25
25 to 44
45 and up
Number of
Respondents
73
61
66
200
Based on the data, do you think that the respondents are uniformly distributed over the age
categories?
Set up the hypothesis to test whether the data are uniformly distributed over the age categories.
Find the expected frequency distribution and perform the chi-square goodness of fit test.
At the 0.05 level of significance, would you say that the respondents were uniformly distributed over
the age groups?
Lecture Notes to Accompany
Applied Business Statistics
Page 96 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
15.6
In an experiment to study the attitude of voters concerning term limitations in Congress, voters in
Indiana, Ohio, and Kentucky were polled with the following results:
Opinion
Support
Do Not Support
Indiana
82
97
Kentucky
107
66
Ohio
93
74
(a) Set up the hypothesis to test whether the proportion of voters who support congressional
term limits is the same for all three states.
(b) Calculate the proportion of voters that support congressional term limits for each state
individually. Based on these values, do you think there is a difference in the proportions?
(c) Calculate the overall proportion of voters who support term limits for Congress.
(d) Calculate the expected frequencies for each cell and find the value of the chi-square test
statistic.
(e) At the 0.05 level of significance, is there a difference in the proportion of voters who support
congressional terms limits among the three states?
Lecture Notes to Accompany
Applied Business Statistics
Page 97 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Minitab
Stat > Table > Chi-square Test
(No Excel Solution)
Lecture Notes to Accompany
Applied Business Statistics
Page 98 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
15.7
In a survey about satisfaction with local phone service, those respondents who rated their current
service as excellent and those who rated Poor to Very Poor were asked to classify their current local
service provider. The results are given in the table below:
Current Service
Source
Excellent
Poor – Very Poor
Long
Distance
264
1394
Local
Phone
444
1318
Type of Company
Cable
Power
TV
131
215
485
431
Cellular
Phone
198
572
(a) Set up the hypothesis to test whether the proportion of people who rated their company as
excellent is the same for each type of company.
(b) Calculate the proportion of people who rate their current phone service as excellent.
(c) Calculate the expected frequencies for each cell and find the value of the chi-square test
statistic.
(d) If you wanted to perform the test at the 0.05 significance level, what would be the critical
value of the test?
(e) At the 0.05 level of significance, is there a difference in the proportion of people who rate their
local phone service as excellent among the different types of companies?
Lecture Notes to Accompany
Applied Business Statistics
Page 99 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Chi-Square Test: Long Distance, Local Phone, Power, CableTV, Cellular Phone
Expected counts are printed below observed counts
1
Long Dis Local Ph
264
444
380.74
404.63
Power
131
141.46
CableTV Cellular
215
198
148.35
176.82
Total
1252
2
1394
1277.26
1318
1357.37
485
474.54
431
497.65
572
593.18
4200
Total
1658
1762
616
646
770
5452
Chi-Sq = 35.796 + 3.831 +
10.671 + 1.142 +
DF = 4, P-Value = 0.000
Lecture Notes to Accompany
Applied Business Statistics
0.773 + 29.947 +
0.230 + 8.927 +
2.536 +
0.756 = 94.610
Page 100 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
15.10
A report by the Department of Justice on rape victims reports on interviews with 3721 victims. The
attacks ere classified by age of the victim and the relationship of the victim to the rapist. The results of
the study are given here:
Age of
Victim
Under 12
12 to 17
Over 17
Family
153
230
269
Relationship of Rapist
Acquaintance or
Friend
Stranger
167
13
746
172
1232
739
(a) Set up the hypotheses to test whether age of victim and relationship of rapist
are independent.
(b) Calculate the expected frequencies for each cell.
(c) How many degrees of freedom will the chi-square test for independence
have? Using a level of significance of 0.01, what is the critical value for the
test?
(d) Calculate the value of the chi-square test statistic.
(e) Is the age of the victim independent of the relationship to the rapist?
Lecture Notes to Accompany
Applied Business Statistics
Page 101 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
MINITAB
Stat > Table > Chisquare Test
Chi-Square Test: C1, C2, C3
Expected counts are printed below observed counts
C1
153
58.35
C2
167
191.96
C3
13
82.69
Total
333
2
230
201.15
746
661.77
172
285.07
1148
3
269
392.50
1232
1291.27
739
556.24
2240
Total
652
2145
924
3721
1
Chi-Sq =153.539
4.136
38.857
DF = 4, P-Value
+ 3.246 + 58.734 +
+ 10.720 + 44.849 +
+ 2.720 + 60.050 = 376.852
= 0.000
Lecture Notes to Accompany
Applied Business Statistics
Page 102 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
15.11
A company that manufactures cardboard boxes for packaging cereals wants to determine whether
type of defect that a particular box has is related to the shift on which it was produced. It compiles the
following data. In each case, if a box had multiple defects the most serious defect was recorded.
Shift
1
2
3
Printing
55
58
89
Type of Defect
Rips/Tears
60
63
63
Size
85
79
48
(a) Set up the appropriate hypotheses for the test.
(b) Calculate the expected frequencies for each cell and calculate the value of the chi-square test
statistic.
(c) How many degrees of freedom will the chi-square test for independence have?
(d) Using a level of significance of 0.01, are defect type and shift related?
Lecture Notes to Accompany
Applied Business Statistics
Page 103 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Chi-Square Test: Printing, Rips/Tears, Size
Expected counts are printed below observed counts
1
Printing Rips/Tea
55
60
67.33
62.00
Size
85
70.67
Total
200
2
58
67.33
63
62.00
79
70.67
200
3
89
67.33
63
62.00
48
70.67
200
Total
202
186
212
600
Chi-Sq =
2.259
1.294
6.972
DF = 4, P-Value
+ 0.065 +
+ 0.016 +
+ 0.016 +
= 0.000
Lecture Notes to Accompany
Applied Business Statistics
2.907 +
0.983 +
7.270 = 21.782
Page 104 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Simple Linear Regression and Correlation
(454-491)
Important Definition of Terms
Test of Independence
Variables
Quantitative (measured)
Qualitative (category, classification, nominal)
Scatter plot
Regression and correlation
Linear vs. Curvilinear models
Simple vs. Multiple Linear Models
Correlation coefficient
Coefficient of determination
Residual (error) term
Observed y vs. expected y
Important Symbols
Y
X
Sy/x
R2
R
a
b
Problems for Simple Linear Regression:
11.2 (p. 553)
Problems for Multiple Linear Regression:
Problem 12.1
Problem 12.5
Problem 12.9
Lecture Notes to Accompany
Applied Business Statistics
Page 105 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Steps in Regression/Correlation Analysis
1.
Identify the response (y) and
candidate predictor variables (x’s)
2.
Collect y,x set of data
3.
Plot each x versus y
4.
From the plots in #3, select the most promising x
5.
Perform Regression and Correlation Analysis
a.
Select a model (linear or nonlinear) that fits the plot and then generate the regression
equation using Excel or Minitab
b.
Test the resulting model for significance using the slope (), correlation (), or the
ANOVA tests
(If resulting model is NOT significant, go back to Step 1)
c.
Test the model for appropriateness using the analysis of residuals. This tests
whether the assumptions on the residual are met. These assumptions are:



Normal distribution
Homoscedastic
Indepedent
(If selected model is not appropriate, go back to Step 5a, else proceed to Step 7)
6.
7.
If model generated in Step 5 is significant but not appropriate, choose a different model
(perhaps use curvilinear model) and repeat Step 5 until an appropriate model is found.
Use model for estimating:
(1) the response variable (y)
Point Estimate – substitute the value of x
Into the regression equation
Interval Estimates:
1. Prediction Interval Estimate
2. Confidence Interval Estimate
(2) the actual slope (B) of the line
CI = b  ( t /2, n-2 ) sb
Lecture Notes to Accompany
Applied Business Statistics
Page 106 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Definitions of Relevant Terms
Types of Variables:
y – response variable (a.k.a. dependent,
predicted, explained)
x – independent variable (a.k.a. predictor, explanatory)
Regression – provides a ‘best-fit’ mathematical equation
for the values of y,x variables
-- expresses the relationship of y and x in
equation form

mathematical equation may be
linear or curvilinear

linear:
Y = a + bX
(Direct, linear)
Y = a – bX

(Inverse, linear)
curvilinear: Y = a + bX + cX2
(quadratic)
Y = e-X (negative expon.)
Y =1/X
Simple Linear Regression – regression model is linear with only ONE
predictor variable
y = b0 + b1X
Multipe Linear Regression -- regression model is linear with only TWO OR
MORE predictor variables
y = b0 + b1x1+ b2x2+ b3x3 + ...+ bKxk
Correlation Analysis – measures the strength of the relationship between Y,X

coefficient of correlation (r) – number that measures both the direction and the
strength of the linear relationship between y and x
1  r  1

coefficient of determination (r2) – the percent of the variation in y that is explained
by the regression model
0%  r2  100%
Lecture Notes to Accompany
Applied Business Statistics
Page 107 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Simple Linear Model and Assumptions
Models
Actual Population Model

Estimated (sample) Model

Y i   0  1 X   i
yˆ i  b0  b1 x
Assumptions on the residuals
1)
2)
3)
Normally distributed
Homoscedastic (constant variance across all x values)
Statistically independent of each other
Lecture Notes to Accompany
Applied Business Statistics
Page 108 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Testing the Model for Significance
a.
Using the Slope ()Test
H0:  = 0
H1:   0
Reject H0 if t> t /2, n-k-1
t
b.
Using the coefficient of correlation ()Test
H0:  = 0
H1:   0
Reject H0 if t> t /2, n-k-1
t
c.
b
sb
r
1 r2
n2
Using the ANOVA F -Test
H0: the Model is not significant
H1: the Model is significant
Reject H0 if F > F, v1, v2
F
Lecture Notes to Accompany
Applied Business Statistics
MS Re ression
MSError
Page 109 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Using the Model for Estimation/Prediction
A.
Estimating the actual slope () of the model
b  point estimate of the actual slope () of the model
Computing a confidence interval for
the actual slope of the model
C.I. for  = b  t /2, n-k-1 (sb)
B.
Using the model to estimate the actual value of y for a given value of x
ŷ
 point estimate of the actual value of y for
a given value of x
 computed by substituting the value of x into
the regression equation
Confidence Interval (C.I.)  the interval that contains the
actual average value of the response variable (y/x)
for a specific value of x
Prediction Interval (P.I.)  the interval that contains the
actual value of the response variable (Y) for a specific
value of x
Lecture Notes to Accompany
Applied Business Statistics
Page 110 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
SIMPLE LINEAR REGRESSION:
A Solved Example
EXAMPLE:
A manufacturer of small electric motors uses an automatic milling machine to produce the slots in the
shaft of the motors. A batch of shafts is run and then checked. All shafts in the batch that do not
meet required dimensional tolerances are discarded.
At the beginning of each new batch, the milling machine is readjusted since its cutter head wears
slightly during the production of the batch. The manufacturer is trying to pick an optimal batch size,
but in order to do this (s)he must know how the size of the batch affects the number of defective
shafts in the batch. Thirty (30) batches were inspected, and the number of defectives in each batch
was counted. The results are shown below:
Batchsize
100
125
125
125
150
150
175
175
200
200
200
225
225
225
250
250
250
250
275
300
300
325
350
350
350
375
375
375
400
400
Defects
5
10
6
7
6
7
17
15
24
21
22
26
29
25
34
37
41
34
49
53
54
69
82
81
84
92
96
97
109
112
Lecture Notes to Accompany
Applied Business Statistics
Page 111 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
INITIAL MODEL (LINEAR)
MINITAB SOLUTION
A.
Plot Batchsize and
Number of Defects
100
defects
GRAPH > PLOT
Graph Variables:
X: Batchsiz
Y: Defects
50
0
100
200
300
400
batchsiz
STAT > REGRESSION > FITTED
LINE PLOT
Response (Y): Defects
Predictor (X): Batchsiz
Type of regression model:
Linear
Regression Plot
Y = -47.9007 + 0.367131X
R-Sq = 95.3 %
defects
100
50
0
100
200
300
400
batchsiz
B. Generate the Regression
Equation
STAT > REGRESSION > REGRESSION
Response: Defects
Predictors (X): Batchsiz
Click on Results:
Select In addition, sequential sums of…
Click OK
Click on Storage
Select Fits
Select Residuals
Lecture Notes to Accompany
Applied Business Statistics
Page 112 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Select Standardized Residuals
Click OK
Click OK
Least squares
regression equation
Coefficient of
determination
Lecture Notes to Accompany
Applied Business Statistics
Page 113 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Generate the Residual Plots
STAT > REGRESSION > Residual Plots
Residuals: RESI1
Fits: FITS1
Click OK
Residual Model Diagnostics
Normal Plot of Residuals
I Chart of Residuals
20
1
1
Residual
Residual
1
10
0
1
51
10
5
X=0.000
0
2
5
-10
-10
-2
-1
0
1
3.0SL=8.378
1
2
11
1
1
-3.0SL=-8.378
2
30
20
10
0
2
2
2
Normal Score
Observation Number
Histogram of Residuals
Residuals vs. Fits
5
10
Residual
Frequency
4
3
2
0
1
-10
0
-10.0-7.5-5.0-2.50.0 2.5 5.0 7.510.012.515.0
Residual
Lecture Notes to Accompany
Applied Business Statistics
0
50
100
Fit
Page 114 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
EXCEL SOLUTION
Data > Data Analysis > Regression
Input:
Input Y range: Defects
Input X range: Batchsiz
Labels: <select>
Output Range: <type in a cell address here>
Residuals:
Residuals:
<do not select>
Standard Residuals: <do not select>
Residual Plots:
< select >
Line Fit Plots:
<select>
Normal Probability:
Normal Probability Plots: < select>
Lecture Notes to Accompany
Applied Business Statistics
Page 115 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Analysis of Residual Plots
defects
Normal Probability Plot
200
100
0
0
20
40
60
80
100
120
Sample Percentile
Residuals
batchsiz Residual Plot
20.000
0.000
-20.000
0
100
200
300
400
500
batchsiz
batchsiz Line Fit Plot
150
defects
100
defects
50
Predicted defects
0
-50
0
200
400
600
batchsiz
Lecture Notes to Accompany
Applied Business Statistics
Page 116 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
REVISED MODEL (NONLINEAR - Quadratic)
Minitab
Delete all non-empty columns except Defects and Batchsiz
Compute C3 = Batchsiz * Batchsiz
Calc > Calculator
Store result in variable: C3
Expression: Batchsiz*Batchsiz
Click OK
Name C3 as "Batchsiz^2"
STAT > REGRESSION > REGRESSION
Response: Defects
Predictors (X): Batchsiz Batchsiz^2
Click on Results:
Select In addition, sequential sums of…
Click OK
Click on Storage
Select Fits
Select Residuals
Select Standardized Residuals
Click OK
Click OK
Regression Analysis
The regression equation is
defects = 6.90 - 0.120 batchsiz +0.000950 batchsiz^2
Predictor
Constant
batchsiz
batchsiz
Coef
6.898
-0.12010
0.00094954
S = 2.423
StDev
3.737
0.03148
0.00006059
R-Sq = 99.5%
T
1.85
-3.82
15.67
P
0.076
0.001
0.000
R-Sq(adj) = 99.5%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
batchsiz
batchsiz
DF
1
1
DF
2
27
29
SS
34186
159
34345
MS
17093
6
F
2911.35
P
0.000
Seq SS
32744
1442
Lecture Notes to Accompany
Applied Business Statistics
Page 117 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Residual Model Diagnostics
Normal Plot of Residuals
I Chart of Residuals
10
Residual
Residual
5
4
3
2
1
0
-1
-2
-3
-4
-5
3.0SL=8.287
X=0.000
0
-3.0SL=-8.287
-10
-2
-1
0
1
2
0
Normal Score
8
7
6
5
4
3
2
1
0
0
1
2
20
30
Residuals vs. Fits
Residual
Frequency
Histogram of Residuals
-4 -3 -2 -1
10
Observation Number
3
Residual
Lecture Notes to Accompany
Applied Business Statistics
4
5
5
4
3
2
1
0
-1
-2
-3
-4
-5
0
50
100
Fit
Page 118 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
EXCEL SOLUTION
Create Batchsiz^2 column (must be adjacent to Batchsiz), where Batchsiz^2 =
Batchsiz * Batchsiz
Data > Data Analysis > Regression
Input:
Input Y range: Defects
Input X range: highlight Batchsiz Batchsiz^2 range of data
Labels: <select>
Output Range: <type in a cell address here>
Residuals:
Residuals:
<do not select>
Standard Residuals: <do not select>
Residual Plots:
< select >
Line Fit Plots:
<select>
Normal Probability:
Normal Probability Plots: < select>
Lecture Notes to Accompany
Applied Business Statistics
Page 119 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
GENERATING PREDICTION AND CONFIDENCE INTERVALS FOR Y
(Minitab)
Values for batch and batch^2 to
predict defect rates
Stat > Regression > Regression > Option
Lecture Notes to Accompany
Applied Business Statistics
Columns where the new values
for the predictors variables can
be found
Page 120 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Predicted values for
number of defects
(Note: Excel does not have this capability)
Lecture Notes to Accompany
Applied Business Statistics
Page 121 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Problem 11.2
In trying to look at the effects of shopping center expansion, the Commerce Department decided to
look at the relationship between the number of shopping centers and the retail sales for different
states in the same region,. It collected the data for the North Central states in the US and found the
following:
State
Illinois
Indiana
Michigan
Minnesota
Ohio
Iowa
Missouri
Wisconsin
South Dakota
North Dakota
Nebraska
Kansas
Number of
Shopping Centers
2096
905
1018
471
1704
308
887
625
58
87
264
481
Retail Sales
($ billion)
41.8
21.4
25.3
13.9
41.6
7.5
22.7
14.6
1.3
2.1
5.7
11.6
(a) Create a scatter plot of the data.
(b) Find the regression equation relating retail sales and number of shopping centers.
(c) Plot the regression line on the same plot as the data. Do you think the line fits the data well?
Why or why not?
(d) Use the regression line to predict retail sales for each state.
(e) Calculate the residuals for each state. Which state has the largest residual? Which state has
the smallest? Do the residuals support your answer to part (d)?
(f) Find the standard error of the estimate.
Lecture Notes to Accompany
Applied Business Statistics
Page 122 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
45
Retail Sales ($billion)
40
35
30
25
Series1
20
15
10
5
0
0
500
1000
1500
2000
2500
Number of Shopping Centers
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.9866955
R Square
0.9735681
Adjusted R
Square
0.9709249
Standard Error
2.3387601
Observations
12
ANOVA
df
1
10
11
SS
2014.691
54.698
2069.389
Coefficients
1.492612
0.021517
Standard
Error
1.071387
0.001121
Regression
Residual
Total
Intercept
X Variable 1
Lecture Notes to Accompany
Applied Business Statistics
MS
2014.691
5.470
F
368.330
Significance
F
0.000
t Stat
1.393158
19.191926
P-value
0.193764
0.000000
Lower 95%
-0.894588
0.019019
Upper 95%
3.879812
0.024015
Page 123 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Problem 11.3
As part of an international study on energy consumption, data were collected on the number of
cars in a country and the total travel in kilometers. The data for 12 of the countries are shown
here:
Country
US
Finland
Denmark
Britain
Australia
Sweden
Netherlands
France
Norway
Italy
Germany
Japan
Total Cars
Travel
(million)
Travel
(billion km)
142.35
1.82
1.66
21.32
8.53
3.32
5.53
23.27
1.59
26.12
43.75
40.25
3140.29
34.66
30.76
352.76
138.22
53.21
83.69
348.2
23.54
367.85
608.52
439.30
(a) Create a scatterplot of the data. Do you think that there is a relationship between the number
of kilometers traveled and the number of cars?
(b) Find the least-squares regression line for the data. Interpret the value of the slope.
(c) Does the intercept make sense for these data? Why or why not?
(d) Plot the regression line on the same plot with the data. Does the line make you feel confident
about predicting travel as a function of the number of cars?
(e) Use the regression line to predict the number of kilometers for Sweden and Japan. How well
do the predictions agree with the original data?
Lecture Notes to Accompany
Applied Business Statistics
Page 124 of 148
School of Business, SOU
Traveled (in biliion Kilometer)
BA 282: Applied Business Statistics
Course Outline
3500
3000
2500
2000
1500
1000
500
0
-500 0
20
40
60
80
100
120
140
160
Total Cars (in million)
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.98503096
R Square
0.97028599
Adjusted R
Square
0.96731458
Standard Error
156.136088
Observations
12
ANOVA
1
10
11
SS
7960585.694
243784.7804
8204370.475
MS
7960586
24378.48
F
326.5415
Significance
F
0.0000
Coefficients
-106.2068
21.5814
Standard Error
55.1609
1.1943
t Stat
-1.9254
18.0705
P-value
0.0831
0.0000
Lower 95%
-229.1129
18.9204
df
Regression
Residual
Total
Intercept
X Variable 1
Lecture Notes to Accompany
Applied Business Statistics
Upper
95%
16.6992
24.2425
Page 125 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Problem 11.23
How much does advertising impact market penetration? To assess the impact of advertising in the
tobacco industry, a study looked at the amount of money spent on advertising a particular brand of
cigarettes and brand preference among adolescents and adults. The data are shown here:
Brand
Marlboro
Camel
Newport
Kool
Winston
Benson & Hedges
Salem
Advertising
($ million)
75
43
35
21
17
4
3
Brand Preferences
Adolescent
Adult
(%)
(%)
60
23.5
13.3
6.7
12.7
4.8
1.2
3.9
1.2
3.9
1
3.0
0.3
2.5
(a) Look at the data for brand preference for adolescents and amount spent on
advertising. Which variable is the dependent variable? Which is the
independent variable?
(b) Create a scatter plot of advertising and adolescent brand preference. Do you
think that there is a linear relationship between the two variables? Why or
why not?
(c) Now create another scatter plot using adult brand preference instead. How
does this plot compare to the one for adolescent brand preference? From the
plots, do you think that adolescent or adult brand preference is more strongly
related to advertising expenditures? Why?
(d) Find the least squares line for adolescent brand and advertising expenditures
(e) Interpret the meaning of the slope and intercept of the model. Do they make
sense?
(f) Use the model to predict adolescent brand preference for each brand studied.
How well do the predicted values agree with the actual data?
(g) Using a 0.05 significance level, is the model significant?
Lecture Notes to Accompany
Applied Business Statistics
Page 126 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Brand Preference (%)
Adolescent Market
70
60
50
40
30
20
10
0
0
10
20
30
40
50
60
70
80
60
70
80
Adevertising ($million)
Brand Preference (%)
Adult Market
25
20
15
10
5
0
0
10
20
30
40
50
Adevertising ($million)
Lecture Notes to Accompany
Applied Business Statistics
Page 127 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
ADOLESCENT MARKET
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.923547
R Square
0.852939
Adjusted R Square
0.823527
Standard Error
9.063086
Observations
7
ANOVA
MS
F
Regression
df
1
2382.011
SS
2382.011
28.99957
Residual
5
410.6976
82.13953
Total
6
2792.709
Coefficients
Standard Error
Intercept
-9.42472
5.365513
-1.75654
0.139344
-23.2172
4.367747
Advertising ($m)
0.786227
0.146
5.385125
0.002978
0.410923
1.161531
t Stat
P-value
Significance F
0.002978
Lower 95%
Upper
95%
ADULT MARKET
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.901096
R Square
0.811974
Adjusted R Square
0.774369
Standard Error
3.536488
Observations
7
ANOVA
MS
F
Regression
df
1
270.0463
SS
270.0463
21.59205
Residual
5
62.53373
12.50675
Total
6
332.58
Coefficients
Standard Error
Intercept
-0.58794
2.093665
-0.28082
0.790098
-5.96987
4.793986
Advertising ($m)
0.264725
0.05697
4.646724
0.005599
0.118279
0.411172
Lecture Notes to Accompany
Applied Business Statistics
t Stat
P-value
Significance F
0.005599
Lower 95%
Upper
95%
Page 128 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Multiple Linear Regression
(560 to 595)
Problem 12.1
A group of legislators wanted to look at factors that affect the number of traffic fatalities. They
collected some data for 1994 from the NTSB on the number of fatalities for 50 states and the District
of Columbia, the number of licensed drivers, the number of registered vehicles, and the number of
vehicle miles traveled. A portion of the data is shown on page 584. Full dataset is in traffat.xls
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
0.982548538
0.96540163
0.963193224
154.5407481
51
ANOVA
Regression
Residual
Total
df
3
47
50
SS
31321046.9
1122493.613
32443540.51
MS
10440349
23882.84
F
Significance F
437.1485 2.54274E-34
Intercept
Licensed Drivers
Registered Vehicles
Vehicle Miles Travelled
Coefficients
51.7481659
0.06294764
-0.211896991
0.029349954
Standard Error
30.43306219
0.048829545
0.055989427
0.003525079
t Stat
1.700393
1.28913
-3.78459
8.326041
P-value
0.095666
0.203662
0.000436
8.34E-11
Upper
Lower 95% 95%
-9.475200509 112.9715
-0.035284642 0.16118
-0.324533083 -0.09926
0.022258416 0.036441
(a) How many independent variables are there in the model proposed? What are they?
(b) Use the computer output to write won the regression model.
(c) Interpret the coefficients of the model.
(d) Use the model to predict the number of traffic fatalities for the states shown in the data table.
Lecture Notes to Accompany
Applied Business Statistics
Page 129 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
(e) Compare the predicted values from the model to the actual values. Based on the plot, does
the model do a good job of predicting the number of traffic fatalities?
Lecture Notes to Accompany
Applied Business Statistics
Page 130 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Problem 12.9
In the problem about number of traffic fatalities the model was rerun, dropping the data on number of
licensed drivers that had the lowest t statistic. The output is shown below:
Regression Analysis: Traffic Fata versus Registered V, Vehicle Mile
The regression equation is
Traffic Fatalities = 46.0 - 0.163 Registered Vehicles
+ 0.0300 Vehicle Miles Travelled
Predictor
Constant
Register
Vehicle
Coef
46.04
-0.16280
0.029996
S = 155.6
SE Coef
30.32
0.04132
0.003513
R-Sq = 96.4%
T
1.52
-3.94
8.54
P
0.135
0.000
0.000
R-Sq(adj) = 96.3%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
2
48
50
SS
31281357
1162183
32443541
MS
15640679
24212
F
645.98
P
0.000
(a) Write down the equation of the new two-variable model.
(b) Compare the new model to the model with three variables. How much does the model
change when number of licensed drivers is dropped?
(c) Compare the value of R2 for both models. What does this make you think about the decision
to drop number of licensed drivers from the model?
(d) Would you consider a two-variable model a good model? Why or why not?
(e) Based on the value of the R2 would you be satisfied with this model or would you want to
consider other variables?
Lecture Notes to Accompany
Applied Business Statistics
Page 131 of 148
School of Business, SOU
BA 282: Applied Business Statistics
BA 282: APPLIED BUSINESS STATISTICS
Fall 1999
Midterm Exam
Course Outline
Name____________________
Part 1: Multiple Choice
1.
a.
b.
c.
d.
e.
Using the Standard Normal distribution, the area between –1.5 and –2.4 is:
0.9250
0.0586
-0.0568
-0.9250
None of the above
For questions 2 to 7, select the most appropriate pair of hypotheses for each statement.
2.
a.
b.
c.
d.
e.
The average age of SOU students is more than 21.5 years.
H0:   21.5 H1:  > 21.5
H0:   21.5 H1:  < 21.5
H0:  = 21.5 H1:   21.5
H0:   21.5 H1:  > 21.5
H0:   21.5 H1:  < 21.5
3.
a.
b.
c.
d.
e.
A new medication for headache is touted to relieve pain in less than 5 minutes.
H0:   5
H1:  > 5
H0:   5
H1:  < 5
H0:  = 5
H1:   5
H0:   5
H1:  > 5
H0:   5
H1:  < 5
4.
A CPA review program is advertised as “guaranteed to improve your CPA test scores.” Fifty graduating accounting
students from a business school were randomly assigned to two groups – group in which students didn’t participate (D) in
the review program and the other group participated (P) in the review program.
a.
H0: P 
D
H1: P > D
b.
H0: P 
D
H1: P < D
c.
H0: P =
D
H1: P  D
d.
H0: P 
D
H1: P > D
e.
H0: P 
D
H1: P < D
5.
A councilperson claims that there is no difference in the level of support to Measure 51 among Republican (R) and
Democratic (D) voters in the Rogue Valley.
a.
H0: R 
D
H1: R > D
b.
H0: R 
D
H1: R < D
c.
H0: R =
D
H1: R D
d.
H0: R 
D
H1: R < D
e.
H0: R =
D
H1: R  D
6.
a.
b.
c.
d.
A filling machine is supposed to fill an average of 12 ounces when operating properly.
H0:   12
H1:  > 12
H0:  12
H1:  < 12
H0:  = 12
H1:   12
H0:   12
H1:  > 12
Lecture Notes to Accompany
Applied Business Statistics
Page 132 of 148
School of Business, SOU
BA 282: Applied Business Statistics
e.
H0:   12
Course Outline
H1:  < 12
7.
A right-tailed test of a population mean results in a p-value that is practically zero. This means that the sample
represents:
a.
A weak evidence supporting the null hypothesis
b.
A weak evidence supporting the alternative hypothesis
c.
A strong evidence supporting the null hypothesis
d.
A strong evidence supporting the alternative hypothesis
8.
a.
b.
c.
d.
Of the following variations, which does not belong?
Common cause variation
Special cause variation
Explained variation
Nonrandom variation
9.
A confidence interval estimate has two components – a point estimate and a margin of error. Which of these two
components is affected by the confidence level?
a.
Point estimate
b.
Margin of Error
c.
None of the above
10. A sample statistic (e.g. sample mean, sample proportion, or sample variance) is a random variable. Which type of
random variable is a sample statistic?
a.
Continuous
b.
Discrete
11. "The distribution of the sample means of any type of distribution will approximate the normal distribution, as the
sample size increases." This sounds like the definition of the
a.
Standard Normal Distribution
b.
Z-distribution
c.
Central Limit Theorem
d.
Binomial distribution approximated by the Normal distribution
e.
None of the above
12.
a.
b.
c.
d.
e.
Which of the following does not belong?
s
p
x-bar

All of the above belong to the same group
13.
a.
b.
c.
d.
e.
Which of the following is NOT true of a sample mean?
It is a point estimate
It is a statistic
It is a continuous random variable
All of the above (a-c) are true of a sample mean
None of the above (a-c) are true of a sample mean
14.
a.
b.
c.
The two components of a confidence interval estimate of a population parameter are:
Confidence Level and Margin of Error
Sample Size and Statistic
Point Estimate and Margin of Error
Lecture Notes to Accompany
Applied Business Statistics
Page 133 of 148
School of Business, SOU
BA 282: Applied Business Statistics
d.
e.
Course Outline
Sample Mean and Sample Proportion
Margin of Error and Sample Size
15. The conditions in using the Normal distribution as an approximation to the binomial distribution are that np and n(1p) be both at least 5.
a.
True
b.
False
16. Which of the following will be the benefit derived from using a larger sample size in estimating an unknown
population parameter?
a.
A larger margin of error
b.
A smaller margin of error
c.
A lower confidence level
d.
A higher confidence level
e.
(b) and/or (d)
f.
(a) and/or (c)
17.
a.
b.
c.
d.
e.
Using the Standard Normal distribution table, find the area below the z-score of –2.50.
-0.4938
0.4938
0.9938
0.0062
None of the above
18.
a.
b.
c.
d.
e.
Which of the following pairs of hypotheses is NOT correct?
H0:   3.5
H1:  > 3.5
H0:   3.5
H1:  < 3.5
H0: p < 0.035 H1: p > 0.035
All of the above are correct
None of the above are correct
19. For a one-tailed test of a population mean the significance level has been set at 0.01. Assume that the population
standard deviation is not known, sample size is 10, and the population is normally distributed. What distribution is
appropriate for performing the hypothesis test?
a.
z-test
b.
t-test
c.
F-test
d.
Binomial
e.
None of the above
20.
a.
b.
c.
d.
e.
In testing the mean of a population, which of the following is a necessary condition for using a t distribution?
n is small
 is not known
The population is infinite
All of these
(a) and (b) but not (c)
21. Assume that you took a sample and calculated the sample mean as 100. You then calculated the lower and upper
limit of a 90 percent confidence interval for  to be 90 and 110, respectively. What is the margin of error of the estimate?
a.
0.10
b.
90 percent
c.
20
Lecture Notes to Accompany
Applied Business Statistics
Page 134 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
d.
e.
10
100
22.
a.
b.
c.
d.
e.
A single value used to estimate an unknown population parameter is known as a(n)
Point estimate
Interval estimate
Statistic
Parameter
(a) and (c)
23. In hypothesis testing, we conclude to reject or fail to reject the null hypothesis by comparing the computed statistic
to a critical statistic. Another form of decision rule is by comparing the p-value to a significance level. Which of the
following is a correct decision rule?
a.
Reject H0 if z > p-value
b.
Reject H0 if z > 
c.
Reject H0 if p-value > 
d.
Reject H0 if p-value < 
e.
All of the above are correct forms of decision rule
24. Which of the following variations cannot be removed but only can be reduced by redesigning or improving the
process?
a.
common cause variation
b.
special cause variation
25. If n = 45 and  = 0.05, then the critical value of z for testing the hypotheses
H0:   3.5 and H1:  > 3.5 is
a.
0.0199
b.
1.96
c.
-1.96
d.
-1.645
e.
1.645
26.
a.
b.
c.
d.
e.
When a null hypothesis is rejected, it is possible that:
A correct conclusion has been made
A Type II error has been made
A Type I error has been made
(a) or (b) is correct
(a) or (c) is correct
27.
a.
b.
c.
d.
e.
Which of the following actions will reduce the Type I and II errors simultaneously?
Decreasing the significance level of a test
Increasing the confidence level of a test
Decreasing Beta error
Increasing the sample size
Decreasing the sample size
28. One concludes whether to “reject” or “fail to reject” the null hypothesis based on a decision rule. The decision rule is
nothing more than a comparison of a calculated value and a critical value. Which of the two is based on the significance
level of a test?
a.
Calculated value
b.
Critical value
Lecture Notes to Accompany
Applied Business Statistics
Page 135 of 148
School of Business, SOU
BA 282: Applied Business Statistics
29.
a.
b.
c.
d.
e.
Course Outline
Which of the following is NOT true of a critical value
It marks the boundary between the “reject H0“ and the “fail to reject to reject H0“ regions
It is based on the significance level of a test
It is determined from the statistic derived from a sample
All of the above are true
None of the above are true
30. If one were to perform a hypothesis test using the following hypotheses:
H0: shipment is GOOD and H1: shipment is BAD Which of the two types of errors is called the Producer’s risk?
a.
Type I (alpha)
b.
Type II (beta)
c.
Both (a) and (b)
d.
Neither (a) nor (b)
31. When the sample size as a proportion of the population size (n/N) gets larger, the value of the finite correction
multiplier approaches which value?
a.
0
b.
1
c.
None of the above
32. In statistical process control charts are used as tools to monitor processes. All processes exhibit variability. When
NOT in control, the ___________ variability is said to be present:
a.
common cause variability
b.
special cause variability
c.
none of the above
33. A 5-week diet program is claimed be effective in reducing the weights of participants in the program. Skeptical about
the claim, you randomly select 10 applicants and weigh each one before and after the 5-week period. This problem
involves:
a.
Comparison of two population proportions
b.
Comparison of means of two independent samples
c.
Comparison of means two dependent samples
d.
Comparison of variations of two independent samples
34. Which of the following sampling distributions would be used in comparing means of two populations, with n1 = 32, n2
= 40?
a.
Z test
b.
pooled t test (or equal variances t test)
c.
unpooled t test (or unequal variances t test)
d.
Binomial
Use for questions 35-37
You work for a market research agency and you were asked to estimate the proportion of people with personal
computers who are using Windows 97 as an operating system. How many people will you need to survey for
your estimate to be within 2 percentage points of the actual value and be 90 percent confident with this
estimate?
35.
a.
b.
c.
d.
e.
The problem described above involves:
Testing a hypothesis about a population mean
Computing a confidence interval estimate of a population average
Computing a sample size to estimate a true population proportion
Estimating a confidence interval estimate of a true population mean
None of the above
Lecture Notes to Accompany
Applied Business Statistics
Page 136 of 148
School of Business, SOU
BA 282: Applied Business Statistics
36.
a.
b.
c.
d.
e.
Course Outline
How much is the stated margin of error?
90 percent
10 percent
0.10
2 percentage points
1.645
37. Give the z-value that will be used for computing the 90 percent confidence interval estimate of the true population
parameter.
a.
1.96
b.
1.32
c.
0.10
d.
2 percentage points
e.
1.645
Use for questions 38- 43
C. Garr Smoke claims that no more than 5 percent of the 40-60 male group smoke cigars. Of
2500 males of this age group you recently sampled, 200 said they smoke cigars. At 0.05
significance level, do the refute C. Garr Smoke’s belief?
38.
a.
b.
c.
d.
e.
The sample statistic in this problem is:
Population proportion
Population mean
Sample proportion
Sample mean
None of the above
39.
a.
b.
c.
d.
e.
In this problem the statement “no more than 5 percent” is:
The hypothesized population proportion
The hypothesized population mean
The sample proportion
The sample mean
None of the above
40.
a.
b.
c.
d.
e.
State the null and alternative hypotheses of this problem
H0:   5
H1:  > 5
H0:   5
H1:  < 5
H0:  = 5
H1:   5
H0:   0.05 H1:  > 0.05
H0:   0.05 H1:  < 0.05
41.
a.
b.
c.
d.
e.
Identify the critical value for the test (one tail).
0.0199
1.96
-1.96
-1.645
1.645
42.
a.
b.
If the computed value for the test is 6.88, the p-value is
almost 1
almost 0
Lecture Notes to Accompany
Applied Business Statistics
Page 137 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
c.
close to the significance level
43.
a.
b.
If the resulting p-value of the test is less than the significance level, then you would conclude to
Reject the null hypothesis
Fail to reject the null hypothesis
Use for questions 44- 47
A torque wrench used in the final assembly of cylinder heads requires a process average
of 135 lbs-ft. The process is known to have a standard deviation of 5.0 lbs-ft. For a
simple random sample of 45 sample nuts that the machine has recently tightened, the
sample average is 137 lbs-ft. Using a 0.05 significance level, determine whether the
machine is operating at the desired level.
44.
a.
b.
c.
c.
The appropriate hypotheses for the problem are:
H0:   135 H1:  > 135
H0:  135 H1:  < 135
H0:  = 135 H1:   135
None of the above are correct
45.
a.
b.
c.
The appropriate distribution for the test above is
t-distribution
z-distribution
F-distribution
46.
a.
b.
c.
d.
e.
The computed value is
-0.40
0.40
-2.68
2.68
None of the above
47. Assuming that the critical value for this problem is 1.645, and another sample produced a computed value of 1.55.
For this sample your statistical conclusion is to:
a.
Reject the null hypothesis and conclude that the process is operating at the desired level
b.
Accept the null hypothesis and conclude that the process is operating at the desired level
c.
Reject the null hypothesis and conclude that the process is not operating at the desired level
d.
Accept the null hypothesis and conclude that the process is not operating at the desired level
Use for questions 48- 50
A pharmaceutical company is testing two new compounds intended to reduce blood-pressure
levels. The compounds are administered to two different sets of lab animals. In Group
1, 71 of 100 animals tested respond to drug 1 with lower blood-pressure levels. In Group
2, 58 of 90 animals tested respond to drug 2 with lower blood-pressure levels. The
company wants to test at the .05 level whether drug 1 is more effective in reducing blood
pressure levels than drug 2.
48.
a.
b.
c.
d.
The problem involves which of the following procedures?
Comparison of two population proportions
Comparison of means of two populations using dependent samples
Comparison of means of two populations using independent samples
Comparison of variances of two populations
Lecture Notes to Accompany
Applied Business Statistics
Page 138 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
49. Using the following subscripts for the two groups: 1- drug 1; 2- drug 2, which of the following is the most appropriate
pair of hypotheses?
a.
H0: 1 
2
H1: 1 > 2
b.
H0: 1 
2
H1: 1 < 2
c.
H0: 1 
2
H1: 1 > 2
d.H0: 1 
2
H1: 1 < 2
e.H0: 1 =
2
H1: 1  2
50.
a.
b.
c.
d.
e.
(Bonus) On which day is Thanksgiving celebrated?
Monday
Tuesday
Wednesday
Thursday
Friday
Lecture Notes to Accompany
Applied Business Statistics
Page 139 of 148
School of Business, SOU
BA 282: Applied Business Statistics
BA 282: Applied Business Statistics
Final Exam -- Spring 1999
Course Outline
Name ______________________
1.
Find the value of 2 .05,20
2.
Find the value of F.05,2,10
3.
a.
b.
c.
d.
In ANOVA, we will tend to not reject the null hypothesis of equal population means whenever the calculated F is:
small
large
equal to the critical F
none of the above is correct
4.
a.
b.
c.
d.
e.
In a two-way ANOVA, in the xijk =  + i + j + ()ijk+ ijk model, the term ()ij represents
random error in the sampling process
the effect that is due to factor A
the effect that is due to factor B
the interaction effect between level i of factor A and level j of factor B
the level of significance at which the null hypothesis is rejected
5.
a.
b.
c.
d.
Which of the following is a typical source of internal secondary data for business research?
accounting or financial reports
sales information
production data
all of the above
For questions 6 to 10, refer to the plot on the right.
6.
The equation for the line going through the points
would take the form of:
a. Y = a + b+ c
b. Y = a - bX
c. Y = x -1
d. Y = a + bX2
e. None of these is correct
7. In this particular problem, the researcher is trying to
predict:
a. Quantity demanded based on price
b. Price based on quantity demanded
c. Both price and quantity demanded
d. None of these is correct
8.
a.
b.
c.
d.
If computed, the sign of b in the equation would be:
Either positive or negative
Positive
Negative
None of the above
9. The correlation coefficient of the problem, if computed, could be:
a. 1.00
b. 0
Lecture Notes to Accompany
Applied Business Statistics
Page 140 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
c. -1.0
d. None of the above
10.
a.
b.
c.
d.
e.
Which of the following won't be true for the regression resulting from the data in the plot above?
r2 = 100%
r=1
sy.x = 0
ANOVA p-value = 0
All of the above are true
11.
a.
b.
c.
d.
In multiple regression analysis, multicollinearity means:
High correlation between the dependent variable and the independent variables
A high correlation between the response variable and some independent variables
A condition where 2 or more of the independent variables are highly correlated with each other
None of the above
Use the following regression output to answer questions 12 to 15.
The regression equation is
sales = 46.5 + 52.6 ad
Predictor
Constant
ad
Coef
46.486
52.57
s = 6.837
Stdev
9.885
10.26
R-sq = 76.6%
t-ratio
4.70
5.12
p
0.000
0.000
R-sq(adj) = 73.7%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
1
8
9
SS
1226.9
374.0
1600.9
MS
1226.9
46.7
F
26.25
p
0.000
12. Identify the coefficient of determination __________
13. Write the standard deviation of the slope (sb )____________
14. Write the standard error of the estimate (sy.x)_________
15. Identify the slope ______________
16. List ONE of the 3 assumptions that underlie the simple regression model ___________
17. Suppose you wished to investigate the effect of consumption of alcohol (Y/N) and a common over-the-counter cold medicine
(Y/N) on a person's reaction time. The appropriate statistical procedure for this experiment is:
a. Chi-square test of independence
b. Analysis of Variance (Two factor)
c. Analysis of Variance (One factor)
d. Regression analysis
e. Discriminant analysis
18. Which of the following is not a linear function?
Lecture Notes to Accompany
Applied Business Statistics
Page 141 of 148
School of Business, SOU
BA 282: Applied Business Statistics
a.
b.
c.
d.
e.
Course Outline
Y = a + bX
Distance = Rate of speed  Travel time
Total Profit = profit per unit  Number units sold
Total Cost = Fixed Cost + Unit Variable Cost  Quantity Produced
All of the above are linear functions
19. There are two main uses of a multiple linear model: 1) for slope analysis, or 2) for estimating the value of Y given a value of X's.
For which of the two uses is multicollinearity not a problem? _____________
20. The _____ interval is the interval that contains the actual average value of the response variable for a specific value of X
a. Confidence
b. Prediction
21. The _____ interval is the interval that contains the actual value of the response variable for a specific value of X
a. Confidence
b. Prediction
22.
a.
b.
c.
d.
e.
Residual is also known as:
Error
Actual Y  Estimated Y
Observed Y  Fitted Y
None of the above
All of the above
PROBLEM 1:
Ryerson Coil Pickling manager wishes to know how the level of pickling operation (measured in tons) affects the monthly overtime
expense of the plant. He collects data for the last 17 months on actual tonnage processed and overtime cost. He then performs
regression analysis on the data. Using the attached output, answer questions 23 to 37.
Partial Data:
Production (Tons)
Overtime Expense
Month
29,668 23,577 27,117
$11,000 $8,000 $9,000
1
...
...
2
19,365
$7,000
3
...
17
23. What is the response variable in this problem? __________
24. What is(are) the independent variable(s)? ______________
25. Give the linear regression equation generated by the 17 observations. __________
26. Using the regression equation, estimate the plant’s overtime expense for a month where 30,000 tons of steel is planned to be
processed. __________
27.
a.
b.
c.
d.
The correlation between the response variable and the predictor variables could be best described as:
Perfectly positively linear
Perfectly negatively linear
Positively correlated
Negatively correlated
28. For the planned production described in #26, give the 95% interval estimate for the plant’s actual overtime expense.
____________________
Lecture Notes to Accompany
Applied Business Statistics
Page 142 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
29. How much of the variation in overtime expense is explained by production level? _____________
30.
a.
b.
c.
d.
e.
For every ton increase in processing level, plant overtime expense salary is expected to:
increase by $0.587
increase by $5.87
decrease by $0.587
decrease by $5.87
None of the above
31. If for the year 2000 the plant plans to process 30,000 tons of coils each month, give the 95% interval estimate of the actual
average monthly overtime expense. ____________
32. What is the coefficient of correlation of this model?_____________
33. Using the slope test set up the null and alternative hypotheses to determine whether the model you identified in #25 is significant.
34. Write the decision rule for the hypotheses in #33 ______________
35. Identify the computed and critical values corresponding to the decision rule in #34. ________, ________
36. Based on #35, is the model significant? __________
37. Compute the 95% confidence interval for the actual change in overtime expense for every ton increase in production level.
_____________
PROBLEM 2:
In a recent survey of winter 1999 BA 282 students, 41 students from the Medford section and the Ashland sections responded. One of
the objectives of the survey was to investigate what factors could possibly affect students’ success in the course's midterm exam. A
regression analysis was performed in which 4 explanatory variables were included in the model. The variables were the following:
GPA – student’s overall GPA to date
243GRADE – student’s final grade in the prerequisite course, MA 243
243WHEN – the number of terms ago the student took MA 243
WHERE – where the student is currently taking BA 282  0 code for Ashland section, 1 for Medford
Use the attached regression output in answering questions 38 to 49
38. What is the dependent variable in this problem? __________
39. What are the predictor variables? ______________
40. Give the linear regression equation generated by the 41 observations. __________
41. Give an estimated midterm grade for a student in the Medford section who earned 3.5 in MA 243 grade a term ago, and currently
holds a 3.25 overall GPA. __________
42. How much of the variation in BA 282 midterm exam can be explained by the regression equation? _____________
43.
a.
b.
c.
For every unit increase in MA 243 grade, BA 282 midterm grade is estimated to
increase by .330 units
decrease by .330 units
increase by .781 units
Lecture Notes to Accompany
Applied Business Statistics
Page 143 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
d. decrease by .781 units
e. None of the above
44. Compute the 80% confidence interval for the actual slope of the variable MA 243. ___________
45. What is the multiple coefficient of determination of this model? __________
46. Using the ANOVA test set up the null and alternative hypotheses to determine whether the model is significant.
47. Write the decision rule for the hypotheses in #46 ______________
48. Identify the computed and critical values (use a 0.05 significance level) corresponding to the decision rule in #47 ________,
________
49. Based on #48, is the model significant? __________
PROBLEM 3:
Given a significance level of 0.05, is there a significant difference in the average midterm grades of the students in the 3 sections of
BA 282? (output attached)
50.
a.
b.
c.
d.
e.
The most appropriate testing procedure for the problem stated above is:
Chi-square test of independence
Regression and correlation
Discriminant analysis
Oneway ANOVA
Twoway ANOVA
51. Write the appropriate hypotheses and decision rule for comparing the average test scores of the three sections.
52. Give the computed and critical F values for carrying the test in #51____________, ____________
53. At 0.05 significance level, which of the three sections has the largest average test scores (note: your answer here should be
consistent with your answer to #51 and 52)? _______
PROBLEM 4:
A test was conducted to determine if grade in MA243, or when MA 243 was taken, has any effect on the midterm grades in BA 282.
Also of interest in the survey was to determine whether grade in MA 243 and the time it was taken have some interaction effect on the
BA 282 midterm grades. But before the ANOVA test was conducted, the raw data were recoded  MA 243 grades were re-classified
into two groups – A’s and Non A’s. The term it was taken was also reclassified into two groups – one term ago and two or more terms
ago. Grades in BA 282 midterm (not changed) are in 4 to 0 scale, representing A to F letter grades. Also, since twoway ANOVA
with replication requires that each cell contain equal samples, six students from combination of MA 243 grade and term group were
randomly selected to fill the cells – the resulting crosstabulation of the observations is shown below:
MA 243 Grade
One Term Ago
A
Non A's
When MA 243 was Taken
Two or More Ago
3,4,3,2,4,3
2,3,3,2,4,3
3,0,2,1,2,3
1,1,1,2,1,1
Use 0.05 significance level for the following tests.
54. Write the appropriate hypotheses for testing whether MA 243 grade has an effect on BA 282 midterm exam. ________________
Lecture Notes to Accompany
Applied Business Statistics
Page 144 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
55. Does MA 243 grade have a significant effect on BA 282 midterm grades? If yes, which grade category performs better?
_____________
56. Write the appropriate hypotheses for testing whether when MA 243 was taken has an effect on BA 282 midterm exam.
________________
57. Does the time when MA 243 was taken have a significant effect on BA 282 midterm grade? If yes, which time has higher average
midterm grades? _____________
58. Is there a significant interaction between MA 234 grade and the time when it was taken on BA 282 midterm scores?
59 and 60 BONUS
Lecture Notes to Accompany
Applied Business Statistics
Page 145 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
PROBLEM 1:
MTB > Regress 'P_overt' 1 'Ton_prod';
SUBC>
Predict 30000.
Regression Analysis
The regression equation is
P_overt = - 6776 + 0.587 Ton_prod
Predictor
Constant
Ton_prod
Coef
-6776
0.5868
S = 2566
StDev
4178
0.1760
R-Sq = 42.6%
T
-1.62
3.33
P
0.126
0.005
R-Sq(adj) = 38.7%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
15
16
SS
73217721
98782279
172000000
MS
73217721
6585485
F
11.12
P
0.005
Predicted Values
Fit
10828
StDev Fit
1306
(
95.0% CI
8044,
13611)
(
95.0% PI
4691,
16965)
PROBLEM 2:
MTB > Regress 'MIDTERM' 4 'GPA' '243GRADE' '243WHEN' 'WHERE';
SUBC>
Constant;
SUBC>
Brief 2.
The regression equation is
MIDTERM = - 0.92 + 0.781 GPA + 0.330 243GRADE - 0.242 243WHEN + 0.601 WHERE
41 cases used 5 cases contain missing values
Predictor
Constant
GPA
243GRADE
243WHEN
WHERE
S = 0.9111
Coef
-0.923
0.7813
0.3302
-0.2421
0.6011
StDev
1.020
0.3569
0.1977
0.1339
0.3957
R-Sq = 40.7%
T
-0.90
2.19
1.67
-1.81
1.52
P
0.371
0.035
0.104
0.079
0.137
R-Sq(adj) = 34.1%
Analysis of Variance
Source
DF
Lecture Notes to Accompany
Applied Business Statistics
SS
MS
F
P
Page 146 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Regression
Residual Error
Total
Problem 3
4
36
40
Course Outline
20.5079
29.8823
50.3902
5.1270
0.8301
Analysis of Variance for MIDTERM
Source
DF
SS
MS
SECTION
2
4.48
2.24
Error
42
49.17
1.17
Total
44
53.64
Level
MW
TR-A
TR-M
N
12
21
12
Pooled StDev =
Mean
1.750
2.000
2.583
StDev
1.055
1.225
0.793
1.082
F
1.91
6.18
0.001
P
0.160
Individual 95% CIs For Mean
Based on Pooled StDev
--+---------+---------+---------+---(---------*----------)
(-------*-------)
(---------*----------)
--+---------+---------+---------+---1.20
1.80
2.40
3.00
Problem 4
Two-way Analysis of Variance
Analysis of Variance for MIDTERM
Source
DF
SS
MS
ma243
1
13.500
13.500
ma243whe
1
1.500
1.500
Interaction
1
0.167
0.167
Error
20
13.333
0.667
Total
23
28.500
ma243
A
B,C,D
Mean
3.00
1.50
ma243whe
One Term
Two and
Mean
2.50
2.00
F
20.25
2.25
0.25
P
0.000
0.149
0.623
Individual 95% CI
----+---------+---------+---------+------(-------*-------)
(-------*-------)
----+---------+---------+---------+------1.20
1.80
2.40
3.00
Individual 95% CI
---+---------+---------+---------+-------(------------*-----------)
(-----------*-----------)
---+---------+---------+---------+-------1.60
2.00
2.40
2.80
Lecture Notes to Accompany
Applied Business Statistics
Page 147 of 148
School of Business, SOU
BA 282: Applied Business Statistics
Course Outline
Average BA 282 Midterm Grades
Interaction Plot - Means for MIDTERM
MA 243 Grade
A
3.2
B,C,D
2.2
1.2
One Term Ago
Two and More Ago
When MA 243 Was Taken
Lecture Notes to Accompany
Applied Business Statistics
Page 148 of 148
School of Business, SOU