Download Lecture notes - The University of Tennessee at Chattanooga

Document related concepts

Psychometrics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Lecture Notes for
Applied Business Statistics
A Training Program for BCBS
Professor Ahmadi, Ph.D.
Chapter 1
Glossary of Terms:

Statistics

Data

Data Set

Elements

Variable

Observations

Sample and Population

Descriptive Statistics

Statistical Inference

Qualitative and Quantitative Data
Scales of Measurement:

Nominal Scale

Ordinal Scale

Interval Scale

Ratio Scale
Chapter 2
Summarizing Quantitative Data
Problem 1.
Daily earnings of a sample of twelve individuals are shown below:
100, 126, 138, 142, 148, 150, 168, 182, 191, 193, 195, 199
Summarize the above data by constructing:
a.
b.
c.
d.
e.
f.
a frequency distribution
a cumulative frequency distribution
a relative frequency distribution
a cumulative relative frequency distribution
a histogram
an ogive
Class
100 - 119
120 - 139
140 - 159
160 - 179
180 - 199
frequency
cumulative
frequency
relative
frequency
cumulative
relative frequency
DOT PLOT
Problem 2. In a recent campaign, many airlines reduced their summer fares in order to gain a
larger share of the market. The following data represent the prices of round-trip tickets from
Atlanta to Boston for a sample of nine airlines:
120
140
140
160
160
160
160
180
180
Construct a dot plot for the above data.
STEM-AND-LEAF DISPLAY
Problem 3. The test scores of 14 individuals on their first statistics examination are shown
below:
95
75
87
63
52
92
43
81
77
83
84
91
78
88
a. Construct a stem-and-leaf display for these data.
Professor Ahmadi’s Lecture Notes
Page 3
b. What does the above stem-and-leaf show?
Professor Ahmadi’s Lecture Notes
Page 4
CROSSTABULATION
Problem 4. The following is a crosstabulation of starting salaries (in $1,000's) of a sample of
business school graduates by their gender.
Starting Salary
Gender
Less than 30
30 up to 35
35 and more
Total
Female
12
84
24
120
Male
20
48
12
80
Total
32
132
36
200
a. What general comments can be made about the distribution of starting salaries and the gender
of the individuals in the sample?
b. Compute row percentages and comment on the relationship between starting salaries and
gender.
SCATTER DIAGRAM
Problem 5. The average grades of 8 students in professor Ahmadi’s statistics class and the
number of absences they had during the semester are shown below:
Student
Number of
Absences
(x)
Average
Grade
(y)
1
2
3
4
5
6
7
8
1
2
2
1
3
4
8
3
94
78
70
88
68
40
30
60
Develop a scatter diagram for the relationship between the number of absences (x) and their
average grade (y).
Professor Ahmadi’s Lecture Notes
Page 5
Chapter 3 Formulas
Ungrouped Data
SAMPLE
POPULATION
Mean
 Xi
X
n
where n = sample size
 Xi
N
where N = size of population

Interquartile Range
IQR = Q3 - Q1
(Same as for sample)
where: Q3 = third quartile (i.e., 75th percentile)
Q1 = first quartile (i.e., 25th percentile)
S2 
 X i  X 
Variance
2
 
2
n -1
or:
 X i   
2
N
or:
S 
2
 X 2i
 nX
n -1
2
 X 2i  N 2
 
N
2
Standard Deviation
  2
S  S2
Coefficient of Variation (C.V.)
 
C.V.    100
 
 S
C.V.    100
 X
Covariance
S xy

( X i  X)(Yi  Y)
n 1
Professor Ahmadi’s Lecture Notes
 XY 
( X i   X )(Yi   Y )
N
Page 6
Pearson Product Moment Correlation Coefficient
rXY
SAMPLE
S
 XY
S XSY
POPULATION
 XY 
where
r XY = Sample correlation coefficient
S XY = Sample covariance
SX = Sample standard deviation of X
S Y = Sample standard deviation of Y
 XY
 XY
where
 XY = Population correlation coefficient
 XY = Population covariance
 X  Population standard deviation of X
 Y  Population standard deviation of Y
Weighted Mean
X
 w i Xi
 wi

 w i Xi
 wi

 fi M i
N
where
Xi = data value i
wi = weight for data value i
Grouped Data
X
Mean
 fi M i
n
where
fi = frequency of class i
Mi = midpoint of class i
S 
2

 fi M i  X

Variance
2
 fi  M i  
 
)
N
2
2
n 1
or
 f i M i  nX
S 
n 1
2
2
2
Professor Ahmadi’s Lecture Notes
2 
 f i M i  N
N
2
2
Page 7
Chapter 3
Measures of Location & Dispersion (Ungrouped Data)
Problem 1. Hourly earnings (in dollars) of a sample of eight employees of Ahmadi, Inc. is
shown below:
Individual Earning (X)
1
12
2
15
3
15
4
17
5
18
6
19
7
22
8
26
I. Measures of location
a. Compute the mean and explain and show its properties.
b. Determine the median and explain its properties.
c. Determine the 70th percentile.
d. Determine the 25th percentile.
e
Find the mode.
Professor Ahmadi’s Lecture Notes
Page 8
II.
Compute the following measures
of dispersion for the above data:
a. Range
b. Interquartile range
c. Variance & the Standard deviation
d. Coefficient of variation
e. A sample of Chatt, Inc. employees had a mean of $21 and a standard deviation of $5. Which
company shows a more dispersed data distribution?
f. Use “Descriptive Statistics” in Excel and determine all the statistical measures.
Professor Ahmadi’s Lecture Notes
Page 9
Chapter 3
Five-Number Summary
Problem 2.
The weights of 12 individuals who enrolled in a fitness program are shown below:
Individual
1
2
3
4
5
6
7
8
9
10
11
12
Weight (Pounds)
100
105
110
130
135
138
142
145
150
170
240
300
a. Provide a five-number summary for the data.
b. Show the box plot for the weight data.
Professor Ahmadi’s Lecture Notes
Page 10
Chapter 3
Covariance & Coefficient of Correlation
Problem 3. The average grades of a sample of 8 students in professor Ahmadi’s statistics class
and the number of absences they had during the semester are shown below.
Student
Number of
Absences
( X i)
Average
Grade
( Yi )
1
2
3
4
5
6
7
8
TOTAL
1
2
2
1
3
4
8
3
24
94
78
70
88
68
40
30
60
528
a. Compute the sample covariance and interpret its meaning.
b.
Compute the sample coefficient of correlation and interpret its meaning.
Professor Ahmadi’s Lecture Notes
Page 11
Chapter 3
Weighted Mean
Problem 4. The M&A Oil Company has purchased barrels of oil from several suppliers. The
purchase price per barrel and the number of barrels purchased are shown below.
Supplier
A
B
C
D
Price Per Barrel ($)
55
49
48
50
Number of Barrels
4,000
3,000
9,000
20,000
Compute the weighted average price per barrel.
Professor Ahmadi’s Lecture Notes
Page 12
Chapter 3
Measures of Location & Dispersion (Grouped Data)
Problem 5. The yearly income distribution for a sample of 30 Ahmadi, Inc. employees is
shown below.
Yearly Income Frequency
(In $10,000)
fi
4- 6
7- 9
10 - 12
13 - 15
16 - 18
Totals
2
6
7
10
5
n = 30
a. Compute the mean yearly income.
b. Compute the variance and the standard deviation of the sample.
c. A sample of Chatt, Inc. employees had a mean income of $132,000 with a standard deviation
of $36,000. Which company shows a more dispersed income distribution?
Professor Ahmadi’s Lecture Notes
Page 13
Chapter 4 Formulas
Counting Rule for Multiple-step Experiments:
Total number of outcomes =
 n1  n 2  n k 
The number of Combinations of N objects taken n at a time:
 N
N!
 
 n  n! N - n!
Sum of the probability of Event A and its Complement:
P(A) + P(Ac) = 1.0
Addition Law (the probability of the union of two events):
P(A
B) = P(A) + P(B) - P(A  B)
Multiplication Law (the probability of the intersection of two events):
P(A  B) = P(A) P(B|A)
or
P(A  B) = P(B) P(A|B)
Two Events A and B are Independent if:
P(A|B) = P(A)
or
Multiplication Law for Independent Events:
P(B|A) = P(B)
P(A  B) = P(A) P(B)
Conditional Probability:
P(A|B) =
P(A  B)
P(B)
or
P(B|A) =
P(A  B)
P(A)
Bayes' Theorem in General:
P(Ai|B) =
P(A i ) P(B|A i )
P(A 1 ) P(B|A 1 ) + P(A 2 ) P(B|A 2 ) +...+ P(A n ) P(B|A n )
Summary of Bayes' Theorem Calculations:
Event
Prior
Probabilities
P(Ai)
Professor Ahmadi’s Lecture Notes
Conditional
Probabilities
P(B|Ai)
Joint
Probabilities
P(Ai  B)
Posterior
Probabilities
P(Ai|B)
Page 14
Chapter 4 - Basic Probability Concepts
Problem 1. Assume you have applied to two different universities (let's refer to them as
universities A and B) for your graduate work. In the past, 25% of students (with similar
credentials as yours) who applied to university A were accepted; while university B had accepted
35% of the applicants (Assume events are independent of each other).
a. What is the probability that you will be accepted in both universities?
b. What is the probability that you will be accepted to at least one graduate program?
c. What is the probability that one and only one of the universities will accept you?
d. What is the probability that neither university will accept you?
Problem 2. An individual has applied to two different insurance companies for health
insurance coverage. The probability that company A will approve her application is 0.63, and the
probability that company B will approve her application is 0.55. The probability that both
companies will approve her application is 0.3465.
a. What is the probability that company A will approve her application, given that company
B has approved her application?
b. Are the approval outcomes independent events? Explain; and using the probability
concepts, substantiate your answer.
c. Are the approval outcomes mutually exclusive? Explain;` and using the probability
concepts, substantiate your answer.
c. What is the probability that her application will be approved by at least one of the
companies?
Professor Ahmadi’s Lecture Notes
Page 15
Chapter 4 - Conditional Probability
Problem 3. A research study investigating the relationship between smoking and heart disease
in a sample of 500 individuals provided the following data:
Record of Heart
Disease
No Record of Heart
Disease
Total
Smoker
50
Nonsmoker
40
Total
90
100
310
410
150
350
500
a. Show the joint probability table.
b. What is the probability that an individual is a smoker and has a record of heart disease?
c. Compute and interpret the marginal probabilities.
d. Given that an individual is a smoker, what is the probability that this individual has heart
disease?
e. Given that an individual is a nonsmoker, what is the probability that this individual has heart
disease?
f. Does the research show that heart disease and smoking are independent events? Use
probabilities to justify your answer.
g. What conclusion would you draw about the relationship between smoking and heart disease?
Professor Ahmadi’s Lecture Notes
Page 16
Chapter 4
BAYES' THEOREM
Problem 4. When Ahmadi, Inc. sets up their drill press machine, 70% of the time it is set up
correctly. It is known that if the machine is set up correctly it produces 90% acceptable parts.
On the other hand, when the machine is set up incorrectly, it produces 20% acceptable parts.
One item from the production is selected and is observed to be acceptable.
a. What is the probability that the machine is set up correctly? That is, we are interested in
computing:
P(Correct set up  Acceptable part).
Let the following symbols represent the various events:
E1 = Correct set up
E2 = Incorrect set up
G = Good part (i.e., Acceptable part)
With the above notations we want to determine P(E1  G).
b. Compute all the posterior probabilities.
Professor Ahmadi’s Lecture Notes
Page 17
Chapter 5 Formulas
Required Conditions for a Discrete Probability Function
f(x) > 0
 f(x) = 1
Discrete Uniform Probability Function
f(x) = 1/n
where
n = the number of values the random variable may assume
Expected Value of a Discrete Random Variable
E(x) = µ =  (x f(x))
Variance of a Discrete Random Variable
Variance (x) =  2 =  (x - µ) 2 f(x
Number of Experimental Outcomes Providing Exactly x Successes in n Trials
 n
  =
 x
where
n!
x!( n - x )!
n! = n (n - 1) (n - 2) . . . (2)(1)
(Remember: 0! = 1)
Binomial Probability Function
f(x) =
n!
p x (1 - p) n – x
x!( n - x )!
where x = 0 ,1, 2, ..., n
The Mean of a Binomial Distribution
µ=np
The Variance of a Binomial Distribution
 2 = n p (1 - p)
Professor Ahmadi’s Lecture Notes
Page 18
Chapter 5
Discrete Probability Distributions
Problem 1. The manager of the university bookstore has kept records of the number of
diskettes sold per day. She provided the following information regarding diskettes sales for a
period of 60 days:
Number of
Diskettes Sold
0
1
2
3
4
5
Number
of Days
6
9
12
18
12
3
a. Identify the random variable
b. Is the random variable discrete or continuous?
c. Develop a probability distribution for the above data.
d. Is the above a proper probability distribution?
e. Develop a cumulative probability distribution.
f. Determine the expected number of daily sales of diskettes.
g. Determine the variance and the standard deviation.
h. If each diskette yields a net profit of 50 cents, what are the expected yearly profits from the
sales of diskettes?
Professor Ahmadi’s Lecture Notes
Page 19
Chapter 5
Introduction to Binomial Distribution
Problem 2. A production process has been producing 10% defective items. A random sample
of four items is selected from the production process.
a. What is the probability that the first 3 selected items are non-defective and the last item is
defective?
b. If a sample of 4 items is selected, how many outcomes contain exactly 3 non-defective items?
c. What is the probability that a random sample of 4 contains exactly 3 non-defective items?
d.
Determine the probability distribution for the number of non-defective items in a sample of
four.
e. Determine the expected number (mean) of non-defectives in a sample of four.
f. Find the standard deviation for the number of non-defectives.
Professor Ahmadi’s Lecture Notes
Page 20
Chapter 5
POISSON PROBABILITY DISTRIBUTION
Problem 3. During the registration period, students consult their advisor for course selection.
A particular advisor noted that during each half hour an average of eight students came to see
him for advising.
a. What is the probability that during a half hour period exactly four students will consult him?
b. What is the probability that during a half hour period less than three students will consult
him?
c. What is the probability that during an hour period ten students will consult him?
d. What is the probability that during an hour and fifteen minute period thirty students will
consult him?
Professor Ahmadi’s Lecture Notes
Page 21
Chapter 6 Formulas
Uniform Probability Density Function for a Random Variable x:
 1
b - a

f(x) = 
0

for a  x  b
elsewhere
Mean and Variance of a Uniform Continuous Probability Distribution:
a + b
2
 =
2 
(b - a) 2
12
The Z Transformation Formula:
z=
(x -  )

Solving for x using the Z transformation formula:
x    Z
Professor Ahmadi’s Lecture Notes
Page 22
Chapter 6 - Continuous Probability Distributions
I. - The Uniform Distribution
Problem 1. The driving time for an individual from her home to her work is uniformly
distributed between 300 to 480 seconds.
a. Give a mathematical expression for the probability density function.
b. Compute the probability that the driving time will be less than or equal to 435 seconds.
c. Determine the probability that the driving time will be exactly 400 seconds.
d. Determine the expected driving time.
e. Determine the standard deviation of the driving time.
Professor Ahmadi’s Lecture Notes
Page 23
Chapter 6
II. - The Normal Distribution
Problem 2. Given that Z is the standard normal random variable, give the probabilities
associated with the following:
a.
P(Z < - 2.09) =
?
b.
P(Z > -0.95) =
?
c.
P(-2.55 < Z < -2.33) =
Problem 3.
?
Z is a standard normal variable. Find the value of Z in the following:
a.
The area between -Z and zero is 0.4929.
b.
The area to the right of Z is 0.0192. Z =
c.
The area between -Z and Z is 0.668.
Professor Ahmadi’s Lecture Notes
Z=
?
?
Z=
?
Page 24
Problem 4. The weight of certain items produced is normally distributed with a mean weight
of 60 ounces and a standard deviation of 8 ounces.
a.
What percentage of the items will weigh between 50.4 and 72 ounces?
b.
What percentage of the items will weigh between 42 and 52 ounces?
c.
What percentage of the items will weigh at least 74.4 ounces?
d.
What are the minimum and the maximum weights of the middle 60% of the items?
Professor Ahmadi’s Lecture Notes
Page 25
Problem 5
Sun Love grapefruit growers have determined that the diameter of their
grapefruits is normally distributed with a mean of 4.5 inches and a standard deviation of 0.3
inches. (You can find the step-by-step solution to this problem in my workbook.)
a.
What is the probability that a randomly selected grapefruit will have a diameter of at
least 4.14 inches?
b.
What percentage of grapefruits has a diameter between 4.8 to 5.04 inches?
c
Sun Love packs their largest grapefruits in a special package called "Super Pack." If
5% of all their grapefruits are packed in "Super Packs," what is the smallest diameter of
the grapefruits, which are in the "Super Packs?"
d
In this year's harvest, there were 111,500 grapefruits, which had a diameter over 5.01
inches. How many grapefruits has Sun Love harvested this year?
Professor Ahmadi’s Lecture Notes
Page 26
Problem 6. In grading eggs, 30% are marked small, 45% are marked medium, 15% are
marked large, and the rest are marked extra-large. If the average weight of the eggs is normally
distributed with a mean of 3.2 ounces and a standard deviation of 0.6 ounces:
a
What are the smallest and the largest weights of the medium size eggs?
b
What is the weight of the smallest egg, which will be in the extra-large category?
Professor Ahmadi’s Lecture Notes
Page 27
Chapter 7 Formulas
SAMPLING AND SAMPLING DISTRIBUTIONS
The number of different simple random samples of size n that can be selected from a finite
population of size N:
N!
n! N - n!
FINITE POPULATION
INFINITE POPULATION
Expected Value of x
E( x ) = 
E( x ) = 
where: E( x ) = the expected value of the random variable x
 = the population mean
x 
Nn 

N 1 n
Standard Deviation of the Distribution of x Values
(Standard Error of the Mean)

x 
n
Z Score
Z
X
X
where:  X 

n
Expected Value of p
E( p ) = p
E( p ) = p
where: E( p ) = the expected value of the random variable p
p = the population proportion
Standard Deviation of the Distribution of p Values
(Standard Error of the Proportion)
p =
N - n
N - 1
p (1 - p)
n
p =
p (1 - p)
n
Z Score
Z
pp
p
where:  p =
Professor Ahmadi’s Lecture Notes
p (1 - p)
n
Page 28
Chapter 7
SAMPLING AND SAMPLING DISTRIBUTIONS
Problem 1. Consider a population of four weights identical in appearance but weighing 2, 4, 6,
and 8 grams. The mean () and the standard deviation () of the population can be computed
to be 5 and 2.236 grams respectively.
( X  ) 2
X
2
4
6
8
Samples of size two (with replacement) are drawn from this population. Show the Sampling
Distribution of X .
The list of all possible samples and the sample means can be shown as:
Possible Samples
Sample Means
2&2
2
2&4
3
2&6
4
2&8
5
4&2
3
4&4
4
4&6
5
4&8
6
6&2
4
6&4
5
6&6
6
6&8
7
8&2
5
8&4
6
8&6
7
8&8
8
The frequency of each mean can be shown as follows:
Possible Sample Means
2
3
4
5
6
7
8
Professor Ahmadi’s Lecture Notes
Frequency
1
2
3
4
3
2
1
Page 29
Chapter 7
SAMPLING DISTRIBUTION OF X
Problem 2. The average yearly starting salary (µ) of MBA’s is $60,000 with a standard
deviation () of $16,000. A random sample of 64 MBAs is selected.
a. Show the sampling distribution of the sample means.
b. What is the probability that the sample mean will be greater than $56,000?
SAMPLING DISTRIBUTION OF P
Problem 3. Twenty percent of the students at UTC are business majors. A random sample of
100 students is selected.
a. Show the sampling distribution of the sample proportions
b. What is the probability that the sample proportion (the proportion of business majors) is
between 0.1 and 0.3?
c. What is the probability that the sample proportion (the proportion of business majors) is more
than 0.25?
Professor Ahmadi’s Lecture Notes
Page 30
Chapter 8 Formulas
I. Interval Estimation of a Population Mean ()
A.
B.
When the standard deviation of the population  is known,

  x  Z 2
n

where the standard error of the mean is  x 
n

and the margin of error = Z 2
n
When the standard deviation  is unknown,
S
  x  t 2
n
S
where the standard error of the mean is S x 
n
S
and the margin of error = t  2
n
n
Z 2 
2
Sample Size for an Interval Estimate of a Population Mean
2
2
E
where E = the desired margin of error
II. Interval Estimation of a Population Proportion (P)
p  p  Z 2 

p 1 p
n

where the standard error of proportion is S p 
and the margin of error = Z 2

p 1 p
n


p 1 p
n

Sample Size for an Interval Estimate of a Population Proportion
2 *
*

Z 2  p 1  p 
n
2
E
If the value of p* is not known and a good estimate of p* is not available, use p* = 0.50.
Professor Ahmadi’s Lecture Notes
Page 31
Chapter 8 – Interval Estimation
I. Interval Estimation of a Population Mean ()
A. The standard deviation of the population (  ) is known:
Problem 1. In order to estimate the average electric usage per month, a sample of 169 houses
was selected; and their electric usage was determined.
a. Assume a population standard deviation of 260-kilowatt hours. Determine the standard
error of the mean.
b. With a 0.90 probability, what can be said about the size of the sampling error?
c. If the sample mean is 1834 KWH, what is the 90% confidence interval estimate of the
population mean?
B. The standard deviation of the population (  ) is unknown:
Problem 2. Chattanooga Paper Company makes various types of paper products. One of their
products is a 30 mils thick paper. In order to ensure that the thickness of the paper meets the 30
mils specification, random cuts of paper are selected and the thickness of each cut is measured.
A sample of 256 cuts had a mean thickness of 30.3 mils with a standard deviation of 4 mils.
a. Develop a 95% confidence for the thickness of the paper.
b. The company considers the production in control if the thickness does not deviate from
the desired 30 mils by more than + 3%. Is the production in control? Explain.
Professor Ahmadi’s Lecture Notes
Page 32
Problem 3. The cost of a roll of camera film (35 mm, 24 exposure) in a sample of 12 cities
worldwide is shown below.
City
Rio de Janeiro
Stockholm
Tokyo
Moscow
Paris
London
New York
Mexico City
Sydney
Honolulu
Cairo
Hong Kong
Cost (in dollars)
12.14
7.47
6.56
5.69
5.62
5.41
4.33
4.00
3.62
3.43
3.40
2.73
a. Using Excel, compute the basic descriptive statistics (the mean, the median, the mode, the
standard deviation, and the standard error of the mean) for the cost of film.
b. Determine a 95% confidence interval for the population mean.
II. Interval Estimation of a Population Proportion (P)
Problem 4. Many people who bought Xbox gaming systems, have complained about having
received defective systems. In a sample of 1200 units sold, 18 units were defective.
a. Determine a 95% confidence interval for the percentage of defective systems.
b. If 1.5 million Xboxes were sold, determine an interval for the number of defectives.
Professor Ahmadi’s Lecture Notes
Page 33
Chapter 9 Formulas
I. Hypothesis Tests about a Population Mean ()
A. The standard deviation of the population (  ) is known:
Test Statistic: Z 
X  0

n
Decision Rule for P-Value Approach: In All Cases Reject Ho if P-Value  
Decision Rule for Critical Value Approach:
Lower One-Tailed
Test of the Form
Upper One-Tailed
Test of the Form
Two-Tailed Test
of the Form
Ho:  >  o
Ho:  <  o
Ho:  =  o
Ha:  <  o
Ha:  >  o
Ha:    o
Reject Ho if:
Z  -Z
Reject Ho if:
Z  Z
Reject Ho if:
Z  -Z/2 or Z  Z/2
B. The standard deviation of the population (  ) is unknown:
X  o
Test Statistic: t 
S n
The decision rules are the same as those shown in Part A (above) with the t
statistic substituted for the Z statistic.
Professor Ahmadi’s Lecture Notes
Page 34
CHAPTER FORMULAS
(Continued)
II. Hypothesis Tests about a Population Proportion (P)
Test Statistic: Z 
p  po
p
where  p =
p o (1 - p o )
n
thus Z will have the form:
Z
p  po
p o (1  p o )
n
Decision Rule for P-Value Approach: In All Cases Reject Ho if P-Value  
Decision Rule for Critical Value Approach:
Lower One-Tailed
Test of the Form
Upper One-Tailed
Test of the Form
Two-Tailed Test
of the Form
Ho: p > po
Ho: p < po
Ho: p= po
Ha: p < po
Ha: p> po
Ha: p  po
Reject Ho if:
Z  -Z
Professor Ahmadi’s Lecture Notes
Reject Ho if:
Z  Z
Reject Ho if:
Z  -Z/2 or Z  Z/2
Page 35
Chapter 9
HYPOTHESIS TESTING PROCEDURE
Assume we are interested in testing whether or not the mean of the population is 70. Then the
null and the alternative hypotheses can be written as:
Ho:
µ = 70
Ha:
µ  70
Possible hypothesis-testing errors will be:
SITUATION IN THE POPULATION
DECISION
Do not reject Ho
(Conclude µ = 70)
Reject Ho
(Conclude µ  70)
Ho is true
(µ = 70)
Ho is false
(µ  70)
Correct
Decision
Type II Error
Type I Error
Correct
Decision
Steps of Hypothesis Testing
Step 1:
Develop the null and the alternative hypotheses.
Step 2:S
Specify the level of significance.
Step 3:
Compute the test statistic (t or Z) from the sample data.
Rejection Rule: p-Value Approach
Step 4:
Compute the p-value by using the test statistic (t or Z) from step 3.
Step 5:
Reject Ho if p-value .
Rejection Rule: Critical Value Approach
Step 4:
Determine the critical value(s) of t or Z at the specified level of significance  and
set up the rejection rule.
Step 5:
Compare the test statistic from step 3 to that of the critical value(s) from step 4. If
the test statistic is beyond the critical value(s), reject the null hypothesis.
Professor Ahmadi’s Lecture Notes
Page 36
Chapter 9
I. Hypothesis Tests about a Population Mean ()
A.
THE STANDARD DEVIATION OF THE POPULATION  IS KNOWN:
Problem 1. The Chamber of Commerce of a Florida gulf coast community advertises area commercial
property available at a mean cost of under $40,000 per acre. A sample of 49 properties provided a
sample mean of $38,000 per acre. Assume the standard deviation of the population is known to be
$7000.
a. At 95% confidence, test the validity of their advertisement.
Ho:
Ha:
Conclusion:
b. Compute the p-value and interpret its meaning.
Professor Ahmadi’s Lecture Notes
Page 37
B.
THE STANDARD DEVIATION OF THE POPULATION  IS UNKNOWN:
Problem 2. A soft drink filling machine, when in perfect adjustment, fills the bottles with 12 ounces of
soft drink. A random sample of 64 bottles is selected, and the contents are measured. The sample
yielded a mean content of 11.88 ounces with a standard deviation of 0.8 ounces.
a. With a 0.05 level of significance (i.e., 95% confidence), test to see if the machine is in perfect
adjustment.
Ho:
Ha:
Conclusion:
b. Compute the p-value and interpret its meaning.
Professor Ahmadi’s Lecture Notes
Page 38
Problem 3. Chattanooga public transportation operates a fleet of electric powered shuttle buses for
downtown services. Daily mean maintenance costs have been $76 per bus. A recent random sample of
25 buses shows a sample mean maintenance cost of $83.50 per day with a sample standard deviation of
$30. Management would like to determine whether or not there has been a significant increase in the
mean daily maintenance cost.
a. At 95% confidence, test to determine whether or not the mean cost has increased.
Ho:
Ha:
Conclusion:
b. Compute the p-value and interpret its meaning.
Professor Ahmadi’s Lecture Notes
Page 39
II. Hypothesis Tests about a Population Proportion (P)
Problem 4. A supplier claims that more than 80% of the parts it supplies meet the product
specifications. In a sample of 800 parts received, 664 met the specifications.
a. At 93.7% confidence, test the supplier's claim.
Ho:
Ha:
Conclusion:
b. Compute the p-value and interpret its meaning.
Professor Ahmadi’s Lecture Notes
Page 40
Chapter 9 final examples: Your turn
1. For each of the following, read the t statistic from the table and write its value in the space
provided.
a. A two-tailed test, a sample of 31 at 80% confidence t =
b. A one-tailed test (upper tail), a sample size of 22 at 99% confidence t =
c. A one-tailed test (lower tail), a sample size of 16 at 95% confidence t =
2.
For each of the following, read the Z statistic from the table and write its value in
the space provided.
a. A two-tailed test at 85.3% confidence Z =
b. A one-tailed test (lower tail) at 87.7% confidence Z =
c. A one-tailed test (upper tail) at 97.61% confidence Z =
3. The average dinner bill for one person in Chattanooga has been $24. It is believed there has been a
significant increase in the average dinner prices. A sample of 36 dinner bills showed a mean of $27
with a standard deviation of $9.
a. At 95% confidence test to determine if there has been a significant increase in the average dinner
prices.
Ho:
Ha:
Conclusion:
b. Determine the p-value for the above and use it for the test
Professor Ahmadi’s Lecture Notes
Page 41
4. The ACT scores of a random sample of 6 UTC students are given below.
Student
1
ACT Score
28
2
22
3
18
4
23
5
29
6
24
At 95% confidence test to see if the average ACT scores of UTC students is significantly
different from 27.
Professor Ahmadi’s Lecture Notes
Page 42
CHAPTER 10 FORMULAS
I. Inferences About the Difference Between Two Population Means:1 and 2 Known
Point Estimator of the Difference Between the Means of Two Populations: x 1  x 2
Standard Error of x 1  x 2
(the Standard Deviation of the sampling distribution of x 1  x 2 )
x
A.
1 x 2

 12  22

n1 n 2
Interval Estimate of the Difference Between the Means of Two Populations
( x1  x 2 )  Z  2
12  22

n1 n 2
Margin of Error = Z 2  x
B.
1
x2
 Z 2
12  22

n1 n 2
Hypothesis Testing (Means), Independent Samples
    D0  x1  x 2   D0
Test Statistic Z  x1 x 2
x  x
12 22
1
2

n1 n 2
Do is the hypothesized difference between 1 and  2 . In most situations, Do = 0.
Decision Rules for P-Value Approach:
When Using the P-Value Approach, In All Cases Reject Ho if P-Value  
Decision Rules for Critical Value Approach:
Lower one-tailed
test of the form
Ho: 1   2  D 0
Upper one-tailed test of the
form
Ho: 1   2  D 0
Two-tailed test
of the form
Ho: 1   2 = D 0
Ha: 1   2  D 0
Reject Ho if:
Z  -Z
Ha: 1   2  D 0
Reject Ho if:
Z  Z
Ha: 1   2  D 0
Reject Ho if:
Z  -Z/2 or Z  Z/2
Professor Ahmadi’s Lecture Notes
Page 43
CHAPTER FORMULAS
(Continued)
II. Inferences about the Difference Between Two Population Means: 1 and 2 Unknown
A.
Interval Estimate of the Difference Between the Means of Two Populations
x1  x 2   t  2
B.
S12 S22

n1 n 2
Hypothesis Testing (Means), Independent Samples
    D0
Test Statistic t  x1 x 2
S12 S22

n1 n 2
The degrees of freedom for t are given by
2
 S12 S22 
  
 n1 n 2 
df 
2
2
1  S12 
1  S22 
  
 
n1  1  n1  n 2  1  n 2 
When computing the degrees of freedom, round to the lower integer value.
Decision rules are the same as those given above for 1 and 2 Known cases,
Simply substitute t for Z
III. Inferences About the Difference Between Two Population Means: Matched Samples
A.
Interval Estimate
d  t
B.
2
Sd
n
Hypothesis Test
Test statistic t =
d  d
sd
Professor Ahmadi’s Lecture Notes
n
where Sd =
 (d i - d) 2
n - 1
Page 44
CHAPTER FORMULAS
(Continued)
IV. Analysis of Variance: Testing for the Equality of k Population Means
Hypotheses to be tested:
Ho:  1 =  2 = . . . =  k
Ha: Not all the population means are equal
where
 j = the mean of the jth population
k = the number of populations or treatments
nT = Total Number of Observations
The General Form of the ANOVA Table - Completely Randomized Design:
Source of
Variation
Sum of
Squares
Degrees of
Freedom
Mean
Squares
Between
Treatments
SSTR
K-1
MSTR
Test Statistic
F
MSTR
MSE
Within
Treatments
SSE
nT - K
Total
SST
nT - 1
MSE
Decision rules:
When using the p-value approach, reject Ho if the p-value  
When using the critical value approach, reject Ho if F =
Professor Ahmadi’s Lecture Notes
MSTR
> F
MSE
Page 45
CHAPTER FORMULAS
(Continued)
Sample Mean for Treatment j
nj
xj 
 xij
i 1
nj
Sample Variance for Treatment j
nj
S2j


 x ij  x j
i 1

2
nj 1
where
xij = the value of observation i for Treatment j
nj = the number of observations for treatment j
The Overall Sample Mean (Grand Mean)
k nj
  x ij
x
j1i 1
nT
n T  n1  n 2 ... n k
where:
Mean Square due to Treatments (Between Treatments)
 n j  x j  x 
k
MSTR 
SSTR
k 1
where:
SSTR =  n j  x j  x 
k

j1
2

Therefore: MSTR 
j1

k 1
Mean Square due to Error (Within Treatments)
 n j  1 S2j
k
MSE =
SSE
nT  k
where:
SSE =  n j  1 S2j
k
j1

SSE also can be computed as: SSE    x ij  x j
Therefore: MSE 
2
j1
nT  K
j i
k nj

SST    x ij  x
j1i 1

Total Sum of Squares
2
or:
SST = SSTR + SSE
General Form of an Interval Estimate for a Population Mean
s
x  t 2
n
Professor Ahmadi’s Lecture Notes
Page 46

2
I. Inferences About the Difference Between Two Population Means:1 and 2 Known
A.
Interval Estimate of the Difference Between the Means of Two Populations
Problem 1. In order to estimate the difference between the age (in months) of computer
consulting firms in the East and the West of the United States, the following information is
gathered:
East
40
70
5
Sample size
Sample mean (months)
Population Standard deviation  (months)
West
45
75
7
Develop an interval estimate for the difference between the average age of the firms in the East
and the West. Let  = 0.03.
B.
Hypothesis Testing (Means), Independent Samples
Problem 2. Independent random samples taken at two local malls provided the following
information regarding purchases of the patrons at the two malls:
Hamilton Place
Sample Size
80
Average purchase
$43
Population Standard deviation  $ 8
Northgate
75
$40
$ 6
a. Use the critical value approach and at 95% confidence test to determine whether or not there
is a significant difference between the average purchases of the patrons at the two malls.
Professor Ahmadi’s Lecture Notes
Page 47
b. Compute the p-value and interpret its meaning. Use it to answer the question in part “a”.
Professor Ahmadi’s Lecture Notes
Page 48
II. Inferences about the Difference Between Two Population Means: 1 and 2 Unknown
A.
Interval Estimate of the Difference Between the Means of Two Populations
Problem 3. In order to estimate the difference between the average daily sales of two branches
of a department store, the following data has been gathered.
Downtown Store North Mall Store
Sample size
Sample mean (in $1,000)
Sample standard deviation (in $1,000)
n1 = 23 days
x 1 = 37
S1 = 4
n2 = 26 days
x 2 = 34
S2 = 5
Develop a 95% confidence interval for the difference between the two population means.
B.
Hypothesis Testing (Means), Independent Samples
Problem 4. Refer to Problem 3 (above) and at 95% confidence test to determine if the average
daily sales of the Downtown Store (1) is significantly more than the average sales of the North
Mall Store (2). Use both the critical value approach and the p-value approach.
Professor Ahmadi’s Lecture Notes
Page 49
III. Inferences About the Difference Between Two Population Means: Matched Samples
A.
Interval Estimate
Problem 5. The daily production rates of a sample of workers in a factory before and after a
training program are shown below:
Worker
1
2
3
4
5
6
Before
6
10
10
8
7
11
After
10
13
9
11
9
12
Provide a 95% confidence interval for the difference between the mean production rates of before
and after the training program.
B.
Hypothesis Test
Problem 6. Refer to Problem 5 (above) and at 95% confidence test to see if the training
program was effective. That is, did the training program actually increase the production rates?
Professor Ahmadi’s Lecture Notes
Page 50
IV. Analysis of Variance: Testing for the Equality of k Population Means
Completely Randomized Design
Problem 7. Ahmadi, Inc. uses three types of advertising (radio, newspaper, and television) in
three different geographical areas. The company is interested in determining whether there is a
significant difference in the effectiveness among the three different methods of advertising.
Sales (in $ millions) over a six-day period for the three geographical areas are shown below:
Area 1
(Radio)
48
40
36
50
51
45
Area 2
(Paper)
48
46
42
50
48
48
Area 3
(T.V.)
44
52
54
52
50
60
At 95% confidence test to determine whether there is a significant difference in the effectiveness
among the three different methods of advertising.
Professor Ahmadi’s Lecture Notes
Page 51
Problem 8. Three universities in your state have decided to administer the same
comprehensive examination to the recipients of MBA degrees from the three institutions. From
each institution, a random sample of MBA recipients has been selected and given the test. The
following table shows the scores of the students from each university.
Northern
University
Central
University
Southern
University
56
85
65
86
93
62
97
91
82
94
72
93
78
54
77
77.0
83.0
78.0
Sample Variance ( s2j ) 246.5
234.0
218.8
Sample Mean ( x j )
At  = 0.01, test to see if there is any significant difference in the average scores of the students
from the three universities. Note that the sample sizes are not equal.
Professor Ahmadi’s Lecture Notes
Page 52
Problem 9.
Part of an ANOVA table involving 8 groups for a study is shown below.
Source of
Variation
Sum of
Squares
Degrees of
Freedom
Mean
Square
Between
Treatments
126
?
?
Within
Treatments
240
?
?
Total
a.
b.
?
F
?
67
Complete all the missing values in the above table and fill in the blanks.
Use  = 0.05 to determine if there is any significant difference among the means
of the eight groups.
Problem 10. In a completely randomized experimental design, 11 experimental units were used
for each of the 4 treatments. Part of the ANOVA table is shown below.
Source of
Variation
Sum of
Squares
Degrees of
Freedom
Mean
Square
Between
Treatments
1500
?
?
Within
Treatments
?
?
?
Total
a.
b.
F
?
5500
Fill in the blanks in the above ANOVA table.
Use  = 0.05 to determine if there is any significant difference among the means
of the four groups.
Professor Ahmadi’s Lecture Notes
Page 53
CHAPTER 11 FORMULAS
A. Interval Estimation of the Difference Between the Proportions of Two Populations
( P1  P 2 )  Z  2  S
p p
1
B.
Where
Sp
2
p
1

2
p  p   p  p 
1
1
2
2
Where
SP
Assuming p 1 = p 2 , the pooled proportion is computed as
p
Sp 1 p 2

1 p2
1
1
 p (1 - p)  
 n1 n 2 
n1 p1 n2 p 2 X 1  X 2

n1  n2
n1  n2
Goodness of Fit Test
The Test Statistic “  ” is:
2
D.

Hypothesis Test about the Difference Between the Proportions of Two Populations
The Test Statistic “Z” is: Z 
C.

p1 1  p1 p 2 1  p 2

n1
n2

k f  e  2
χ   i i
i  1 ei
2
Test of Independence
The Test Statistic “  ” is:
2
Professor Ahmadi’s Lecture Notes
  
i j
2
f ij  eij  2
eij
Page 54
Chapter 11
A.
Interval Estimation of the Difference Between the Proportions of Two Populations
Problem 1. In a sample of 400 Democrats, 60 said that they support the president's new tax
proposal. While of 500 Republicans, only 80 said they support it. Determine a 90% confidence
interval estimate for the difference between the proportions of the opinions of the individuals in
the two parties.
B.
Hypothesis Test about the Difference Between the Proportions of Two Populations
Problem 2. In a sample of 600 Republicans, 480 were in favor of the President's foreign
policies. While in a sample of 900 Democrats, 675 were in favor of his policies.
a.
At 95% confidence, test to see if there is a significant difference in the proportions of the
Democrats and the Republicans who are in favor of the President's foreign policies.
b.
Compute the p-value and use it to test to determine if the percentage of Republicans who
favored the president’s foreign policies is significantly more than the percentage of
Democrats.
Professor Ahmadi’s Lecture Notes
Page 55
C.
Goodness of Fit Test
Problem 3. The AMA Journal reported the following frequencies of deaths due to cardiac
arrest for each day of the week.
Cardiac Death by Day of the Week
Day
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
f _
40
17
16
29
15
20
17
At 95% confidence, determine whether the number of deaths is uniform over the week.
Professor Ahmadi’s Lecture Notes
Page 56
D.
Test of Independence - Contingency Tables
Problem 4. Dr. Ahmadi’s diet pills are supposed to cause significant weight loss. The
following table shows the results of a recent study where some individuals took the diet pills and
some did not.
No weight loss
Weight loss
Total
Diet pills
80
100
180
No Diet pills
20
100
120
Total
100
200
300
With 95% confidence, test to see if losing weight is dependent on taking the diet pills.
Professor Ahmadi’s Lecture Notes
Page 57
CHAPTER 12 FORMULAS
Simple Linear Regression Model
y =  0 + 1 x + 
Simple Linear Regression Equation
E(y) =  0 + 1 x
Least Squares Criterion
Min  y i  y i 
2
Estimated Simple Linear Regression Equation
y  b o  b 1 x
 = the estimated value of the dependent variable
y
where
b1 = the slope of the line
b1 
( x i  x)( y i  y)
2
 (x i  x)
(x
SSR 
and
b0 = the y-intercept
and
b o  y  b1 x
Sum of Squares Due to Regression
 x )(y i  y )
2
 (x i  x )
i

2
Total Sum of Squares

SST =  y i  y

2
SSE =   y i  y i 
Also:
SST = SSR + SSE
Sum of Squares Due to Error
2
Coefficient of Determination
r2 
SSR
SST
Professor Ahmadi’s Lecture Notes
Also r 2  1 
SSE
SST
Page 58
CHAPTER FORMULAS
(Continued)
Sample Correlation Coefficient
r = (the sign of b1)
where
Coefficient of Determination
=+
r2
b1 = the slope of the regression equation
Mean Square Error (Estimate of 2 )
s 2  MSE 
SSE
n-2
Standard Error of the Estimate
s  MSE
t Test for significance of the slope of the regression equation
H o : 1  0
H a : 1  0
t statistic:
t
b1
s b1
where s b1 (Estimated Standard Deviation of b1) is:
Reject Ho if
s b1 
s
Σ(x i  x)2
t   t  2 or: t  t  2 (degrees of freedom = n – p – 1)
Professor Ahmadi’s Lecture Notes
Page 59
CHAPTER FORMULAS
(Continued)
F Test for Significance of the Linear Regression Model (ANOVA)
H o :  1  0 (i.e., the regression model is NOT significant)
H a :  1  0 (the regression model IS significant)
ANOVA Table
Source of
Variation
Sum of
Squares
Regression
SSR
Degrees of
Freedom
p
Mean
Square
Test Statistic
F
MSR
MSR
MSE
Error (Residual)
SSE
n-p-1
Total
SST
n-1
Where: p = Number of independent variables
MSE
n = The sample size
Reject Ho if the Test statistic F > Critical F
Confidence Interval Estimate for the Mean Value of y, that is E(yp)
y p  t  2 sy
p
Estimated Standard Deviation of y p
s ŷ p  s
2
1 (x p  x)

n ( x i  x ) 2
Remember:
s  MSE
Professor Ahmadi’s Lecture Notes
Page 60
Chapter 12
Simple (Bivariate) Linear Regression and Correlation
Problem 1. Ahmadi, Inc. is a microcomputer producer. The following data represent Ahmadi's
yearly sales volume and their advertising expenditure over a period of 8 years.
(Y)
Sales
Year (In $1,000,000)
(X)
Advertising
(In $10,000)
1996
15
32
1997
16
33
1998
18
35
1999
17
34
2000
16
36
2001
19
37
2002
19
39
2003
24
42
a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.
Develop a scatter diagram of sales versus advertising.
Use the method of least squares to compute an estimated regression line between sales and
advertising.
If the company's advertising expenditure is $400,000, what is the predicted sales? Give the
answer in dollars.
What does the slope of the estimated regression line indicate?
Compute the coefficient of determination and fully interpret its meaning.
Use the F test to determine whether or not the regression model is significant.
Let  = 0.05.
Use the t test to determine whether the slope of the regression model is significant.
Let  = 0.05
Explain the basic assumptions about the error term in regression.
Develop a 95% confidence interval for predicting the average sales for the years when
$400,000 was spent on advertising.
Use Excel and solve the above problems.
Using Excel determine the regression equation between sales an time (where 1996 = 1).
Professor Ahmadi’s Lecture Notes
Page 61
CHAPTER 13 FORMULAS
Multiple Regression Model
y = 0 + 1x1 + 2x2 + . . . pxp + 
Multiple Regression Equation
E(y) = 0 + 1x1 + 2x2 + . . . pxp
Estimated Multiple Regression Equation
ŷ = b0 + b1x1 + b2x2 + . . . + bpxp
Least Squares Criterion
Min   y i  y i 
2
where
Relationship among SST, SSR, and SSE
SST = SSR + SSE
Multiple Coefficient of Determination
r2 =
SSR
SST
Also r 2  1 
SSE
SST
Adjusted Multiple Coefficient of Determination
 n 1 

R a2  1  1  R 2  
 n  p 1
Excel’s ANOVA Table
ANOVA
Regression
Residual
Total
df
p
n-p-1
n-1
Professor Ahmadi’s Lecture Notes
SS
SSR
SSE
SST
MS
MSR = SSR/p
MSE = SSE/(n-p-1)
F
Significance F
F = MSR/MSE
Page 62
CHAPTER FORMULAS
(Continued)
F Test for Overall Significance in Multiple Regression
Ho: 1   2 ...   p  0 (the model is not significant)
Ha: One or more of the coefficients is not equal to zero (the model is significant)
Test Statistic
F=
MSR
(See Excel’s ANOVA table)
MSE
When using the p-value approach, reject Ho if the p-value  
When using the critical value approach, reject Ho if the test statistic F  F
where F is based on an F distribution with p numerator degrees of freedom and (n – p – 1)
denominator degrees of freedom
t Test for Individual Significance in Multiple Regression
Ho:  i  0
Ha:  i  0
for any parameter  i
Test Statistic
b
t= i
s bi
When using the p-value approach, reject Ho if the p-value  
When using the critical value approach, reject Ho if the test statistic t  t  2 or if t   t  2 ,
where t  2 is based on a t distribution with (n - p – 1) degrees of freedom
Professor Ahmadi’s Lecture Notes
Page 63
Chapter 13
Multiple Regression and Correlation
Problem 1. Ahmadi, Inc. is a microcomputer producer. The following data represent Ahmadi's
yearly sales volume, their advertising expenditure, and the number of individuals in the sales
force over a period of 15 years:
(Y)
X1
X2
X3
Sales
Advertising Sales Force Time
Year
($1,000,000)
($10,000)
(100)
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
15
16
18
17
16
19
19
24
25
27
30
33
38
40
45
32
33
35
34
36
37
39
42
44
40
45
50
49
50
55
10
12
11
14
16
18
17
20
25
22
27
28
30
30
35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
a.
Using Excel, enter the above data in a file and save the file. Print the file as well as the
results of all of the following parts.
b.
Run the correlation analysis relating sales (Y) and all of the independent variables. (Do
not include the column of Year.) Explain the results. Discuss the concept of
multicollinearity.
c.
d.
Run the Regression analyses relating sales (Y) and advertising (X1). Explain the results.
Run a regression analysis relating sales (Y) and two independent variables X1 and X2.
Explain the results.
Run a regression analysis relating sales (Y) and two independent variables X1 and X3.
Explain the results.
Using the model developed in part "e", predict sales for 2004 assuming we are planning to
advertise $700,000.
Run a regression analysis relating sales (Y) and Time (X3). Explain the results.
Using the model developed in part "g" predict sales for 2008.
Run a regression analysis relating sales (Y) and three independent variables X1, X2, and
X3. Explain the results.
e.
f.
g.
h.
i.
Professor Ahmadi’s Lecture Notes
Page 64
Chapter 13
Multiple Regression & Correlation With Dummy Variables
Problem 2. Ahmadi, Inc. is a microcomputer producer. The following data represent Ahmadi's
yearly sales volume, their advertising expenditure, and whether in a given year they used all
Television advertising (X2 = 0) or used Multimedia advertising (X2 = 1).
(Y)
X1
X2
Sales
Advertising Dummy Variable
Year
($1,000,000)
($10,000)
(0,1)
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
15
16
18
17
16
19
19
24
25
27
30
33
38
40
45
32
33
35
34
36
37
39
42
44
40
45
50
49
50
55
0
1
1
1
0
1
0
0
1
0
1
1
0
0
1
Regression procedure of Excel was used on the above data and parts of the results are shown on
the next page.
a. Fill in all the blanks on the next page.
b. Write the estimated regression equation.
c. Using the results shown on the next page, predict sales for the year 2004 assuming we are
planning to use $700,000 for television advertising only.
d. Using the results shown on the next page, predict sales for the year 2004 assuming we are
planning to use $700,000 for multimedia advertising.
Professor Ahmadi’s Lecture Notes
Page 65
SUMMARY OUTPUT
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
___________?
___________?
___________?
2.715
___________?
ANOVA
Regression
Residual
Total
df
___________?
___________?
___________?
Intercept
Advertising
Dummy
Coefficients
-28.462401
1.31332227
-0.8296375
Professor Ahmadi’s Lecture Notes
SS
1243.274
___________?
___________?
Standard Error
4.285592715
0.10113336
1.406029116
MS
___________?
___________?
t Stat
___________?
___________?
___________?
F
___________?
Significance F
8.59E-08
P-value
___________?
___________?
___________?
Page 66
Your Turn – One Final Example
Significance of variables and other issues
Problem 3. Ahmadi, Inc. produces several models of computer printers. Data on a few
variables for one of the company’s printers are presented below.
Sales (Y)
(In $1,000,000)
1578
1741
2295
2134
2035
2408
2337
2468
2533
2800
2729
2799
3264
3367
3289
3453
5031
6125
6519
4586
4876
4675
3473
3669
4167
a.
b.
c.
Advertising (X1)
(In $1,000)
588
600
600
780
750
820
810
840
700
970
920
950
980
1167
800
1255
1706
1890
1996
1700
1706
1888
1300
1500
1400
Price (X2)
(In $100)
21
20
17
21
21
19
20
25
25
16
15
24
17
19
12
17
17
12
17
15
21
14
19
18
24
Competitor's
Price (X3)
(In $100)
20
22
19
21
21
21
20
22
24
18
21
23
23
17
18
16
25
26
28
18
24
23
24
21
23
Time (X4)
(In Years)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Rating (X5)
(0 to 10)
4
2
4
8
6
8
8
6
8
8
6
6
6
4
6
6
8
8
8
10
4
6
10
8
4
Enter the above data into an Excel file and save the file. Print the file and the results of
all of the following parts.
Run a correlation analysis (among all variables) and print the results. Fully discuss the
meaning of the correlation coefficients. Be sure to discuss the concept of
multicollinearity.
Run a regression analysis relating sales (Y) and ALL the independent variables. Fully
explain the results.
Professor Ahmadi’s Lecture Notes
Page 67
d.
Drop the variable(s) that at 95% confidence were not significant in part “c” and run a new
regression analysis. Fully explain your results.
Professor Ahmadi’s Lecture Notes
Page 68