Download I.Nyirenda, Probability Notes 2021-2022.

Document related concepts
no text concepts found
Transcript
INTRODUCTION AND DESRIPTIVE STATISTICS.
THE SCIENCE OF STATISTICS.
STATISTICS: Is the science of data. This involves collecting, classifying, summarizing,
organizing, analyzing and interpreting numerical information.
TYPES OF STATISTICAL APPLICATIONS.

Descriptive Statistics

Inferential Statistics
DESCRIPTIVE STATISTICS:
Utilizes numerical and graphical methods to look for patterns in a data set, to summarize
the information revealed in the data set and to present that information in a convenient
form.

Collect Data: Ex. Survey

Present Data: Ex. Tables and Graphs

Characterize Data: Ex. Sample Mean =
X
i
n
INFERENTIAL STATISTICS:
Utilizes sample data to make estimates, decisions, predictions or other generalizations
about a larger set of data.

Estimation: Ex. Estimate the population mean weight using the sample mean
weight

Hypothesis Testing: Ex. Test the claim that the population mean weight is 120
pounds

Drawing conclusions and/or making decisions concerning a population
based on sample results.
FUNDAMENTAL ELEMENTS OF STATISTICS.

A POPULATION: Is a set of units in which we are interested. Typically, there are
too many experimental units in a population to consider every one. If we can
examine every single one, we conduct a CENSUS.

A SAMPLE: Is a subset of the POPULATION.

AN EXPERIMENTAL UNIT: Is an object about which we collect data.
Ex. Person, Place, Thing, Event...
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-1

A VARIABLE: Is a characteristic or property of an individual unit. The values of
these characteristics will not surprisingly vary.

A MEASURE OF RELIABILITY: Is a statement about the degree of uncertainty
associated with a statistical inference. Ex. Based on our analysis, we think 56%
of soda drinkers prefer Pepsi to Coke, ± 5%.
DESCRIPTIVE STATISTICS
INFERENTIAL STATISTICS
The population or sample of interest
Population of interest
One or more variables to be investigated
One or more variables to be investigated
Tables, graphs or numerical summary tools
The sample of population units
Identification of patterns in the data
The inference about the population based
on the sample data
A measure of reliability of the inference
TYPES OF DATA.

Quantitative Data

Categorical (Qualitative) Data
QUANTITATIVE DATA:
Are measurements that are recorded on a naturally occurring numerical scale. Ex. Age,
GPA, Salary, Cost of books this semester...
CATEGORICAL (QUALITATIVE) DATA:
Are measurements that cannot be recorded on a natural numerical scale, but are
recorded in categories. Ex. Live, On/Off campus, Major, Gender...
METHODS FOR DESCRIBING SETS OF DATA.
QUANTITATIVE DATA PRESENTATION:
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-2
1. ORDERED ARRAY:

Organizes data to focus on major features

Data placed in rank order from Smallest to Largest
Example:

Data in Raw Form (as Collected) are: 24, 26, 24, 21, 27, 27, 30, 41, 32, 38...

Data in Ordered Array are: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41...
2. STEM - AND - LEAF DISPLAY:

Shows the number of observations that share a common value (the stem) and
the precise value of each observation (the leaf)
Example:

Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41...
2
144677
From: 21, 24, 24, 26, 27, 27
3
028
From: 30, 32, 38
4
1
From: 41
3. FREQUENCY DISTRIBUTION TABLE:

Determine Range

Select Number of Classes, usually between 5 and 15 inclusive

Compute Class Intervals (Width)

Determine Class Boundaries (Limits)

Compute Class Midpoints

Count Observations and assign to Classes
Example-1:
Raw Data: 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
CLASS
FREQUENCY
15 but < 25
3
25 but < 35
5
35 but < 45
2
Example-2:
Raw Data: 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-3
CLASS
MIDPOINT
FREQUENCY
15 but < 25
20
3
25 but < 35
30
5
35 but < 45
40
2
Where by:
15 is a Lower Boundary, also called a Limit
25 is an Upper Boundary, also called a Limit
15 but < 25 is a Class Interval, also called Width
Midpoint = (Lower Boundary + Upper Boundary) / 2
3.1 RELATIVE FREQUENCY DISTRIBUTION TABLE:
class frequency
class relative frequency =
n
Where by: n is a sample size
CLASS
RF(Prop.)
15 but < 25
.3
25 but < 35
.5
35 but < 45
.2
3.2 RELATIVE FREQUENCY PERCENTAGE DISTRIBUTION TABLE:
Class percentage = (Class relative frequency) x 100
CLASS
RF%
15 but < 25
30.0
25 but < 35
50.0
35 but < 45
20.0
3.3 CUMULATIVE RELATIVE FREQUENCY PERCENTAGE DISTRIBUTION
TABLE:
CLASS
CRF%
15 but < 25
30.0
Initial value remain
25 but < 35
80.0
30% + 50%
35 but < 45
100.0
80% + 20%
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-4
3.4 HISTOGRAM:
Is a graph of the frequency or relative frequency of a variable.

Class intervals make up the horizontal axis (Lower Boundaries)

The frequencies or relative frequencies are displayed on the vertical axis
Example:
CLASS
FREQUENCY
15 but < 25
3
25 but < 35
5
35 but < 45
2
3.5 POLYGON:
CLASS
MIDPOINT
FREQUENCY
15 but < 25
20
3
25 but < 35
30
5
35 but < 45
40
2
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-5
Where by:

Midpoint make up the horizontal axis

The frequencies or relative frequencies are displayed on the vertical axis
3.6 OGIVE:

Class intervals make up the horizontal axis (Lower Boundaries)

The cumulative relative frequencies % are displayed on the vertical axis
CATEGORICAL (QUALITATIVE) DATA PRESENTATION:
1. SUMMARY TABLE:

Lists categories and number of elements in category

Number of elements in category obtained by tallying responses in category

Summary may show as well Frequencies (counts), % or both
Example:
MAJOR
COUNTS
Accounts
130
Economics
20
Management
50
Total
200
1.1 PIE CHART:

Shows breakdown of total quantity into categories

Useful for showing relative differences

Angle Size = (360°)(Percentage)
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-6
Where by: Angle Size = (360°)(Percentage)
Therefore: 360° x 10% = 36° for Economics
360° x 25% = 90° for Management and
360° x 130% = 234° for Accounts
1.2 BAR CHART:

Frequencies, also % are displayed on the horizontal axis (Bars' length)

Majors are displayed on the vertical axis

Horizontal Bars for categorical variables

The distance between Bars as you plot the chart should be 1/2 to 1 Bar width

The chart shall have equal Bar widths

Horizontal Bar Chart
Example: Plot a Bar Chart using the table hereunder
MAJOR
COUNTS
Accounts
130
Economics
20
Management
50
Total
200
Solution:
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-7
1.3 PARETO DIAGRAM:

% are displayed on the vertical axis (Bars' length)

Majors are displayed on the horizontal axis

The chart shall have equal Bar widths

Bars to be in descending order (Largest to Smallest)

Vertical Bar Chart
Example:
NOTE: The diagram does not correspond to the previous Data
NUMERICAL DESCRIPTIVE MEASURES.
SUMMARY MEASURES:
CENTRAL TENDENCY
VARIATION
Arithmetic Mean
Range (Interquartile)
Geometric Mean
Variance
Median
Coefficient of Variation
Mode
Standard Deviation
1. MEASURES OF CENTRAL TENDENCY:
There are various ways to describe the central, most common or middle value in a
distribution or set of data such as:

The arithmetic mean

The geometric mean

The median

The mode
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-8
Numerical measures of Central Tendency summarizing Data sets numerically. It helps
answering questions such as:

Are there certain values that seem more typical for the data?

How typical are they?
Therefore: Central Tendency is the value or values around which the data tend to
cluster while Variability shows how strongly data are clustered around those values.
1.1 THE MEAN:
Of a set of quantitative data is the sum of the observed values divided by the number of
values (sample size).
n
n
x
x
i 1
i
n
x
i

i 1
N
Where by:
The sample mean is typically denoted by x-bar, but the population mean is denoted by
the Greek symbol μ.
n: Sample size
N: Population size
Example:
If x1 = 1; x2 = 2; x3 = 3 and x4 = 4...Find mean.
Solution:
n
x
x
i 1
i = (1 + 2 + 3 + 4)/4 = 10/4 = 2.5
n
1.2 THE MEDIAN (M):
Of a set of quantitative data is the value which is located in the middle of the data,
arranged from lowest to highest values (or vice versa), with 50% of the observations
above and 50% below.
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-9
In order to find Median (M):

Arrange the n measurements from smallest to largest

If n is odd, M is the middle number

If n is even, M is the average of the middle two numbers
1.3 THE MODE:
Is the most frequently observed value.
The modal class is the midpoint of the class with the highest relative frequency.
1.4 THE GEOMETRIC MEAN:
Equals to the nth root of the product of all observations or values.
For a set of values: x1, x2, x3, x3, ........., xn
Geometric mean equals to:
Example:
Jim has 20 problems to do for homework. Some are harder than others and take more
time to solve. We take a random sample of 9 problems. Find the mean (arithmetic and
geometric), median and mode for the number of minutes Jim spends on his homework.
Problem #
Time spent (Minutes)
01
12
02
4
03
3
04
8
05
7
06
5
07
4
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-10
Problem #
Time spent (Minutes)
08
9
09
11
Data:
Sample size (n) = 9
Problems 1 through 9 = x1, x2, x3 … x9, respectively
Solution-1, arithmetic mean (AM):
n
x
x
i 1
i = (12 + 4 + 3 + 8 + 7 + 5 + 4 + 9 + 11) = 63/9 = 7minutes.
n
Solution-2, geometric mean (GM):
GM = 6.31
Solution-3, median (M):
Arrange the n measurements from smallest to largest as follows: 3, 4,4,5,7,8,9,11,12
(n+1)/2 = (9+1)/2 = 5
The 5
th
ordered observation is 7 and so is the Median.
Solution-4, mode:
Arrange the n measurements from smallest to largest as follows: 3, 4,4,5,7,8,9,11,12
Only the value 4 occurs >1 time. Then, the Mode is 4.
1.5 APPROXIMATING THE MEAN FROM A FREQUENCY DISTRIBUTION:
Used when the only source of data is a frequency distribution.
Where by: n = sample size
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-11
c = number classes in the frequency distribution
mj = midpoint of the jth class
fj = frequencies of the jth class
Example: Approximate the mean using the table hereunder.
CLASS
MIDPOINT
FREQUENCY
10 but < 20
15
3
20 but < 30
25
6
30 but < 40
35
5
40 but < 50
45
4
50 but < 60
55
2
Total
20
Solution:
= ((15x3) + (25x6) + (35x5) + (45x4) + (55x2))/20
= (45 + 150 + 175 + 180 + 110)/20
= 660/20
= 33
CONCLUSION.

If you have perfectly symmetric data set:
 Then, Mean = Median = Mode

If you have extremely high value in the data set:
 Then, Mean > Median > Mode (Rightward skewness)

If you have extremely low value in the data set:
 Then, Mean < Median < Mode (Leftward skewness)
A data set is skewed if one tail of the distribution has more extreme observations than
the other tail. The mean, median and mode give us an idea of the central tendency, or
where the “middle” of the data is. Variability gives us an idea of how spread out the data
are around that middle.
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-12
2. MEASURES OF VARIATION:
NUMERICAL MEASURES OF VARIABILITY:
2.1 RANGE.
The range is equal to the largest measurement minus the smallest measurement.

Easy to compute, but not very informative

Considers only two observations (the smallest and largest)
2.1.1
QUARTILES:
Quartiles Split Ordered Data into 4 equal portions.

Q1 and Q3 are measures of Non-Central Location, Q2 = the Median.

Each Quartile has position and value

With the data in an ordered array, the position of Qi is:
Where by:
n = sample size
Qi is the value associated with that position in the ordered array
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-13
 Qi  
i  n  1
4
Example 1: Given the following data in Ordered Array, find Q1
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
Solution:
i  n  1 ; Therefore Position of Q1  1 9  1  2.5
 Qi  
4
4
Q1 
12  13  12.5
2
Example 2:
Given data in Ordered Array: 3 4 4 5 7 8 9 11 12
Find the 1st and 3rd Quartiles in the ordered observations above.
Solution:
Position of Q1 = 1(9+1)/4 = 2.5
The 2.5th observation = (4+4)/2 = 4
Position of Q3 = 3(9+1)/4 = 3(Q1) = 7.5
The 7.5th observation = (9+11)/2 = 10
2.1.2
INTERQUARTILE RANGE (IQR):
Is the difference between Q1 and Q3
Is the middle of the values (50%), also known as Midspread. Resistant to extreme
values.
Example 1:
Given the following data in Ordered Array: 11 12 13 16
16
Find the Interquartile range (IQR).
Solution:
Position of Q1 = 1(9+1)/4 = 2.5
The 2.5th observation = (12+13)/2 = 12.5
Position of Q3 = 3(9+1)/4 = 3(Q1) = 7.5
The 7.5th observation = (17+18)/2 = 17.5
Therefore: The IQR = 17.5 - 12.5 = 5
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-14
17 17 18 21
Example 2:
Given the following data in Ordered Array: 3 4 4 5 7 8 9 11 12
Find the Range and the Interquartile Range in the above distribution.
Solution:
Range = Largest – Smallest = 12 – 3 = 9
Quartiles are as follows:
Position of Q1 = 1(9+1)/4 = 2.5
The 2.5th observation = (4+4)/2 = 4
Position of Q3 = 3(9+1)/4 = 3(Q1) = 7.5
The 7.5th observation = (9+11)/2 = 10
Therefore: The IQR = 10 - 4 = 6
2.2 VARIANCE.
SAMPLE VARIANCE (S2):
2.2.1
For a sample of n measurements is equal to the sum of the squared distances from the
mean, divided by (n – 1).
n
s2 
 (x  x )
2
i
i 1
n 1
Where by:
S2 = sample variance
Xi = distance from the mean
n = sample size
= sample mean
2.2.2
SAMPLE STANDARD DEVIATION (S):
For a sample of n measurements is equal to the square root of the sample variance.
n
s  s2 
 (x  x )
2
i
i 1
n 1
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-15
Example:
Say a small data set consists of the measurements 1, 2 and 3
Find sample variance and sample standard deviation
Solution:
Compute sample mean first:
n
x
x
i 1
i = (1 + 2 + 3) = 6/3 = 2
n
Then compute sample variance and sample standard deviation
n
s2 
 (x  x )
2
i
 (3  2) 2  (2  2) 2  (1  2) 2  / (3  1)
i 1
n 1
s 2  12  02  12 / 2  2 / 2  1


s  s2  1  1
NOTE:
Greek letters are used for populations and Roman letters for samples
s2 = sample variance
s = sample standard deviation
σ2 = population variance
σ = population standard deviation
2.2.3
COMPARING STANDARD DEVIATIONS:
Greater S or σ = more dispersion of data
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-16
2.2.4
INTERPRETING THE STANDARD DEVIATION:

Chebyshev’s Rule

The Empirical Rule
Both tell us something about where the data will be relative to the mean.
2.2.4.1
CHEBYSHEV'S RULE:

Valid for any data set

For any number k >1, at least (1-1/k2)% of the observations will lie within k
standard deviations of the mean
k
k2
1/k2
1-1/k2
2
4
.25
75%
3
9
.11
89%
4
16 .0625 93.75%
THE BIENAYME-CHEBYSHEV RULE:

At least (≥) 75% of the observations must be contained within distances of 2
SD around the mean.

At least (≥) 88.89% of the observations must be contained within distances of
3 SD around the mean.

At least (≥) 93.75% of the observations must be contained within distances of
4 SD around the mean.
2.2.4.2
THE EMPIRICAL RULE:

Useful for mound-shaped, symmetrical distributions

If not perfectly mounded and symmetrical, the values are approximations
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-17
For a perfectly symmetrical and mound-shaped distribution:

~68% will be within the range:
(ẋ-s,
ẋ+s)

~95% will be within the range:
(ẋ-2s,
ẋ+2s)

~99.7% will be within the range: (ẋ-3s,
ẋ+3s)
Example:
Hummingbirds beat their wings in flight an average of 55 times per second. Assume the
standard deviation is 10, and that the distribution is symmetrical and mounded.
Approximate what percentage of hummingbirds beat their wings:

Between 45 and 65 times per second?

Between 55 and 65 times per second?

Less than 45 times per second?
Data:
Sample Mean (ẋ) = 55
Standard Deviation (S) = 10
Recall:

~68% will be within the range:
(ẋ-s,
ẋ+s)

~95% will be within the range:
(ẋ-2s,
ẋ+2s)

~99.7% will be within the range: (ẋ-3s,
ẋ+3s)
Solution 1:
Approximate what percentage of hummingbirds beat their wings:
Between 45 and 65 times per second?

Since 45 and 65 are exactly one standard deviation below and above the
mean, the empirical rule says that about 68% of the hummingbirds will be in
this range.
Solution 2:
Approximate what percentage of hummingbirds beat their wings:
Between 55 and 65 times per second?

This range of numbers is from the mean to one standard deviation above it, or
one-half of the range in the previous question. Therefore, about one-half of
68% or 34% of the hummingbirds will be in this range.
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-18
Solution 3:
Approximate what percentage of hummingbirds beat their wings:
Less than 45 times per second?

Half of the entire data set lies above the mean, and ~34% lie between 45 and
55 (between one standard deviation below the mean and the mean).
Therefore, ~84% = (~34% + 50%) are above 45, which means ~16% are
below 45.
Exercise:
A manufacturer of automobile batteries claims that the average length of life of its grade
A battery is 60 months. However, the guarantee on this brand is for just 36 months.
Suppose the standard deviation of the life length is known to be 10 months and the
frequency distribution of the life-length data is known to be mound shaped.

Approximate what percentage of the manufacturer’s grade A batteries will last
more than 50 months?, assuming that the manufacturer’s claim is true.

Approximate what percentage of the manufacturer’s batteries will last less
than 40 months?, assuming that the manufacturer’s claim is true.

Suppose your battery last 37 months. What could you infer about the
manufacturer’s claim?
Data:
Sample Mean (ẋ) = 60
Standard Deviation (S) = 10
Solution 1:
Approximate what percentage of the manufacturer’s grade A batteries will last more than
50 months?, assuming that the manufacturer’s claim is true.
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-19

Half of the entire data set lies above the mean, and ~34% lie between 50 and
60 (between one standard deviation below the mean and the mean).
Therefore, ~84% = (~34% + 50%) are above 50 months, which means ~84%
of the manufacturer’s grade A batteries will last more than 50 months.
Solution 2:
Approximate what percentage of the manufacturer’s batteries will last less than 40
months?, assuming that the manufacturer’s claim is true.

The required % will be equals to100% minus the summation of the % (half
way above the mean i.e. 50%) and the sum of two standard deviation below
the mean i.e. (34% + 13.5% = 47.5%).

Therefore, 100% - (50% + 47.5%) = 2.5%

Conclusion: ~2.5% of the manufacturer’s batteries will last less than 40
months
Solution 3:
Suppose your battery last 37 months. What could you infer about the manufacturer’s
claim?

Since 37 lies between second and third standard deviation (i.e. 40 and 30)
below the mean. This means that, chances that a manufacturer’s batteries
lasts at most 37 months is ~2.5% obtained from the equation 100% - (50% +
47.5%).

Since the manufacturer claimed that " the average length of life of its grade A
battery is 60 months. Therefore, the ~2.5% represents a slice chance that the
manufacturer’s claim could not be achieved.
2.3 COEFFICIENT OF VARIATION.

Measure of Relative Variation

Shows variation relative to the Mean

Used to compare Two or More sets of data measured in different units
S
CV  
X

100%

Where by: S = Sample Standard Deviation and X = Sample Mean
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-20
2.3.1
COMPARING COEFFICIENT OF VARIATION:
Stock A:
Stock B:
Average price last year = $50
Average price last year = $100
Standard deviation = $5
Standard deviation = $5
 S
CVA  
X
 S
CVB  
X

$5
  100% 
 100%  10%

$50


$5
 100% 
100%  5%

$100

Conclusion:
Both stocks have the same standard deviation, but stock B is less variable relative to its
price.
2.3.2
Z-SCORE:
NUMERICAL MEASURES OF RELATIVE STANDING.
The z-score tells us how many standard deviations above or below the mean of a
particular measurement is.

Sample z-score

Population z-score
xx
z
s
z
x

Example 1:
Hummingbirds beat their wings in flight an average of 55 times per second. Assume the
standard deviation is 10, and that the distribution is symmetrical and mounded. An
individual hummingbird is measured with 75 beats per second. What is this bird’s zscore?
Data:
Sample Mean (ẋ) = 55
Standard Deviation (S) = 10
Measurements/ Value (X) = 75
Z=?
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-21
Solution:
z
75  55
 2.0
10
Therefore, the value 75 is 2.0 standard deviation above the Mean.
Example 2:
If the mean is 14.0 and the standard deviation is 3.0. What is the Z - score for the value
18.5?
Data:
Sample Mean (ẋ) = 14.0
Standard Deviation (S) = 3.0
Measurements/ Value (X) = 18.5 ; Z = ?
Solution:
Z
X  X 18.5  14.0

 1.5
S
3.0
Therefore, the value 18.5 is 1.5 standard deviation above the Mean.
NOTE:
1. A negative Z-score would mean that a value is less than the Mean.
2. Z-scores are related to the empirical rule as follows:
For a perfectly symmetrical and mound-shaped distribution, then

~68 % will have Z-scores between -1 and 1

~95 % will have Z-scores between -2 and 2

~99.7% will have Z-scores between -3 and 3
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-22
INTERPRETATION OF POINT #2 ABOVE:

Since ~95% of all the measurements will be within 2 standard deviations of
the Mean, only ~5% will be more than 2 standard deviations from the Mean.

About half of this 5% will be far below the mean, leaving only about 2.5% of
the measurements at least 2 standard deviations above the mean.
2.3.3
METHODS FOR DETERMINING OUTLIERS:
An outlier is a measurement that is unusually large or small relative to the other values.
There are three possible causes for the outlier to happen:

Observation, recording or data entry error

Item is from a different population

A rare, chance event
The outlier can be identified using the Box Plot (“Box-and-Whisker”).
2.3.3.1
THE BOX PLOT (“Box-and-Whisker”):
The box plot is a graph representing information about certain percentiles for a data set
and can be used to identify outliers.

5 number summary: Median, Q1, Q3, X smallest, X largest

Box Plot: Graphical display of data using 5-number summary
2.3.3.2
DISTRIBUTION SHAPES AND BOX PLOT:
Left-Skewed
Symmetric
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-23
Right-Skewed
2.3.3.3
OUTLIERS AND Z-SCORES:
The chance that a z-score is between -3 and +3 is over 99%.
Therefore, any measurement with |z| > 3 is considered an outlier.
2.3.3.4
CORRELATION COEFFICIENT (r):

It has no unit (Unit Free)

Measures the strength of the linear relationship between 2 quantitative
variables

Ranges between –1 and 1 where by:
 The Closer to –1, the stronger the negative linear relationship becomes
 The Closer to 1, the stronger the positive linear relationship becomes
 The Closer to 0, the weaker any linear relationship becomes
Example:
Scatter plots of data with various Correlation Coefficients (r). Scattergram or scatter
plot shows the relationship between two quantitative variables.
2.3.3.5
DISTORTING THE TRUTH WITH DECEPTIVE STATISTICS:
DISTORTIONS:

Stretching the axis (and the truth)

Is average relevant?; Mean, median or mode?

Is average relevant?; What about the spread?
L1-Statistics
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2021-24
BASIC PROBABILITY.
WHY PROBABILITY.
The following situations provide
examples of the role of uncertainty in our lives/ a
business context:

Investment counselors cannot be sure which of two stocks will deliver the better
growth over the coming year.

Engineers try to reduce the likelihood that a machine will break down.

Marketers may be uncertain as to the effectiveness of an Ad. campaign or the
eventual success of a new product.

Product manufacturers and system designers need to have testing methods that
will assess various aspects of reliability.

Long lifetimes> time consuming >we need “accelerated” testing methods.

Inventory management.
BASIC CONCEPTS.
RANDOM EXPERIMENT: Is a process leading to at least two possible outcomes with
uncertainty as to which will occur.
Example:

A coin is thrown

A consumer is asked which of two products he or she prefers
SAMPLE SPACES: Is a collection of all possible outcomes.
Example:
Examine three fuses in sequence and note the result of each examination.
Outcome for the entire experiment is any sequence of Ns and Ds of length 3.
Sample space s={NNN, NND, NDN, NDD, DNN, DND, DDN, DDD}
AN EVENT: Is any collection (subset) of outcomes contained in the sample space S.

An event is said to be simple if it consists of exactly one outcome and

A compound if it consists of more than one outcome.
JOINT EVENT: Is when 2 events occurring simultaneously.
Example:
Male and Age over 20.
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-1
UNIONS AND INTERSECTION:
Intersection: (A and B, (AÇB))
Union: (A or B, (A È B))
EVENT PROPERTIES:

Mutually Exclusive: Two outcomes that cannot occur at the same time.
Example: Flip a coin, resulting in head or tail.

Collectively Exhaustive: One outcome in sample space must occur.
Example: Male or Female.
SPECIAL EVENTS:

Null Event:
Example: Club & Diamond on 1 Card Draw.

Complement of Event:
For Event A, All Events Not In A: A' or Ā.
WHAT IS PROBABILITY?

Focuses on a systematic study of randomness and uncertainty.

Provides methods for quantifying the chances, or likelihoods associated with the
various outcomes
Numerical measure of likelihood that the event will occur lies between 0 & 1, i.e. Sum
of events is 1.
0: Impossible
1: Certain
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-2
CONCEPTS OF PROBABILITY.

Priori "classical probability": The probability of success is based on prior
knowledge of the process involved.
Example: The chance of picking a black card from a deck of cards.

Empirical: The outcomes are based on observed data, not on prior knowledge of
a process.
Example: The chance that individual selected at random from employee survey if
satisfied with his or her job.

Classical probability: Based on formal reasoning.

Subjective probability: The chance of occurrence assigned to an event by a
particular individual, based on his/her experience, personal opinion and analysis
of a particular situation.
Example: The chance of a newly designed style of mobile phone will be successful in
market.
COMPUTING PROBABILITIES.
NOTE: Each of the outcomes in the sample space is equally likely to occur.
Where by:
P(E): Probability of an Event E.
X: Number of event outcomes.
T: Total number of possible outcomes in the sample space.
PRESENTING PROBABILITY AND SAMPLE SPACE.

Listing

Venn Diagram

Tree Diagram

Contingency Table
LISTING:
S = {Head, Tail}
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-3
VENN DIAGRAM:

Let A = aces

Let B = red cards
TREE DIAGRAM:
CONTINGENCY TABLE:
Ace
Not Ace
Total
Black
2
24
26
Red
2
24
26
Total
4
48
52
JOINT PROBABILITY USING CONTINGENCY TABLE:
Event
Event
B1
B2
Total
A1
P(A1 ∩ B1)
P(A1 ∩ B2)
P(A1)
A2
P(A2 ∩ B1)
P(A2 ∩ B2)
P(A2)
Total
P(B1)
P(B2)
1
Where by:

P(A1 ∩ B1); P(A2 ∩ B1); P(A1 ∩ B2) and P(A2 ∩ B2) are Joint Probability.

P(A1); P(A2); P(B1) and P(B2) are Marginal/ Simple Probability.
COMPOUND PROBABILITY.
ADDITIONAL RULE:
Used to Get Compound Probabilities for Union of Events.
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-4

P(A or B) = P(AB) = P(A) + P(B)  P(AB).
For Mutually Exclusive Events:

P(A or B) = P(AB) = P(A) + P(B).
For Probability of Compliment:

P(A) + P(Ā) = 1. So, P(Ā) = 1  P(A).
Example:
A hamburger chain found that 75% of all customers use mustard, 80% use ketchup, 65%
use both. What is the probability that a particular customer will use at least one of these?
Given:

Let A = Customers use mustard; P(A) = .75

Let B = Customers use ketchup; P(B) = .80

P(AB) = .65 (Both)

P(AB) =  (At least one these = A or B)
Solution:

P(AB) = P(A) + P(B)  P(AB) = .75 + .80  .65= .90
MULTIPLICATION RULE:
Used to Get Joint Probabilities for Intersection of Events (Joint Events).

P(A and B) = P(AB).

P(AB) = P(A)*P(B|A) = P(B)*P(A|B).
For Independent Events:

P(A and B) = P(AB) = P(A)*P(B).
COMPUTING CONDITIONAL PROBABILITIES.
A conditional probability is the probability of one event, given that another event has
occurred:

P(A | B) 
P(A and B)
The conditional probability of A given that B has occurred.
P(B)

P(B | A) 
P(A and B)
The conditional probability of B given that A has occurred.
P(A)
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-5
Where by:

P(A and B) = Joint Probability of A and B

P(A) = Marginal Probability of A

P(B) = Marginal Probability of B
Example:
Of the cars on a used car lot, 70% have air conditioning (AC) and 40% have a CD player
(CD). 20% of the cars have both. What is the probability that a car has a CD player,
given that it has AC ?
Given:

Let A = Cars with AC; P(A) = .7

Let B = Cars with CD; P(B) = .4

P(AB) = .2

P(B | A) = P(CD | AC) = 
Solution:

Recall: P(B | A) 

P(CD | AC) 
P(A and B)
P(A)
P(CD and AC) 0.2

 0.2857
P(AC)
0.7
By using Contingency Table:
Event

Recall: P(B | A) 

P(CD | AC) 
Event
CD
No CD
Total
AC
.2
.5
.7
No AC
.2
.1
.3
Total
.4
.6
1.0
P(A and B)
P(A)
P(CD and AC) 0.2

 0.2857
P(AC)
0.7
Conclusion: The probability that a car has a CD player, given that it has AC = .2857 =
28.6%.
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-6
BAYES' THEOREM.

Permits Revising Old Probabilities based on New Information.
 Prior Probability.
 New information.

Application of Conditional Probability.

Mutually Exclusive and Exhaustive Events.
Therefore,
the computation of a Posterior Probability P(Ai | B) from given Prior
Probabilities P(Ai) and Conditional Probabilities P(B | Ai) is as follows:

Application of Conditional Probability.
Recall: P(A | B) 
P(A and B)
P(A and B)
and P(B | A) 
P(B)
P(A)
P(AB) = P(A)*P(B|A) = P(B)*P(A|B) ..........Eqn 1.
P(Ai | B) 
P(Ai) * P(B | Ai)
P(B)
...........Eqn 2. This is Bayes' Theorem.
Generalized form of Bayes' Theorem "Revised Probability"

Given k Mutually Exclusive and Exhaustive Events B1, B2,… Bk, and an observed
event A, then:
P(B) = P(A1)*P(B|A1) + P(A2)*P(B|A2) + P(A2)*P(B|A2) + P(Ak)*P(B|Ak) ...........Eqn 3.
n
P(B)   P(Ai) * P(B | Ai)
i 1
Bayes' Theorem reference diagrams:
Conditional Probability
Sample space and Interaction of Events
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-7
Example 1:
A drilling company has estimated a 40% chance of striking oil for their new well. A
detailed test has been scheduled for more information. Historically, 60% of successful
wells have had detailed tests, and 20% of unsuccessful wells have had detailed tests.
Given that this well has been scheduled for a detailed test, what is the probability that the
well will be successful?
Given:
Prior Probabilities are:

Let S = successful well: P(S) = .4

Let U = successful well: P(U) = .6
Conditional Probability are:

Let D = Detailed Test Event

P(D | S) = .6

P(D | U) = .2

P(S | D) = 
Solution:
P(D | S)P(S)
P(D | S)P(S)  P(D | U)P(U)
(0.6)(0.4)

(0.6)(0.4) (0.2)(0.6)
0.24

 0.667
0.24  0.12
P(S | D) 
Conclusion:
Given the detailed test, the revised probability of a successful well has risen to 0.667
from the original estimate of 0.4
Using Tabula form:
Event
Prior Prob.
Conditional Prob.
Joint Prob.
Revised Prob.
S (successful)
.4
.6
0.4x0.6=0.24
0.24/0.36 = 0.667
U (unsuccessful)
.6
.2
0.6x0.2=0.12
0.12/0.36 = 0.333
 = 0.36
 = 1.0
 = 1.0
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-8
Example 2:
Fifty percent of borrowers repaid their loans. Out of those who repaid, 40% had a
college degree. Ten percent of those who defaulted had a college degree. What is the
probability that a randomly selected borrower who has a college degree will repay the
loan?
Given:

Let B1 = Repay; B2 = Default, A = College degree

P(B1) = .5; P(A|B1) = .4, P(A|B2) = .1

P(B1|A) = 
Solution:
P( B1 | A) 

(.4)(.5)
.2
P( A | B1 ) P( B1 )


 .8
P( A | B1 ) P( B1 )  P( A | B2 ) P( B2 ) (.4)(.5)  (.1)(.5) .25
Using Tabula form:
Event
Prior Prob.
Conditional Prob.
Joint Prob.
Revised Prob.
Bi
P(Bi)
P(A | Bi)
P(BiA)
P(Bi | A)
B1 (Repay)
.5
.4
0.5x0.4=0.20
0.20/0.25 = 0.8
B2 (Default)
.5
.1
0.5x0.1=0.5
0.5/0.25 = 0.2
 = 0.25
 = 1.0
 = 1.0
PERMUTATION AND COMBINATION.
PERMUTATION:
Counting Rule 1:

If any one of n different mutually exclusive and collectively exhaustive events can
occur on each of r trials, the number of possible outcomes is equal to:
n·n ·… ·n = nr
Counting Rule 2:

The number of ways that all n objects can be arranged in order is:
n(n -1)(n -2)(2)(1) = n!; Where n! is called Factorial and 0! is defined as 1
Example:

There are 20 candidates for three different mechanical engineer positions, E1, E2,
and E3. How many different ways could you fill the positions?
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-9
Solution:

20x19x18x..........x1 = 6840.
Counting Rule 3 "Permutation":

The number of ways of arranging r objects selected from n objects in order is:
Prn 
n!
( n  r )!
COMBINATION:

The number of ways that arranging r objects selected from n objects irrespective
of the order is equal to:
Crn 
   r!(nn! r )!
n
r
Example:

Five sales Engineers will be hired from a group of 100 applicants. In how many
ways (Combinations) can groups of 5 sales Engineers be selected?
Given:

n = 100; r = 5
Solution:

Crn 
100!
   5!(100
 75,287,520
 5)!
n
r
RANDOM VARIABLE.
A random variable is a variable that assumes numerical values associated with the
random outcome of an experiment, where one (and only one) numerical value is
assigned to each sample point.
TYPES OF RANDOM VARIABLE.

Discrete random variable

Continuous random variable
A discrete random variable: Can assume a countable number of values "obtained by
counting" A random variable that can take on only certain values along an interval, with
the possible values having gaps between them. Ex: Number of steps to the top of a
Tower.
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-10
Example 1: Counter number of Tails when two coins are tossed and Probability
Distribution table.
Given:

If H = Head and T = Tail.

S = {HH, HT, TH, TT}
Solution:
Probability Distribution
Event: Toss two coins
Values:
Probability:
HH
0
1/4=.25
HT; TH
2
2/4=.50
TT
1
1/4=.25
Example 2: Six batches of components are ready to be shipped by a supplier. The
number of defective components in each batch is as follows:
Batch
#1
#2
#3
#4
#5 #6
# of Defectives
0
2
0
1
2
0
Solution:

P(0) = P(Batch 1, 3 and 6) = 3/6 = 0.500

P(1) = P(Batch 4) = 1/6 = 0.167

P(2) = P(Batch 2 and 5) = 2/6 = 0.333
A continuous random variable: Can assume any value along a given interval of a
number line.
Example:

The time a tourist stays at the top once s/he gets there

Exact temperature outside
PROBABILITY DISTRIBUTIONS FOR DISCRETE RANDOM VARIABLES.
The Probability Distribution (Probability Mass Function) of a discrete random
variable is a graph, table or formula that specifies the probability associated with each
possible outcome the random variable can assume i.e. [Xj , p(Xj) ] pairs.
Where by:
Xj = Value of random variable
P(Xj) = Probability associated with value
P(x) ≥ 0 for all values of x and p(x) = 1
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-11
Example:
Say a random variable x follows this pattern: P(x) = (.3)(.7)x-1 for x > 0.
X
P(x)
X
P(x)
1
.30
6
.05
2
.21
7
.04
3
.15
8
.02
4
.11
9
.02
5
.07
10
.01
EXPECTED VALUES OF DISCRETE RANDOM VARIABLES.
The mean, or expected value, of a discrete random variable is:
  E ( x)   xp( x).
The variance of a discrete random variable x is:
 2  E[( x   )2 ]   ( x   )2 p( x).
The standard deviation of a discrete random variable x is:
 2  E[( x   )2 ] 
(x  )
2
p( x).
IMPORTANT DISCRETE PROBABILITY DISTRIBUTIONS.
THE BINOMIAL DISTRIBUTION:
Properties of a Binomial Random Variable.

n: Identical trials. Example: Flip a coin 3 times

Two outcomes: Success or Failure. Example: Heads and Tails

P(S) = p and P(F) = q = 1 – p. Example: P(H) = .5; P(F) = 1-.5 = .5

Trials are independent. Example: A head on flip i doesn’t change P(H) of flip i + 1

x is the number of Successes in n trials
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-12

 n
P( x)    p x q n x ..........Eqn 1
 x
Where by:

n
  : The number of ways of getting the desired results
 x

p x : The probability of getting the required number of successes

q n  x : The probability of getting the required number of failures

   x!(nn! x)!
n
x
..........Eqn 2
Inserting Eqn 2 into Eqn 1

p( x) 
n!
p x (1  p ) n  x
x!(n  x)!
Example:

What is the probability of one success in five observations if the probability of
success is .1?
Given: x = 1, n = 5, and p = 0.1
Solution:
n!
p X (1  p) n  X
x! (n  x)!
5!

(0.1)1 (1  0.1)51
1!(5  1)!
P(x  1) 

 (5)(0.1)(0.9) 4
 0.32805
NOTE:
A Binomial Random Variable also has:
  np

Mean

Variance
 2  npq

Standard Deviation
  npq
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-13
POSSIBLE BINOMIAL DISTRIBUTION SETTINGS.

A manufacturing plant labels items as either defective or acceptable

A firm bidding for contracts will either get a contract or not

A marketing research firm receives survey responses of “yes I will buy” or “no I
will not”

New job applicants either accept the offer or reject it
THE HYPER GEOMETRIC DISTRIBUTION.

Recall: In the Binomial situation, each trial was independent i.e. Drawing cards
from a deck and replacing the drawn card each time.

Now: If the card is not replaced, each trial depends on the previous trial(s). The
Hyper geometric distribution can be used in this case.

Randomly draw n elements from a set of N elements, without replacement.
Assume there are r successes and N-r failures in the N elements.

Therefore: The Hyper geometric random variable is the number of successes; x,
drawn from the r available in the n selections.
 r  N  r 
 

x
n

x

P( x)   
N
 
 
n
Where by:

N = Total number of elements (Population size)

r = Number of successes in the N elements (Successes in the population)

n = Number of elements drawn (Sample size)

x = Number of successes in the n elements (successes in the sample)
NOTE:
The Hyper geometric distribution also has:


Variance:
Mean:
2 

nr
N
r ( N  r ) n( N  n)
N 2 ( N  1)
Example:
Three Light bulbs were selected from ten. Of the ten, four were defective. What is the
probability that two of the three selected are defective?
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-14
Given:
N = 10; n = 3; r = 4 and X = 2.
Solution:
 r  N  r 
 4 10  4 
 

 

x  n  x 
2  3  2 


Recall: P ( x) 
= P(2) 
= .30
N
10 
 
 
n
3
THE POISSON DISTRIBUTION.
Evaluates the probability of a number (usually small) of occurrences out of many
opportunities in a …

Period of time

Area

Volume

Weight

Distance

Other units of measurement
P( x) 
x e  
x!
Where by:

 = Mean number of occurrences in the given unit of time, area, volume, etc.

e = 2.71828….

µ= 

σ2 = 

x = Number of successes per unit
Example:
Suppose the number x of cracks per concrete specimen for a particular type of cement
mix has approximately a Poisson probability distribution. Furthermore, assume that the
average number of cracks per specimen is 2.5. Find the probability that a randomly
selected concrete specimen has exactly five cracks.
Given:
 = 2.5
e = 2.71828…. and x = 5
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-15
Solution:
P( x  5) 
x e  
2.55 e 2.5

 0.067
x!
5!
COMPARISON.
The Poisson probability distribution is related to and can be used to approximate a
binomial probability distribution when n is large and  = np is small.
Exercise:
An acceptance sampling plan selects 5 items from a population of 500 items, 16 of which
are unacceptable. The lot is accepted if at most 2 of the sampled items are
unacceptable. Compare the exact (hypergeometric) probability with both binomial and
Poisson approximations.
Recall:

N = Total number of elements (Population size)

r = Number of successes in the N elements (Successes in the population)

n = Number of elements drawn (Sample size)

x = Number of successes in the n elements (successes in the sample)
Given: N = 500; n = 5; r = 16 and x = 2.
Solution:
 r  N  r 
16  500  16 
 

 

x  n  x 
2  5  2 


P( x) 
= P(2) 
=
N
 500
 


n
 5 
Where by:
16  484
 

 2  3  = 1.8 x 10-5
 500


 5 
   r!(nn! r )!
r
x
Comparison:
Poisson:  = 5; e = 2.71828…. and x = 2

P( x) 
x e  
x!
52 * 2.718285
; P ( 2) 
2 *1
= .08
Binomial: x = 2; n = 5; p = 2/5 = .4 and q = 1-0.4 = .6
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-16

 n
P( x)    p x q n x
 x

p( x) 

p ( 2) 
n!
p x (1  p ) n  x ;
x!(n  x)!
5 * 4 * 3!
* (0.4) 2 (.06) 3
2 * 3!
Distribution Type:
Probability:
= .34
Hyper geometric
Poisson
Binomial
1.8 x 10-5
.08
.34
Conclusion:
L2-Probability Theory
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-17
CONFIDENCE INTERVAL ESTIMATION.
Statistical inference consists of those methods used to make decisions or to draw
conclusions about a population. These methods utilize the information contained in a
sample from the population in drawing conclusions.
Divided into two major areas:

Parameter Estimation

Hypothesis Testing
CONFIDENCE INTERVALS.
Confidence Intervals for the Population Mean is μ.

when Population Standard Deviation σ is Known

when Population Standard Deviation σ is Unknown
Confidence Intervals for the Population Proportion is p.
Confidence Intervals for the Population Standard deviation is σ.
They are to be used to determining the Required Sample Size.
Point and interval estimates:

A Point Estimate is a single number.

A Confidence Interval provides additional information about variability.
Point estimates:
We can estimate a
With a Sample Statistic
Population Parameter
(Point Estimate)
Mean
μ
x
Proportion
π
p
Standard Deviation
σ
s
How much uncertainty is associated with a point estimate of a population parameter? An
interval estimate provides more information about a population characteristic than it does
for a point estimate. Such interval estimates are called Confidence Intervals.
L3-Confidence Intervals
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-1
CONFIDENCE INTERVAL ESTIMATE.
An interval gives a range of values:

Takes into consideration variation in sample statistics from sample to sample;

Based on observations from 1 sample;

Gives information about closeness to unknown population parameters;

Stated in terms of Level of Confidence;

Can never be 100% confident.
Estimation process:
General formula:
The general formula for all confidence intervals is:
Point Estimate ± (Critical Value)(Standard Error) = X  Z
σ
n
CONFIDENCE LEVEL (1-).
Confidence Level: Confidence for which the interval will contain the unknown population
parameter. A percentage (less than 100%).

Suppose Confidence Level = 95%. Also written (1-) = 0.95 where:  is a
threshold that you use to categorize a result as either explainable by chance
alone or not explainable by chance alone.

A relative frequency interpretation: In the long run, 95% of all the confidence
intervals that can be constructed will contain the unknown true parameter.

A specific interval either will contain or will not contain the true parameter. No
probability involved in a specific interval.
L3-Confidence Intervals
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-2
CONFIDENCE INTERVAL FOR μ (σ Known).
Assumptions:

Population standard deviation σ is known;

Population is normally distributed;

If Population is not normal, use large sample.
Confidence Interval Estimate:
σ
n
XZ
Where:

X is the point estimate

Z is the normal distribution critical value for a probability of /2 in each tail

σ/ n is the standard error
Common Levels of Confidence:

Commonly used confidence levels are 90%, 95%, and 99%
Confidence Coefficient
Confidence Level
1
Z value
80%
0.80
1.28
90%
0.90
1.645
95%
0.95
1.96
98%
0.98
2.33
99%
0.99
2.58
99.8%
0.998
3.08
99.9%
0.999
3.27
Finding the Critical Value (Z):
Consider a 95% Confidence Interval; Z  1.96
L3-Confidence Intervals
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-3
Intervals and Level of Confidence:
Sampling Distribution of the Mean:
Interval extended
from:
X Z
σ
n
(1-)x100% of intervals
constructed contain μ;
to
X Z
σ
n
()x100% do not.
Example:
A sample of 11 circuits from a large normal population has a mean resistance of 2.20
ohms. We know from past testing that the population standard deviation is 0.35 ohms.
Determine a 95% confidence interval for the true mean resistance of the population.
Given:
X = 2.20; σ = 0.35; Z = 1.96; n= 11; μ = ?
Solution:
XZ
σ
 2.20  1.96 (0.35/ 11)  2.20 0.2068 = 1.9932    2.4068
n
Interpretation:

We are 95% confident that the true mean resistance is between 1.9932 and
2.4068 ohms. Although the true mean may or may not be in this interval, 95% of
intervals formed in this manner will contain the true mean.
L3-Confidence Intervals
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-4
CONFIDENCE INTERVAL FOR μ (σ Unknown).

If the population standard deviation σ is unknown, we can substitute the sample
standard deviation S.

This introduces extra uncertainty, since S is variable from sample to sample.

So we use the t distribution instead of the normal distribution.
Assumptions:

Population standard deviation is unknown

Population is normally distributed

If population is not normal, use large sample

Use Student’s t Distribution
Confidence Interval Estimate:
X  t n -1
S
n
Where:

X is the point estimate

t is the critical value of the t distribution with n -1 degrees of freedom and an area
of α/2 in each tail.

S
is the standard error
n
Student’s t Distribution:

The t is a family of distributions

The t value depends on degrees of freedom (df)

Degrees of freedom is a number of observations that are free to vary after
sample mean has been calculated.
d.f. = n - 1
Degrees of Freedom (df):
Idea: Number of observations that are free to vary after sample mean has been
calculated.
Example:

Suppose the mean of 3 numbers is 8.0
L3-Confidence Intervals
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-5

Let X1 = 7; Let X2 = 8; What is X3?
Solution:

If the mean of these three values is 8.0

Then X3 must be 9; i.e. X3 is not free to vary.

Here, n = 3, so Degrees of Freedom: n – 1 = 3 – 1 = 2

2 values can be any numbers but the third is not free to vary for a given mean.
NOTE: t → Z as n increases.
Fig: Student’s t Distribution.
Example:

Let: n = 3; df = n - 1 = 2;  = 0.10; /2 = 0.05
Upper Tail:
df
.25
.10
.05
1
1.000
3.078
6.314
2
0.817
1.886
2.920
3
0.765
1.638
2.353
NOTE: The body of the table contains t values, not probabilities.
t Distribution values:
With comparison to the Z value.
Confidence
t
t
t
Level
(10 d.f.)
(20 d.f.)
(30 d.f.)
0.80
1.372
1.325
1.310
1.28
0.90
1.812
1.725
1.697
1.645
0.95
2.228
2.086
2.042
1.96
0.99
3.169
2.845
2.750
2.58
NOTE: t → Z as n increases.
L3-Confidence Intervals
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-6
Z
Example:
A random sample of n = 25 has X = 50 and S = 8. Form a 95% confidence interval for μ
Given:

n = 25; X = 50; S = 8; df = n - 1 = 25-1 = 24

95% confidence interval means: (1 - ) = 0.95,  = 1-0.95 = 0.05; /2 = 0.005

t/2 , n 1  t 0.025,24  2.0639
Solution:
S
8
 50  (2.0639)
n
25

X  t/2, n -1

46.698 ≤ μ ≤ 53.302
CONFIDENCE INTERVALS FOR THE POPULATION PROPORTION π.
An interval estimate for the population proportion ( π ) can be calculated by adding an
allowance for uncertainty to the sample proportion ( p ).
Assumptions:

Two categorical outcomes

Population follows binomial distribution

Normal approximation can be used if n·p > 5 and n·(1 - p) > 5
The distribution of the sample proportion is approximately normal if the sample size is
large, with standard deviation.
σp 
 (1   )
n
We will estimate this with sample data:
p(1 p)
n
L3-Confidence Intervals
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-7
Confidence Interval Endpoints:
Upper and lower confidence limits for the population proportion are calculated with the
formula:
pZ
p(1  p)
n
Where by:

Z is the standard normal value for the level of confidence desired

p is the sample proportion

n is the sample size
Example:
A random sample of 100 people shows that 25 are left-handed. Form a 95% confidence
interval for the true proportion of left-handers.
Given: n = 100; p = 25/100 = 0.25; Z = 1.96
Solution:

p  Z p(1  p)/n  0.25 1.96 0.25(0.75)/100  0.25  1.96 (0.0433)

0.1651    0.3349
Interpretation:

We are 95% confident that the true percentage of left-handers in the population is
between 16.51% and 33.49%.

Although the interval from 0.1651 to 0.3349 may or may not contain the true
proportion, 95% of intervals formed from samples of size 100 in this manner will
contain the true proportion.
CONFIDENCE INTERVALS FOR VARIANCES AND STANDARD DEVIATIONS.

Use chi-square distribution Table.
L3-Confidence Intervals
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-8

Confidence Intervals for Variances:
n  1s 2   2  n  1s 2
2
 right

2
 left
Standard Deviations:
n  1s 2
2
 right
 
n  1s 2
2
 left
Example:
Find the 95% confidence interval for the variance and standard deviation of the nicotine
content of cigarettes manufactured if a sample of 20 cigarettes has a standard deviation
of 1.6 milligrams.
Given:
95% confidence interval: α = 0.05, α/2 = 0.025; n = 20; S = 1.6
Find critical values for 0.025 and (1-0.025)= 0.975 with 19 degrees of freedom (d.f.)
So: 0.025 → 32.852 and 0.975 → 8.907 from chi-square distribution Table.
L3-Confidence Intervals
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-9
Solution:

n  1s 2   2  n  1s 2 20  11.6
=
2
 right
2
 left
32.852
2
2 
20  11.62 = 1.5   2  5.5
8.907
One-Sided Confidence Bounds:

Substitute Zα/2 or tα/2 with Zα or tα
Confidence Interval for a Difference in Mean: General Distribution:

X 1  X 2  Z
12
2
n1

 22
2
;
n2
Z 2 2 (σ1  σ 2 )
n1  n2 
e2
2
Where by:

Both samples are taken at random from the respective populations of interest.

Samples are taken independently of each other.

Both the sample sizes are large enough to get an proximate normal distribution
for the difference in sample means.
Example:
A farm-equipment manufacturer wants to compare the average daily downtime for two
sheet-metal stamping machines located in factories at two different locations.
Investigation of company records for 100 randomly selected days on each of the two
machines gave the following results:
Sample size
Mean
Variance
100
12
6
100
9
4
Construct a 90% confidence interval estimate for the difference in mean daily downtimes
for sheet-metal stamping machines located at two locations.
Given:

X 1 = 12; X 2 = 9;  12 = 6;  22 = 4; n1 = n2 = 100

90% confidence interval: α = 0.10, α/2 = 0.05, So: Zα/2 = Z0.05 = 1.645

X 1  X 2  Z
12
2
n1

 22
n2
= 12  9  Z0.05
6
4

= 3 0.52
100 100
2.48 and 3.52
L3-Confidence Intervals
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-10
Interpretation:

We are about 90% confident that the difference in mean daily downtimes for 100
sheet-metal stamping machines at two locations is between 2.48 and 3.52 min.
DETERMINING SAMPLE SIZE.

For the Mean

For the Proportion
Sampling Error:

The required sample size can be found to reach a desired margin of error (e) with
a specified level of confidence (1 - ).
The margin of error is also called sampling error

The amount of imprecision in the estimate of the population parameter

The amount added and subtracted to the point estimate to form the confidence
interval.
Determining Sample Size for the MEAN:

X Z
σ
σ
Z 2 σ2
where by: e  Z
(Sampling / Margin Error); Therefore, n 
e2
n
n
NOTE: To determine the required sample size for the mean, one must know:

The desired level of confidence (1 - ), which determines the critical Z value

The acceptable sampling error, e

The standard deviation, σ
Example:
If  = 45, what sample size is needed to estimate the mean within ± 5 with 90%
confidence?
Given:  = 45; e = 5

90% confidence interval: α = 0.10, α/2 = 0.05, So: Zα/2 = Z0.05 = 1.645
Solution:

n
Z 2 σ 2 (1.645)2 (45)2

 219.19 , So the required sample size is n = 220
e2
52
(Always round up).
L3-Confidence Intervals
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-11
If σ is unknown:
The unknown σ can be estimated using the required sample size formula.

Use a value for σ that is expected to be at least as large as the true, σ

Select a pilot sample and estimate σ with the sample standard deviation, S
Determining Sample Size for the PROPORTION:

Z 2 π (1  π )
π (1  π )
n

eZ
,
e2
n
NOTE: To determine the required sample size for the proportion, one must know:

The desired level of confidence (1 - ), which determines the critical Z value

The acceptable sampling error, e

The true proportion of “successes”, π

π can be estimated with a pilot sample, if necessary (or conservatively use π =
0.5)
Example:
How large a sample would be necessary to estimate the true proportion defective in a
large population within ±3%, with 95% confidence? (Assume a pilot sample yields p =
0.12).
Given:

For 95% confidence, Z = 1.96

e = 0.03

p = 0.12, so use this to estimate π
Solution:

Z2 π (1  π ) (1.96)2 (0.12)(1 0.12)
n

 450.74 ,
e2
(0.03)2

So use n = 451 (Always round up).
Ethical issues:

A confidence interval estimate (reflecting sampling error) should always be
included when reporting a point estimate;

The level of confidence should always be reported;

The sample size should be reported;

An interpretation of the confidence interval estimate should also be provided.
L3-Confidence Intervals
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-12
Exercise:
A laboratory scale is known to have a standard deviation of σ = 0.001 gram in repeated
weighing. Scale readings in repeated weighing are Normally distributed, with mean equal
to the true weight of the specimen. Three weighing of a specimen gave 3.412, 3.414,
3.415. Given 95% confidence interval for the true weight of the specimen.

What are the estimate and the margin of error in this interval?

How many weighing must be averaged to get the margin of error of 0.0005
Given:

σ = 0.001; n = 3; X = (3.412+3.414+3.415)/3 = 3.41

For 95% confidence, Z = 1.96
Solution:
Z 2 σ2
σ
σ
Recall: X  Z
where by: e  Z
(Sampling / Margin Error); Therefore, n 
e2
n
n
Part I: What are the estimate and the margin of error in this interval?


X Z
σ
0.001
= 3.41 1.96
= 3.41+(1.13x10-3) = 3.40 and 3.42 (Estimates).
n
3
Again: e  Z
σ
0.001
= 1.96
=1.13x10-3 = 0.0013 (Margin of error).
n
3
Part II: How many weighing must be averaged to get the margin of error of 0.0005

σ = 0.001; Z = 1.96; e = 0.0005; n = ?

Z 2 σ 2 (1.96) 2 (0.001)2
n

=15.36, So the required sample size is n = 16
e2
(0.0005) 2
For more examples, kindly visit:

http://www.ce.memphis.edu/3103/pdfs/Confidence%20Intervals_full.pdf

https://www.che.utah.edu/~tony/OTM/CI-CL/
L3-Confidence Intervals
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-13
FUNDAMENTALS OF HYPOTHESIS TESTING: ONE-SAMPLE TESTS.
Objectives:

Structure engineering decision-making problems as hypothesis tests;

Test hypotheses on the mean of a normal distribution using either a Z-test or a ttest procedure;

Test hypotheses on the variance or standard deviation of a normal distribution

Test hypotheses on a population proportion;

Use the P-value approach for making decisions in hypotheses tests;

Compute power, type II error probability, and make sample size selection
decisions for tests on means, variances, and proportions;

Explain and use the relationship between confidence intervals and hypothesis
tests.
What is a Hypothesis?
A statistical hypothesis is a claim (assumption) about a population parameter. It is a
statement about the nature of a population. It is often stated in terms of a population
parameter i.e. Population mean and Population proportion.

Population mean example: Burning rate of a solid propellant used to power
aircrew escape systems is μ = 50 cm/sec

Population proportion example: The proportion of adults in this city with cell
phones is π = 0.68
The Null Hypothesis, H0:

States the claim or assertion to be tested;

Is always about a population parameter eg. H0: μ = 50 and not about a sample
statistic.
Determined by:

Past experience or knowledge of the process or previous tests or experiments:
changes

Theory or model regarding the process: verification

External
considerations
such
as
design
or
engineering
conformation

Begin with the assumption that the null hypothesis is true

Similar to the notion of innocent until proven guilty

Refers to the status quo

Always contains “=” , “≤” or “” sign
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-1
specifications:

May or may not be rejected
The Alternative Hypothesis, H1:

Is the opposite of the null hypothesis e.g. burning rate of a solid propellant
used to power aircrew escape systems is not equal to 50 ( H1: μ ≠ 50 )

Challenges the status quo

Never contains the “=” , “≤” or “” sign

May or may not be proven

Is generally the hypothesis that the researcher is trying to prove
Hypothesis Testing Process:
1:
2:
Claim: The population mean age is 50.
(Null Hypothesis: H0: μ = 50 )
Population
4: Suppose sample mean age is 20:
3:
X =20. Is X =20 likely if μ = 50 ?

If not likely, REJECT

Null Hypothesis.
Now select a random sample
Reason for Rejecting H0:
Level of Significance, :
Defines the unlikely values of the sample statistic if the null hypothesis is true.

Defines rejection region of the sampling distribution.
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-2

Is designated by  , (level of significance);

Typical values are 0.01, 0.05, or 0.10;

Is selected by the researcher at the beginning;

Provides the critical value(s) of the test.
Level of Significance and the Rejection Region:
Errors in Making Decisions:
Type I Error

Reject a true null hypothesis H0 is considered a serious type of error.

The probability of Type I Error is 

Called level of significance of the test

Set by the researcher in advance
Type II Error

Fail to reject a false null hypothesis H0

The probability of Type II Error is β
Outcomes and Probabilities: Possible Hypothesis Test Outcomes:
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-3
KEY:

BLUE: Outcome

RED: Probability
Type I & II Error Relationship:
Type I and Type II errors cannot happen at the same time:

Type I error can only occur if H0 is true

Type II error can only occur if H0 is false

If Type I error probability (  ) ↑ , then Type II error probability ( β ) ↓
Factors Affecting Type II Error:
All else equal: β ↑ when the difference between hypothesized parameter and its true
value ↓

β ↑ when  ↓

β ↑ when σ ↑

β ↑ when n ↓
Hypothesis Tests for the Mean:
Z Test of Hypothesis for the Mean (σ Known):
Convert sample statistic ( X ) to a Z test statistic:
Hypothesis Testing Approaches:
There are three basic approaches to conducting a hypothesis test:
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-4
1. Using a predetermined level of significance, establish critical value(s), then see
whether the calculated test statistic falls into a rejection region for the test.
(critical value).
2. Determine the exact level of significance associated with the calculated value of
the test statistic. In this case, we’re identifying the most extreme critical value that
the test statistic would be capable of exceeding. (p value).
3. Confidence Intervals.
Critical Value Approach to Testing:
For a two-tail test for the mean, σ known:

Convert sample statistic ( X ) to test statistic (Z statistic );

Determine the critical Z values for a specified level of significance  from a table
or computer;

Decision Rule: If the test statistic falls in the rejection region, Reject H0 ;
otherwise do not Reject H0
Two-Tail Tests:
There are two cut off values (critical values), defining the regions of rejection:
6 Steps in Hypothesis Testing:
1. State the null hypothesis, H0 and the alternative hypothesis, H1;
2. Choose the level of significance,  and the sample size, n;
3. Determine the appropriate test statistic and sampling distribution;
4. Determine the critical values that divide the rejection and non rejection regions;
5. Collect data and compute the value of the test statistic;
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-5
6. Make the statistical decision and state the managerial conclusion. If the test
statistic falls into the non rejection region, do not reject the null hypothesis H0. If
the test statistic falls into the rejection region, reject the null hypothesis H0.
Express the managerial conclusion in the context of the problem.
Example:
Suppose that we are interested in the mean burning rate of a solid propellant used to
power aircrew escape systems. we are interested in deciding whether or not the mean
burning rate is 50 centimeters per second. Suppose that a sample of n  10 specimens
is tested and that the sample mean burning rate of 48.5 is observed .Previous
experience show that the standard deviation of burning rate is 2.5
Solution:
1. State the appropriate null and alternative hypotheses;
H0: μ = 50; H1: μ ≠ 50 (This is a two-tail test)
2. Specify the desired level of significance and the sample size;
Suppose that  = 0.05
3. Determine the appropriate technique;
σ is known so this is a Z test.
4. Determine the critical values;
For  = 0.05 the critical Z values are ±1.96
5. Collect the data and compute the test statistic;
Suppose the sample results are:

n = 10, X = 48.5, σ = 2.5 (is assumed known), μ = 50

Z
Xμ
48.5  50  1.5


 1.9
σ
2.5
0.79
n
10
6. Is the test statistic in the rejection region?

Condition: Reject H0 if Z < -1.96 or Z > 1.96; otherwise do not reject H0
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-6

Reach a decision and interpret the result:
Since Z = -1.9 > -1.96, we do not reject the null hypothesis and conclude that there
is sufficient evidence that the mean burning rate is 50 centimeters per second.
p-Value Approach to Testing:
p-value: Probability of obtaining a test statistic more extreme ( ≤ or  ) than the observed
sample value given H0 is true. Also called observed level of significance, smallest value
of  for which H0 can be rejected.

Convert sample statistic ( X ) to test statistic (Z statistic );

Obtain the p-value from a table or computer;

Compare the p-value with .

Decision rule: If p-value <  , Reject H0 but if p-value   , do not Reject H0
Example:
How likely is it to see a sample mean of 48.5 (or something further from the mean, in
either direction) if the true mean is  = 50?
Solution:

Convert sample statistic ( X ) to test statistic (Z statistic )
Recall: Z 
Xμ
48.5  50  1.5


 1.9
σ
2.5
0.79
n
10
Therefore: X = 48.5 is translated to a Z score of Z = -1.9

Obtain the p-value from a table or computer
P(Z  1.9)  0.0287;P(Z  1.9)  0.0287, p-value = 0.0287 + 0.0287 = 0.0574

Compare the p-value with .
If p-value <  , Reject H0 but if p-value   , do not Reject H0
NOTE: /2 = 0.0287.
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-7
Connection to Confidence
Intervals:

For X = 48.5, σ = 2.5 and n = 10, the 95% confidence interval is:

48.5 - (1.96)

46.9505 ≤ μ ≤ 50.0495
2.5
25
to 48.5  (1.96)
10
10
Since this interval contain the hypothesized mean "μ = 50", we do not reject the null
hypothesis at  = 0.05.
One-Tail Tests:

In many cases, the alternative hypothesis focuses on a particular direction.

There is only one critical value, since the rejection area is in only one tail.
Upper-Tail Tests:

There is only one critical value, since the rejection area is in only one tail.
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-8
Example:
Upper-Tail Z Test for Mean ( Known):
A phone industry manager thinks that customer monthly cell phone bills have increased,
and now average over $52 per month. The company wishes to test this claim. (Assume
 = 10 is known).
Form hypothesis test:

H0: μ ≤ 52 the average is not over $52 per month;

H1: μ > 52 the average is greater than $52 per month (i.e. sufficient evidence
exists to support the manager’s claim).
Solution:
Suppose that  = 0.20 is chosen for this test.
Find the rejection region:
Review: One-Tail Critical Value:
Test Statistic (Z):
Obtain sample and compute the test statistic.

Suppose a sample is taken with the following results: n = 64, X = 53.1, μ = 52
and =10 (assumed known).

Then the test statistic is: Z 
Xμ
53.1  52

 0.88
σ
10
n
64
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-9
Decision: Reach a decision and interpret the result:
Do not reject H0 since Z = 0.88 ≤ 1.28 i.e. There is no sufficient evidence that the mean
bill is over $52.
p -Value Solution:
Calculate the p-value and compare to  (assuming that μ = 52.0).

53.1 52.0 

P(X  53.1); P Z 
  P(Z  0.88)  1  0.8106  0.1894
10/ 64 

Do not reject H0 since p-value = 0.1894 >  = 0.20
t Test of Hypothesis for the Mean (σ Unknown):
Convert sample statistic ( X ) to a t test statistic.
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-10
Example: Two-Tail Test ( Unknown):
The average cost of a hotel room in New York is said to be $168 per night. A random
sample of 25 hotels resulted in X = $172.50 and S = $15.40. Test at the  = 0.05 level.
Assume the population distribution is normal.
Form hypothesis test:

H0: μ = 168 (Null Hypothesis)

H1: μ ≠ 168 (Alternative Hypothesis)

Given: n = 25,  = 0.05,S = 15.40,  is unknown (use a t statistic).
Solution:
Recall: t statistic

t n 1 
X μ
172.50 168

 1.46
S
15.40
n
25
Critical Value: t24 = ± 2.0639
Do not reject H0: not sufficient evidence that true mean cost is different than $168.
Connection to Confidence
Intervals:

For X = 172.5, S = 15.40 and n = 25, the 95% confidence interval is:

172.5 - (2.0639) 15.4/25 to 172.5 + (2.0639) 15.4/25

166.14 ≤ μ ≤ 178.86
Since this interval contains the Hypothesized mean (168), we do not reject the null
hypothesis at  = 0.05
Hypothesis Tests for Proportions:

Involves categorical variables
Two possible outcomes:
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-11

“Success” (possesses a certain characteristic)

“Failure” (does not possesses that characteristic)
Fraction or proportion of the population in the “success” category is denoted by π.
Proportions:
Sample proportion in the success category is denoted by p.

p
X number of successesin sample

n
sample size
When both nπ and n(1-π) are at least 5, p can be approximated by a normal distribution
with mean and standard deviation.

μp  

σp 
 (1   )
n
Hypothesis Tests for Proportions:
Example: Z Test for Proportion:
A marketing company claims that it receives 8% responses from its mailing. To test this
claim, a random sample of 500 were surveyed with 25 responses. Test at the  = 0.05
significance level.
Check:

n π = (500)(.08) = 40

n(1-π) = (500)(.92) = 460
Solution:

H0: π = 0.08

H1: π ≠ 0.08
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-12

 = 0.05, n = 500, p = 0.05

Critical Values: ± 1.96

Test Statistic: Z 

Decision: Reject H0 at  = 0.05

Conclusion:
p 
.05  .08

 2.47
 (1   )
.08(1 .08)
n
500
There is sufficient evidence to reject the company’s claim of 8% response rate.
p-Value Solution:
Calculate the p-value and compare to  (For a two-tail test the p-value is always two-tail)
P(Z  2.47)  P(Z  2.47)  2(0.0068) 0.0136
Therefore, p-value = 0.0136
Conclusion: Reject H0 since p-value = 0.0136 <  = 0.05
Hypothesis Tests for Variance:

Use chi-square distribution.

Variances:
n  1s 2   2
2

Example:
Suppose a regulatory agencies specify that the standard deviation of the amount of fill in
16-ounce cans should be less than 0.1 ounce. To determine whether the process is
meeting this specification, the supervisor randomly selects 10 cans and weighs the
contents of each. The descriptive analysis showed that the cans has a mean of 16.026
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-13
and standard deviation of 0.0412.Is there sufficient evidence to conclude that the true
standard deviation s of the fill measurements of 16-ounce cans is less than 0.1 ounce?
Solution:

Testing the hypothesis: H0: σ2 ≥ 0.01 H1: σ2  0.01

The calculated value will be:

From the Chi table with df of 9 the critical value is 3.325
n  1s 2
2
 1.53
Conclusion: Since the value of the test statistic within the rejection region the we reject
the null hypothesis, and the supervisor can conclude that the variance of the population
of all amounts of fill is less than .01
Calculating Type II Error Probabilities β:

Type I error, rejecting a true hypothesis

α=Probability of rejecting H0 when H0 is true

α = P(reject H0 |H0 true)

α = The level of significance of a test

Type II error, failing to reject a false hypothesis

β = Probability of failing to reject H

β= P(fail to reject H0 |H0 false)

1-β = Probability of rejecting H0 when H0 is false

1-β = The power of test (is the probability that the test will respond correctly by
OR
OR
rejecting a false null hypothesis
Calculating Type II Error Probabilities β:
To calculate P(Type II), or β …
1. Calculate the value (s) of X that divide the “do not reject” region from the “reject”
region(s).
Upper-tailed test:
 s 
x0  0  z x  0  z 

n


L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-14
Lower-tailed test:
Two-tailed test:
 s 
x0  0  z x  0  z 

 n
 s 
x0 L  0  z / 2 x  0  z / 2 

 n
 s 
x0U  0  z / 2 x  0  z / 2 

 n
2. Calculate the z-value of X 0 assuming the alternative hypothesis mean is the true
mean µ:

The probability of getting this z-value is β.
Example 1:
Oxford Cereals Company specifications require a mean weight of 368 grams per box
,the filling process is subject to periodic inspection from a representative of the consumer
affairs office. The representative’s job is to detect the possible “short weighting” of boxes,
which means that cereal boxes having less than the specified 368 grams are sold. Thus,
the representative is interested in determining whether there is evidence that the cereal
boxes have a mean weight that is less than 368. Suppose that the sample of 25 cereal
boxes are selected at random , and the population standard deviation is 15 grams. find
the probability of making type two error and power of the test if the actual population
mean is 360 grams.
Solution:
H0: µ ≥ 368 (filling process is working properly)
Ha: µ < 368 (filling process is not working properly)

 s 
 15 
x0  0  z 
  368  (1.645)
  363.065
 n
 25 

Z 

Power of the test = 0.8461

β =1-P(Z≤1.02) =1 - 0.8461= 0.1539
X μ
363.065  360
;Z 
 1.02
σ
15
n
25
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-15
Type I error conclude that the population mean fill was less than 368 when it actually
greater or equal to 368. This error would result in adjusting the filling process even
though the process was working properly. If you did not reject a false null hypothesis,
you would make a Type II error and conclude that the population means fill was greater
or equal to 368 when it actually was less than 368. Here, you would allow the process to
continue without adjustment even though the process was not working properly.
Example 2:
A textile fiber manufacturer is investigating a new drapery yarn, which the company
claims has a mean thread elongation of 12 kilograms with a standard deviation of 0.5
kilograms. The company wishes to test the hypothesis Ho ≥ 0 against Ha < 0 using a
random sample of 16 specimens. Find β for the case where the true mean elongation is
11.25 kilograms. if the critical region is defined as (x bar) 11.5 kilograms?
Solution:
X μ
11.5  11.25
;Z 
2
σ
0.5
n
16

Z 

β =1-P(Z≤2) =1 - 0.9772= 0.0228
Type II Error:

In many practical problems, a specific value for an alternative will not be known,
and consequently cannot be calculated. Choose an appropriate significance level
α and a test statistic that will make β as small as possible.

Set up our hypothesis so that if the test statistic falls into the rejection region, we
reject H0 ,knowing that the risk of a type I error is fixed.

At α: If we do not reject we state that the evidence is insufficient to reject H0 . We
do not affirmatively accept H0.
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-16
L4a- Hypothesis Testing-One Sample
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-17
TWO-SAMPLE TESTS.
Example:
DIFFERENCE BETWEEN TWO MEANS:

Goal: Test hypothesis or form a confidence interval for the difference between
two population means, μ1 – μ2.
The point estimate for the difference is: X1  X 2 .
Different data sources: Unrelated and Independent.
 Sample selected from one population has no effect
on the sample selected from the other population.

Use the difference between 2 sample means

Use Z test, a pooled-variance t test or a separatevariance t test
L4b- Hypothesis Testing-Two Samples
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-1
σ1 and σ2 Known:
Assumptions:

Samples are randomly and independently drawn

Population distributions are normal or both sample
sizes are  30

Population standard deviations are known
When σ1 and σ2 are known and both populations are normal
or both sample sizes are at least 30, the test statistic is a Zvalue…
2


Therefore, the standard error of X1  X 2 is:
and the test statistic for μ1 – μ2 is:
Z
σ X1  X 2
 X  X   μ
1
2
2
1
2
σ1 σ 2

n1 n 2
Hypothesis Tests for Two Population Means:

2
σ
σ
 1  2
n1 n 2
Two Population Means, Independent Samples.
L4b- Hypothesis Testing-Two Samples
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-2
 μ2 
Hypothesis tests for μ1 – μ2:

Two Population Means, Independent Samples.
Confidence Interval, σ1 and σ2 Known:

The confidence interval for μ1 – μ2 is:


2
2
σ
σ
X1  X 2  Z 1  2
n1 n 2
σ1 and σ2 Unknown, Assumed Equal:
Assumptions:

Samples are randomly and independently drawn

Population distributions are normal or both sample
sizes are at least 30

Population variances are unknown but assumed
equal
Forming interval estimates:

The population variances are assumed equal, so use the two sample variances
and pool them to estimate the common σ2.


The test statistic is a t value with (n1 + n2 – 2) degrees of freedom.
Then, the pooled variance is:
S
2
p
2
2

n1  1S1  n 2  1S2

(n1  1)  (n 2  1)
L4b- Hypothesis Testing-Two Samples
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-3


The test statistic for μ1 – μ2 is:
t
Where: t has (n1 + n2 – 2) d.f. and
X
1

 X 2   μ1  μ 2 
1
1 
S2p   
 n1 n 2 
S
2
p
2
2

n1  1S1  n 2  1S2

(n1  1)  (n 2  1)
Confidence Interval, σ1 and σ2 Unknown:


The confidence interval for μ – μ is:  X  X   t
1
Where:
S
2
p
2
1
2
n1  n 2 - 2
1 1 
S2p   
 n1 n 2 
2
2

n1  1S1  n 2  1S2

(n1  1)  (n 2  1)
σ1 and σ2 Unknown, Not Assumed Equal:
Assumptions:

Samples are randomly and independently drawn

Population distributions are normal or both sample
sizes are at least 30

Population variances are unknown but cannot be
assumed to be equal
Forming the test statistic:

The population variances are not assumed equal, so include the two sample
variances in the computation of the t-test statistic.

The test statistic is a t value with v degrees of freedom i.e. The number of
degrees of freedom is the integer portion of:
L4b- Hypothesis Testing-Two Samples
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-4
2
 S12 S2 2 


n  n 
2 
   12
2
 S12   S2 2 

 

n  n 
 1   2 
n1  1
n 2 1

The test statistic for μ1 – μ2 is:
t
 X  X   μ  μ 
1
2
1
2
S12 S22

n1 n 2
RELATED POPULATIONS:
Tests Means of 2 Related Populations:
 Paired or matched samples
 Repeated measures (before/after)
 Use difference between paired values: Di = X1i - X2i, Di is called the ith paired
difference.
Eliminates Variation Among Subjects

Assumptions:
 Both Populations Are Normally Distributed;
 Or if not Normal, use large samples.
Mean Difference, σD Known:
n

The point estimate for the population mean paired difference is:
D
D
i 1
i
n
Suppose the population standard deviation of the difference scores, σD, is known
whereby n is the number of pairs in the paired sample.
L4b- Hypothesis Testing-Two Samples
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-5

Then, the test statistic for the mean difference is a Z value:
Z
D  μD
σD
n
Whereby:

μD = hypothesized mean difference

σD = population standard deviation of differences

n = the sample size (number of pairs)
Confidence Interval, σD Known:
The confidence interval for μD is:
DZ
σD
n
Whereby:
n = The sample size (number of pairs in the paired sample)
σD = Population standard deviation of differences
Z = Test statistic for the mean difference
Mean Difference, σD Unknown:
If σD is unknown, we can estimate the unknown population standard deviation with a
sample standard deviation, SD.
n

The sample standard deviation is:
SD 
 (D  D)
i 1
2
i
n 1
Use a paired t test, the test statistic for D is now a t statistic, with (n-1) d.f.

D  μD
t
SD
n
n
Where t has (n - 1) d.f. and
SD 
 (D
i 1
Confidence Interval, σD Unknown:

The confidence interval for μD is: D  t n 1
SD
n
L4b- Hypothesis Testing-Two Samples
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-6
i
 D) 2
n 1
Hypothesis Testing for Mean Difference, σD Unknown:

Paired Samples:
Example: Paired t Test.
Suppose we are interested in learning about the effect of a newly developed gasoline
detergent additive on automobile millage. To gather information, seven cars have been
assembled, and their gasoline mileages (in units of miles per gallon) have been
determined. For each car this determination is made both when gasoline without the
additive is used and when gasoline with the additive is used. The data can be
represented as follows:
Mileage:
Car:
Without Additive:
With Additive:
Di
1
24.2
23.5
0.7
2
30.4
29.6
0.8
3
32.7
32.3
0.4
4
19.8
17.6
2.2
5
25
25.3
-0.3
6
24.9
25.4
-0.5
7
22.2
20.6
1.6
Solution:
n

D
 Di
i 1
n
n
= 0.7 ;
SD 
 (D
i 1
i
 D) 2
n 1
= 0.966
Test, at the 5 percent level of significance, the null hypothesis that the additive does not
change the mean number of miles obtained per gallon of gasoline ( = .05; /2 = .025;
D = .7; SD = .966 and d.f. = n-1 = 6).
L4b- Hypothesis Testing-Two Samples
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-7

Therefore, the critical value = + 2.447
Form Hypothesis Test:

H0: μD = 0

H1: μD  0

Test Statistic:

Decision: Do not reject H0 (t statistics is not in the reject region).

Conclusion: There is not a significant change in the millage.
t
D  μD
0.7  0

 1.917
SD / n 0.966/ 7
TWO POPULATION PROPORTIONS:
Goal: test a hypothesis or form a confidence interval for the difference between two
population proportions, π1 – π2
Assumptions:

n1 π1  5 , n1(1- π1)  5

n2 π2  5 , n2(1- π2)  5

The point estimate for the difference is:
p1  p 2
Since we begin by assuming the null hypothesis is true, we assume π1 = π2 and pool the
two sample estimates.
p
X1  X 2
n1  n 2

The pooled estimate for the overall proportion is:

Whereby: X1 and X2 are the numbers from sample 1 and 2 with the characteristic
of interest.

The test statistic for p1 – p2 is a Z statistic:
Z
 p1  p 2    π1  π2 
1 1 
p (1  p)   
 n1 n 2 
L4b- Hypothesis Testing-Two Samples
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-8

Whereby:
p
X1  X 2
X
X
, p1  1 , p 2  2
n1  n 2
n1
n2
Confidence Interval for Two Population Proportions:
The confidence interval for π1 – π2 is:
 p1  p 2   Z
p1 (1  p1 ) p 2 (1  p 2 )

n1
n2
Hypothesis Tests for Two Population Proportions:

Population proportions:
Example: Two population Proportions:
Is there a significant difference between the proportion of men and the proportion of
women who will vote Yes on Proposition A?

In a random sample, 36 of 72 men and 31 of 50 women indicated they would vote
Yes. Test at the .05 level of significance.
Solution:
The hypothesis test is:

H0: π1 – π2 = 0 (The two proportions are equal)

H1: π1 – π2 ≠ 0 (There is a significant difference between proportions)
L4b- Hypothesis Testing-Two Samples
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-9
The sample proportions are:

Men: p1 = 36/72 = .50

Women: p2 = 31/50 = .62
The pooled estimate for the overall proportion is:
X1  X 2 36  31 67


 .549
n1  n 2 72  50 122

p

For  = .05, Critical Values = ±1.96
The test statistic for π1 – π2 is:
 p1  p 2    1   2 
 .50  .62   0

z

Decision: Do not reject H0

Conclusion: There is not significant evidence of a difference in proportions who
1 1 
p (1  p)   
 n1 n 2 

1 
 1
.549 (1  .549)  
 72 50 
  1.31
will vote yes between men and women.
HYPOTHESIS TESTS FOR VARIANCES:
The F test statistic is:
S12 = Variance of Sample 1
S12
F 2
S2
S 22 = Variance of Sample 2
n1 - 1 = numerator degrees of freedom
n2 - 1 = denominator degrees of freedom
L4b- Hypothesis Testing-Two Samples
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-10
The F Distribution:

The F critical value is found from the F table

There are two appropriate degrees of freedom: numerator and denominator

S12
F 2
S2
where df1 = n1 – 1 ; df2 = n2 – 1
In the F table:

Numerator degrees of freedom determine the column

Denominator degrees of freedom determine the row
Finding the Rejection Region:
To find the critical F values:

Find FU from the F table for (n1 – 1) numerator and (n2 – 1) denominator degrees
of freedom.
Find FL using the formula:

FL 
1
FU *
Where FU* is from the F table with n2 – 1 numerator and n1 – 1 denominator
degrees of freedom (i.e. switch the d.f. from FU)
Example: F Test:
You are a financial analyst for a brokerage firm. You want to compare dividend yields
between stocks listed on the NYSE & NASDAQ. You collect the following data:
L4b- Hypothesis Testing-Two Samples
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-11
NYSE
NASDAQ
21
25
Mean
3.27
2.53
Std dev
1.30
1.16
Number
Is there a difference in the variances between the NYSE & NASDAQ at the  = 0.05
level?
Solution:
Form the hypothesis test:

H0: σ21 – σ22 = 0
(There is no difference between variances)

H1: σ21 – σ22 ≠ 0
(There is a difference between variances)
Find the F critical values for  = 0.05:
S12 1.302
 1.256
 The test statistic is: F  2 
S2 1.162

Decision: F = 1.256 is not in the rejection region, so we do not reject H0

Conclusion: There is not sufficient evidence of a difference in variances
at  = .05
L4b- Hypothesis Testing-Two Samples
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-12
ANALYSIS OF VARIANCE.
Terminologies:

Block: Group of homogeneous experimental units.

Design (layout): Complete specifications of experimental test runs, including
blocking, randomization, repeat tests, replication, and the assignment of factorlevel combinations to experimental units.

Effect: Change in the average response between two factor-level combinations
or between two experimental conditions.

Factor: A controllable experimental variable that is thought to influence the
response.

Level: Specific value of a factor.

Repeat tests: Two or more observations that have the same levels for all the
factors.

Replication: Repetition of an entire experiment or a portion of an experiment
under two or more sets of conditions.

Response: Outcome or result of an experiment or observation.
Example:
For example, suppose that we are interested in comparing the yields per plot of deferent
varieties of corn. Then, the yield per plot is the response, the variety of corn is the factor
and deferent varieties of corn are the levels of this factor, plots are the experimental
units.
General ANOVA Setting:
Investigator controls one or more independent variables:

Called factors (or treatment variables): characteristics which differentiates
treatments/populations from one another; variables whose effect is of interest to
researcher;

Each factor contains two or more levels (or groups or categories/ classifications);
values of the factors utilized in the experiment.
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-1
Observe effects on the dependent variable:

Response to levels of independent variable.
Experimental design:

The plan used to collect the data.
Completely Randomized Design:
1. Experimental units (subjects) are assigned randomly to treatments:

Subjects are assumed homogeneous
2. Only one factor or independent variable:

With two or more treatment levels
3. Analyzed by one-way analysis of variance (ANOVA).
One-Way Analysis of Variance:
Examples:

Effects of five (levels) different brands (factors) of gasoline on automobile engine
operating efficiency.

Effects of the presence of four (levels) different sugar solutions (factors) on
bacterial growth.
Assumptions:

Populations are normally distributed

Populations have equal variances

Samples are randomly and independently drawn
Hypotheses of One-Way ANOVA:
H 0 : μ1  μ 2  μ 3    μ c

All population means are equal;

i.e. no treatment effect (no variation in means among groups).
H1: Not all of the population means are the same.

At least one population mean is different i.e. there is a treatment effect;

Does not mean that all population means are different (some pairs may be the
same).
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-2
One-Factor ANOVA:
H 0 : μ1  μ 2  μ 3    μ c
H1 : Not all μ j are the same
All Means are the same: The Null Hypothesis is True
(No Treatment Effect).
H 0 : μ1  μ 2  μ 3    μ c
H1 : Not all μ j are the same
At least one mean is different: The Null Hypothesis is
NOT true (Treatment Effect is present).
Partitioning the Variation:
Total variation can be split into two parts:

SST = SSA + SSW
Whereby:

SST = Total Sum of Squares (Total variation)

SSA = Sum of Squares Among Groups (Among-group variation)

SSW = Sum of Squares Within Groups (Within-group variation)
Total Variation = The aggregate dispersion of the individual data values across the
various factor levels (SST).
Among-Group Variation = Dispersion between the factor sample means (SSA).
Within-Group Variation = Dispersion that exists among the data values within a particular
factor level (SSW).
Partition of Total Variation:
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-3
Total Sum of Squares:

SST = SSA + SSW
c

nj
SST   ( X ij  X ) 2
j 1 i 1
Whereby:

SST = Total sum of squares

c = number of groups (levels)

nj = number of observations in group j

Xij = ith observation from group j

X = grand mean (mean of all data values)
Total Variation:

SST  ( X 11  X ) 2  ( X 12  X ) 2  ...  ( X cnc  X ) 2
Among-Group Variation:
Whereby:

SSA = Sum of squares among groups

c = number of groups

nj = sample size from group j

Xj = sample mean from group j

X = grand mean (mean of all data values)

SSA  n1 ( x1  x )2  n2 ( x2  x )2  ...  nc ( xc  x )2
c
SSA   n j ( X j  X ) 2
j 1
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-4
Mean Square Among:

Mean Square Among = SSA/Degrees of freedom.

MSA 
SSA
c 1
Within-Group Variation:
Whereby:

SSW = Sum of squares within groups

c = number of groups

nj = sample size from group j

Xj = sample mean from group j

Xij = ith observation in group j

SSW  ( x11  X 1 ) 2  ( X 12  X 2 ) 2  ...  ( X cnc  X c ) 2
c
SSW  
j 1
Mean Square Within:

Mean Square Within = SSW/Degrees of freedom.

MSW 
SSW
nc
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-5
nj

i 1
( X ij  X j ) 2
Obtaining the Mean Squares:

MSA 
SSA
SSW
; MSW 
c 1
nc
;
MST 
SST
n 1
One-Way ANOVA Table:
Whereby:
c = number of groups
n = sum of the sample sizes from all groups
df = degrees of freedom
ONE-WAY ANOVA F test STATISTIC.

H0: μ1= μ2 = … = μc

H1: At least two population means are different

Test statistic:
F
MSA
MSW
Whereby:
MSA is mean squares among groups
MSW is mean squares within groups

Degrees of freedom:
df1 = c – 1 (c = number of groups)
df2 = n – c (n = sum of sample sizes from all populations)
Interpreting One-Way ANOVA F Statistic:
The F statistic is the ratio of the among estimate of variance and the within estimate of
variance.

The ratio must always be positive

df1 = c -1 will typically be small

df2 = n - c will typically be large
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-6
Decision Rule:

Reject H0 if F > FU, otherwise do
not reject H0
Example: One-Way ANOVA F Test:
An experiment was conducted to compare the wearing qualities of three types of paint
when subjected to the abrasive action of a slowly rotating cloth-surfaced wheel. Ten
paint specimens were tested for each paint type, and the number of hours until visible
abrasion was apparent was recorded for each specimen. At the 0.05 significance level, is
there sufficient evidence to indicate a difference in the mean time until abrasion is visibly
evident for the three paint types?
Given: n1 = 10; n2 = 10; n3 = 10; n = 30; c = 3;  = 0.05
Solution:
Paint 1: 2296/10 = 229.6; Therefore, x1  229.6
Paint 2: 3099/10 = 309.9; Therefore, x 2  309.9
Paint 3: 4270/10 = 427.8; Therefore, x 3  427.8
Grand mean: (229.6+309.9+427.8)/3 = 322.4
x  322.4
1. Obtain Variation due to Factor (SSA):
Where: d.f. = c - 1.
c

SSA   n j ( X j  X ) 2

SSA  n1 ( x1  x )2  n2 ( x2  x )2  ...  nc ( xc  x )2
j 1
SSA = 10 (229.6 – 322.4)2 + 10 (309.9 – 322.4)2 + 10 (427.8 – 322.4)2 = 198772.5
2. Obtain Mean Squares Among (MSA):

Mean Square Among = SSA/Degrees of freedom.
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-7

MSA 
SSA
= 198772.5 / (3 - 1) = 99386.2
c 1
3. Obtain Variation due to Random Sampling (SSW):
Where: d.f. = n - c
c
nj

SSW  

SSW  ( x11  X 1 ) 2  ( X 12  X 2 ) 2  ...  ( X cnc  X c ) 2
j 1

i 1
( X ij  X j ) 2
SSW = (148 – 229.6)2 + (76 – 229.6)2 +…+ (465 – 427.8)2 = 770670.9
4. Obtain Mean Squares Within (MSW):

Mean Square Within = SSW/Degrees of freedom.

MSW 
SSW
nc
= 770670.9 / (30 - 3) = 28543.4
5. Obtain F ratio or Test Statistic:

F
MSA
99386.2
 3.48
= F
MSW
28543.4
Form Hypothesis:

H0: μ1= μ2 = … = μc

H1: μj not all equal
Given:  = 0.05; df1= 2; df2 = 27
Fu = 3.35 from T-16 (F-Critical Values).
Decision:

Reject H0 at  = 0.05
Conclusion:

There is evidence that, at least one
μj differs from the rest.
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-8
ANOVA Table: Single Factor:
ANOVA
Source of Variation:
SS:
df:
MS:
Between Groups:
198772.5
2
99386.23
Within Groups:
770670.9
27
28543.4
969443.4
29
Total:
F:
3.48
F crit.:
3.35
Alternative method- one way ANOVA:

SST =
c

( X 11  X ) 2  ( X 12  X ) 2  ..  ( X cnc  X ) 2 or
nj
1 2
2
X

T

ij
n
i 1 j 1
Whereby:

SST = Total Sum of Square

1 2
T = Correction Factor
n

T 2 = The square of the grand total
c

SSA =
c


i 1
n (X
j 1
1
nj
j
j
 X )2
or
2
 nj
 1
 X j   T 2

 n
 j 1

Whereby:

SSA = Sum of squares among groups

c = number of groups

nj = sample size from group j

Xj = sample mean from group j

X = grand mean (mean of all data values)
Example: F Test.
Suppose in an industrial experiment that an engineer is interested in how the mean
absorption of moisture in concrete varies among 5 different concrete aggregates. The
samples are exposed to moisture for 48 hours. It is decided that 6 samples are to be
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-9
tested for each aggregate, requiring a total of 30 samples to be tested. The data are
recorded as follows:
Test appropriate hypothesis at 0.05 level of significant for the given data.
Given: c = 5; nj = 6; n = 30;  = 0.05
Solution:

Ho: μ1= μ2= μ3=μ4

Ha: at least two of the μi are unequal
c

 X
i 1 j 1
c

nj
2
ij
 9677954
ij
= 16854;
nj
 X
i 1 j 1
c

1 2
T = (16854)2 /30 = 9468577.2
n
nj
1
SST =  X ij2  T 2 = 9677954- 9468577.2= 209376.8
n
i 1 j 1
c
SSA = 
i 1
1 2 1 2 1
1
1
1
1
X i  T = (3320)2 + (3416)2 + (3663)2 + (2791)2  (3664)2 - 9468577.2
nj
n
6
6
6
6
6

Therefore: SSA = 9553933.7 - 9468577.2 = 85356.467

SWW = SST - SSA = 209376.8 - 85356.467 = 124020.33

MSA = 21339.12

MSW = 4960.813

F= 4.3
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-10
ANOVA Table: Single Factor:
ANOVA
Source of Variation:
SS:
df:
MS:
Between Groups:
85356.467
4
21339.12
Within Groups:
124020.33
25
4960.813
209376.8
29
Total:
F:
4.3
F crit.:
2.76
Decision: Reject H0
Conclusion: The aggregates do not have the same mean absorption.
THE TUKEY-KRAMER PROCEDURE.
Tells which population means are significantly different:

Example: μ1 = μ2 ≠ μ3

Done after rejection of equal means in ANOVA
Allows pair-wise comparisons:

Compare absolute mean differences with critical range
Critical Range  Q U
MSW  1 1 

2  n j n j' 
Whereby:

QU = Value from Studentized Range Distribution with c and (n - c) degrees of
freedom for the desired level of 

MSW = Mean Square Within

nj and nj’ = Sample sizes from groups j and j’
x .1  x .2
x .1  x .3
The Critical Range will be compared with:
x .2  x .3
etc...

Is x .j  x .j'  Critical Range ?
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-11
If the absolute mean difference is greater than the critical range then there is a significant
difference between that pair of means at the chosen level of significance.
Example: The Tukey-Kramer Procedure.
1. Compute Absolute Mean Differences:
Solution:
Paint 1: 2296/10 = 229.6; Therefore, x1  229.6
Paint 2: 3099/10 = 309.9; Therefore, x 2  309.9
Paint 3: 4270/10 = 427.8; Therefore, x 3  427.8
x1  x 2  229.6  309.9  80.3
x1  x 3  229.6  427.8  198.2
x 2  x 3  309.9  427.8  117.9
2. Find the QU value from the table given: c = 3; n = 30;  = 0.05
Solution:

n - c = 30 - 3 = 27; Therefore, QU  3.53
3. Compute Critical Range:

Critical Range  Q U
MSW  1
1 
28543.4 1
1

 3.53
    188.6


2  n j n j' 
2
 10 10 
4. Compare:
Critical range:
Mean Absolute Difference:
x 1  x 2  80 .3
188.6
x1  x 3  198.2
x 2  x 3  117.9
5. Decision:
Since one of the absolute mean differences is greater than critical range. Therefore there
is a significant difference between one pair of means at 5% level of significance.
6. Conclusion:
Thus, with 95% confidence we can conclude that the mean distance for paint 3 is greater
than paint 1.
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-12
THE RANDOMIZED BLOCK DESIGN.
Like One-Way ANOVA, we test for equal population means (for different factor levels)...
...but we want to control for possible variation from a second factor (with two or more
levels). Levels of the secondary factor are called blocks.
The randomized block design consists of two step procedure:

Matched set of experimental units, called blocks are formed each block consist of
p experimental units (where p is the number of treatments). The blocks should
consist of experimental units that are similar as possible.

One experimental unit from each block is randomly assigned to each treatment,
resulting in total of n=bp responses.
Examples:
Testing tensile strength of wires produced using different machines, testing different
methods of production using various operators, testing different brands of tires for
different passenger cars, testing different teaching methods, or testing a certain number
of drugs on a group of animals. In these examples, the different blocks consist of
machines, operators, cars, students, and animals, respectively.
Partitioning the Variation:
Total variation can now be split into three parts:
SST = SSA + SSBL + SSE
Whereby:

SST = Total variation

SSA = Among-Group variation

SSBL = Among-Block variation

SSE = Random variation
Sum of Squares for Blocking:
SST = SSA + SSBL + SSE
Whereby:
r
SSBL  c (Xi.  X)
2
i 1

c = Number of groups

r = Number of blocks

X i. = Mean of all values in block i

X = Grand mean (mean of all data values)
Partitioning the Variation:
Total variation can now be split into three parts:
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-13

SST = SSA + SSBL + SSE
Whereby:

SST and SSA are computed as they were in One-Way ANOVA.
r
c (X i.  X) 2

SSBL =

SSE = SST – (SSA + SSBL)
i 1
Mean Squares:
SSBL
r 1

MSBL = Mean Square Blocking:

MSA = Mean Square among Groups:

MSE = Mean Square Error:
SSA
c 1
SSE
(r  1)(c  1)
Randomized Block ANOVA Table:
Source of Variance:
SS:
df:
MS:
F-ratio:
Among Treatments
SSA
c-1
MSA
MSA / MSE
Among Blocks
SSBL
r-1
MSBL MSBL / MSE
Error
SSE
(r–1)(c-1)
MSE
Total SST
rc - 1
Whereby:
c = Number of populations
rc = Sum of the sample sizes from all populations
r = Number of blocks
df = Degrees of freedom
Blocking Test:

H 0 : μ1.  μ 2.  μ 3.  ...

H1 : Not all block means are equal
Blocking test:

df1 = r – 1

df2 = (r – 1)(c – 1)

F = MSBL / MSE; Reject H0 if F > FU
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-14
Main Factor Test:

H 0 : μ1.  μ 2.  μ 3.  ...

H1 : Not all population means are equal
Main Factor test:

df1 = c – 1

df2 = (r – 1)(c – 1)

F = MSA / MSE; Reject H0 if F > FU
Randomized Block ANOVA Table:
Source of Variance:
SS:
df:
MS:
F-ratio:
Among Treatments
SSA
c-1
MSA
MSA / MSE
Among Blocks
SSBL
r-1
MSBL MSBL / MSE
Error
SSE
(r–1)(c-1)
MSE
Total SST
rc - 1
Multiple comparison of means:
Tukey-Kramer Procedure:
Equal sample size
Bonferroni:
Does not require equal sample size
Scheffé:
Compare all possible linear combination
Apply your knowledge:
A consumer testing organization wished to compare the annual power consumption for
five different brands of dehumidifier. Because power consumption depends on the
prevailing humidity level, it was decided to monitor each brand at four different levels
ranging from moderate to heavy humidity (thus blocking on humidity level). Within each
level brands were randomly assigned to the five selected locations. The resulting amount
of power consumption (annual kwh) are:
Treatments
Brands:
Blocks (Humidity Level):
1
2
3
4
1 685
792
838
875
2 722
806
893
953
3 733
802
880
941
4 811
888
952
1005
5 828
920
978
1023
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-15
L5-Analysis of Variance
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-16
EXPERIMENTAL DESIGN.
THE EXPERIMENTAL DESIGN PROCESS:
Design of Experiments (DOE) defined:
A theory concerning the minimum number of experiments necessary to develop an
empirical model of a research question and a methodology for setting up the necessary
experiments.
Design of Experiment Constraints:

Time and Money.
Why conduct experiment:

To determine the principal causes of variation in a measured response;

To find the conditions that give rise to a maximum or minimum response;

To compare the responses achieved at different settings of controllable variables;

To obtain a mathematical model in order to predict future responses.
Benefits of experimental design:

Design a proper set of experiments for measurement or simulation;

Develop a model that best describes the data obtained and check model
adequacy;

Estimate the contribution of each alternative to the performance;

Isolate the measurement errors;

Estimate confidence intervals for model parameters;

Check if the alternatives are significantly different.
L6-Experimental Design Process
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-1
Common mistakes in experimentation:

The variation due to experimental error is ignored;

Important parameters are not controlled;

Effects of different factors are not isolated;

Simple one-factor-at-a-time designs are used;

Interactions are ignored: An interaction is the failure of one factor to produce the
same effect on the response at different levels of another factor;

Too many experiments are conducted.
EXPERIMENTAL DESIGN BASICS:
Two kinds of data gathering methodologies:
Observation:

Can’t prove cause & effect but can establish associations.
Experimental:

Can proveCause & effect;

Variables of interest: Factors vs. Treatments.
Independent variable:

Treatment: Manipulations of variables of interest;

Treatment vs. Control group.
Dependent variable is what you are measuring:
Example:
Optimize the various operating parameters for enhancing the performance and heat
transfer characteristics of solar parabolic through collectors ( (PTC). The independent
variables are chosen as follows:
Parameters (Factors):
Values (Levels / Treatments):
Diameter of receiver (m)
0.03
0.026
0.021
0.001756
0.001578
0.001311
Copper (Cu)
Aluminium (Al)
Galvanized steel (GI )
Mass flow rate (kg/s)
Material of receiver
Response Variable: Outcome

Example: Performance, Throughput...
Factors: Variables that affect the response variable.

Example: Diameter of receiver, Mass flow rate, Material of receiver. They are
also called Predictor variables or Predictors.
Levels: The values that a factor can assume. Also called Treatments.

Example: Mass flow rate has three levels: 0.001756; 0.001578 and 0.001311
L6-Experimental Design Process
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-2
Primary Factors: The factors whose effects need to be quantified.
Confounds:
Randomization Concerns:
Randomization prevents experimental bias.

Assignment by experimenter: Counterbalancing.
Statistical assumptions.

A requirement for statistical tests of significance.
Design of Experiment Terminologies:
Replications:

Independent observations of a single treatment.
Repeated measures:

Each subject is measured at two or more points with respect to time.
Variance:

The measuring stick that compares different treatments.
Internal validity:

The extent to which an experiment accomplishes its goal(s).
Reproducibility:

Given the appropriate information, the ability of others to replicate the
experiment.
External validity:

How representative of the target population is the sample?
 Can the results be generalized?
 Generalizations for field experiments are easier to justify than lab experiments
because of artificialities.
Medical Trials:

Placebo

Double Blind
BASIC PRINCIPLES OF EXPERIMENTAL DESIGNS:

The Principle of Replication;

The Principle of Randomization;

The Principle of Local Control.
L6-Experimental Design Process
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-3
1. Principle of Replication:

The experiment should be repeated more than once;

Each treatment is applied in many experimental units instead of one;

This increases the statistical accuracy of the experiments;

The result so obtained will be more reliable.
Conceptually replication does not present any difficulty but computationally it does.

Example: If an experiment requiring a two-way analysis of variance is replicated,
it will then require a three-way analysis of variance since replication itself may be
a source of variation in the data.
However, it should be remembered that replication is introduced in order to increase the
precision of a study.
2. Principle of Randomization:

Provides protection when you conduct an experiment against the effect of
extraneous factors by randomization;

This principle indicates that you should design or plan the experiment in such a
way that the variations caused by extraneous factors can all be combined under
the general heading of “chance.”;

The application of the principle of randomization gives a better estimate of the
experimental error.
3. Principle of Local Control (Blocking):

Under it the extraneous factor (the known source of variability) is made to vary
deliberately over as wide a range as necessary;

This needs to be done in such a way that the variability it causes can be
measured and hence eliminated from the experimental error;

This means that you should plan the experiment in a manner that you can
perform a two-way analysis of variance in which the total variability of the data is
divided into three components attributed to treatments (varieties of rice), the
extraneous factor (soil fertility) and experimental error;

Blocking is a method of eliminating the effects of unrelated variation due to noise
factors and thereby improving the efficiency of experimental design;

The main objective is to eliminate unwanted sources of variability such as batch
to batch, day-to-day, shift to shift, etc. and arrange similar experimental runs into
blocks (or groups). Generally, a block is a set of relatively homogeneous
experimental conditions.
L6-Experimental Design Process
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-4

The blocks can be batches of raw materials, different operators, different
vendors, etc.;

Observations collected under the same experimental conditions (i.e. same day,
same shift, etc.) are said to be in the same block;

Variability between blocks must be eliminated from the experimental error, which
leads to an increase in the precision of the experiment.
FACTORIAL DESIGNS:
Full factorial design:

Two or more independent variables are manipulated in a single experiment.
They are referred to as factors.

Levels: These are various ways the independent variable is changed.

Major purpose of the research is explore their effect jointly.

Factorial design produce efficient experiments, each observation supplies
information about all of the factors (all possible combinations).
22 Factorial Design:
Two factors, each at two levels (k factors, each at two levels):
Example: Workstation Design.

Factor 1: Memory size

Factor 2: Cache size

Dependent variable: Performance.
Cache size:
Memory size:
4M byte
4M byte
1K
15
45
2K
25
75
2
Combination and Interaction in a 2 Experiment:
Interaction in a 22
Experiment.
L6-Experimental Design Process
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-5
2k Factorial Design:
k factors, each at two levels:

3
Example: 2 design: In designing a personal workstation, the three factors
needed to be studied are: Cache size, Memory size and Number of processors.
Factors:
Level 1
Level 2
Cache size
1K
2K
Memory size
154Mb
458Mb
No. of processor
1
2
Combination and Interaction in a 23 Experiment:
Interaction in a 23
Experiment.
L6-Experimental Design Process
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-6
SIMPLE LINEAR REGRESSION.
Correlation vs. Regression:
A scatter diagram can be used to show the relationship between two variables.
Correlation analysis is used to measure strength of the association (linear relationship)
between two variables.

Correlation is only concerned with strength of the relationship;

No causal effect is implied with correlation.
INTRODUCTION TO REGRESSION ANALYSIS:
Regression analysis is used to:

Predict the value of a dependent variable based on the value of at least one
independent variable;

Explain the impact of changes in an independent variable on the dependent
variable.
Dependent variable:

The variable we wish to predict or explain.
Independent variable:

The variable used to explain the dependent variable.
SIMPLE LINEAR REGRESSION MODEL:

Only one independent variable, X;

Relationship between X and Y is described by a linear function;

Changes in Y are assumed to be caused by changes in X.
Types of Relationships:
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-1
Simple Linear Regression Model:
Simple Linear Regression Equation (Prediction Line):

The simple linear regression equation provides an estimate of the population
regression line. The individual random error terms ei have a mean of zero.
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-2
Least Squares Method:

b0 and b1 are obtained by finding the values of b0 and b1 that minimize the
sum of the squared differences between Y and Yˆ :

min  (Yi Ŷi ) 2  min  (Yi  (b0  b1X i ))2
Least Squares Method:
Model:
Estimates:
Deviation:
SSE:
Formulas for the Least Squares Estimates:
Interpretation of the Slope and the Intercept:

b0 is the estimated average value of Y when the value of X is zero;

b1 is the estimated change in the average value of Y as a result of a one-unit
change in X.
Example: Simple Linear Regression.
A real estate agent wishes to examine the relationship between the selling price of a
home and its size (measured in square feet). A random sample of 10 houses is
selected:

Dependent variable (Y) = House price in $1000s

Independent variable (X) = Square feet
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-3
Sample Data for House Price Model:
S/N
House Price
Square Feet
in $1000s (Y)
(X)
(Xi Yi)
(Xi2)
01
245
1400
343000
1960000
02
312
1600
499200
2560000
03
279
1700
474300
2890000
04
308
1875
577500
3515625
05
199
1100
218900
1210000
06
219
1550
339450
2402500
07
405
2350
951750
5522500
08
324
2450
793800
6002500
09
319
1425
454575
2030625
10
255
1700
433500
2890000
T
2865
17150
5085975
30983750
Graphical Presentation:
House price model: Scatter Plot.
Least Squares Method:
 xi yi 
 xi  yi 
n
2

 xi 
2
 xi 
n

Slope:

Y intercept: ˆo  y  ˆ1 x 
286517150
10

 0.10977
1715017150
30983750
10
5085975
y
 xi  2865  0.109768 17150  98.24445
 ˆ1
n
n
10
10
i
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-4
Graphical Presentation:

House price model: Scatter Plot and Regression Line.
Interpretation of the Intercept - b0:

House price  98.24833 0.10977(square feet)
Whereby: b0 is the estimated average value of Y when the value of X is zero (if X = 0 is
in the range of observed X values).

Here, no houses had 0 square feet, so b0 = 98.24833 just indicates that, for
houses within the range of sizes observed, $98,248.33 is the portion of the house
price not explained by square feet.
Interpretation of the Slope Coefficient - b1:

House price  98.24833 0.10977(square feet)
Whereby: b1 measures the estimated change in the average value of Y as a result of a
one-unit change in X.

Here, b1 = .10977 tells us that the average value of a house increases by
10977($1000) = $109.77, on average for each additional one square foot of size.
Predictions using Regression Analysis:

Predict the price for a house with 2000 square feet:
House price  98.25  0.1098(sq.ft.)
 98.25  0.1098(2000)
 317.85
The predicted price for a house with 2000 square feet is 317.85($1,000s) = $317,850.
Interpolation vs. Extrapolation:

When using a regression model for prediction, only predict within the relevant
range of data.
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-5
Measures of Variation:
Total variation is made up of two parts:
SST =
Total Sum of Squares
SSR +
Regression Sum
SSE
Error Sum of Squares
of Squares
SST   (Yi  Y ) 2
SSR   (Yˆi  Y ) 2
SSE   (Yi  Yˆi ) 2
Whereby:
Y : Average value of the dependent variable
Yi: Observed values of the dependent variable
Yˆ i: Predicted value of Y for the given Xi value
SST = Total sum of squares:

Measures the variation of the Yi values around their mean Y
SSR = Regression sum of squares:

Explained variation attributable to the relationship between X and Y
SSE = Error sum of squares:

Variation attributable to factors other than the relationship between X and Y
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-6
Coefficient of Determination - r2:

The coefficient of determination is the portion of the total variation in the
dependent variable that is explained by variation in the independent variable. The
coefficient of determination is also called r-squared and is denoted as r2.

Therefore: r 2 
SSR regression sum of squares
2

Whereby: 0  r  1 .
SST
total sum of squares
Examples of Approximate r2 Values:
r2 = 1:

Perfect linear relationship between
X and Y

100% of the variation in Y is
explained by variation in X
0 < r2 < 1:

Weaker
linear
relationships
between X and Y

Some but not all of the variation in
Y is explained by variation in X
r2 = 0:

No linear relationship between X and Y

The value of Y does not depend on X.
(None of the
variation in Y is explained by variation in X)
Standard Error of Estimate:
The standard deviation of the variation of observations around the regression line is
estimated by:
n
SYX 
SSE

n2
 (Y  Yˆ )
i
i 1
2
i
n2
Whereby:
SSE: Error sum of squares.
n: Sample size.
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-7
Comparing Standard Errors:
SYX is a measure of the variation of observed Y values from the regression line.
The magnitude of SYX should always be judged relative to the size of the Y values in the
sample data. i.e. SYX = $41.33K is moderately small relative to house prices in the $200 $300K range.
Assumptions of Regression:
Use the acronym LINE:

Linearity:
The underlying relationship between X and Y is linear.

Independence of Errors:
Error values are statistically independent.

Normality of Error:
Error values (ε) are normally distributed for any given value of X.

Equal Variance (Homoscedasticity):
The probability distribution of the errors has constant variance.
Residual Analysis:

The residual for observation i, ei, is the difference between its observed and
predicted value.
ei  Yi  Yˆi

Check the assumptions of regression by examining the residuals:
 Examine for linearity assumption
 Evaluate independence assumption
 Evaluate normal distribution assumption
 Examine for constant variance for all levels of X (Homoscedasticity)

Graphical Analysis of Residuals:

Can plot residuals vs. X
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-8
Residual Analysis for Linearity:
Are the data points relatively linear or is it curved or skewed in some way?
Residual Analysis for Linearity.
Residual Analysis for Independence:
Is there any pattern in the residue yes - correlation.
Residual Analysis for Normality:
A normal probability plot of the residuals can be used to check for normality:
Do the residue points fall more or less on a straight line in the normal probability plot?
Residual Analysis for Equal Variance:
Is there any pattern in the residue high/low yes-heteroscadasticity.
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-9
are residue distributed evenly and consistently around the x-axis- yes homoscedasticity.
Residual Output:
Does not appear to violate any regression assumptions.
Inferences About the Slope:
The standard error of the regression slope coefficient (b1) is estimated by:
Sb1 
SYX

SSX
SYX
 (X  X )
2
i
Whereby:
Sb1 : Estimate of the standard error of the least squares slope
SYX 
SSE
: Standard error of the estimate
n2
Comparing Standard Errors of the Slope:
Sb1 : Is a measure of the variation in the slope of regression lines from different possible
sample.
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-10
Inference about the Slope - t Test:
t test for a population slope:

Is there a linear relationship between X and Y?
Null and alternative hypotheses:

H0: β1 = 0
(no linear relationship)

H1: β1 ¹ 0
(linear relationship does exist)
Test statistic:

t
b1  β1
; d.f.  n  2
Sb1
Whereby: where:
b1: Regression slope coefficient
β1: Hypothesized slope
Sb: Standard error of the slope
Inferences about the Slope - t Test example:
H0: β1 = 0
H1: β1 ≠ 0
b1  β1 0.10977 0

 3.32938
Sb1
0.03297

t

Test Statistic: t = 3.329
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-11
Decision: Reject H0
Conclusion: There is sufficient evidence that square footage affects house price.

From output: P-value = 0.01039
Decision: P-value < α so Reject H0
Conclusion: There is sufficient evidence that square footage affects house price.
F Test for Significance:

F Test statistic: F 
MSR
SSR
SSE
; MSR 
and MSE 
MSE
k
n  k 1
Whereby:

F follows an F distribution with k numerator and (n – k - 1) denominator
degrees of freedom.
(k = the number of independent variables in the regression model).
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-12
Confidence Interval Estimate for the Slope:
Confidence Interval Estimate of the Slope:
b1  tn 2Sb1 ; d.f. = n - 2
At 95% level of confidence, the confidence interval for the slope is (0.0337, 0.1858).
Since the units of the house price variable is $1000s, we are 95% confident that the
average impact on sales price is between $33.70 and $185.80 per square foot of house
size.
This 95% confidence interval does not include 0.
Conclusion: There is a significant relationship between house price and square feet at
the .05 level of significance.
t Test for a Correlation Coefficient:

Hypotheses:
H0: ρ = 0 (no correlation between X and Y)
HA: ρ ≠ 0 (correlation exists)

t
Test statistic:
r -ρ
1 r2
n2
with (n – 2) degrees of freedom.
Whereby:
r   r 2 if b1  0
r   r 2 if b1  0
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-13
Example - House Prices:

Is there evidence of a linear relationship between square feet and house price at
the .05 level of significance?
H0: ρ = 0
(No correlation)
H1: ρ ≠ 0
(correlation exists)
 =.05 , df = 10 - 2 = 8

t
r ρ
1 r
n2
2

.762  0
1  .762
10  2
2
 3.329
Solution:
Estimating Mean Values and Predicting Individual Values:
Goal: Form intervals around Y to express uncertainty about the value of Y for a given Xi
Confidence Interval for the Average Y, Given X:
Confidence interval estimate for the mean value of Y given a particular Xi
Confidence interval for μ Y|X Xi :
Yˆ  t n 2SYX hi
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-14
hi 
1 (Xi  X) 2 1
(Xi  X) 2

 
n
SSX
n  (Xi  X) 2
Whereby: (Xi  X) 2 Size of interval varies according to distance away from mean, X .
Prediction Interval for an Individual Y, Given X:
Confidence interval estimate for an Individual value of Y given a particular Xi.
Example - Estimation of Mean Values:
Confidence Interval Estimate for
μY|X=X i. Find the 95% confidence interval for the mean
price of 2,000 square-foot houses.
Predicted Price Ŷi = 317.85 ($1,000s).
Ŷ  t n -2SYX
1
(Xi  X) 2

 317.85 37.12
n  (Xi  X) 2
The confidence interval endpoints are 280.66 and 354.90, or from $280,660 to $354,900.
Pitfalls of Regression Analysis:

Lacking an awareness of the assumptions underlying least-squares regression

Not knowing how to evaluate the assumptions

Not knowing the alternatives to least-squares regression if a particular
assumption is violated

Using a regression model without knowledge of the subject matter

Extrapolating outside the relevant range
Strategies for Avoiding the Pitfalls of Regression:

Start with a scatter diagram of X vs. Y to observe possible relationship

Perform residual analysis to check the assumptions:
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-15
 Plot the residuals vs. X to check for violations of assumptions such as
Homoscedasticity
 Use a histogram, stem-and-leaf display, box-and-whisker plot, or normal
probability plot of the residuals to uncover possible non-normality

If there is violation of any assumption, use alternative methods or models

If there is no evidence of assumption violation, then test for the significance of the
regression coefficients and construct confidence intervals and prediction intervals

Avoid making predictions or forecasts outside the relevant range
L7-Linear Regression
Lecture Notes by Dr. Mahabi
Compiled by Ibrahim Nyirenda, 2022-16