Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
MODULE IV
LINEAR REGRESSION, CORRELATION, AND HYPOTHESIS TESTING
4.1 SIMPLE LINEAR REGRESSION
The simplest and the most widely used form of regression involves a linear
relationship between two variables. Linear regression is an approach for modelling the
relationship between a scalar dependent variable y and one or more explanatory
variables denoted X. The case of one explanatory variable is called simple linear regression.
The objective in linear regression is to obtain an equation of a straight line that minimizes the
sum of the squared deviation between the points and the line defined by the coefficients. This
least square line has the equation:
Yt = a + bX
Where:
Yt = Predicted or dependent variable
X = Predictor or independent variable
b = Slope of the line
a = Value of Yt when X = 0
(Note that the predicted variable is written on the Y-axis and the predictor
variable on the X – axis.)
The coefficients a and b of the line are computed using these two
equations:
b =
a =
𝑛(𝛴𝑋𝑌) − ( 𝛴𝑋) (𝛴𝑌)
𝑛(𝛴𝑋 2 ) − (𝛴𝑋)²
𝛴𝑌 − 𝑏 𝛴𝑋
𝑛
or
a = y - bx
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
Illustrative example:
Milcas Hamburger has a chain of 10 stores throughout Cavite. Sales
figures and profiles for the stores are given in the following table. Obtain a
regression line for the data, and predict profit for the store assuming sales of 30
million.
Sales, X (Millions)
15
17
21
18
19
22
16
17
25
20
Profits, Y (millions)
8
9
13
10
11
14
8.5
10
15
13
Solution:
Sales, x (Millions)
15
17
21
18
19
22
16
17
25
20
ΣX = 190
Profits, y (millions)
8
9
13
10
11
14
8.5
10
15
13
ΣY = 111.5
xy
120
153
273
180
209
308
136
170
375
260
ΣXY = 2,184
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
x²
225
289
441
324
361
484
256
289
625
400
ΣX²=3,694
𝑛(𝛴𝑋𝑌)− (𝛴𝑋)(𝛴𝑌)
b =
𝑛(𝛴𝑋 2 )− (𝛴𝑋)²
b =
2,184 − 21,185
=
36,940 − 36,100
10(2,184)−(190)(111.5)
=
10(3,694)−(190)²
655
840
b = 0.78
a = y - bx
Where:
y =
x =
𝛴𝑌
𝑛
𝛴𝑋
𝑛
=
=
111.5
10
190
10
=
=
11.15
19
a = 11.15 - 0.78(19)
a = -3.67
When x = ₱ 30 million
Yt = a + bx
Y30 = -3.67 + 0.78 (30)
Y30 = 19.73 million
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
4.2 LINEAR CORRELATON
Correlation is a statistical tool used to measure the linear relationship between two
random variable X and Y.
There are three degrees of relationship or correlation between two variables:
1. Perfect correlation (Positive and Negative)
2. Some degree of correlation (Positive and Negative)
3. No correlation
The following range of values for quantitative interpretation of the degree of
linear relationship:
± 1.00
Perfect positive (negative) correlation
± 0.91 - ± 0.99 Very high positive (negative) correlation
± 0.71 - ± 0.90 High positive (negative) correlation
± 0.51 - ± 0.70 Moderately positive (negative) correlation
± 0.31 - ± 0.50 Low positive (negative) correlation
±0.01 - ± 0.30 Negligible positive (negative) correlation
0.00 No correlation
Scatterpoint Diagram is used to illustrate the relationship between two variables.
The following are scatterpoint diagrams for different types of correlation. Next page.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
Perfect positive correlation Some degree of positive
correlation
Perfect negative correlation
No correlation
Some degree of negative
correlation
4.3 THE PEARSON PRODUCT – MOMENT CORRELATION COEFFICIENT
The Pearson Product – Moment Correlation Coefficient, or symbol r,is the most
widely used formula for correlation.
Formula:
r =
where:
NΣXY − ΣX ΣY
√[NΣX2 − (ΣX)2 ] [ NΣY2 −(ΣY)2 ]
X = the observed data for the independent variable
Y = the observed data for the dependent variable
N = sample size
R = degree of relationship between X and Y
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
Illustrative example:
A sample of 20 students was selected and their heights (in inches) and
weights (in pounds) measured as shown below. Find out if relationship exists
between them.
Student
X
Y
XY
X²
Y²
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
170
172
158
165
180
195
183
175
182
190
165
184
194
176
182
178
180
168
157
185
72
70
60
73
85
98
78
76
82
90
75
80
90
78
84
80
82
76
64
85
12, 240
12, 040
9, 480
12, 045
15, 300
19, 110
14, 274
13, 300
14, 924
17, 100
12, 375
14, 720
17, 460
13, 728
15, 288
14, 240
14, 760
12, 768
10, 048
15, 725
28, 900
29, 584
24, 964
27, 225
32, 400
38, 025
33, 489
30, 625
33, 124
36, 100
27, 225
33, 856
37, 636
30, 976
33, 124
31, 684
32, 400
28, 224
24, 649
34, 225
5, 184
4, 900
3, 600
5, 329
7, 225
9, 604
6, 084
5, 776
6, 724
8, 100
5, 625
6, 400
8, 100
6, 084
7, 056
6, 400
6, 724
5, 776
4, 096
7, 225
ΣX = 3, 539
ΣXY = 280,925
ΣY² = 126, 012
ΣY = 1,578
ΣX² = 628,435
r=
NΣXY − ΣX ΣY
√[NΣX2 −
(ΣX)2 ] [ NΣY2 −(ΣY)2 ]
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
=
20(280,925)−(3,539)(1,578)
√[20(628,435)−(3,539)2 ][20(126,012)−(1,578)2 ]
5,618,500 − 5,584,542
=
√12,568,700−12,524,521) (2,520,240 −2,490,084)
=
33,958
√(44,179)(30,156)
=
33,958
√1,332,261,924
=
33,958
36,500.16
r = 0.93
From the table of qualitative interpretation, r falls on the range
0.91 - 0.99. This shows that the heights and weights of 20 students has a
very high positive correlation.
The students are advised to construct the scatterpoint diagram of
the above data as a seatwork.
4.4 THE SPEARMAN’S COEFFICIENT OF CORRELATION
Calculating the Coefficient of Correlation using the Spearman’s Coefficient
of Correlation R
Formula:
R =
1−
Where: R
1
6
ΣG
N
6 (𝛴𝐺 )
𝑁² − 1
-
Coefficient of Correlation by Spearman’s formula
constant
constant
the sum of column G
number of pairs of scores or measures
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
Illustrative example:
12 randomly selected BSHRM students were given tests in Accounting X and
Economics Y. Use Spearman’s Coefficient of Correlation to determine the extent
of the relationship of these subjects. Below is the table of scores obtained.
Students
1
2
3
4
5
6
7
8
9
10
11
12
Accounting,
X
21
22
28
27
48
22
27
6
11
12
16
28
Economics,
Y
27
28
27
10
30
21
27
21
21
28
30
21
RX
8
6.5
2.5
4.5
1
6.5
4.6
12
11
10
9
2.5
RY
6
3.5
6
12
1.5
9.5
6
9.5
9.5
3.5
1.5
9.5
G
2
3
--------------------2.5
1.5
6.5
7.5
----ΣG = 23
Step 1. Write the scores of the 12 students under Accounting (X) and Economics
(Y) columns.
Step 2. Rank the scores in column X with the highest score as rank 1 and the
lowest score as rank 12. Write the ranks under column RX.
Step 3. Rank the scores in column Y with the highest score as rank 1.5 and the
lowest as rank 12.
Step 4. Subtract RY value from RX values.
Step 5. Write the difference under column G (Gain). Consider only the positive
values.
Step 6. Get the ΣG and solve for R.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
R =
1 −
6 (𝛴𝐺 )
𝑁²−1
= 1−
6(23)
12²−1
= 1−
138
144−1
R = 1 − 0.96
R = 0.04
Step 7. Write your observation.
Since the value of R = 0.04 is very low, it indicates an almost negligible
relationship between the two sets of data.
4.5 HYPOTHESIS TESTING
Hypothesis is simply a statement that something is true. It is a tentative
explanation, or claim, or assertion about people, objects, or events.
Examples are:
1. There is no significant relationship between the mathematics attitude
and competency levels of BSHRM students.
2. The percentage of shoppers who buy their favourite toothpaste
regardless of price is not 25 %.
3. The mean monthly allowance of randomly selected students of ISHRM
is at least ₱ 5, 000.
4. 90 % of the government employees filed their income tax returns on
time.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
5. The proportion of consumers who purchase Champion powder soap
before advertising campaign in the television and the proportion who
purchased it after the advertising campaign is not equal.
Hypothesis testing is a procedure in making a decisions based on a sample
evidence or probability theory used to determine whether the hypothesis is accepted or
rejected.
4.6 TYPES OF HYPOTHESIS
1. Null Hypothesis, denoted by Ho:, is the hypothesis of “no difference” . It is
expressed as:
Ho : µ = k
Ho ; µ ≤ k
Ho : µ ≥ k
2. Alternative Hypothesis, denoted by H1:, is the hypothesis to be accepted in
case the Ho is rejected (not true). It is expressed in three ways.
H1 : µ ≠ k
H1 : µ ˃ k
H1 : µ ˂ k
4.7 TYPES OF ERRORS
In the course of hypothesis testing, there is always the possibility of committing
an error on whether or not to reject or not to reject a hypothesis.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
A Type I error occurs if one rejects the null hypothesis when in fact the null
hypothesis is true. It is denoted by alpha (α). In hypothesis testing, the normal curve that
shows the critical region is called the alpha region.
A Type II error occurs if one does not reject the null hypothesis when in fact it is
false. It is denoted by beta (β). The normal curve that shows the acceptance region is
called the beta region.
4.8 LEVEL OF SIGNIFICANCE
Level of significance, or significance level, refers to a criterion of judgment
upon which a decision is made regarding the value stated in a null hypothesis.
The criterion is based on the probability of obtaining a statistic measured in a
sample if the value stated in the null hypothesis were true.
The significance level (α) of a hypothesis test is defined as the probability of
committing a type I error – ejecting the null hypothesis when it is true.
The choice of level of significance depends on the researcher. The commonly
used significance levels are .01 (or 1.0%), .05 (or 5%), or .1 (or 10%). The table of
Critical Values for the Student’s t distribution is used for reference.
Using 0.05 level of significance in testing hypothesis implies that the probability
of accepting to commit an error in rejecting the null hypothesis is 5% but 95% sure that
the decision made is correct.
3.7
TYPES OF TEST
A one – tailed test shows that the null hypothesis be rejected when test value
(statistic) is in the critical region on one side of the mean. It may be either a right – tailed
test or left – tailed test, depending on the direction of the inequality of the alternative
hypothesis. It is also called directional (hypothesis) test.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
A two – tailed test indicates the alternative hypothesis is that of a “not equal”
sign. It is a non - directional (hypothesis) test.
Relationship between Signs in Hypothesis and the Tails of the Test
Two – tailed test
Left – tailed test
Right – tailed test
Signs in the Ho
Ho: µ = k
Ho: µ = k or Ho: µ ≥ k
Ho: µ = k or Ho: µ ≤ k
Signs in the H1
H1: µ ≠ k
H1: µ ˂ k
H1: µ ˃ k
In both tails
In the left tail
In the right tail
Rejection Region
where k represents a specified number
=
˃
≥
Common Phrases in Hypotheses Testing
Is to
Is not equal to
Is the same as
≠
Is not the same
Is exactly the same as
Is different from
Is increased
Is greater than
Is higher than
Is at least
Is not less than
Is greater than or equal to
˂
≤
Is decreased
Is less than
Is lower than
Is at most
Is not more than
Is less than or equal to
4.10 CRITICAL REGION OR REJECTION REGION
The Critical Region is a set of values of the test statistics for which the null
hypothesis is rejected in hypothesis testing. It means that if the computed value of a test
statistics falls within the rejection region, reject the null hypothesis and accept the
alternative hypothesis.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
Critical values for one – and two – tailed tests commonly used level of significance
Level of Significance
0.05
0.01
0.001
Type of Test
One - tailed Two – tailed
± 1.645
± 1.96
± 2.33
± 2.58
±3.09
± 3.30
There are two rejection regions indicated by the
shaded portions on both ends of the curve. The
unshaded portion at the center is the acceptance
region. The two lines that separate the rejection
and acceptance regions are the critical regions
whose values are expressed as standard z values.
If the level of significance α is 0.05, the size of the
rejection region is 0.05 ∕2 for two-tailed test. The
value of α = 0.05 is ±1.960 ∕ 2 = ± 0.98
In this one-tailed test, the rejection
region is on the right side.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
If z is negative, the rejection region is
on the left side.
4.11 THE Z TEST
The z – test is used to determine if the obtained sample mean is different from the
expected population mean.
( 1 ) If the population standard deviation (σ) is given, the formula is:
Z =
where: x
µ
σ
n
=
=
=
=
𝑥− µ
𝜎
√𝑛
df = α
the sample mean
the population mean
the population standard deviation
the number of cases in the sample
( 2 ) If the population standard deviation is unknown, the formula is:
Z =
𝑥− µ
𝑠
√𝑛
df = n-1
(The degree of freedom (df) is the freedom to vary.)
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
4.12 THE ONE SAMPLE z - TEST
The One Sample z - test is a statistical test for the mean of a population.
Assumptions in One Sample z test:
1) The samples (subject) are randomly selected.
2) Population distribution is normal.
3) The population should be known.
4) Cases of the samples should be independent.
5) Sample size should be greater than or equal to (≥) 30.
4.13 APPLICATION
Illustrative examples:
(A) It is known that the percentile mean of senior students of Manila Science in
the NCEE is 99 with standard deviation of 2. In the last NCEE, the actual
mean of 500 students of the school was 97 with standard deviation of 2.3.
Was there a significant difference between the actual mean and the
hypothesized mean? Use ∞ .05 and follow the steps in hypothesis testing.
Solution:
1) Identify the given values.
µ = 99
σ = 2
x = 97
s = 2.3
n = 500
2) State the hypothesis and identify the claim.
Ho: µ = 99 (claim)
H1: µ ≠ 99 (The “not equal” sign indicates that the test is two-tailed)
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
3) Identify the level of significance
Test the claim at α = .05 = 1.960 ∕ 2 = ± 0.98 (from the table)
4) Solve for the z value
Z =
𝑥− µ
𝜎
√𝑛
=
97 − 99
2
√500
=
−2
2
√22.361
= ± 22.371
5) Draw the standard normal distribution curve and locate the critical
region, rejection region, and the acceptance region
The critical region Zα/₂ = ±
0.98 and the computed Z
value = ±22.371
6) Make decision based on the solution on step 5
The computed Z value of ± 22.371 is greater than the value of the
critical region Zα/₂ = ± 0.98.
7) State conclusion
The Z = ± 22.371 ˃ Zα∕₂ = ± 0.98, means that the Z value falls within
the rejection region. Therefore there is significant difference between
the actual mean and the hypothesized mean. Reject the Ho and accept
the H1.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
B) A researcher claims that the monthly consumption of coffee per person is
more than 19 cups. In a sample of 60 randomly selected people, the mean
monthly consumption was 20 with a standard deviation of 4 cups. Test the
claim at α = 0.01.
Solution:
1) Identify the given values.
µ = 19
x = 20
s = 4
n = 60
2) State the hypothesis and identify the claim.
Ho: µ = 19 cups
H1: µ ˃ 19 (claim)
3) Identify the level of significance
Test the claim at α = 0.01 level of significance
4) Solve for the one sample z value
Z =
𝑥− µ
𝑠
√𝑛
=
20 − 19
4
√60
=
1
4
7.746
=
1
0.516
= 1.93
5) Draw the standard normal distribution curve and locate the critical
region, rejection region, and the acceptance region
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
The computed Z = 1.938 and
the Zα at 0.01 is 2.33.
6) Make decision based on the solution on step 5
The computed Z value of 1.938 is less than the value of the critical
region Zα/₂ = 2.33.
7) State conclusion
The Z = 1.938 ˂ Zα = 0.01 = 2.33, means that the Z value falls below
the critical region. Therefore, Ho is not rejected.
C. Z - test without standard deviation
Some experts claim that the probability of each person being left-handed is 0.25.
It is observed that out of 30 randomly sampled people, 10 are left-handed.
Using = 0.05, is there sufficient evidence to conclude that the population
proportion is different from 0.25?
Given: µ = 0.25 n = 30
x = 10
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
1. Set up the hypotheses (since the research hypothesis is to check whether the
proportion is different from 0.25, we set it up as a two-tailed test):
Ho: µ = 0.25
Ha: µ
0.25
2. Decide on the significance level,
According to the question,
.
= 0.05.
3. Compute the value of the test statistic:
Z=
𝑥
−µ
𝑛
µ ( 1− µ)
√
𝑛
=
10
−0.25
30
0.25 ( 1−0.25 )
√
30
=
1.053
= ± 1.053
Step 4. Find the appropriate critical values for the test using the z-table. Write
down clearly the rejection region for the problem. We can use Table 2 to find the value
of Z0.025 since the row for df = (infinite) refers to the z-value.
From the table, Z0.025 is found to be 1.96 and thus the critical values are ±
1.96. The rejection region for the two-tailed test is given by:
z > 1.96 or z < -1.96
The Z value = ±1.053 falls below
the α = ±1.96
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
Step 5. Check whether the value of the test statistic falls in the rejection region. If
it does, then reject Ho (and conclude Ha). If it does not fall in the rejection
region, do not reject Ho.
The observed z-value is 1.053 does not fall within the rejection region, we
do not reject Ho.
Step 6. State the conclusion in words.
Based on the observed data, there is not enough evidence to conclude that
the population proportion of left-handed people is different from 0.25.
D. The treasurer of a certain university claims that the mean monthly salary of
their college professors is ₱ 21, 750 with a standard deviation of ₱ 6,000. A
researcher takes a random sample of 75 college professors and was found to
have a mean monthly salary of ₱ 19, 375. Do the 75 professors have lower
salaries than the rest? Test the claim at α = 0.05 level of significance.
Solution:
1) Hypothesis
Ho: µ = ₱21, 750
H1: µ ˂ ₱21, 750
2) Level of significance
α = 0.05
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
3) One-tailed test is used because the H1 is directional.
4) The Critical value of z at 0.05 level of significance is ± 1.645
5) Compute for the value of
Z =
𝑥− µ
𝜎
√𝑛
=
19,375 − 21,750
6,000
√75
=
− 2,375
6,000
8.66
Z = - 3.43
6) Decision:
The computed value of z = -3.43 is smaller than the value
of α = - 1.645, so that it lies within the rejection region. Reject the
Ho and accept H1,
7) Conclusion:
Therefore the college professors of a certain university
have lower salaries, lower than what was claimed by the treasurer.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
WORK PROJECT / LEARNING ACTIVITY NO. 1, MODULE IV
Name: ______________________________ Yr/Sec: ____________ Date: _________________
A. IDENTIFICATION.
_________________________ 1. It is the simplest and the most widely used form of
linear regression.
_________________________ 2. It is an approach for modelling the relationship between a
scalar dependent variable y and one or more explanatory
variables denoted X.
_________________________ 3. It is a tool used to measure the linear relationship of two
variables X and Y.
_________________________ 4. This diagram is used to illustrate the relationship between
two variables.
_________________________ 5. It is the most widely used formula for correlation.
_________________________ 6. This test statistic is used to determine if the obtained
sample mean is different from the expected population
mean.
_________________________ 7. This refers to a set of values of the test statistics for
which the null hypothesis is rejected in hypothesis
testing.
_________________________ 8. This test shows that the null hypothesis be rejected when
test value (statistic) is in the critical region on one side of
the mean.
_________________________ 9 A two – tailed test indicates the alternative hypothesis is
that of a “not equal” sign.
_________________________ 10. This refers to a criterion of judgmentupon which a
decision is made regarding the value stated in a null
hypothesis.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
WORK PROJECT / LEARNING ACTIVITY NO. 2, MODULE IV
Name: ______________________________ Yr/Sec: ____________ Date: _________________
B. MULTIPLE CHOICE.
__________ 1. The α error is defined as the probability of committing a
a. Type I error
c. Type II error
b. Both a and b
d. None
__________ 2. Which of the following is referred to as hypothesis of no difference?
a. Ho: µ ˂ x
c. Ho: µ ˃ x
b. Ho: µ ≠ x
d. Ho: µ = x
__________ 3. If the chosen level of significance is α 0.05, the researcher is ready to
commit an error in his work of about
a. 90 %
c. 95 %
b. 5 %
d. 1.0 %
__________ 4. Likewise, at α = 0.05, the researcher is sure that his work is correct by
a. 90 %
c. 95 %
b. 5 %
d. 1.0 %
__________ 5. Which of the following hypotheses is a one-tailed test to the left?
a. Ho: µ ˂ x
c. Ho: µ ˃ x
b. Ho: µ ≠ x
d. Ho: µ = x
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
__________ 6. If the researcher did not reject the null hypothesis when in fact it is false,
then he has committed a
a. Type II error
c. Type I error
b. Both a and b
d. None
__________ 7. The following are all examples of alternative hypotheses, except
a. Ho: µ ˂ x
c. Ho: µ ˃ x
b. Ho: µ ≠ x
d. Ho: µ = x
__________ 8. When the researcher did not reject the null hypothesis when in fact it was
wrong, he has committed a
a. Type II error
c. Type I error
b. Both a and b
d. None
__________ 9. A type II error is also termed
a. α error
c. β error
b. both a and b
d) None
__________ 10. Which of the following tests is a non-directional?
a. Two-tailed test
c. One-tailed test to the right
b. One-tailed test to the left
d. Both b and c
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
WORK PROJECT / LEARNING ACTIVITY NO. 3, MODULE IV
Name: ___________________________________ Yr/Sec: ____________ Date: _________________
C. Write the degree of linear relationship of the following values of r.
The following range of values for quantitative interpretation of the degree of linear
relationship:
1) – 0. 94 =
2) + 0. 02 =
3) + 0. 57 =
4) – 1. 00 =
5) – 0. 34 =
6) + 0. 09 =
7) - 0. 83 =
8) + 1. 00 =
9) - 0.03 =
10) + 0.46 =
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
WORK PROJECT / LEARNING ACTIVITY NO. 4, MODULE IV
Name: ___________________________________ Yr/Sec: ____________ Date: _________________
D. PROBLEM SOLVING.
Simple Linear Regression
The manager of a seafood restaurant was asked to establish a pricing
policy on lobster dinners. Experimenting with prices produced the following data;
(Price, in Peso)
X
600
650
630
580
620
610
590
(Average no. sold/day)
Y
250
220
200
230
180
210
240
XY
X²
Obtain a regression line for the data and predict sales when the price is ₱ 595.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
WORK PROJECT / LEARNING ACTIVITY NO. 5, MODULE IV
Name: ______________________________ Yr/Sec: ____________ Date: _________________
E. PROBLEM SOLVING
Coefficient of Correlation using Pearson’s Product – Moment Method
20 students, randomly selected from BSTM course, were given tests in
College Algebra and Statistics and the results are shown in the table below.
Students
Algebra
(X)
Statistics
(Y)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
XY
X²
Y²
80
73
78
65
83
71
67
68
46
60
68
81
91
82
91
87
50
52
75
77
86
83
71
82
71
68
50
68
67
64
55
64
71
68
59
61
94
87
71
66
Is there a correlation between the two subjects? Find out using the rand make a
scatterpoint diagram.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
WORK PROJECT / LEARNING ACTIVITY NO. 6, MODULE IV
Name: ______________________________ Yr/Sec: ____________ Date: _________________
F. PROBLEM SOLVING.
Coefficient of Correlation using Spearman’s formula
18 TM students were randomly selected and were given tests in Statistics
and Economics. The results are shown in the table below.
Statistics
X
83
67
46
68
91
91
65
81
94
68
75
66
84
55
88
69
82
96
Economics
Y
71
68
60
81
82
59
74
92
89
71
81
73
88
57
94
83
79
87
a) Solve for the coefficient of correlation using Spearman’s formula.
b) Explain the result.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
WORK PROJECT / LEARNING ACTIVITY NO. 7, MODULE IV
Name: ______________________________ Yr/Sec: ____________ Date: _________________
G. PROBLEM SOLVING.
HYPOTHESES ESTING
1. The manufacturer of soap claims that the average weight of detergent soap per
pack is 200 grams. A researcher sampled 20 packs of this soap and got an
average weight of 198.7 grams and a standard deviation of 5 grams. Is the
claim of the manufacturer valid? Test at .05 level of significance.
2. Powder milk is packed in 1 – kg bag. An inspector from the Department of
Trade and Industry (DTI) suspects the bags may not contain 1 kg. A sample of
40 bags produces a mean of 0.96 kg and a standard deviation of 0.12 kg. Is
there enough evidence to conclude that the bags do not contain 1 kg as stated
at α = 0.05. Also, find the 95 % confidence interval of the true mean.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
Table 1. Standard Normal Curve Areas
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.80780.81060.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.95150.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
ISHRM 2014
Learning Material
Statistics
Module
z Area
3.50 0.99976737
4.00 0.99996833
4.50 0.99999660
5.00 0.99999971
ISHRM 2014
Learning Material
Statistics
Module
z
-3.4
-3.3
-3.2
-3.1
-3.0
-2.9
-2.8
-2.7
-2.6
-2.5
-2.4
-2.3
-2.2
-2.1
-2.0
-1.9
-1.8
-1.7
-1.6
-1.5
-1.4
-1.3
-1.2
-1.1
-1.0
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
-0.0
0.00
0.0003
0.0005
0.0007
0.0010
0.0013
0.0019
0.0026
0.0035
0.0047
0.0062
0.0082
0.0107
0.0139
0.0179
0.0228
0.0287
0.0359
0.0446
0.0548
0.0668
0.0808
0.0968
0.1151
0.1357
0.1587
0.1841
0.2119
0.2420
0.2743
0.3085
0.3446
0.3821
0.4207
0.4602
0.5000
0.01
0.0003
0.0005
0.0007
0.0009
0.0013
0.0018
0.0025
0.0034
0.0045
0.0060
0.0080
0.0104
0.0136
0.0174
0.0222
0.0281
0.0351
0.0436
0.0537
0.0655
0.0793
0.0951
0.1131
0.1335
0.1562
0.1814
0.2090
0.2389
0.2709
0.3050
0.3409
0.3783
0.4168
0.4562
0.4960
ISHRM 2014
Learning Material
Statistics
Table 1. Standard Normal Curve Areas
0.02
0.03
0.04
0.05
0.06
0.0003 0.0003 0.0003 0.0003 0.0003
0.0005 0.0004 0.0004 0.0004 0.0004
0.0006 0.0006 0.0006 0.0006 0.0006
0.0009 0.0009 0.0008 0.0008 0.0008
0.0013 0.0012 0.0012 0.0011 0.0011
0.0018 0.0017 0.0016 0.0016 0.0015
0.0024 0.0023 0.0023 0.0022 0.0021
0.0033 0.0032 0.0031 0.0030 0.0029
0.0044 0.0043 0.0041 0.0040 0.0039
0.0059 0.0057 0.0055 0.0054 0.0052
0.0078 0.0075 0.0073 0.0071 0.0069
0.0102 0.0099 0.0096 0.0094 0.0091
0.0132 0.0129 0.0125 0.0122 0.0119
0.0170 0.0166 0.0162 0.0158 0.0154
0.0217 0.0212 0.0207 0.0202 0.0197
0.0274 0.0268 0.0262 0.0256 0.0250
0.0344 0.0336 0.0329 0.0322 0.0314
0.0427 0.0418 0.0409 0.0401 0.0392
0.0526 0.0516 0.0505 0.0495 0.0485
0.0643 0.0630 0.0618 0.0606 0.0594
0.0778 0.0764 0.0749 0.0735 0.0721
0.0934 0.0918 0.0901 0.0885 0.0869
0.1112 0.1093 0.1075 0.1056 0.1038
0.1314 0.1292 0.1271 0.1251 0.1230
0.1539 0.1515 0.1492 0.1469 0.1446
0.1788 0.1762 0.1736 0.1711 0.1685
0.2061 0.2033 0.2005 0.1977 0.1949
0.2358 0.2327 0.2296 0.2266 0.2236
0.2676 0.2643 0.2611 0.2578 0.2546
0.3015 0.2981 0.2946 0.2912 0.2877
0.3372 0.3336 0.3300 0.3264 0.3228
0.3745 0.3707 0.3669 0.3632 0.3594
0.4129 0.4090 0.4052 0.4013 0.3974
0.4522 0.4483 0.4443 0.4404 0.4364
0.4920 0.4880 0.4840 0.4801 0.4761
Module
0.07
0.0003
0.0004
0.0005
0.0008
0.0011
0.0015
0.0021
0.0028
0.0038
0.0051
0.0068
0.0089
0.0116
0.0150
0.0192
0.0244
0.0307
0.0384
0.0475
0.0582
0.0708
0.0853
0.1020
0.1210
0.1423
0.1660
0.1922
0.2206
0.2514
0.2843
0.3192
0.3557
0.3936
0.4325
0.4721
0.08
0.0003
0.0004
0.0005
0.0007
0.0010
0.0014
0.0020
0.0027
0.0037
0.0049
0.0066
0.0087
0.0113
0.0146
0.0188
0.0239
0.0301
0.0375
0.0465
0.0571
0.0694
0.0838
0.1003
0.1190
0.1401
0.1635
0.1894
0.2177
0.2483
0.2810
0.3156
0.3520
0.3897
0.4286
0.4681
0.09
0.0002
0.0003
0.0005
0.0007
0.0010
0.0014
0.0019
0.0026
0.0036
0.0048
0.0064
0.0084
0.0110
0.0143
0.0183
0.0233
0.0294
0.0367
0.0455
0.0559
0.0681
0.0823
0.0985
0.1170
0.1379
0.1611
0.1867
0.2148
0.2451
0.2776
0.3121
0.3483
0.3859
0.4247
0.4641
z Area
-3.50 0.00023263
-4.00 0.00003167
-4.50 0.00000340
-5.00 0.00000029
Source: Computed by M. Longnecker using Splus
ISHRM 2014
Learning Material
Statistics
Module
Table 2. Critical Values for the Student’s t Distribution
1 tail α =
0.1
0.05
0.025
0.01
0.005
2 tails α =
0.2
0.1
0.05
0.02
0.01
df =1
3.078
6.314
12.706
31.821
63.656
2
1.886
2.920
4.303
6.965
9.925
3
1.638
2.353
3.182
4.541
5.841
4
1.533
2.132
2.776
3.747
4.604
5
1.476
2.015
2.571
3.365
4.032
6
1.440
1.943
2.447
3.143
3.707
7
1.415
1.895
2.365
2.998
3.499
8
1.397
1.860
2.306
2.896
3.355
9
1.383
1.833
2.262
2.821
3.250
10
1.372
1.812
2.228
2.764
3.169
11
1.363
1.796
2.201
2.718
3.106
12
1.356
1.782
2.179
2.681
3.055
13
1.350
1.771
2.160
2.650
3.012
14
1.345
1.761
2.145
2.624
2.977
15
1.341
1.753
2.131
2.602
2.947
16
1.337
1.746
2.120
2.583
2.921
17
1.333
1.740
2.110
2.567
2.898
18
1.330
1.734
2.101
2.552
2.878
19
1.328
1.729
2.093
2.539
2.861
20
1.325
1.725
2.086
2.528
2.845
21
1.323
1.721
2.080
2.518
2.831
22
1.321
1.717
2.074
2.508
2.819
23
1.319
1.714
2.069
2.500
2.807
24
1.318
1.711
2.064
2.492
2.797
25
1.316
1.708
2.060
2.485
2.787
26
1.315
1.706
2.056
2.479
2.779
27
1.314
1.703
2.052
2.473
2.771
28
1.313
1.701
2.048
2.467
2.763
29
1.311
1.699
2.045
2.462
2.756
30
1.310
1.697
2.042
2.457
2.750
60
1.296
1.671
2.000
2.390
2.660
120
1.289
1.658
1.980
2.358
2.617
∞
1.282
1.645
1.960
2.326
2.576
ISHRM2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
RANDOM NUMBERS
Every individual in the population is equally likely to be chosen for the sample
Every set of size n is equally likely of being chosen.
A simple random sample guards against bias and allows us to apply results from
probability, such as the central limit theorem, to our sample.
Random samples are so necessary that we consider it important to have a process to
obtain such a sample. While computers will generate random numbers, these are actually
pseudorandom. Good tables of random digits are the result of random physical processes. The
following example goes through a detailed example. We can see how to construct a simple
random sample with the use of a table of random digits.
Statement of Problem
Suppose that we have a population of 85 college students and want to form a simple
random sample of size eleven to survey about some issues on campus. We begin by assigning
numbers to each of our students. Since there are a total of 85 students, and 85 is a two digit
number, every individual in the population is assigned a two digit number beginning 01, 02, 03, .
. . 83, 84, 85.
Use of the Table
We will use a table of random numbers to determine which of the 85 students should be
chosen in our sample. We blindly start at any place in our table and write the random digits in
groups of two. Beginning at the fifth digit of the first line we have:
23 44 92 72 75 19 82 88 29 39 81 82 88
The first eleven numbers that are in the range from 01 to 85 are selected from the list.
The numbers below that are in bold print correspond to this:
23 44 92 72 75 19 82 88 29 39 81 82 88
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
At this point there are a few things to note about this particular example of the process of
selecting a simple random sample. The number 92 was omitted because this number is greater
than the total number of students in our population. We omit the final two numbers in the list, 82
and 88. This is because we have already included these two numbers in our sample. We only
have ten individuals in our sample. To obtain another subject it is necessary to continue to the
next row of the table. This line begins:
29 39 81 82 86 04
The numbers 29, 39, 81 and 82 have already been included in our sample. So we see that
the first two-digit number that fits in our range and does not repeat a number that has already
been selected for the sample is 86.
Conclusion of the Problem
The final step is to contact students who have been identified with the following
numbers:
23, 44, 72, 75, 19, 82, 88, 29, 39, 81, 86
Table 3. A well-constructed survey can be administered to this group of students and the
results tabulated.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
Table 3. Random Numbers
39634
62349
74088
65564
16379
19713
39153
69459
17986
14595
35050
40469
27478
44526
67331
93365
54526
22356
24537
93208
30734
71571
83722
79712
25775
65178
07763
82928
31131
30196
64628
89126
91254
24090
25752
03091
39411
73146
06089
15630
42831
95113
43511
42082
15140
34733
68076
18292
69486
80468
80583
70361
41047
26792
78466
03395
17635
09697
82447
31405
00209
90404
99457
72570
42194
49043
24330
14939
09865
45906
05409
20830
01911
6 0767
55248
79253
12317
84120
77772
50103
95836
22530
91785
80210
34361
52228
33869
94332
83868
61672
65358
70469
87149
89509
72176
18103
55169
79954
72002
20582
72249
04037
36192
40221
14918
53437
60571
40995
55006
10694
41692
40581
93050
48734
34652
41577
04631
49184
39295
81776
61885
50796
96822
82002
07973
52925
75467
86013
98072
91942
48917
48129
48624
48248
91465
54898
61220
18721
67387
66575
88378
84299
12193
03785
49314
39761
99132
28775
45276
91816
77800
25734
09801
92087
02955
12872
89848
48579
06028
13827
24028
03405
01178
06316
81916
40170
53665
87202
88638
47121
86558
84750
43994
01760
96205
27937
45416
71964
52261
30781
78545
49201
05329
14182
10971
90472
44682
39304
19819
55799
14969
64623
82780
35686
30941
14622
04126
25498
95452
63937
58697
31973
06303
94202
62287
56164
79157
98375
24558
99241
38449
46438
91579
01907
72146
05764
22400
94490
49833
09258
62134
87244
73348
80114
78490
64735
31010
66975
28652
36166
72749
13347
65030
26128
49067
27904
49953
74674
94617
13317
81638
36566
42709
33717
59943
12027
46547
61303
46699
76243
46574
79670
10342
89543
75030
23428
29541
32501
89422
87474
11873
57196
32209
67663
07990
12288
59245
83638
23642
61715
13862
72778
09949
23096
01791
19472
14634
31690
36602
62943
08312
27886
82321
28666
72998
22514
51054
22940
31842
54245
11071
44430
94664
91294
35163
05494
32882
23904
41340
61185
82509
11842
86963
50307
07510
32545
90717
46856
86079
13769
07426
67341
80314
58910
93948
85738
69444
09370
58194
28207
57696
25592
91221
95386
15857
84645
89659
80535
93233
82798
08074
89810
48521
90740
02687
83117
74920
25954
99629
78978
20128
53721
01518
40699
20849
04710
38989
91322
56057
58573
00190
27157
83208
79446
92987
61357
38752
55424
94518
45205
23798
55425
32454
34611
39605
39981
74691
40836
30812
38563
85306
57995
68222
39055
43890
36956
84861
63624
04961
55439
99719
36036
74274
53901
34643
06157
89500
57514
93977
42403
95970
81452
48873
00784
58347
40269
11880
43395
28249
38743
56651
91460
92462
98566
72062
18556
55052
47614
80044
60015
71499
80220
35750
67337
47556
55272
55249
79100
34014
17037
66660
78443
47545
70736
65419
77489
70831
73237
14970
23129
35483
84563
79956
88618
54619
24853
59783
47537
88822
47227
09262
25041
57862
19203
86103
02800
23198
70639
43757
52064
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
Important Statistics Formulas
Parameters
Population mean = μ = ( Σ Xi ) / N
Population standard deviation = σ = sqrt [ Σ ( Xi - μ )2 / N ]
Population variance = σ2 = Σ ( Xi - μ )2 / N
Variance of population proportion = σP2 = PQ / n
Standardized score = Z = (X - μ) / σ
Population correlation coefficient = ρ = [ 1 / N ] * Σ { [ (Xi - μX) / σx ] * [ (Yi - μY) / σy ] }
Statistics
Unless otherwise noted, these formulas assume simple random sampling.
Sample mean = x = ( Σ xi ) / n
Sample standard deviation = s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]
Sample variance = s2 = Σ ( xi - x )2 / ( n - 1 )
Variance of sample proportion = sp2 = pq / (n - 1)
Pooled sample proportion = p = (p1 * n1 + p2 * n2) / (n1 + n2)
Pooled sample standard deviation = sp = sqrt [ (n1 - 1) * s12 + (n2 - 1) * s22 ] / (n1 + n2 - 2)
]
Sample correlation coefficient = r = [ 1 / (n - 1) ] * Σ { [ (xi - x) / sx ] * [ (yi - y) / sy ] }
Correlation
Pearson product-moment correlation = r = Σ (xy) / sqrt [ ( Σ x2 ) * ( Σ y2 ) ]
Linear correlation (sample data) = r = [ 1 / (n - 1) ] * Σ { [ (xi - x) / sx ] * [ (yi - y) / sy ] }
Linear correlation (population data) = ρ = [ 1 / N ] * Σ { [ (Xi - μX) / σx ] * [ (Yi - μY) / σy ] }
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
Simple Linear Regression
Simple linear regression line: ŷ = b0 + b1x
Regression coefficient = b1 = Σ [ (xi - x) (yi - y) ] / Σ [ (xi - x)2]
Regression slope intercept = b0 = y - b1 * x
Regression coefficient = b1 = r * (sy / sx)
Standard error of regression slope = sb1 = sqrt [ Σ(yi - ŷi)2 / (n - 2) ] / sqrt [ Σ(xi - x)2 ]
Counting
n factorial: n! = n * (n-1) * (n - 2) * . . . * 3 * 2 * 1. By convention, 0! = 1.
Permutations of n things, taken r at a time: nPr = n! / (n - r)!
Combinations of n things, taken r at a time: nCr = n! / r!(n - r)! = nPr / r!
Probability
Rule of addition: P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
Rule of multiplication: P(A ∩ B) = P(A) P(B|A)
Rule of subtraction: P(A') = 1 - P(A)
Random Variables
In the following formulas, X and Y are random variables, and a and b are constants.
Expected value of X = E(X) = μx = Σ [ xi * P(xi) ]
Variance of X = Var(X) = σ2 = Σ [ xi - E(x) ]2 * P(xi) = Σ [ xi - μx ]2 * P(xi)
Normal random variable = z-score = z = (X - μ)/σ
Chi-square statistic = Χ2 = [ ( n - 1 ) * s2 ] / σ2
f statistic = f = [ s12/σ12 ] / [ s22/σ22 ]
Expected value of sum of random variables = E(X + Y) = E(X) + E(Y)
Expected value of difference between random variables = E(X - Y) = E(X) - E(Y)
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
Variance of the sum of independent random variables = Var(X + Y) = Var(X) + Var(Y)
Variance of the difference between independent random variables = Var(X - Y) = Var(X) +
Var(Y)
Sampling Distributions
Mean of sampling distribution of the mean = μx = μ
Mean of sampling distribution of the proportion = μp = P
Standard deviation of proportion = σp = sqrt[ P * (1 - P)/n ] = sqrt( PQ / n )
Standard deviation of the mean = σx = σ/sqrt(n)
Standard deviation of difference of sample means = σd = sqrt[ (σ12 / n1) + (σ22 / n2) ]
Standard deviation of difference of sample proportions = σd = sqrt{ [P1(1 - P1) / n1] + [P2(1 - P2) /
n2] }
Standard Error
Standard error of proportion = SEp = sp = sqrt[ p * (1 - p)/n ] = sqrt( pq / n )
Standard error of difference for proportions = SEp = sp = sqrt{ p * ( 1 - p ) * [ (1/n1) + (1/n2) ] }
Standard error of the mean = SEx = sx = s/sqrt(n)
Standard error of difference of sample means = SEd = sd = sqrt[ (s12 / n1) + (s22 / n2) ]
Standard error of difference of paired sample means = SEd = sd = { sqrt [ (Σ(di - d)2 / (n - 1) ] } /
sqrt(n)
Pooled sample standard error = spooled = sqrt [ (n1 - 1) * s12 + (n2 - 1) * s22 ] / (n1 + n2 - 2) ]
Standard error of difference of sample proportions = sd = sqrt{ [p1(1 - p1) / n1] + [p2(1 - p2) / n2] }
Discrete Probability Distributions
Binomial formula: P(X = x) = b(x; n, P) = nCx * Px * (1 - P)n - x = nCx * Px * Qn - x
Mean of binomial distribution = μx = n * P
Variance of binomial distribution = σx2 = n * P * ( 1 - P )
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
Negative Binomial formula: P(X = x) = b*(x; r, P) = x-1Cr-1 * Pr * (1 - P)x - r
Mean of negative binomial distribution = μx = rQ / P
Variance of negative binomial distribution = σx2 = r * Q / P2
Geometric formula: P(X = x) = g(x; P) = P * Qx - 1
Mean of geometric distribution = μx = Q / P
Variance of geometric distribution = σx2 = Q / P2
Hypergeometric formula: P(X = x) = h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]
Mean of hypergeometric distribution = μx = n * k / N
Variance of hypergeometric distribution = σx2 = n * k * ( N - k ) * ( N - n ) / [ N2 * ( N - 1 ) ]
Poisson formula: P(x; μ) = (e-μ) (μx) / x!
Mean of Poisson distribution = μx = μ
Variance of Poisson distribution = σx2 = μ
Multinomial formula: P = [ n! / ( n1! * n2! * ... nk! ) ] * ( p1n1 * p2n2 * . . . * pknk )
Linear Transformations
For the following formulas, assume that Y is a linear transformation of the random variable X, defined by
the equation: Y = aX + b.
Mean of a linear transformation = E(Y) = Y = aX + b.
Variance of a linear transformation = Var(Y) = a2 * Var(X).
Standardized score = z = (x - μx) / σx.
t-score = t = (x - μx) / [ s/sqrt(n) ].
Estimation
Confidence interval: Sample statistic + Critical value * Standard error of statistic
Margin of error = (Critical value) * (Standard deviation of statistic)
Margin of error = (Critical value) * (Standard error of statistic)
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
Hypothesis Testing
Standardized test statistic = (Statistic - Parameter) / (Standard deviation of statistic)
One-sample z-test for proportions: z-score = z = (p - P0) / sqrt( p * q / n )
Two-sample z-test for proportions: z-score = z = z = [ (p1 - p2) - d ] / SE
One-sample t-test for means: t-score = t = (x - μ) / SE
Two-sample t-test for means: t-score = t = [ (x1 - x2) - d ] / SE
Matched-sample t-test for means: t-score = t = [ (x1 - x2) - D ] / SE = (d - D) / SE
Chi-square test statistic = Χ2 = Σ[ (Observed - Expected)2 / Expected ]
Degrees of Freedom
The correct formula for degrees of freedom (DF) depends on the situation (the nature of the test statistic,
the number of samples, underlying assumptions, etc.).
One-sample t-test: DF = n - 1
Two-sample t-test: DF = (s12/n1 + s22/n2)2 / { [ (s12 / n1)2 / (n1 - 1) ] + [ (s22 / n2)2 / (n2 - 1) ] }
Two-sample t-test, pooled standard error: DF = n1 + n2 - 2
Simple linear regression, test slope: DF = n - 2
Chi-square goodness of fit test: DF = k - 1
Chi-square test for homogeneity: DF = (r - 1) * (c - 1)
Chi-square test for independence: DF = (r - 1) * (c - 1)
Sample Size
Below, the first two formulas find the smallest sample sizes required to achieve a fixed margin of error,
using simple random sampling. The third formula assigns sample to strata, based on a proportionate
design. The fourth formula, Neyman allocation, uses stratified sampling to minimize variance, given a
fixed sample size. And the last formula, optimum allocation, uses stratified sampling to minimize
variance, given a fixed budget.
ISHRM 2014
Learning Material
Statistics
MT 312/MT 322/TM Module IV
Mean (simple random sampling): n = { z2 * σ2 * [ N / (N - 1) ] } / { ME2 + [ z2 * σ2 / (N - 1) ] }
Proportion (simple random sampling): n = [ ( z2 * p * q ) + ME2 ] / [ ME2 + z2 * p * q / N ]
Proportionate stratified sampling: nh = ( Nh / N ) * n
Neyman allocation (stratified sampling): nh = n * ( Nh * σh ) / [ Σ ( Ni * σi ) ]
Optimum allocation (stratified sampling):
nh = n * [ ( Nh * σh ) / sqrt( ch ) ] / [ Σ ( Ni * σi ) / sqrt( ci ) ]