Download Parametric Statistical Inference

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
5/3/2017
Parametric Statistical Inference
Instructor: Ron S. Kenett
Email: [email protected]
Course Website: www.kpa.co.il/biostat
Course textbook: MODERN INDUSTRIAL STATISTICS,
Kenett and Zacks, Duxbury Press, 1998
(c) 2001, Ron S. Kenett, Ph.D.
1
5/3/2017
Course Syllabus
•Understanding Variability
•Variability in Several Dimensions
•Basic Models of Probability
•Sampling for Estimation of Population Quantities
•Parametric Statistical Inference
•Computer Intensive Techniques
•Multiple Linear Regression
•Statistical Process Control
•Design of Experiments
(c) 2001, Ron S. Kenett, Ph.D.
2
Definitions

Null Hypotheses


5/3/2017
H0: Put here what is typical of the
population, a term that characterizes “business
as usual” where nothing out of the ordinary
occurs.
Alternative Hypotheses

H1: Put here what is the challenge, the view
of some characteristic of the population that, if
it were true, would trigger some new action,
some change in procedures that had previously
defined “business as usual.”
(c) 2001, Ron S. Kenett, Ph.D.
3
The Logic of Hypothesis Testing

Step 1.
A claim is made.

A new claim is asserted
that challenges existing
thoughts about a population
characteristic.

(c) 2001, Ron S. Kenett, Ph.D.
5/3/2017
Suggestion: Form the
alternative hypothesis first,
since it embodies the
challenge.
4
The Logic of Hypothesis Testing

Step 2.
How much
error are you
willing to
accept?
(c) 2001, Ron S. Kenett, Ph.D.

5/3/2017
Select the maximum
acceptable error, a. The
decision maker must elect how
much error he/she is willing to
accept in making an inference
about the population. The
significance level of the test is
the maximum probability that
the null hypothesis will be
rejected incorrectly, a Type I
error.
5
The Logic of Hypothesis Testing

Step 3.
If the null
hypothesis
were true, what
would you
expect to see?
(c) 2001, Ron S. Kenett, Ph.D.

5/3/2017
Assume the null hypothesis
is true. This is a very powerful
statement. The test is always
referenced to the null
hypothesis.
Form the rejection region,
the areas in which the
decision maker is willing to
reject the presumption of
the null hypothesis.
6
The Logic of Hypothesis Testing

Step 4.
What did you
actually see?
(c) 2001, Ron S. Kenett, Ph.D.

5/3/2017
Compute the sample
statistic. The sample
provides a set of data that
serves as a window to the
population. The decision
maker computes the sample
statistic and calculates how far
the sample statistic differs
from the presumed distribution
that is established by the null
hypothesis.
7
The Logic of Hypothesis Testing

5/3/2017
Step 5.  The decision is a conclusion supported
Make the by evidence. The decision maker will:
 reject the null hypothesis if the sample
decision.

(c) 2001, Ron S. Kenett, Ph.D.
evidence is so strong, the sample statistic so
unlikely, that the decision maker is
convinced H1 must be true.
fail to reject the null hypothesis if the
sample statistic falls in the nonrejection
region. In this case, the decision maker is
not concluding the null hypothesis is true,
only that there is insufficient evidence to
dispute it based on this sample.
8
The Logic of Hypothesis Testing

Step 6.
What are the
implications
of the
decision for
future
actions?
(c) 2001, Ron S. Kenett, Ph.D.

5/3/2017
State what the decision means
in terms of the research
program.
The decision maker must draw out
the implications of the decision. Is
there some action triggered, some
change implied? What
recommendations might be
extended for future attempts to
test similar hypotheses?
9
Two Types of Errors

Type I Error:



5/3/2017
Saying you reject H0 when it really is true.
Rejecting a true H0.
Type II Error:


Saying you do not reject H0 when it really
is false.
Failing to reject a false H0.
(c) 2001, Ron S. Kenett, Ph.D.
10
What are acceptable error levels?

5/3/2017
Decision makers frequently use a 5%
significance level.



Use a = 0.05.
An a-error means that we will decide to adjust
the machine when it does not need
adjustment.
This means, in the case of the robot welder, if
the machine is running properly, there is only a
0.05 probability of our making the mistake of
concluding that the robot requires adjustment
when it really does not.
(c) 2001, Ron S. Kenett, Ph.D.
11
Three Types of Tests

Nondirectional, two-tail test:


H1: pop parameter n.e. value
Directional, right-tail test:


5/3/2017
H1: pop parameter > value
Directional, left-tail test:

H1: pop parameter < value
Always put hypotheses in terms of
population parameters and have
H0: pop parameter = value
(c) 2001, Ron S. Kenett, Ph.D.
12
5/3/2017
Two tailed test
H0: pop parameter = value
H1: pop parameter n.e. value
Do Not
Reject H
Reject H
a/2
(c) 2001, Ron S. Kenett, Ph.D.
Reject H
1-a
a/2
13
5/3/2017
Right tailed test
H0: pop parameter = value
H1: pop parameter > value
Do Not Reject H
1-a
(c) 2001, Ron S. Kenett, Ph.D.
Reject H
a
14
5/3/2017
Left tailed test
H0: pop parameter = value
H1: pop parameter < value
Reject H
a
(c) 2001, Ron S. Kenett, Ph.D.
Do Not Reject H
1-a
15
5/3/2017
Ho
H1
Ho
(c) 2001, Ron S. Kenett, Ph.D.
H1
Type I
Error
OK
OK
Type II
Error
16
What Test to Apply?
5/3/2017
Ask the following questions:
 Are the data the result of a measurement
(a continuous variable) or a count (a
discrete variable)?
 Is s known?
 What shape is the distribution of the
population parameter?
 What is the sample size?
(c) 2001, Ron S. Kenett, Ph.D.
17
Test of µ, s Known, Population
Normally Distributed

Test Statistic:

5/3/2017
x –m
z= s 0
n
where




is the sample statistic.
µ0 is the value identified in the null hypothesis.
s is known.
n is the sample size.
x
(c) 2001, Ron S. Kenett, Ph.D.
18
Test of µ, s Known, Population Not
Normally Distributed

5/3/2017
If n > 30, Test Statistic:
x –m
z= s 0
n

If n < 30, use a distribution-free test.
(c) 2001, Ron S. Kenett, Ph.D.
19
Test of µ, s Unknown, Population
Normally Distributed

5/3/2017
Test Statistic:

where


x –m
t= s 0
n
x is the sample statistic.
µ0 is the value identified in the null hypothesis.

s is unknown.
n is the sample size

degrees of freedom on t are n – 1.

(c) 2001, Ron S. Kenett, Ph.D.
20
Test of µ, s Unknown, Population Not
Normally Distributed
5/3/2017


If n > 30, Test Statistic:
x –m
t= s 0
n
If n < 30, use a distribution-free test.
(c) 2001, Ron S. Kenett, Ph.D.
21
Test of p, Sample Sufficiently Large
5/3/2017

If both n p > 5 and n(1 – p) > 5,
Test Statistic:
p–p
0
z=
p (1–p )
0
0
n



where p = sample proportion
p0 is the value identified in the null hypothesis.
n is the sample size.
(c) 2001, Ron S. Kenett, Ph.D.
22
Test of p, Sample Not Sufficiently
Large
5/3/2017
If either n p < 5 or n(1 – p) < 5,
convert the proportion to the underlying
binomial distribution.
 Note there is no t-test on a population
proportion.

(c) 2001, Ron S. Kenett, Ph.D.
23
Observed Significance Levels

5/3/2017
A p-Value is:





the exact level of significance of the test statistic.
the smallest value a can be and still allow us to reject
the null hypothesis.
the amount of area left in the tail beyond the test
statistic for a one-tailed hypothesis test or
twice the amount of area left in the tail beyond the test
statistic for a two-tailed test.
the probability of getting a test statistic from another
sample that is at least as far from the hypothesized
mean as this sample statistic is.
(c) 2001, Ron S. Kenett, Ph.D.
24
Observed Significance Levels

5/3/2017
A p-Value is:





the exact level of significance of the test statistic.
the smallest value a can be and still allow us to reject
the null hypothesis.
the amount of area left in the tail beyond the test
statistic for a one-tailed hypothesis test or
twice the amount of area left in the tail beyond the test
statistic for a two-tailed test.
the probability of getting a test statistic from another
sample that is at least as far from the hypothesized
mean as this sample statistic is.
(c) 2001, Ron S. Kenett, Ph.D.
25
Several Samples

Independent
Samples:

Testing a company’s
claim that its peanut
butter contains less fat
than that produced by
a competitor.
(c) 2001, Ron S. Kenett, Ph.D.
5/3/2017

Dependent
Samples:

Testing the relative
fuel efficiency of 10
trucks that run the
same route twice,
once with the current
air filter installed and
once with the new
filter.
26
Test of (µ1 – µ2), s1 = s2,
Populations Normal

5/3/2017
Test Statistic
[x – x ] – [m – m ]
t = 1 2 ! 1 !2
!
!
!
1
1
s p2! n + n !!
!!
! 1
2 !!!

(n –1) s 2 + (n –1) s 2
1
2
2
where s p2 = 1
n +n – 2
1 2
where degrees of freedom on t = n1 + n2 – 2
(c) 2001, Ron S. Kenett, Ph.D.
27
Example:
Comparing Two populations
5/3/2017
H0: pop1 = pop2
H1: pop1 n.e. pop2
Hypothesis
Assumption
Test Statistic
The mean of population 1 is equal to the mean of
population 2
(1) Both distributions are normal
(2) s1 = s2
t=
X1 - X 2
(1 / n1 + 1 / n2 )  (n1 - 1)s12 + (n2 - 1)s2 2 / (n1 + n2 - 2)
t distribution with df = n1+ n2-2
(c) 2001, Ron S. Kenett, Ph.D.
28
Example:
Comparing Two populations
5/3/2017
Rejection
Region
0.5
t(x; nu)
Rejection
Region
0.4
nu=50
0.3
0.2
nu=5
0.1
0.0
-5
0
5
t
t=
X1 - X 2
(1 / n1 + 1 / n2 )  (n1 - 1)s12 + (n2 - 1)s2 2 / (n1 + n2 - 2)
t distribution with df = n1+ n2-2
(c) 2001, Ron S. Kenett, Ph.D.
29
Test of (µ1 – µ2), s1 n.e. s2,
Populations Normal, large n

5/3/2017
Test Statistic
[x – x ]–[m – m ]
1 20
z = 1 2
s2 s 2
1 + 2
n n
1
2

with s12 and s22 as estimates for s12 and s22
(c) 2001, Ron S. Kenett, Ph.D.
30
Test of Dependent Samples
(µ1 – µ2) = µd

Test Statistic
t=s d
d
5/3/2017
n
where d = (x1 – x2)
d = Sd/n, the average difference
n = the number of pairs of
observations
sd = the standard deviation of d
df = n – 1

(c) 2001, Ron S. Kenett, Ph.D.
31
Test of (p1 – p2), where n1p1>5,
n1(1–p1)>5, n2p2>5, and n2 (1–p2 )>5
5/3/2017

Test Statistic

p -p
1 2
z=


 1
p (1- p)  n + n1 

2 
 1
where p1 =
p2 =
n1 =
n2 =
observed proportion, sample 1
observed proportion, sample 2
sample size, sample 1
sample size , sample 2
n p + n p
2 2
p = 1 1
n + n
1
2
(c) 2001, Ron S. Kenett, Ph.D.
32
Test of Equal Variances



5/3/2017
Pooled-variances t-test assumes the two
population variances are equal.
The F-test can be used to test that
assumption.
The F-distribution is the sampling
distribution of s12/s22 that would result if
two samples were repeatedly drawn from
a single normally distributed population.
(c) 2001, Ron S. Kenett, Ph.D.
33
Test of



s12
=
s22
5/3/2017
If s12 = s22 , then s12/s22 = 1. So the
hypotheses can be worded either way.
s2
s 2
Test Statistic: F = 1 or 2 whichever is larger
s 2
s2
2
1
The critical value of the F will be F(a/2, n1, n2)

where a = the specified level of significance
n1 = (n – 1), where n is the size of the sample
with the larger variance
n2 = (n – 1), where n is the size of the sample
with the smaller variance
(c) 2001, Ron S. Kenett, Ph.D.
34
Confidence Interval for (µ1 – µ2)

5/3/2017
The (1 – a)% confidence interval for the
difference in two means:

Equal variances, populations normal
!
!
!
!
!
!
!
!
1
1
( x – x ) ± t a ‫ ׳‬s 2 + !!!
p
1 2
2
n n !!
1 2!

Unequal variances, large samples
s2 s 2
(x – x ) ± z a ‫ ׳‬1 + 2
1 2
n
2 n
1
2
(c) 2001, Ron S. Kenett, Ph.D.
35
Confidence Interval for (p1 – p2)

5/3/2017
The (1 – a)% confidence interval for the
difference in two proportions:
p (1– p )
p (1– p )
1 + 2
2
(p – p ) ± z a ‫ ׳‬1
1 2
n
n
2
1
2

when sample sizes are sufficiently large.
(c) 2001, Ron S. Kenett, Ph.D.
36
Summary
5/3/2017
Hypothesis
Assumption
Test Statistic
The mean of population 1 is equal to the mean of
population 2
(1) Both distributions are normal
(2) s1 = s2
t=
X1 - X 2
(1 / n1 + 1 / n2 )  (n1 - 1)s12 + (n2 - 1)s2 2 / (n1 + n2 - 2)
t distribution with df = n1+ n2-2
The standard deviation of population 1 is equal to
the standard deviation of population 2
Both distributions are normal
2
s
F = 12
s2
F distribution with
df2 = n1-1 and df2 = n2-1
The proportion of error in population 1 is equal to
the proportion of errors in population 2
n1p1 and n2p2 > 5 (approximation by normal
distribution)
X 1 / n1 - X 2 / n2
Z=
pavg (1 - pavg )(1 / n1 + 1 / n2 )
(c) 2001, Ron S. Kenett, Ph.D.
Z - Normal distribution
pavg =
X1 - X 2
n1 - n2
37
Related documents