Download parametric statistical inference: estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
PARAMETRIC STATISTICAL INFERENCE
INFERENCE:
•
Methodologies that allow us to draw conclusions
about population parameters from sample
statistics
TYPES OF INFERENCE:
1.
Estimation
2.
Hypothesis testing
•
Methods based on statistical relationships
between samples and populations
•
POINT ESTIMATION: estimation of
parameter from a sample statistic
–
•
For the mean, standard deviation, etc..
INTERVAL ESTIMATION: using a sample
to identify an interval within which the
population parameter is thought to lie, with
a certain probability
ESTIMATION OF POPULATION MEAN
•
Sample mean value is only an estimate of the
parameter mean value
–
Parameter value is not known
•
Due to sampling variability, no two samples will
produce exactly the same outcome, or sample
mean

Can we estimate how this sample mean value
would vary if you take many large samples
from the same population?
Remember:

sample mean values from large samples have a
normal distribution

the mean of the sampling distribution is the
same as the unknown parameter 
•
standard deviation of x for a SRS of size n is ?
PARAMETRIC STATISTICAL INFERENCE:
ESTIMATION
•
Example: A random sample of 350 male
college students were asked for the number of
units they were taking. The mean was 12.3
units, with a standard deviation of 2.50 units.
•
What can we say about the mean number of
units of all student males at the university?
How will the estimate value of the parameter
vary from one sample to another with a certain
confidence, like 95%?
Assume that  = ?. s = ?
PARAMETRIC STATISTICAL INFERENCE:
ESTIMATION
Statistical confidence
Remember: The 68-95-99.7 rule

In 95% of all samples, the mean score of x
will lie within 2 standard deviations of the
population mean score .
Since s = 2.50, we can say that
In 95% of samples,  will lie within 5.0 points of
the observed sample mean
In 95% of all samples,
x  5.0    x  5.0
• Thus, the parameter will lie between 7.3 and
17.3, in 95% of samples
PARAMETRIC STATISTICAL INFERENCE:
ESTIMATION
Rephrasing:
1. We are 95% confident that the interval 7.317.3 contains 
•
We have just assigned statistical confidence
to our estimation of the parameter
•
We call this estimated interval a
CONFIDENCE INTERVAL for the mean
value
PARAMETRIC STATISTICAL INFERENCE:
ESTIMATION
 But, there is still some chance that the true
parameter value will not lie in the identified interval
•
e.g. The SRS chosen was one of few samples
for which x is not within 5.0 points of true mean.
5% of samples will give these incorrect results
PARAMETRIC STATISTICAL INFERENCE:
ESTIMATION

CONFIDENCE INTERVAL – formal definition
A level C confidence interval for a parameter is
defined as
estimate  margin of error
and gives the interval that will capture the true
parameter value in repeated samples with a certain
probability

Confidence intervals usually vary between 90%
and 99.9%
PARAMETRIC STATISTICAL INFERENCE:
ESTIMATION
BUILDING CONFIDENCE INTERVALS
If we know the parameter  and , we can standardize
the sample mean. The result is the ONE-SAMPLE Z
STATISTIC
z
x

n
The z statistic tells us how far the observed x is from
, in units of standard deviations of x . Because x
has a normal distribution, z has the standard normal
distribution N(0,1).
PARAMETRIC STATISTICAL INFERENCE:
ESTIMATION

Constructing confidence intervals
When we construct a 95% confidence interval,
we are looking for two values for which there
is a 95% chance that the population mean is
between them. So,
P(Low <  < High) = 0.95
Thus, 0.95 = P(-1.96 < z < 1.96)
=
P( 1.96 
x

 1.96)
n

 x    1.96  
)
= P( 1.96 
n
n

= P(  x  1.96 
0.95 = P( x  1.96  
n
n
    x  1.96  
   x  1.96  
n
n
)
)
PARAMETRIC STATISTICAL INFERENCE:
ESTIMATION

Draw a SRS of size n from a population having
unknown mean , and known standard
deviation . A level C confidence interval for 
x  z / 2
x  z / 2

n

n
   x  z / 2

n
This interval is exact when the population
distribution is normal and is approximately
correct for large n in other cases
  1 C
where  represents the probability that the
interval will not capture the true parameter
value in repeated sample or confidence level,
and C is the confidence level.
Confidence intervals and confidence levels of
Standardized normal curve N(0,1)
Figure 6.5 and figure 6.6
z* = z/2
C = chosen confidence level – probability that a
parameter will lie within a given interval with a desired
confidence
(1-C)/2 = probability that a parameter will be situated
either above or below the the lower confidence limit
= /2
PARAMETRIC STATISTICAL INFERENCE:
ESTIMATION

x  z / 2
 Example:
n
• A manufacturer of pharmaceutical products
analyzes a specimen from each batch of a
product to verify the concentration of the
active ingredient. The chemical analysis is not
perfectly precise. Repeated measurements on
the same specimen give slightly different
results. The results of repeated
measurements follow a normal distribution.
The analysis procedure has no bias, so the
mean of the population of all measurements is
the true concentration in the specimen. The
standard deviation of this distribution is
known to be 0.0068 g/l. Three analyses of
one specimen give the following
concentrations
0.8403
0.8363
0.8447
• Calculate the 99% confidence interval for the
true concentration.
PARAMETRIC STATISTICAL INFERENCE:
ESTIMATION

INTERVAL ESTIMATION OF  WITH  UNKNOWN
x  z / 2
•
•

n
   x  z / 2

n
 replaced with estimate s – introduces more
uncertainty
STUDENT’S T-DISTRIBUTION
not standard normal curve
x
t
s
n
x  t / 2,n 1
x  t / 2
s
n
s
s
   x  t / 2
n
n
PARAMETRIC STATISTICAL INFERENCE:
ESTIMATION
INTERVAL ESTIMATION OF  WITH  UNKNOWN
Intervals derived from t-distribution are wider than
those found with z-distribution
For large samples (n=>30), it makes no difference
which distribution we use to estimate confidence
interval
PARAMETRIC STATISTICAL INFERENCE:
ESTIMATION
HOW CONFIDENCE INTERVALS BEHAVE

Ideal situation – high confidence and small
margin of error
Margin of error (E) = z / 2 

The smaller the margin of error, the more
precise our estimation of 
n
PARAMETRIC STATISTICAL INFERENCE:
ESTIMATION

Properties of error
1. Error increases with smaller sample size
For any confidence level, large samples reduce the
margin of error
2. Error increases with larger standard Deviation
As variation among the individuals in the population
increases, so does the error of our estimate
3. Error increases with larger z values
Tradeoff between confidence level and margin of error
Interval width (error) increases with
Increased confidence level
Higher confidence levels have
Higher z values
Figure 8-10 and 8-11
Error is high in small samples
PARAMETRIC STATISTICAL INFERENCE:
ESTIMATION
Example:
Calculate the 99% confidence interval for sample
size of 1.  = 0.8404, = 0.0068
99% confidence interval for n=3 was 0.8303 to
0.8505 g/l
How do these compare in relation to the mean?
Which one has the larger margin of error?
CHOOSING SAMPLE SIZE

Sometimes we wish to estimate our mean within
a certain margin of error.
• Sometimes we wish to determine a certain
sample size in order to achieve a given margin of
error
• Here is how…
Remember:
Margin of error (E) =
z / 2 
n
To obtain a desired value of E, for a given
confidence level, you need to figure out n.
From the above,

 z / 2 
n

 E 
2
It is the sample size that determines the margin
of error
• Required sample size depends on the desired
level of confidence
CHOOSING SAMPLE SIZE
Example:
Management asks the pharmaceutical laboratory
to produce results accurate to within 0.005 with
95% confidence. How many measurements must
be averaged to comply with this request?
m = 0.005 g/l
For 95% confidence level, z = ?
 = 0.0068 g/l
CHOOSING SAMPLE SIZE
Example:
Management asks the pharmaceutical laboratory
to produce results accurate to within 0.005 with
95% confidence. How many measurements must
be averaged to comply with this request?
m = 0.005 g/l
For 95% confidence level, z = 1.960.
 = 0.0068 g/l
 z   1.96  0.0068 
n
 
  7.1
m
0
.
005

 

2
is n = 7 or n = 8?
Choose one that will give a smaller margin of
error.
How should we always round to meet the
requirements necessary?
SUMMARY
All formulas for inference are only correct under
certain conditions
o Most inference methods have several assumptions
attached to them that must be met if the outcomes
produced by them are to be reliable.
Confidence interval formula has the following
assumptions:
1. The data must come from a simple random sample.
different methods exist for stratified and multistage
samples
undercoverage and non-response can add error
2. X bar must be a random normal variable
3. There must be no outliers. Is the formula sensitive
to outliers?
4. If sample size is small (<15) and/or  is not known
but distribution of x still normal, t-distribution must
be used to compute interval
5. When sigma is known use z-distribution.
For large sample sizes we can assume that  = s
and use either z or t distributions