Download Mean Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Mean Analysis
1
Introduction


If we use sample mean (the mean of the
sample) to approximate the population mean
(the mean of the population), errors will be
introduced.
Two questions:
- How good is the approximation?
- If the error is too large, what should we do
to reduce it?
2
Confidence interval



Confidence interval gives us the interval in
which the population mean would most likely
fall.
The confidence level tells us (intuitively) the
possibility the population mean will be in the
confidence interval.
The smaller the confidence interval and the
higher the confidence level, the better the
3
approximation.
Calculation of the confidence
interval
The equation for confidence interval is:
y  z s
n
s
z

The half-width of the C.I.:
n
The point estimate is: y
4
Some z values
Confidence level
z
90%
1.65
95%
2
99%
2.6
99.7%
3
5
How to find z value
Given a confidence level, say 80%, we need
to find the Area value from the normal table
that is closest to 80% and then find the
corresponding z value, which will be the z
value for the confidence interval formula.
6
Properties



The larger the sample size n, the narrower
the confidence interval.
The higher the confidence level, the larger z
and thus the wider the confidence interval.
The population size has nothing to do with
the confidence interval, as long as it is large
enough.
7
Distribution of the sample mean



If the sample size is large enough, say larger
than 30, the histogram of the sample means
is very close to a Normal distribution.
The mean of the sample means equals to the
population mean.
The standard deviation of the sample means
equals to the population standard deviation
divided by the squared root of n.
8
Conditions for Confidence
Interval



The sampling method should be unbiased
The sample size n should be large enough,
say larger than 30.
The population size N should be much larger
than the sample size n.
9
Determine the correct
sample size n
1. Take a random sample of size n.
2. Calculate the confidence interval based on your
sample.
3. Check if the confidence interval is small enough. If
it is too wide, decide a proper width.
4. Assuming the sample standard deviation will
remain the same, use the equation for confidence
interval to estimate the required sample size.
5. Take a new sample and calculate the confidence
10
interval.
When n is small


When n is small, say, n=5, a formula for the
confidence interval can be derived only when
the population distribution is normal.
The formula for confidence interval is very
similar to the one we have used, the
difference is we will have to employ the socalled Student t distribution instead of
normal distribution.
11
The Confidence interval for
population proportion
The equation is:
p z p(1 p)
n
– n is the sample size,
– z is determined by the confidence level, e.g., if the
confidence level is 95% z is equal to 2, and
– p is the proportion of yes answers in the sample.
12
Conditions for Confidence
Interval



The sampling method should be unbiased
The sample size n should be large enough. In
particular, both np and n(1-p) should be larger
than 5.
The population size N should be much larger
than the sample size n.
13
Comparing the means of two
populations
The formula is:
s2 s2
(y  y )  z 2  1
2 1
n
n
2 1
–
y2 and y1 are the sample means;
– s1 and s2 are the standard deviations of the two samples;
– n1 and n2 are the sample sizes of the two samples; and
– z is selected to provide the desired confidence level.
14
Conditions for Confidence
Interval



The two samples should be independently
and randomly selected;
The sample sizes should be large enough –
at least 30;
The population sizes should be much larger
than the corresponding sample sizes.
15
Compare proportions of two
means
The formula is the following:
p (1 p ) p (1 p )
2  1
1
(p  p )  z 2
2 1
n
n
2
1
16