Download Estimation in Sampling!?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Estimation in
Sampling!?
Chapter 7 – Statistical Problem Solving in Geography
Goals
• Basis Concepts in Estimation
• Point Estimation and Interval Estimation
• Sampling Distribution of a Statistic
• Central Limit Theorem
• Confidence Intervals and Estimation
• Standard Normal and Z-Scores
• General Procedure for Constructing a Confidence Interval
• Geographic Examples of Confidence Intervals
• Sample Size Selection
• Mean, Total and Proportion in Sample Size Selection
Points in Estimation
• Estimation: Goal of sampling is estimation and inferences of
population characteristics.
• Point Estimation
• A statistic is calculated from sample to estimate a corresponding
population parameter.
• In probability sampling the “best” point estimate for a population is
the corresponding sample statistic.
• For 𝜇 𝑖𝑡 𝑖𝑠 𝑥, for 𝜎 𝑖𝑡 𝑖𝑠 𝑠 (Sample’s standard deviation)
• Calculating Point Estimates:
• See table 7.1, Page 98 – McGrew and Monroe
Intervals in Estimation
• Interval Estimation
• Due to the nature of uncertainty it is unlikely that a sample statistic
will equal a population parameter.
• Used to determine the distance that a sample statistic is from a
population parameter.
• Interval estimation uses a confidence interval to establish the
likelihood that a sample statistic is within an interval or range from
the population parameter.
• Confidence Interval: Represents level of precision associated with
the population estimate. Width is determined by 1) sample size; 2)
amount of variability in the population’; and 3) the probability level
or level of confidence selected for the problem.
Sampling Distribution of a
Statistic
• A single sample of size n will lead to a distribution curve which
could be any of the curves that we have discussed.
• Examples are Poisson, Uniform, Normal, etc.
• This single sample will produce a sample mean and standard
deviation.
• Sampling Distribution of Sample Means: If you take multiple,
similar-sized independent samples from a population the set
of sample means can be graphed.
• The red curve is the Sampling Distribution of Sample Means
• The black curve represents the frequency
distribution of values within the population
Central Limit Theorem
• Given the effect of randomness in drawing samples, some
sample means will fall above the population mean and some
below.
• Provided they are independent samples the mean of the all of
the sample means will be the population mean.
• The distribution of sample means will also be normal and
centered on the population mean regardless of the
distribution of the population provided that the sample is
larger than 30.
• When the sample size (n) is large, the sample mean(s) will be
closer to the population mean.
Central Limit Theorem
• One final component of the Central Limit Theorem is :
• Standard Error of the Mean: According to this theorem the
standard deviation of the sampling distribution can be determined
σ
by σ𝑥 =
thus standard error is a basic measure of sampling
𝑛
error.
• http://www.youtube.com/watch?v=BvB1QqwurK0
• Sampling Error: The larger the sample size, the smaller the
amount of sampling error. Thus, the larger the sample the closer
the sample mean is to the population mean. In addition, the
larger the standard deviation of the population, the larger the
amount of sampling error due to the larger variability in the
population.
Central Limit Theorem
• The Central Limit Theorem is completely true only for infinitely large
populations.
• Within a finite population a correction process may be incorporated.
• Finite Population Correction: Applied to the estimation process
𝑛
when the sampling fraction is large. Include the fpc in the
𝑁
population estimate equations only when the ratio of sample size to
𝑛
population exceeds 5% ( > .05).
𝑁
• If it is determined that you should include the fpc then the equation
for finding fpc
𝑁−𝑛
𝑁 −1
should be included in the standard error
equation as:
σ𝑥 =
σ
𝑛
(fpc) =
σ
𝑛
𝑁−𝑛
𝑁 −1
Confidence Intervals and
Estimation
• A confidence interval is placed to demonstrate the likelihood that a
sample mean is within an interval range of the population mean.
• A confidence interval is determined: 𝑋 ± 𝑍σ 𝑥
• Z = z-score from the standard normal table
• 𝑋 = sample mean
• σ 𝑥 = standard error of mean
• A 90% confidence interval thus
gives 90% certainty that a
population mean lies within the
confidence interval defined.
• The shaded area in the figure represents the 90% confidence
interval. Notice that there is a .05 area in the upper limit and lower
limit where the true mean could fall.
Z-Scores and Confidence
Intervals
• In order to establish a confidence interval we must determine
a z-score.
• This can be done by looking at a table to see z-scores of
common confidence intervals!
• More information on z-scores can be found at this website.
Using Interval Estimates
• Confidence Level: Probability that the interval surrounding a
sample mean encompasses the true population mean.
Defined as 1 - 𝛼.
• Significance Level: Probability that the interval surrounding a
sample mean fails to encompass the true population mean.
The significance level is denoted by 𝛼 equals the total sampling
error. Since error goes in both directions the probability of it
𝛼
falling into either tail is
2
Constructing a Confidence
Interval
• Establish sample mean, population standard deviation, sample
size and the z-value for the desired confidence level.
• Plug the numbers into the confidence level equation
𝑋 ±𝑍
σ
𝑛
• This will allow you to calculate the sample mean ± the interval
as a z-score.
• Ensure that finite population correction (slide 8) is not
needed.
What Level of Confidence?
• .99, .95 and .90 are the most commonly used confidence
intervals to establish the mean.
• Higher confidence results in wider intervals and thus less
precise estimates but lower sampling error (𝛼).
• Lower confidence results in smaller intervals but higher
sampling error (𝛼).
• Balance acceptable level of error with needed level of
precision.
The Real World: Unknown
Population Standard Deviation
• Rarely do we know the parameters of a population hence our
attempts to estimate them!
• 𝜎 is generally unknown, so how do we estimate standard error?
• Using the sample variance which is the standard deviation squared
is an acceptable approach.
• Standard Error Revisited: The standard deviation of the mean group
of samples.
σ𝑥 =
σ
𝑛
(fpc) =
σ
𝑛
𝑁−𝑛
𝑁 −1
So, we put in the sample variance and take the root of the variance to
get the standard error
σ𝑥 =
𝑠2 𝑁 − 𝑛
𝑛
𝑁
What if Sample Size is Small?
• Z is valid only if the sample size is greater than 30 so our
confidence interval equation must be altered if we have a
smaller sample.
• Instead we use a t-distribution which approaches the standard
normal value as the sample size approaches 30.
• In this instance the confidence interval formula is
𝑋 ± 𝑡σ𝑥
• We can use the t-table to determine the value of t
The T-Table
• The t-table is dependent on two values:
• The Significance Level (𝛼) of which the common levels are .10,
.05, .01 as determined earlier by the common confidence levels.
• Degrees of freedom which is determined by taking the sample
size and subtracting one:
• df = n-1
T-Table (click to view full table)
But! To Calculate a Confidence
Interval….
• Equation used depends on two factors
• The equation used for a confidence interval depends on the
sample type (random, systematic, stratified, etc.)
• Different population parameters require different confidence
interval equations.
Random or Systematic
Samples
• Random or Systematic Sample – Estimating Population Mean
• Use the t equation for samples less than 30 and the z for those
greater.
• 𝜇 = 𝑥 ± 𝑧 𝑜𝑟 𝑡
𝑠2 𝑁 − 𝑛
𝑛
𝑁
• Use sample variance as it is rare that we know the population 𝜎
• Random or Systematic Sample – Population Total
• Best estimate of population total (𝑡) is the sample total (T) which
is T = N𝑋
• Once we know T (which is not a t-score) we plug it into the
equation.
• T±𝑍
𝑠2 𝑁 − 𝑛
2
𝑁
𝑛
𝑁
Random or Systematic
Samples Continue….
• Random or Systematic Sample – Estimate of Population
Proportion
• The best estimate of the population proportion (𝑝) is the sample
proportion (P)
• The sample proportion is the number of individuals in the sample
having the specified characteristic (x) divided by the total sample
size (n) which is:
• 𝑝=𝑝=
𝑥
𝑛
• The confidence interval around this population estimate of the
proportion is:
• P±𝑍
𝑝 1−𝑝
𝑛 −1
𝑁 −𝑛
𝑁
Stratified Samples
• A stratified sample is a little more complicated…
• Stratified Sample – Estimate of Population Mean
• You will be using different groups called stratum.
• These will be denoted by 1,2,3, etc.
• Thus you will have 𝑁1 𝑁2 𝑁3 and 𝑥1 𝑥2 𝑥3 , etc. for the parameters.
• The best estimate of the population mean is the stratified sample
mean.
• 𝜇= 𝑥=
1
𝑁2
𝑚
𝑖=1 𝑁𝑖 𝑥𝑖
• M in this equation represents the number of strata
• Subscript i is the number of each variable in the strata.
•
1
𝑁2
is the population of the strata
• The confidence interval around the mean is
• 𝑋 ±𝑍
1
𝑁2
𝑚
2
𝑖=1 𝑁 𝑖
𝑠2 𝑖
𝑛𝑖
𝑁𝑖 − 𝑛𝑖
𝑁𝑖
• Note the finite population correction which may or many not be needed.
Stratified Samples
• Stratified Sample – Estimate of Population Total
• Best estimate of population total (𝑡) is the sample total (T) which
in stratified samples
• T= 𝑚
𝑖=1 N𝑖 𝑥𝑖
• We sum the strata
• Once we know T (which, again is not to be confused with a tscore) we plug it into the equation to obtain the confidence
interval
• T±𝑍
𝑚
2
𝑁
𝑖
𝑖=1
𝑠2𝑖
𝑛𝑖
𝑁𝑖 − 𝑛𝑖
𝑁𝑖
• Note that the equation has the finite population correction which may not
be needed.
Stratified Samples
• Stratified Sample – Estimate of Population Proportion
• The best estimate of the population proportion (𝑝) is the sample
proportion (P) which in stratified samples is:
1
• P=𝑁
𝑚
𝑖=1 𝑁𝑖 𝑃𝑖
• Once again summing the strata
• Then the confidence interval can be obtained
• P±𝑍
1
𝑁2
𝑚
2
𝑖=1 𝑁 𝑖
𝑠2𝑖
𝑛𝑖
𝑁𝑖 − 𝑛𝑖
𝑁𝑖
• Note that the equation has the finite population correction which
may not be needed.
Sample Size Selection
• Sample Size Selection – Using the Mean
• For practicality sometimes we would prefer to predetermine our confidence
interval and then calculate the sample size needed.
• Recall that the confidence interval of the mean is 𝑥 ± 𝑍𝜎𝑥
• Let us designate E as the Error that we are willing to tolerate.
• E = 𝑍𝜎𝑥 = 𝑍
σ2
𝑛
• We then decide what error we can have around the population mean.
• For example .10, .05, .01, etc.
• Algebraically we can then obtain
• n=
𝑍𝑠 2
𝐸
• Since in most instances we will not know σ we substitute with sample sigma. But
how do we find this!?
• Sample sigma can be found by taking a preliminary sample greater than 30, then
calculated, and then we can continue the random sample for the result of n.
• When the pre-sample and then continued sample occurs it is called two-stage
sampling design.
Sample Size Selection
• Sample Size Selection – Total
• The minimum sample needed to make an interval estimate of a
population total within a tolerance level E can also be
determined.
• E = 𝑍𝜎𝑡 = 𝑍
σ2
2
𝑁 ( )
𝑛
• We can then isolate n through algebra
• 𝑛=
𝑁𝑍𝜎 2
𝐸
or =
𝑁𝑍𝑠 2
𝐸
• Recall that s is used when we do not know population 𝜎
• It is best to run a pretest or small sample to obtain 𝑠 2
Sample Size Selection
• Sample Size Selection – Proportion
• To estimate a population proportion within a certain allowable level
of Error (E), the minimum sample size can also be calculated in
advance of full sampling.
• E = 𝑍𝜎𝑝 = 𝑍
𝑝(1 −𝑝)
𝑛 −1
• n is isolated algebraically so that:
• 𝑛=
𝑍 2 𝑝(1−𝑝)
𝐸2
• The population proportion (p) or sample proportion if it is unknown is used.
These symbols look very similar.
• The population proportion allows for us to estimate without first taking a
pretest or preliminary sample.
• This is related to the p(1-p) and the range of values that it can take.
• The largest value of p(1 – p) is .25 as the values peak at p = .5
• Thus, we can use the value of p(1 – p) = .25 as a worst case scenario and use it
in any data.
• We are however, still able to do a pretest if needed and can obtain a smaller p
value.
Chapter VII Ending