Download Sampling

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Statistical inference wikipedia , lookup

Sampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
SW 981
Sampling and Randomness
Random selection (sampling) vs. Random assignment (to groups)
Equalization and representative vs. Statistical function
Larger samples are more representative in the sense of yielding a more precise (point) estimate.
Kinds of Samples:
Probability Samples - use some form of random sampling in one or more of their stages
Random Sampling - each member of the population has an
equal chance of being selected.
Stratified Sampling (Blocking) - divide population into strata.
Cluster Sampling - successive random sampling of units, or
sets and subsets.
Systematic Sampling (Interval Sampling) - every kth unit
selected.
Nonprobability Samples - all fail to use random process making statistical inference improper.
Types of Distributions
1. Sample distribution - frequency distribution summarizing a given set of data, based on a
randomly selected subset of a population.
2. Population distribution - theoretical distribution which describes the relative frequency
associated with each of the values of a numerical variable, into which an entire set of possible
observations may be mapped.
3. Sampling distribution - theoretical probability distribution which relates various values of
some sample statistic to their probabilities of occurrence over all possible, samples of size N
given a specific population distribution, and some probability structure underlying the selection
of samples.
We use (1) to estimate (2) based on what we know about (3).
Sampling distribution is generally not the same as the distribution of the random variable for the
population.
Sampling distribution of the mean approaches a normal distribution as n increases regardless of
the underlying distriubtion of the random variable for the population (Central Limit Theorem).
The theory of sampling distributions permits one to judge the probability that a given value of
some statistic arose by chance from some particular population distribution.
Population (Sample space) - parameters (Greek Letters)
Sample - Statistics
We estimate parameters using statistics.
Desirable Properties of Estimators:
Maximum likelihood estimate: Principle of maximum likelihood says to choose as our estimate
of the population parameter, the value that maximizes the probability of observing the obtained
sample.
Unbiased estimate: E(M) = µ The expected value of the sample mean is equal to the population
mean, i.e., the sample mean is an unbiased estimate of the population mean.
Note: E(V)  2
E(V) = 2  (N-1) / N
for an unbiased estimate use s2 = V N / (N-1)
Consistent estimate: prob (G -   e)  1, as N  
Relative Efficiency: ²H ÷ ²G = efficiency of H relative to G
The more efficient estimator has the smaller sampling variance.
Sufficiency: If G is a sufficient statistic, our estimate of  cannot be improved by considering
any other aspect of the data not already included in G itself.
In inferential statistics, our main interest is in the sampling distribution of the mean. The mean
of the sampling distribution of means is the same as the population mean.
However, the variance of the mean, ²M = ²/N
Think of the two extremes: N = 1; and, N = population.
.Estimation of the Standard Error of the Mean:
²M = ²/N
However, we don't know ².
Instead we use an estimate of ²: s² = (N/N-1) * V, where V = sample variance
Substituting s² for ² yields: ²M = V/(N-1)
M =
V / (N -1) = Standard error
 standard deviation of the sampling distribution of the mean
Sample Size Calculations
Need to distinguish between statistical and substantive significance is eliminated if one pays
attention to the issue in the design of the research.
Power analysis requires that we specify what difference we want to distinguish (ie. what is
substantively important).
The power of a test of a mean always depends on four things:
1. The particular alternative hypothesis - The larger the departure of H0 from the true situation, H1, the
more powerful is the test of H0, other things being equal.
2. The value of alpha chosen by the researcher - smaller alpha, less power.
3. The size of the sample - larger N leads to more power.
4. The variability of the population under study.
[For graphic demonstration of above see Hays, Figure 7.9.1.]
Need some estimate of variance - This can be a stumbling block. Frequently will require a
separate pilot study.
Example: Single Sample t-test
H0: Drinking coffee is safe.
H1: Drinking coffee is detrimental.
Calculation steps:
1. Calculate  (Glass's effect size) from appropriate formula in Summary Table. This is the
effect (mean differnce, corrrelation, etc.) that you care about and wish to detect if it exists.
2. Calculate (critical effect size) again using formula in Summary Table.
3. Set alpha and Beta based on your willingness to be wrong in either direction. (Sample size
constraints may influence choice of Beta)
4. Obtain v (nu) from Master Table.
5. Calculate n (sample size) from formula back in Summary Table.