Download Sampling distribution of the mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Sampling
and
Sampling Distributions
The Aim
By the end of this lecture, the students
will be aware of sampling and sampling
distributions.
2
The Goals
•
•
•
•
•
To explain why we are doing sampling.
Count the factors affecting example size.
Explain the types of sampling
Able to write SEM and SEP formulas.
Explain SD and SEM usage areas.
3
3
•
•
•
While doing statistical analysis, generally
we want to collect information and to
comment about an entire population.
However, to obtain data from the entire
population, often not possible both time
and ecnomically.
Therefore we collect data from a sample
to represent the population, and by using
the data we make inferences about the
population.
4
•
When we take a sample from the population, we
can predict that, the sample cannot represent
the population entirly.
•
By examining only a part of the population we
make a sampling error.
•
In this lecture, using theoretical distributions we
will learn how to calculate this error .
5
Factors affecting the sample size
• Data type
– Categorical : The percentage or ratio
– Numerical : Average
•
•
•
•
Spread
Alpha ( α ) Significance Level
The power of the test ( 1 - β )
Effects of Width ( Δ )
– It is the smallest change amount we want to determine
correctly at the end of the hypothesis test. In other words,
the difference between the values specified in the null
hypothesis and the alternative hypothesis.
• The size of the group ( N)
Sample size, to estimate the mass ratio
n
n
p(1  p) z
d
2
When the population size
N unknown
2
Nz P(1  P)
2
()
2
( N  1)d  P(1  P)z
2
()
When the population size
N known
n: number of individual samples to be taken
p: The incidence of the analyzed event
t: t table value determined by the level of error and a certain degree of freedom
d: the desired  deviation according to event incidance.
Example:
Suppose that, malnutrition rate of p = 0.15 get found
in a study conducted previously.
A research investigator wants this value within ±0.05
”d” limits, (value between 0.10-0.20)
And , level error =0.05 between these limits in
other words, to make 95% reliability.
How many people should be included in this
resaearch?
2
( 0.15 x 0.85 )1.96
n
 196
2
0.05
Result:
If the examination is requested of an event
rate seen in population 0,15, 95% chance
boundary between 0.10 to 0.20, at least "196"
individuals should be worked on.
Sample size to estimate the average mass
n
n
z 2 2
d
When the
population size N
unknown
2
Nz 
2
2
d 2 ( N  1)  z2  2
When the
population size N
known
: mass standard deviation
d: according to the average desired  deviation
Appropriate sampling method
•
Randomness in sampling
•
For each sampling subject, equal chance must be
given in terms of selection.
In the case of the chance is not synchronized, the
results will be biased, since the errors obtained
from sampling will not be random.
In order to achieve randomness, randomness
conditions must be complied with.
•
•
11
Sampling Methods
Probable samplinng
Improbable sampling
Quota
sampling
Snowball
sampling
Simple
Random
Sampling
Stratified
Sampling
Cluster
Sampling
Probability sampling methods
•
•
•
•
In probability sampling methods, equal chance must
be given for examples of sampling units to be
selected.
By giving an equal chance to sample units, the
protection of variability of population is provided in
the sample. Thus, the ability of the sample to
represent the population would have increased.
In order to give equal chance to each sample units
to be selected, random selection is done between
population units.
To ensure the randomness, table of random
numbers or the random number generating software
are used.
13
Simple random sampling
•
•
Simple random sample is a subset of individuals (a
sample) chosen from a larger set (a population).
Each individual is chosen randomly and entirely by
chance, such that each individual has the same
probability of being chosen at any stage during the
sampling process.
In this method, after determining the appropriate
sample size, examples are selected using simple
random sampling method. By calculating sample
statistics, estimates for the population parameters
are done.
14
Stratified sampling
•
Stratification is the process of dividing members of
the population into homogeneous subgroups before
sampling. The strata should be mutually exclusive:
every element in the population must be assigned to
only one stratum. The strata should also be
collectively exhaustive: no population element can
be excluded. Then simple random sampling or
systematic sampling is applied within each stratum.
To get the best results from stratified sampling
-Layers must be homogeneous in themselves
-Layers must be heterogeneous between
themselves
•
15
Cluster sampling
•
•
•
•
This method is used when the subjects can not be listed
in the population, therefor reaching individual subjects
not possible.
Cluster sampling is a sampling technique used when
"natural" but relatively heterogeneous groupings are
evident in a statistical population.
In this technique, the total population is divided into
these groups (or clusters) and a simple random sample of
the groups is selected. The elements in each cluster are
then sampled.
In this method, samples made by selecting clustures
instead of selecting subjects.
16
Sample variations
• If we were to take repeated samples of the same size from a
population, it is unlikely that the estimates of the population
parameter would be exactly the same in each sample.
• However, our estimates should all be close to the true value of the
parameter in the population, and the estimates themselves should
be similar to each other.
• By quantifying the variability of these estimates, we obtain
information on the precision of our estimate and can thereby
assess the sampling error.
• In reality, we usually only take one sample from the population.
However, we still make use of our knowledge of the
theoretical distribution of sample estimates to draw inferences
about the population parameter.
17
Sampling distribution of the mean
•
We try to measure population mean.
•
Suppose we are interested in estimating the population
mean; we could take many repeated samples of size n from
the population, and estimate the mean in each sample.
•
A histogram of the estimates of these means would show
their distribution.
•
This is the sampling distribution of the mean.
18
Figure: Changes in the distribution of the number of various samples
from the same population.
19
• Ifthe sample size is reasonably large, the estimates of the mean
follow a Normal distribution, whatever the distribution of the
original data in the population (Central Limit Theorem).
• If the sample size is small, the estimates of the mean follow a
Normal distribution provided the data in the population follow a
Normal distribution.
• The mean of the estimates is an unbiased estimate of the true
mean in the population, i.e. the mean of the estimates equals
the true population mean.
• The variability of the distribution is measured by the standard
deviation of the estimates; this is known as the standard
error of the mean (SEM). If we know the population standard
deviation (σ ), then the standard error of the mean is given by
SEM = σ / √n
20
• When we only have one sample, as is customary, our best
estimate of the population mean is the sample mean, and
because we rarely know the standard deviation in the
population, we estimate the standard error of the mean by
SEM = s / √n
• Where s is the standard deviation of the observations in
the sample.
• The SEM provides a measure of the precision of our
estimate.
21
Interpreting standard errors
• A large standard error indicates that the estimate
is imprecise.
• A small standard error indicates that the estimate
is precise.
• The standard error is reduced, i.e. we obtain a
more precise estimate, if:
-the size of the sample is increased.
-the data are less variable.
22
Standart deviation? Or standart error?
•
•
•
Although these two parameters seem to be similar, they are
used for different purposes.
The standard deviation describes the variation in the data
values and should be quoted if you wish to illustrate
variability in the data.
In contrast, the standard error describes the precision of
the sample mean, and should be quoted if you are
interested in the mean of a set of data values.
23
Sampling distribution of proportion
• We may be interested in the proportion of individuals in a population who
possess some characteristic. Having taken a sample of size n from the
population, our best estimate, p, of the population proportion, is given by:
p = r/ n
π: Mean of the population
p: Population proportion
n: Sample size from the population
r: The number of individuals in the sample with the characteristic.
• If we were to take repeated samples of size n from our population and
plot the estimates of the proportion as a histogram, the resulting
sampling distribution of the proportion would approximate a Normal
distribution with mean value π. The standard deviation of this distribution
of estimated proportions is the standart error of the proportion.
24
• When we take only a single sample, it is estimated by:
• This provides a measure of the precision of our
estimate of π; a small standard error indicates a precise
estimate.
Examples
•
1.
2.
3.
4.
Bir araştırmada 250 kişiden alınan kan örneklerinin
biyokimyasal analizine göre ortalama açlık kan şekeri
85,7 mg/dl standart sapması 25,4 mg/dl
bulunmuştur. Aynı araştırmada kişilerin %15’inde
şeker hastalığı saptanmıştır. Ankete katılanların %
20’si şeker hastalığı hakkında bilgisini “iyi” olarak
belirtirken % 15’i “hiç bilgisinin olmadığını”
belirtmiştir.
Paragrafta geçen veri tiplerini tartışın
Açlık kan şekerinin SEM’ni hesaplayarak yorumlayın
Şeker hastası olanların SEP’ini hesaplayarak
yorumlayın
Kan şekeri ortalaması ile birlikte SM mi yoksa SEM
mi verelim? Neden?
26
• Answers
Veri tipleri
1. Açlık kan şekeri ortalaması=nümerik
2. Şeker hastası olan kişi sayısı=Nominal
3. Ankete katılanların şeker hastalığı hakkındaki
bilgileri=Ordinal
•
Sadece açlık kan şekeri ortalaması verilmiş ve
örneklemde gruplar arası bir karşılaştırma
yapılmadığından bu örnekte SEM verilmesi gerekir