Download Data Distributions:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Making inferences from collected data
involve two possible tasks:
1) Estimation:
•
Use sample data to infer population parameter 
e.g., lifetime risk of being a victim of a violent
crime according to NCVS data
2) Hypothesis Testing:
•
Use data to make a decision about the
correctness of some hypothesis or prediction 
e.g., whether civil orders of protection really lower
recurrent violence against spouses
Both tasks rely on using Samples to
make statements about populations:
1) Sample:
•
A limited number of cases selected to represent
the larger population of data points
2) Key Terms/Ideas in Sampling:
•
•
•
Representativeness  degree to which sample
is an exact replica in miniature of the population
Sampling Error  degree to which sample
statistic deviates from population value
Sampling Method  procedure used to draw
cases from the population of data points
Two main types of sampling methods:
1) Probability Sampling
•
Selection where each data point has a known
probability for being selected into the sample
• Simple Random sample  every data point has
an equal likelihood of being selected
• Other types of probability samples?
─ Systematic
─ Stratified
─ Weighted
─ Cluster
• Doesn’t guarantee representativeness each time
Two main types of sampling methods:
2) Non-probability Sampling:
• Selection procedure in which probability of
selection is unknown
• Specific types of Non-probability samples?
─
─
─
─
─
Accidental
Convenience
Purposive
Snowball
Volunteer
• No guarantee of representativeness
Why use one sample method versus
another?
1) Maximize representativeness of data
2) Minimize sampling error and bias in
data
3) Valid use of inferential statistics with
data (which mostly assumes simple
random sampling)
Making inferences from sample
statistics involves 3 distributions:
1) Sample distribution: observed in cases
from which data were collected
2) Population distribution: unobserved in
population from which cases drawn
3) Sampling distribution: unobserved but
calculable distribution of statistics for
samples of same size/type as ours
(drawn from the same population)
 This distribution is the key to making inferences
“Sampling Distribution”: what is it?
1) A hypothetical population of samples
(and sample statistics) from drawn from
the same population
a) Has a describable theoretical distribution
(based on repeatedly drawing a sample an
infinite number of times)
b) Has certain parameters determined by the
population from which the sample is drawn
and the size of the sample (denoted as n)
e.g.: If we draw a sample of 25 cases and
compute the sample mean
a) The sample mean has a theoretical sampling
distribution whose characteristics are exactly
determined by the distribution of the
population (μ & σ) and by the sample size
(n=25)
b) The mean of the sampling distribution = the
mean of the population
c) In this case: the σ of the sampling
distribution = σ/5 (i.e., one-fifth the σ of the
population)
Important features of Sampling
distributions:
a) If the variable is normally distributed in the
population, then the sampling distribution of
sample means will also be normal
b) The mean of the sampling distribution = the
mean of the population
c) The σ of the sampling distribution = σ/√n
d) Use this information to compute the likelihood
of any sample mean being drawn from the
population (using the standard normal [z]
table)
Additional Important features of Sampling
distributions:
a) The σ of the sampling distribution will always
be smaller than the σ of the population
b) The mean of the sampling distribution will
always be the population mean
c) The sampling distribution will become more
Normal as the sample size gets larger – no
matter the distribution of the population!
[this is called the Central Limit Theorem]
Using Sample statistics to make inferences
about population parameters:
a) The best estimate of the population mean is
the sample mean
b) The sample estimate of σ is slightly too low; it
needs to be adjusted to be accurate estimate
c) Thus there are two different formulas for the
sample variance/standard deviation:
(descriptive)
(inferential)
 (x
i
 x)
i
N
2
 (x
i
 x)
i
N1
2
Basic Steps in Estimating Population
Parameters:
1) Select valid estimator (unbiased, consistent,
and efficient)
2) Select valid data sample
•
•
•
•
Corresponds to population of interest
Random sample
Complete (no censoring or omissions)
Variables measured with least possible error
3) Compute value of statistical estimate
4) Compute confidence interval (i.e., plausible
margin of sampling error)
Two Approaches to estimation:
1) Point Estimation:
•
•
•
Use sample data to infer exact value of population
parameter
Highly likely to be wrong or off-mark to some degree
e.g., infer that 30% of adults will be victims of violent crime
in their lifetimes (could actually be 35% or 25%)
2) Interval Estimation:
•
•
Instead use sample data to compute a range of values
(“confidence intervals”) within which the actual parameter
is located (with some calculated margin of certainty or
confidence)
Yields more approximate but more plausible (or confident)
estimates.
Confidence Interval Estimation:
1) Compute the sample mean
2) Compute the sample standard error
─ From the population (σ)
─ From the sample (s or  )
4) Compute the confidence interval
X  zce   X
or
X  t   X