Download Introduction to Inferential Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Introduction to Inferential
Statistics
Introduction


Researchers most often have a population that
is too large to test, so have to draw a sample
from the population
Researchers collect a random sample from the
population and generalize from the known
characteristics of the sample to the unknown
population
Random Sampling

To use inferential statistical techniques, it is required
that samples be randomly drawn from the population
of interest

Non-random samples can be used for exploratory research




Meaning that you are going to explore a topic to see what
variables are important to the topic
But, the conclusions cannot be generalized to the
population
Random sampling requires a precise process of
selection
Need to remember that randomness is not
representativeness

Because a sample is random does not guarantee that it is
an exact representation of the population
Random Sampling


The probability is high that a randomly
selected sample will be representative
The good thing about inferential statistics is
that they allow you to state the probability of
this type of error very precisely
Ways in Which Random
Samples are Gathered
Simple Random Sample




For a simple random sample, each case and each
combination of cases in the population must have an
equal probability of being chosen for the sample
This kind of sample is used when you have a
complete list of all cases in the population
Most researchers use tables of random numbers to
select cases
This would be extremely time consuming for a large
sample, so another technique is used
Systematic Sampling



Needed for large populations
Only the first case is randomly selected
After that, every kth case is selected



Will choose the first case from the table of random
numbers, then choose every 10th case, or however many
you need to reach a sample of the size you want
Will divide your population size by the sample size to find
the distance to the next score for your sample
This is not a random sample

Need to make sure the list of the population is random
Stratified Sample

Proportional Stratified Sample


This is used if you want to guarantee a
representation of certain categories of cases
(the same percentage as in the population)
If you want to compare chemistry majors to
criminology majors



You first ask students their major
Then you put them all in the same list sorted by
major, then begin your random sample from the list
All majors will be included if your sample is large
enough, but each would be included in the
proportion they are in the original population

Will have more criminology majors
Stratified Sample

Disproportionate Stratified Sampling

If you need exactly the same number of students from
each major





You separate the students by major
Then have to change your sampling fraction to account for
differences in number of cases
If you need a sample of 50, and there are 100 zoology majors,
you would choose every other one to get exactly 50 cases
Problem with this is you cannot generalize directly to
the population, since your sample will never be
representative
The biggest problem in sampling is that there is no
complete list of most populations
Cluster Sample




Used when there is no list of the members of
the population
It involves random selection of geographical
units (states, neighborhoods, or blocks)
And will test every case within the last
geographical units
Not a random sample, so not as trustworthy,
but cheaper to do
The Sampling Distribution
Introduction


Researchers have a great deal of information about
the sample distribution, but they know nothing about
the population
It is the population that is of interest


We do not want to know what 2,000 people think out of
the 100,000,000 or so adults in the U.S.
What you would want to know about the distribution
of the population



The shape of the distribution
Some measure of central tendency
Some measure of dispersion
The Normal Curve


You need to know the properties of the normal
curve, which are based on the laws of probability, to
find out information about the population from the
sample
To do this, you use a device known as the sampling
distribution


It bridges the gap between the sample and the population
The sampling distribution is the central concept in
inferential statistics, so you need to understand the
concept
Three Distributions

The sample distribution



This is empirical (observed) and known
It is collected by researchers and used to learn about the
population
The population distribution


It exists in reality, so is empirical, but it is unknown to the
researcher
The sole purpose of inferential statistics is to make
inferences (meaning draw conclusions) about the
population distribution
The Sampling Distribution

This is nonempirical (theoretical)



Theoretical, since you only do one sample, and the
sampling distribution is based on an infinite number of
samples taken from that population
Laws of probability tell us much about this
distribution
Theoretically, if you drew an infinite number of
samples from a population


Then you only computed the mean of each sample
And you put the means on a graph to form a frequency
polygon
The Sampling Distribution

We know that each sample mean will be
slightly different


Since each sample is not an exact representation
of the population
We know that most of the sample means will
cluster around the true population value
Two Theorems About the Sampling
Distribution

If repeated random samples of size N are
drawn from a normal population with mean
µ and standard deviation σ, then the
sampling distribution of sample means will
be normal with a mean µ and a standard
deviation of σ /the square root of N

So the mean of the sampling distribution will
be the same as the mean of the population
Theorems

Since the samples are random, the means
should miss an equal number of times on
either side of the population value



Making the distribution symmetrical
A normal curve with a bell shape
So, we know about the shape of the sampling
distribution
Dispersion of the Sampling
Distribution


We can also tell something about the
dispersion (specifically the standard
deviation) of the sampling distribution
The formula for the standard deviation of the
sampling distribution is represented by the
symbol σ/the square root of N

Which is the standard deviation of the population
divided by the square root of N
Dispersion of the Sampling
Distribution



What this tells you is that in comparing a sampling
distribution with a population distribution, there will
always be more variance in the population
distribution
As the sample size gets larger, the variance of the
sampling distribution will get smaller (N = the
number in the sample)
The above theorem applies to populations that are
normally distributed on a particular variable
Central Limit Theorem


This second theorem is needed if the
population distribution is not normal
If repeated random samples of size N are
drawn from any population, with mean µ and
standard deviation σ, then as N becomes
large, the sampling distribution of sample
means will approach normality, with mean µ
and standard deviation σ/the square root of N
Large Samples

What, exactly, is meant by large

A good rule of thumb is that if N is 100 or more,
the Central Limit Theorem applies, and you can
assume that the sampling distribution is normal in
shape