Download Sampling/probability/inferential statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
The Normal Curve




Theoretical
Symmetrical
Known Areas For Each
Standard Deviation or
Z-score
FOR EACH SIDE:



34.13% of scores in
distribution are b/t the
mean and 1 s from the
mean
13.59% of scores are
between 1 and 2 s’s
from the mean
2.28% of scores are
> 2 s’s from the mean
Z SCORE FORMULA Z = Xi – X

Xi = 120; X = 100; s=10
S
Z= 120 – 100 = +2.00
10
Xi = 80, S = 10
Z= 80 – 100 = -2.00
10
Xi = 112, S = 10
Z = 112 – 100 = 1.20
10



The point is to convert your particular metric (e.g.,
height, IQ scores) into the metric of the normal curve
(Z-scores). If all of your values were converted to Zscores, the distribution will have a mean of zero and a
standard deviation of one.
4 More Sample Problems

For a sample of 150 U.S. cities, the mean
poverty rate (per 100) is 12.5 with a standard
deviation of 4.0. The distribution is
approximately normal.

Based on the above information:
1.
2.
3.
4.
What percent of cities had a poverty rate of more than
8.5 per 100?
What percent of cities had a rate between 13.0 and
16.5?
What percent of cities had a rate between 10.5 and
14.3?
What percent of cities had a rate between 8.5 and
10.5?
First Two Answers

What percent of cities had a poverty rate of more
than 8.5 per 100?


8.5 – 12.5 = -1.0 .3413 + .5 = .8413 = 84.13%
4
What percent of cities had a rate between 13.0 and
16.5?

13.0 – 12.5 = .125
4
16.5 – 12.5 = 1.0
4
.3413 – .0478 = .2935 = 29.35%
The Rest of the Answers

What percent of cities had a rate between 10.5 and 14.3?


10.5 – 12.5 = -0.5
4
14.3 – 12.5 = .45
4
.1915 + .1736 = .3651 = 36.51%
What percent of cities had a rate between 8.5 and 10.5?


10.5 – 12.5 = -0.5
4
8.5 – 12.5 = -1.0
4
Column C: .3085 -.1587 = .1498 = 14.98% …OR…
Column B: .3413 - .1915 =.1498 = 14.98%
Probability & the Normal Distribution
THE NORMAL DISTRIBUTION as a
PROBABILITY Distribution


We can use the normal
curve to estimate the
probability of randomly
selecting a case
between 2 scores
Probability distribution:
 Theoretical distribution
of all events in a
population of events,
with the relative
frequency of each
event
1.2
1.0
.8
.6
.4
.2
0.0
-2.07
-1.21
-.36
.50
1.36
Normal Curve, Mean = .5, SD = .7
2.21
3.07
PROBABILITY & THE NORMAL
DISTRIBUTION
–
The probability of a
particular outcome is the
proportion of times that
outcome would occur in
a long run of repeated
observations.
1.2
1.0
.8
.6
.4
.2
0.0
-2.07
-1.21
-.36
.50
1.36
Normal Curve, Mean = .5, SD = .7
2.21
3.07
PROBABILITY & THE
NORMAL DISTRIBUTION

p [next male being 66-74” tall] =
# that tall = 68 = 0.68
100 who approach
100
Probability & the Normal Distribution

Another example:

Suppose the mean score on a test is 80, with a
standard deviation of 7. If we randomly sample one
score from the population, what is the probability that
it will be as high or higher than 89?



Z for 89 = 89-80/7 = 9/7 or 1.29
Area in tail for z of 1.29 = 0.0985
P(X > 89) = .0985 or 9.85%
Probability & the Normal Distribution

Bottom line:

Normal distribution can also be thought of as
probability distribution

Probabilities always range from 0 – 1
Probability

What is the probability of picking a red marble out
of a bowl with 2 red and 8 green?
There are 2
outcomes that
are red
THERE ARE 10
POSSIBLE
OUTCOMES
p(red) = 2 divided by 10
p(red) = .20
Frequencies and Probability

The probability of picking a color relates to the
frequency of each color in the bowl


8 green marbles, 2 red marbles, 10 total
p(Green) = .8 p(Red) = .2
Frequencies & Probability

What is the probability of randomly selecting an
individual who is extremely liberal from this sample?
p(extremely liberal) =
32 = .024 (or 2.4%)
1,319
THINK OF SELF AS LIBERAL OR CONSERVATIVE
Valid
Mis sing
Total
Frequency
1 EXTREMELY LIBERAL
32
2 LIBERAL
171
3 SLIGHTLY LIBERAL
186
4 MODERATE
486
5 SLGHTLY
205
CONSERVATIVE
6 CONSERVATIVE
198
7 EXTRMLY
41
CONSERVATIVE
Total
1319
8 DK
62
9 NA
6
Total
68
1387
Percent
2.3
12.3
13.4
35.0
Valid Percent
2.4
13.0
14.1
36.8
Cumulative
Percent
2.4
15.4
29.5
66.3
14.8
15.5
81.9
14.3
15.0
96.9
3.0
3.1
100.0
95.1
4.5
.4
4.9
100.0
100.0
Inferential Statistics (intro)

Inferential statistics are used to generalize from a
sample to a population



We seek knowledge about a whole class of similar
individuals, objects or events (called a POPULATION)
We observe some of these (called a SAMPLE)
We extend (generalize) our findings to the entire class
WHY SAMPLE?

Why sample?


It’s often not possible to collect info. on all individuals you
wish to study
Even if possible, it might not be feasible (e.g., because of
time, $, size of group)
WHY USE PROBABILITY SAMPLING?
 Representative sample

One that, in the aggregate, closely approximates the
population from which it is drawn
PROBABILITY SAMPLING

Samples selected in accord with probability theory, typically
involving some random selection mechanism
 If everyone in the population has an equal chance of being
selected, it is likely that those who are selected will be
representative of the whole group
 EPSEM – Equal Probability of SElection Method