Download Statistical Reasoning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Misuse of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Statistical Reasoning
Statistical Reasoning
Descriptive Statistics are used to organize and
summarize data in a meaningful way.
Frequency distributions – Where are the
majority of the scores?
• Used to organize raw scores, or data, so that
information makes sense at a glance.
• They take scores and arrange them in order
of magnitude and the number of times each
score occurs.
Multiple Choice
13 A+ 40
12 A 39
38
11 A- 37
10 B+ 36
9 B 35
34
8 B- 33
7 C+ 32
6 C 31
30
5 C- 29
4 D+ 28
3 D 27
26
2 D- 25
1 F 24
<24
Composite
Essay
4 41%
11
11
6
4 31%
5
5
4
3 19%
3
2
4
1
8%
2%
1
Mean=34.3
SD=4.2
A 12 23 52%
A- 11 10
B+ 10 15
B 9 5 41%
B- 8 6
C+ 7 2
C 6 1 5%
C- 5
D+ 4 2
D 3
3%
D- 2
F 1
0%
0
Mean=10.2
SD=2.0
13
12
11
10
9
8
7
6
5
4
3
2
1
A+
A 11 39%
A- 14
B+ 9
B 12 45%
B- 8
C+ 2
C+ 2 11%
C- 3
D+
D 3 5%
DF
0%
Mean=9.3
SD=2.3
Histograms & Frequency Polygons
•
These are 2 ways of showing your frequency distribution
data.
1. Histogram – graphically represents a frequency
distribution by making a bar chart using vertical bars
that touch
•
•
When you have a continuous scale (for example, scores on a test go from 0-100,
continuously getting larger.) the bars touch, because you have to have a class for each score
to fall into, and you can’t have any “gaps.”
Different than a Bar Graph which is used when you have non-continuous classes (example,
which candidate do you support, Obama or McCain? You’d have a bar for each, with gaps
in between, because you can’t fall between two candidates, you have to pick one.)
Histogram
Uses a Bar Graph to show data
Frequency Polygon
Uses a line graph to show data
2. Frequency Polygon – graphically represents a frequency distribution
by marking each score category along a graph’s horizontal axis,
and connecting them with straight lines (line graph)
Standard Normal Distribution Curve
Characteristics of the normal curve
•
Bell shaped curve where the mean, median and
mode are all the same and fall exactly in the
middle
+ or - #
.13%
2.15%
13.6%
34.1%
34.1%
13.6%
2.15%
.13%
Skewed Curves
Skewed Distribution – when more scores pile up on one side
of the distribution than the other.
Positively skewed means more people have low scores.
Negatively skewed means more people have high scores.
•Positive & Negative refers to the direction of the “tail” of the
curve, they do not mean “good” or “bad.”
Measures of Central Tendency
•
A single number that gives us information about
the “center” of a frequency distribution. Measures
of central tendency – 3 types
4, 4, 3, 4, 5
1. Mode=most common=4
(Reports what there is more of – Used in data with no
connection. Can’t average men & women.)
2. Mean=arithmetic average=20/5=4
(has most statistical value but is susceptible to the effects of extreme scores )
3. Median=middle score=4
(1/2 the scores are higher, half are lower. Used when there are extreme scores)
Central Tendency
An extremely high or low price/score can skew the mean. Sometimes the
median is better at showing you the central tendency.
1968
TOPPS
Baseball
Cards
Nolan Ryan
$1500
Elston Howard
Billy Williams
Luis Aparicio
Harmon Killebrew
Orlando Cepeda
Maury Wills
Jim Bunning
Tony Conigliaro
Tony Oliva
Lou Pinella
Mickey Lolich
$8
$5
$5
$3.50
$3.50
$3
$3
$3
$3
$2.50
Jim Bouton
Rocky Colavito
Boog Powell
Luis Tiant
Tim McCarver
Tug McGraw
Joe Torre
Rusty Staub
Curt Flood
With Ryan:
Median=$2.50
Mean=$74.14
$2.25
$2
$2
$2
$2
$1.75
$1.75
$1.5
$1.25
$1
Without Ryan:
Median=$2.38
Mean=$2.85
Does the mean accurately portray the central
tendency of incomes?
NO!
What measure of central tendency would more accurately
show income distribution?
Median – the majority of the incomes surround that number.
Measures of Variability
•
•
Gives us a single number that presents us with
information about how spread out scores are in a
frequency distribution. (See example of why this
is important).
Range – Difference b/w a high & low score
–
•
Take the highest score and subtract the lowest
score from it. (can be skewed by an extreme score)
Standard Deviation – How spread out is your data?
–
–
The larger this number is, the more spread out
scores are from the mean.
The smaller this number is, the more
consistent the scores are to the mean
Calculating Standard Deviation
How spread out (consistent) is your data?
1. Calculate the mean.
2. Take each score and subtract the mean from it.
3. Square the new scores to make them positive.
4. Mean (average) the new scores
5. Take the square root of the mean to get back to your original
measurement.
6. The smaller the number the more closely packed the data. The
larger the number the more spread out it is.
Standard Deviation
Punt
Deviation
Distance from Mean
36
38
41
45
36 - 40 = -4
38 – 40 = -2
41 – 40 = +1
45 – 40 = +5
Deviation
Squared
Numbers
multiplied by itself
& added together
16
4
1
25
Standard
Deviation:
variance=
11.5 = 3.4 yds
Mean:
160/4 = 40 yds
46
Variance:
46/4 = 11.5
Multiple Choice
13 A+ 40
12 A 39
38
11 A- 37
10 B+ 36
9 B 35
34
8 B- 33
7 C+ 32
6 C 31
30
5 C- 29
4 D+ 28
3 D 27
26
2 D- 25
1 F 24
<24
Composite
Essay
4 41%
11
11
6
4 31%
5
5
4
3 19%
3
2
4
1
8%
2%
A 12 23 52%
A- 11 10
B+ 10 15
B 9 5 41%
B- 8 6
C+ 7 2
C 6 1 5%
C- 5
D+ 4 2
D 3
3%
D- 2
F 1
0%
0
13
12
11
10
9
8
7
6
5
4
3
2
1
Mean=10.2
SD=2.0
1
Mean=34.3
SD=4.2
Are these scores consistent?
Is there a skew?
A+
A 11 39%
A- 14
B+ 9
B 12 45%
B- 8
C+ 2
C+ 2 11%
C- 3
D+
D 3 5%
DF
0%
Mean=9.3
SD=2.3
Z-Scores
A number expressed in Standard Deviation Units that shows
an Individual score’s deviation from the mean.
Basically, it shows how you did compared to everyone else.
+ Z-score means you are above the mean,
– Z-score means you are below the mean.
Z-Score = your score minus the average score divided by standard deviation.
Which class did you perform better in compared to your classmates?
Test Total Your
Score
Average
score
S.D.
Biology
200
168
160
4
Psych.
100
44
38
2
Z score in Biology: 168-160 = 8, 8 / 4 = +2 Z Score
Z score in Psych: 44-38 = 6, 6/2 = +3 Z Score
You performed better in Psych compared to your classmates.
Standard Normal Distribution Curve
Characteristics of the normal curve
•
Bell shaped curve where the mean, median and
mode are all the same and fall exactly in the
middle
+ or - #
.13%
2.15%
13.6%
34.1%
34.1%
13.6%
2.15%
.13%
Correlation
Correlation – shows the relationship between two variables.
•The closer to + or - one the stronger the relationship between
the two variables.
•This enables us to predict. However, correlation does not mean
causation.
Positive Correlation
• As the value of one variable increases (or
decreases) so does the value of the other
variable.
• When A goes UP B goes UP or
• When A goes Down, B goes Down
• A perfect positive correlation is +1.0.
• The closer the correlation is to +1.0, the
stronger the relationship.
Negative Correlation
• As the value of one variable increases, the
value of the other variable decreases.
• When A goes UP B goes Down or
• When A goes Down, B goes Up
• A perfect negative correlation is -1.0.
• The closer the correlation is to -1.0, the
stronger the relationship.
Zero Correlation
• There is no relationship whatsoever
between the two variables.
Let’s Review
Inferential Statistics
• Techniques that allow a researcher to
determine whether a study’s outcome is
more than just chance events.
•
•
Usually you would use inferential statistics to
try to predict things about a population based
on a sample.
For example, we surveyed 50 staff members in
the district about their level of education and
are trying to use that to predict the average
level of education for all staff in the district.
Statistical Significance
p value = likelihood a result is caused by chance. In other
words, are they statistically significant? If the answer is
yes, then they can be generalized to a larger population
• This is bad to a researcher. They want this number to be
as small as possible to show that any change in their
experiment was caused by an independent variable and
not some outside force.
• Results are considered statistically significant if the
probability of obtaining it by chance alone is less than
.05 or a P-Score of 5%.
p ≤ .05
• This means the researcher must be 95% certain their
results are not caused by chance.
• Replication of the experiment will prove the p value to
be true or not.
Does the sample represent the
population?
a.
b.
c.
•
•
Non-biased sample-good
Low variability-good
Larger samples-good
Population – is a complete set of
something.
Sample – is a subset of a population.