Download Introduction to Statistics2312

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Introduction to Statistics
Measures of Central Tendency
and Dispersion
• The phrase “descriptive statistics” is used
generically in place of measures of central
tendency and dispersion for inferential
statistics.
• These statistics describe or summarize the
qualities of data.
• Another name is “summary statistics”, which
are univariate:
– Mean, Median, Mode, Range, Standard Deviation,
Variance, Min, Max, etc.
Measures of Central Tendency
• These measures tap into the average
distribution of a set of scores or values in
the data.
– Mean
– Median
– Mode
What do you “Mean”?
The “mean” of some data is the average
score or value, such as the average
age of an MPA student or average
weight of professors that like to eat
donuts.
Inferential mean of a sample: X=(X)/n
Mean of a population: =(X)/N
Problem of being “mean”
• The main problem associated with the
mean value of some data is that it is
sensitive to outliers.
• Example, the average weight of political
science professors might be affected if
there was one in the department that
weighed 600 pounds.
Donut-Eating Professors
Professor
Weight
Weight
Schmuggles
165
165
Bopsey
213
213
Pallitto
189
410
Homer
187
610
Schnickerson
165
165
Levin
148
148
Honkey-Doorey
251
251
Zingers
308
308
Boehmer
151
151
Queenie
132
132
Googles-Boop
199
199
Calzone
227
227
194.6
248.3
The Median (not the cement in the middle
of the road)
• Because the mean average can be
sensitive to extreme values, the median
is sometimes useful and more
accurate.
• The median is simply the middle value
among some scores of a variable. (no
standard formula for its computation)
What is the Median?
Professor
Weight
Weight
Rank order
and choose
middle value.
Schmuggles
165
Bopsey
213
Pallitto
189
Homer
187
Schnickerson
165
Levin
148
Honkey-Doorey
251
Zingers
308
Boehmer
151
199
Queenie
132
213
Googles-Boop
199
227
Calzone
227
251
194.6
308
132
148
151
If even then
average
between two
in the middle
165
165
187
189
Percentiles
• If we know the median, then we can go up
or down and rank the data as being above
or below certain thresholds.
• You may be familiar with standardized
tests. 90th percentile, your score was
higher than 90% of the rest of the sample.
The Mode (hold the pie and the ala)
(What does ‘ala’ taste like anyway??)
• The most frequent response or value
for a variable.
• Multiple modes are possible: bimodal
or multimodal.
Figuring the Mode
Professor
Weight
Schmuggles
165
Bopsey
213
Pallitto
189
Homer
187
Schnickerson
165
Levin
148
Honkey-Doorey
251
Zingers
308
Boehmer
151
Queenie
132
Googles-Boop
199
Calzone
227
What is the mode?
Answer: 165
Important descriptive
information that may help
inform your research and
diagnose problems like lack
of variability.
Measures of Dispersion
(not something
you cast…)
• Measures of dispersion tell us about
variability in the data. Also univariate.
• Basic question: how much do values differ
for a variable from the min to max, and
distance among scores in between. We
use:
– Range
– Standard Deviation
– Variance (standard deviation squared)
• To glean information from data, i.e. to
make an inference, we need to see
variability in our variables.
• Measures of dispersion give us
information about how much our
variables vary from the mean, because if
they don’t it makes it difficult infer
anything from the data. Dispersion is
also known as the spread or range of
variability.
The Range (no Buffalo roaming!!)
• r=h–l
– Where h is high and l is low
• In other words, the range gives us the
value between the minimum and maximum
values of a variable.
• Understanding this statistic is important in
understanding your data, especially for
management and diagnostic purposes.
The Normal Curve
• Bell-shaped distribution or curve
• Perfectly symmetrical about the mean.
Mean = median = mode
• Tails are asymptotic: closer and closer to
horizontal axis but never reach it.
Sample Distribution
• What does Andre do
to the sample
distribution?
• What is the probability
of finding someone
like Andre in the
population?
• Are you ready for
more inferential
statistics?
Normal curves and probability
Dr. Boehmer would be here
Andre would be here
The Standard Deviation
• A standardized measure of distance from
the mean.
• In other words, it allows you to know how far
some cases are located from the mean.
How extreme our your data?
• 68% of cases fall within one standard
deviation from the mean, 97% for two
deviations.
Formula for Standard Deviation
S
=
2
( X  X )
(n - 1)
=square root
=sum (sigma)
X=score for each point in data
_
X=mean of scores for the variable
n=sample size (number of
observations or cases
X
X- mean
x-mean squared
Smuggle
165
-29.6
Bopsey
213
18.4
Pallitto
189
-5.6
31.2
Homer
187
-7.6
57.5
Schnickerson
165
-29.6
875.2
Levin
148
-46.6
2170.0
Honkey-Doorey
251
56.4
3182.8
Zingers
308
113.4
12863.3
Boehmer
151
-43.6
1899.5
Queeny
132
-62.6
3916.7
Googles-boop
199
4.4
19.5
Calzone
227
32.4
1050.8
Mean
194.6
875.2
339.2
2480.1
49.8
We can see that the Standard Deviation equals 165.2
pounds. The weight of Zinger is still likely skewing this
calculation (indirectly through the mean).
Std. Deviation practice
• What is the value of Democracy one std.
deviation above and below the mean?
Descriptive Statistics
N
Democ
Valid N (lis twis e)
319
319
Minimum
-10.00
Maximum
10.00
Mean
3.4859
Std. Deviation
6.71282
The answer is 10.20872 and -3.22692
What percentage of all the cases fall within 10.2 and 3.2?
Roughly 68%
Std. Deviation practice
What is the value of Urban population one std. deviation
above and below the mean?
Descriptive Statistics
N
Urbanpop
Valid N (lis twis e)
139
139
Minimum
19.77
Maximum
97.12
Mean
66.1166
The answer is 83.86509 and 48.36811
What percentage of all the cases fall within 83.86 and 48.36?
Roughly 68%
Std. Deviation
17.74849
Organizing and Graphing
Data
Goal of Graphing?
1. Presentation of Descriptive Statistics
2. Presentation of Evidence
3. Some people understand subject
matter better with visual aids
4. Provide a sense of the underlying
data generating process (scatterplots)
What is the Distribution?
• Gives us a picture of
the variability and
central tendency.
• Can also show the
amount of skewness
and Kurtosis.
Graphing Data: Types
Creating Frequencies
• We create frequencies by sorting data
by value or category and then
summing the cases that fall into those
values.
• How often do certain scores occur?
This is a basic descriptive data
question.
Ranking of Donut-eating Profs.
(most to least)
Zingers
308
Honkey-Doorey
251
Calzone
227
Bopsey
213
Googles-boop
199
Pallitto
189
Homer
187
Schnickerson
165
Smuggle
165
Boehmer
151
Levin
148
Queeny
132
Here we have placed the Professors into
weight classes and depict with a histogram in
columns.
Weight Class Intervals of Donut-Munching Professors
3.5
3
2.5
2
Number
1.5
1
0.5
0
130-150 151-185 186-210 211-240 241-270 271-310
311+
Here it is another histogram depicted
as a bar graph.
Weight Class Intervals of Donut-Munching Professors
311+
271-310
241-270
211-240
Number
186-210
151-185
130-150
0
0.5
1
1.5
2
2.5
3
3.5
Pie Charts:
Proportions of Donut-Eating Professors by Weight Class
130-150
151-185
186-210
211-240
241-270
271-310
311+
Actually, why not use a donut
graph. Duh!
Proportions of Donut-Eating Professors by Weight Class
130-150
151-185
186-210
211-240
241-270
271-310
311+
See Excel for other options!!!!
19
81
19
82
19
83
19
84
19
85
19
86
19
87
19
88
19
89
19
90
19
91
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
Approval
Line Graphs: A Time Series
100
90
80
Approval
70
60
50
40
30
20
Economic approval
10
0
Month
Scatter Plot (Two variable)
Presidential Approval and Unemployment
100
Approval
80
60
Approve
40
20
0
0
2
4
6
Unemployment
8
10
12