Download The Scientific Study of Politics (POL 51)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Transcript
The Scientific Study of
Politics (POL 51)
Professor B. Jones
University of California, Davis
Fun With Numbers
► Some
Univariate Statistics
► Learning to Describe Data
Useful to Visualize Data
20
10
0
Frequency
30
40
Histogram
0
1000
2000
Variable
Y
3000
4000
Main Features
► Exhibits
“Right Skew”
► Some “Outlying” Data Points?
► Question: Are the outlying data points also
“influential” data points (on measures of
central tendency)?
► Let’s check…
The Mean
► Formally,
given by:
► Or
the mean is
Y1  Y2      YN
Y
N
more compactly:
N
Y 
Y
i 1
N
i
Our Data
► Mean
of Y is 260.67
► Mechanically…
 (263 + 73 + … + 88)/67=260.67
► Problems
with the mean?
► No indication of dispersion or variability.
Variance
► The
variance is a
statistic that describes
(squared) deviations
around the mean:
► Why “N-1”?
► Interpretation:
“Average squared
deviations from the
mean.”
N
^

2

_
 (Y  Y )
i 1
i
N 1
2
Our Data
► Variance=
202,431.8
► Mechanically:
 [(263-260.67)2 + (73-260.67)2 + ••• + (88-260.67)2 ]/66
► Interpretation:
 “The average squared deviation around Y is 202,431.
► Rrrrright.
(Who thinks in terms of squared
deviations??)
► Answer: no one.
► That’s why we have a standard deviation.
Standard Deviation
► Take
the square root of the variance and you
get the standard deviation.
N
_
2
► Why we like this:
(
Y

Y
)

i
^
 Metric is now in original units of Y.
  i 1
N 1
► Interpretation
 S.D. gives “average deviation” around the mean.
 It’s a measure of dispersion that is in a metric that
makes sense to us.
Our Data
►
►
The standard deviation is: 449.92
Mechanically:
{[(263-260.67)2 + (73-260.67)2 + ••• + (88-260.67)2 ]/66}½
►
►
►
►
►
Interpretation: “The average deviation around the mean of 260.67 is
449.92.
Now, suppose Y=Votes…
The average number of votes is “about 261 and the average deviation
around this number is about 450 votes.”
The dispersion is very large.
(Imagine the opposite case: mean test score is 85 percent; average
deviation is 5 percent.)
Revisiting our Data
20
10
0
Frequency
30
40
Histogram
0
1000
2000
Variable
Y
3000
4000
Skewness and The Mean
► Data
often exhibit skew.
► This is often true with political variables.
► We have a measure of central tendency and
deviation about this measure (Mean, s.d)
► However, are there other indicators of
central tendency?
► How about the median?
Median
Percentile: Location at which 50
percent of the cases lie above; 50 percent
lie below.
► Since it’s a locational measure, you need to
“locate it.”
► Example Data: 32, 5, 23, 99, 54
► As is, not informative.
► “50th”
Median
Rank it: 5, 23, 32, 54, 99
Median Location=(N+1)/2 (when n is odd)
► =6/2=3
► Location of the median is data point 3
► This is 32.
► Hence, M=32, not 3!!
► Interpretation: “50 percent of the data lie above 32; 50
percent of the data lie below 32.”
► What would the mean be?
► (42.6…data are __________ skewed)
►
►
Median
► When
n is even: -67, 5, 23, 32, 54, 99
► M is usually taken to be the average of the
two middle scores:
 (N+1)/2=7/2=3.5
 The median location is 3.5 which is between 23
and 32
 M=(23+32)/2=27.5
► All
pretty straightforward stuff.
Median Voter Theorem (a sidetrip)
►
►
►
►
►
►
One of the most fundamental results in social sciences is
Duncan Black’s Median Voter Theorem (1948)
Theorem predicts convergence to median position.
Why do parties tend to drift toward the center?
Why do firms locate in close proximity to one another?
The theorem: “given single-peaked preferences, majority
voting, an odd number of decision makers, and a
unidimensional issue space, the position taken by the
median voter has an empty winset.”
That is, under these general conditions, all we need to
know is the preference of the median chooser to determine
what the outcome will be. No position can beat the
median.
Dispersion around the Median
►
►
The mean has its standard deviation…
What about the median?
 No such thing as “standard deviation” per se, around the
median.
 But, there is the IQR
►
Interquartile Range
 The median is the 50th percentile.
 Suppose we compute the 25th and the 75th percentiles and
then take the difference.
 25th Percentile is the “median” of the lower half of the data;
the 75th Percentile is the “median” of the upper half.
IQR and the 5 Number Summary
► Data: -67, 5, 23, 32, 54, 99
► 25th Percentile=5
► 50th Percentile=54
► IQR is difference between 75th
and 25th
percentiles: 54-5=49
► Hence, M=27.5; IQR=49
► “Five Number Summary” Max, Min, 25th, 50th, 75th
Percentiles:
► -67, 5, 27.5, 54, 99
Finding Percentiles
►
►
►
►
General Formula
p is desired percentile
n is sample size
If L is a whole number:
 The value of the pth percentile
is between the Lth value and
the next value. Find the mean
of those values
►
If L is not a whole number:
 Round L up. The value of the
pth percentile is the Lth value
pn
L
100
Example
► -67,
5, 23, 32, 54, 99
► 25th Percentile: L=(25*6)/100=1.5
 Round to 2. The 25th Percentile is 5.
► 75th




Percentile: L=(75*6)/100=4.5
Round to 5. The 75th Percentile is 54.
50th Percentile: L=(50*6)/100=3
Take average of locations 3 and 4
This is (23+32)/2=27.5.
Our Data
►
►
►
►
►
Median=120 Votes (i.e. [50*67]/100)
25th Percentile=46 Votes
75th Percentile=289 Votes
IQR: 243 Votes
5 number summary:
 Min=9, 25th P=46, Median=120, 75th P=289, Max=3407
►
►
►
►
(massive dispersion!)
Mean was 260.67. Median=120.
The Mean is much closer to the 75th percentile.
That’s SKEW in action.
Revisiting our Data: Odd Ball Cases
20
10
0
Frequency
30
40
Histogram
0
1000
2000
Variable
Y
3000
4000
“Influential Observations”
► Two
data points:
 Y=(1013, 3407)
► Suppose
we omit them (not recommended in
applied research)
► Mean plummets to 200.69 (drop of 60 votes)
► s.d. is cut by more than half: 203.92
► Med=114 (note, it hardly changed)
► Let’s look at a scatterplot
Useful to Visualize Data
0
1000
2000
Y
3000
4000
Scatterplot
0
100000
200000
X
300000
Main Features?
►Y
and X are positively related.
► There are clearly visible “outliers.”
► With respect to Y, which “outlier” worries
you most?
► Influence!
0
1000
2000
Y
3000
4000
Scatterplot
0
100000
200000
X
300000
Simple Description
► You
can learn a lot from just these simple
indicators.
► Suppose that our Y was a real variable?
Palm Beach County, FL
2000 Election
Descriptive Statistics Help to Clarify
Some Issues.
► Palm
Beach County
 Largely a Jewish community
 Heavily Democratic
 Yet an overwhelming number of Buchanan
Votes
► The
Ballot created massive confusion.
► Margin of Victory in Florida: 537 votes.
► Number of Buchanan Votes in PBC: 3407
4000
Buchanan by Bush Vote in Florida
1000
2000
3000
PALM BEACH
0
DUVAL
PASCO
BREVARD
MARION
POLK
ESCAMBIA
VOLUSIA
ORANGE
ST.
JOHNS
ST.
LUCIE
LEE
LAKE
LEON
CITRUS
MANATEE
OKALOOSA
ALACHUA
BAY
HERNANDO
SARASOTA
SANTA ROSA
CLAYCOLLIER
CHARLOTTE
PUTNAM
OSCEOLA
HIGHLANDS
SEMINOLE
WALTON
SUMTER
SUWANNEE
MARTIN
INDIAN
RIVER
JACKSON
CALHOUN
NASSAU
WASHINGTON
COLUMBIA
FLAGLER
HOLMES
GULF
BAKER
LEVY
BRADFORD
WAKULLA
MONROE
LIBERTY
OKEECHOBEE
UNION
DE
GADSDEN
SOTO
FRANKLIN
HARDEE
JEFFERSON
DIXIE
MADISON
GILCHRIST
TAYLOR
HAMILTON
HENDRY
LAFAYETTE
GLADES
0
100000
PINELLAS
HILLSBOROUGH
BROWARD
DADE
200000
Vote for Bush
300000
4000
2000
3000
PALM BEACH
1000
PINELLAS
HILLSBOROUGH
BROWARD
DUVAL
PASCO
BREVARD
MARION
POLKVOLUSIA
ESCAMBIA
ORANGE
ST.
JOHNS
ST.
LEELUCIE
LAKE
LEON
CITRUS
MANATEE
OKALOOSA
ALACHUA
BAY
HERNANDO
SARASOTA
SANTA ROSA
CLAY
CHARLOTTE
PUTNAM
OSCEOLA
HIGHLANDS
COLLIER
SEMINOLE
WALTON
SUMTER
SUWANNEE
MARTIN
INDIAN
RIVER
JACKSON
CALHOUN
NASSAU
WASHINGTON
COLUMBIA
FLAGLER
HOLMES
BAKER
GULF
LEVY
BRADFORD
WAKULLA
MONROE
LIBERTY
OKEECHOBEE
UNION
DE
GADSDEN
SOTO
FRANKLIN
HARDEE
DIXIE
GILCHRIST
TAYLOR
JEFFERSON
MADISON
HAMILTON
HENDRY
LAFAYETTE
GLADES
0
Vote for Buchanan
Buchanan by Gore Vote
0
100000
200000
Vote for Gore
DADE
300000
400000