Download The sex ratio of comedians over time

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Research article.
Submitted to Open Differential Psychology DATE, YEAR
Published in Open Differential Psychology DATE, YEAR
The sex ratio of comedians over time
Emil O. W. Kirkegaard1
Bo Tranberg
Abstract
Data obtained from Wikipedia’s list of comedians shows that most comedians are/were
male (about 84%) but that this has been falling in recent decades and is now about 72%
for the last 20 years.
Keywords: humor production, humor ability, entertainment, sex differences, comedy
1
Introduction
The late Christopher Hitchens provoked a minor controversy in 2007 when he claimed
that ”women aren’t funny”[1, 2, 3]. Hitchens was not making the claim that no women
are funny, but merely the claim that ”men, taken on average and as a whole, [are] funnier
than women”. Hitchens argues his case using evolutionary psychological reasoning. He
claims that women find humor ability more attracted in a mate than do men. From this
one might hypothesize that men are better at humor production. This might be because
of better innate humor production ability, innate dispositions towards training humor
production, or that men notice that women like funnier men and hence practice it in
order to impress mates. Whichever is the case, men should be higher in humor production
ability. Hitchens did not cite any actual statistics to back up his claim, so on the basis of
his article it could best be seen as plausible but speculative.
Are there are data on the matter? Yes, at least one study have examined humor
production abilities, and it did found that men were higher (N=400, college students)[4].
If one assumes a normal distribution of humor production ability where the sexes have
different averages, then one will expect relatively more men the farther out in the right
tail one looks, see Figure 1. Hence, another way to study the humor production ability
× sex link it to focus on high performers, we.e. comedians (see also [5] for a study of
comedians). The goal of the present study was to examine the sex ratio of comedians.
1
Corresponding author: [email protected]
Figure 1: Hypothetical model. Overlapping normal distributions of humor production
ability. Men are in blue, women in red. The vertical line shows the cutoff for a high
scoring subgroup.
2
General method
The Internet is full of useful information, but not all of it is found in an easily available
form. The process of data mining is to gather and combine data from sources to as to
make it available for further use[6].
The English Wikipedia is a very comprehensive encyclopedia with about 5 million
articles as well as a large number of supporting pages, totaling about 33 million. It features
comprehensive lists of various types of people including comedians[7] which is the page we
mined for this study[8].
The list of comedians contains the names and links to all the individual articles about
the comedians. we wrote a python2 script to extract all the names of individual comedians
(avoiding comedy groups and comedy writers). The next problem was how to measure
which sex the person in the article is since Wikipedia does not consistently list people’s
sex. English has gendered pronouns (he, his, him are male pronouns, she, her, hers are
female) which made it possible to use these as a proxy for gender. The method is simple:
count all the pronouns and group them by sex, then assign the sex based on the largest
group. Initial testing revealed that this was a reliable method to ascertain the sex of the
person in the article automatically. For instance, the article about Alex Agnew has 21
pronouns, of which 21 are male and he is then assigned as male by this method.
Furthermore, we added code that captures the year the person was born. After testing
that the code worked on a small test sample, we fetched the data for the entire list of
comedians.
2
Python is a free, very versatile and easily learnable language which can be used for many sorts of data
mining, data processing, and much more. Official homepage: https://www.python.org/. Introductory
tutorial: http://www.codecademy.com/tracks/python
3
Results, further analysis and discussion
The basic results are shown in Table 1. Men are heavily overrepresented as Hitchens
claimed.
Total
1106
Male
923
Percent male
83.45%
Missing sex
1
Female
182
Percent female
16.46%
Total with birth year data
1033
Percent with birth year data
93.40%
Table 1: Descriptive statistics. Average across all years.
The oldest comedian in the sample is born in 1723 and the youngest in 1993. Perhaps
things have changed and men are no longer overrepresented in the later years. We wrote
further code to calculate the male proportion as a moving average (by comedians’ birth
years). The next question was how large an interval to base the moving average on.
Setting the interval size 10 reduces the sample size of some years to <20, which
introduces a large amount of sampling error (erratic movement in the figures). Conversely,
setting the interval to e.g. 30 risks hiding real, but small tips and dips that may occur
over a 30 year period. We experimented with different intervals and found 20 to be large
enough to produce yearly samples of N>20 but small enough to see smaller tips and dips.
The 20 year moving average is shown in Figure 2. The Appendix contains the moving
averages for 5, 10 and 30 year intervals (US only, see below). The curious reader can use
the dataset and code to calculate any other moving average he desires.
Figure 2: Male proportion of comedians over time.
The results show clearly that the male proportion has been falling since the start of
the 1900s, but that comedians born in the last 20 years are still about 71% male. There
are interesting patterns in the data. Clearly something happened that caused the bulge
upwards around 1975 but that stopped so that the trend declined again.
To properly test any such hypotheses, one must either have the variables of interest for
most of the countries in the sample weighted accordingly, or limit the sample to some large
country. Since presumingly, most of the sample was from the US, we wrote additional
code to check whether the person was from the US3 . Results showed that at least 65.6%
(725) of the sample was described as being American. The 20 year moving average of
American comedians is shown in Figure 3.
Figure 3: Male proportion of US comedians over time.
As can be seen, the decrease in the male proportion of comedians has been stronger in
the US sub-sample. The datapoint for the US sub-sample covering the years 1979-1999
reaches a low point of .63, while the corresponding datapoint in the full sample only reaches
.7. However, the sample sizes are too small for this to be of much significance. Total N for
1979-1999 is 70 whereof 51 are from the US, leaving 19 in the non-US sub-sample (Fisher’s
exact test yields a two-tailed p of .040).
What explains these decreases in the male proportion of comedians? Surely, it has
something to do with the general change of sex roles in society. This variable is, however,
not very concrete and thus do not permit a correlational analysis. Instead one must look
for variables with available data. Possibilities include: birth rates, abortion rates, the size
of the male-female wage gap, age the average education level of women.
Before considering potential explanatory variables we note that it is easy to produce
variables that yield high correlations. This is because there is a strong general negative
tendency in the data and any other dataset with such a tendency will produce a high
correlation. For instance, merely correlating the male proportion over time with incrementally increasing numbers [1, 2, 3, ..., n] produces a correlation of .92. For this reason, one
must be wary to only use predictors that have some theoretical grounding. At the same
time, predictors must be found that have a general tendency upwards/downwards but
which have a bulge at some point that fits with the increase in the male proportion in the
1960-70s (using 20 year moving average).
3
The specific method is this: Wikipedia articles are categorized in many categories some of which
have to do with belonging to a certain country. We used this fact to find the comedians that were
in a category beginning with ”Category:American”. For example, Bud Abbot is listed in the category
”Category:American male film actors”, which is detected by out method. This method probably misses
some Americans that are not assigned to a category of Americans.
3.1
Abortion rate
Conceivably, abortion rates are a proxy for women’s liberties in the US. The US legalized
abortion in 1973[9] and data is available from that year till now which is shown in Figure
4.
Figure 4: The abortion rate in the US over time. Based on data from [10, 11]
While there is an increase in the abortion rate in the years 1973 to 1978 which fits
somewhat with the comedian data, the fit is very not good. The correlation is .51 (N=28).
However, these data are clearly too limited since there is no data before the change in
comedian composition. Perhaps if data could be obtained that included estimates of illegal
abortions before 1973.
3.2
Total fertility rate
Total fertility rate (TFR, birth per woman) are a crude measure of how much time women
have to pursue other interests than taking care of children. Fertility data are available for
the US from the period 1917 to 2012 and are shown in Figure 5.4
4
When combining the sources, we noticed that they disagree slightly for the overlapping 4 years
1970-1973. We consulted the Center for Disease Control and they told us to go with the newer source in
case of discrepancies.
Figure 5: The total fertility rate in the US over time. Based on data from the Centers for
Disease Control.[12, Table 1A] [13, Table 4]
There is clearly an upward bulge in the TFR around 1945-1965 known as the Baby
Boom[14]. However, the years do not fit, as the change in the male proportion of comedians
came much later. One solution to this is to time-lag the correlation so as to produce the
best fit. For the 20 year interval data, the time-lag to produce the highest correlation is
22 years, r=.77. The relationship is shown in Figure 6.
Figure 6: Time series showing male proportion of US comedians over time (moving average
interval 20) and the total fertility in the US lagged by 22 years.
.
We’re not sure which weight to put in this finding. When lagging correlations like
this, then one reduces the sample size. The N for the years in Figure 6 is reduced to 62
(fertility data 1917-1978 and comedian data 1939-2000).
Does the finding work with the other moving averages? To some degree. The
corresponding analysis for interval 10 is shown in Figure 7. The best fit is when the lag is
20 years, not 22 as before. N=56 and r=.50.
Figure 7: Time series showing male proportion of US comedians over time (moving average
interval 10) and the total fertility in the US lagged by 20 years.
.
If one uses the 5 year interval data, the best fitting delay is 13 years and r=.56 (N=48),
as shown in Figure 8.
Figure 8: Time series showing male proportion of US comedians over time (moving average
interval 5) and the total fertility in the US lagged by 13 years.
.
References
[1] Christopher Hitchens. Why women aren’t funny. Vanity Fair, January 2007.
[2] Christopher Hitchens. Why women still don’t get it. Vanity Fair, March 2008.
[3] Alessandra Stanley. Who says women aren’t funny? Vanity Fair, April 2008.
[4] Gil Greengross and Geoffrey Miller. Humor ability reveals intelligence, predicts
mating success, and is higher in males. Intelligence, 39(4):188--192, 2011.
[5] Gil Greengross, Rod A Martin, and Geoffrey Miller. Personality traits, intelligence,
humor styles, and humor production ability of professional stand-up comedians
compared to college students. Psychology of Aesthetics, Creativity, and the Arts, 6
(1):74, 2012.
[6] Wikipedia. Data mining --- wikipedia, the free encyclopedia, 2014. URL http://
en.wikipedia.org/w/index.php?title=Data_mining&oldid=607689170. [Online;
accessed 13-May-2014].
[7] Wikipedia.
Statistics.
Statistics.
URL https://en.wikipedia.org/wiki/Wikipedia:
[8] Wikipedia. List of comedians --- wikipedia, the free encyclopedia, 2014. URL http://
en.wikipedia.org/w/index.php?title=List_of_comedians&oldid=604966226.
[Online; accessed 21-April-2014].
[9] Wikipedia. Roe v. wade --- wikipedia, the free encyclopedia, 2014. URL http://
en.wikipedia.org/w/index.php?title=Roe_v._Wade&oldid=606243847. [Online;
accessed 13-May-2014].
[10] Rachel K Jones and Kathryn Kooistra. Abortion incidence and access to services in
the united states, 2008. Perspectives on sexual and reproductive health, 43(1):41--50,
2011.
[11] Rachel K Jones and Jenna Jerman. Abortion incidence and service availability in the
united states, 2011. Perspectives on sexual and reproductive health, 46(1):3--14, 2014.
[12] Robert L Heuser, National Center for Health Statistics (US), et al. Fertility tables for
birth cohorts by color: United States, 1917-73. US Department of Health, Education,
and Welfare, Public Health Service, Health Resources Administration, National
Center for Health Statistics, 1976. URL http://www.cdc.gov/nchs/data/misc/
fertiltbacc.pdf.
[13] Joyce A Martin, Brady E Hamilton, Stephanie J Ventura, Michelle JK Osterman,
and TJ Matthews. Births: final data for 2011. National Vital Statistics Report, 62
(1), 2013. URL http://www.cdc.gov/nchs/data/nvsr/nvsr62/nvsr62_09.pdf.
[14] Wikipedia. Post–world war ii baby boom --- wikipedia, the free encyclopedia, 2014.
URL http://en.wikipedia.org/w/index.php?title=Post%E2%80%93World_War_
II_baby_boom&oldid=605675368. [Online; accessed 19-May-2014].
Appendix
The lines only show the years for which N>=20.
Figure 9: Male proportion of US comedians over time.
Figure 10: Male proportion of US comedians over time.
Figure 11: Male proportion of US comedians over time.