Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Research article. Submitted to Open Differential Psychology DATE, YEAR Published in Open Differential Psychology DATE, YEAR The sex ratio of comedians over time Emil O. W. Kirkegaard1 Bo Tranberg Abstract Data obtained from Wikipedia’s list of comedians shows that most comedians are/were male (about 84%) but that this has been falling in recent decades and is now about 72% for the last 20 years. Keywords: humor production, humor ability, entertainment, sex differences, comedy 1 Introduction The late Christopher Hitchens provoked a minor controversy in 2007 when he claimed that ”women aren’t funny”[1, 2, 3]. Hitchens was not making the claim that no women are funny, but merely the claim that ”men, taken on average and as a whole, [are] funnier than women”. Hitchens argues his case using evolutionary psychological reasoning. He claims that women find humor ability more attracted in a mate than do men. From this one might hypothesize that men are better at humor production. This might be because of better innate humor production ability, innate dispositions towards training humor production, or that men notice that women like funnier men and hence practice it in order to impress mates. Whichever is the case, men should be higher in humor production ability. Hitchens did not cite any actual statistics to back up his claim, so on the basis of his article it could best be seen as plausible but speculative. Are there are data on the matter? Yes, at least one study have examined humor production abilities, and it did found that men were higher (N=400, college students)[4]. If one assumes a normal distribution of humor production ability where the sexes have different averages, then one will expect relatively more men the farther out in the right tail one looks, see Figure 1. Hence, another way to study the humor production ability × sex link it to focus on high performers, we.e. comedians (see also [5] for a study of comedians). The goal of the present study was to examine the sex ratio of comedians. 1 Corresponding author: [email protected] Figure 1: Hypothetical model. Overlapping normal distributions of humor production ability. Men are in blue, women in red. The vertical line shows the cutoff for a high scoring subgroup. 2 General method The Internet is full of useful information, but not all of it is found in an easily available form. The process of data mining is to gather and combine data from sources to as to make it available for further use[6]. The English Wikipedia is a very comprehensive encyclopedia with about 5 million articles as well as a large number of supporting pages, totaling about 33 million. It features comprehensive lists of various types of people including comedians[7] which is the page we mined for this study[8]. The list of comedians contains the names and links to all the individual articles about the comedians. we wrote a python2 script to extract all the names of individual comedians (avoiding comedy groups and comedy writers). The next problem was how to measure which sex the person in the article is since Wikipedia does not consistently list people’s sex. English has gendered pronouns (he, his, him are male pronouns, she, her, hers are female) which made it possible to use these as a proxy for gender. The method is simple: count all the pronouns and group them by sex, then assign the sex based on the largest group. Initial testing revealed that this was a reliable method to ascertain the sex of the person in the article automatically. For instance, the article about Alex Agnew has 21 pronouns, of which 21 are male and he is then assigned as male by this method. Furthermore, we added code that captures the year the person was born. After testing that the code worked on a small test sample, we fetched the data for the entire list of comedians. 2 Python is a free, very versatile and easily learnable language which can be used for many sorts of data mining, data processing, and much more. Official homepage: https://www.python.org/. Introductory tutorial: http://www.codecademy.com/tracks/python 3 Results, further analysis and discussion The basic results are shown in Table 1. Men are heavily overrepresented as Hitchens claimed. Total 1106 Male 923 Percent male 83.45% Missing sex 1 Female 182 Percent female 16.46% Total with birth year data 1033 Percent with birth year data 93.40% Table 1: Descriptive statistics. Average across all years. The oldest comedian in the sample is born in 1723 and the youngest in 1993. Perhaps things have changed and men are no longer overrepresented in the later years. We wrote further code to calculate the male proportion as a moving average (by comedians’ birth years). The next question was how large an interval to base the moving average on. Setting the interval size 10 reduces the sample size of some years to <20, which introduces a large amount of sampling error (erratic movement in the figures). Conversely, setting the interval to e.g. 30 risks hiding real, but small tips and dips that may occur over a 30 year period. We experimented with different intervals and found 20 to be large enough to produce yearly samples of N>20 but small enough to see smaller tips and dips. The 20 year moving average is shown in Figure 2. The Appendix contains the moving averages for 5, 10 and 30 year intervals (US only, see below). The curious reader can use the dataset and code to calculate any other moving average he desires. Figure 2: Male proportion of comedians over time. The results show clearly that the male proportion has been falling since the start of the 1900s, but that comedians born in the last 20 years are still about 71% male. There are interesting patterns in the data. Clearly something happened that caused the bulge upwards around 1975 but that stopped so that the trend declined again. To properly test any such hypotheses, one must either have the variables of interest for most of the countries in the sample weighted accordingly, or limit the sample to some large country. Since presumingly, most of the sample was from the US, we wrote additional code to check whether the person was from the US3 . Results showed that at least 65.6% (725) of the sample was described as being American. The 20 year moving average of American comedians is shown in Figure 3. Figure 3: Male proportion of US comedians over time. As can be seen, the decrease in the male proportion of comedians has been stronger in the US sub-sample. The datapoint for the US sub-sample covering the years 1979-1999 reaches a low point of .63, while the corresponding datapoint in the full sample only reaches .7. However, the sample sizes are too small for this to be of much significance. Total N for 1979-1999 is 70 whereof 51 are from the US, leaving 19 in the non-US sub-sample (Fisher’s exact test yields a two-tailed p of .040). What explains these decreases in the male proportion of comedians? Surely, it has something to do with the general change of sex roles in society. This variable is, however, not very concrete and thus do not permit a correlational analysis. Instead one must look for variables with available data. Possibilities include: birth rates, abortion rates, the size of the male-female wage gap, age the average education level of women. Before considering potential explanatory variables we note that it is easy to produce variables that yield high correlations. This is because there is a strong general negative tendency in the data and any other dataset with such a tendency will produce a high correlation. For instance, merely correlating the male proportion over time with incrementally increasing numbers [1, 2, 3, ..., n] produces a correlation of .92. For this reason, one must be wary to only use predictors that have some theoretical grounding. At the same time, predictors must be found that have a general tendency upwards/downwards but which have a bulge at some point that fits with the increase in the male proportion in the 1960-70s (using 20 year moving average). 3 The specific method is this: Wikipedia articles are categorized in many categories some of which have to do with belonging to a certain country. We used this fact to find the comedians that were in a category beginning with ”Category:American”. For example, Bud Abbot is listed in the category ”Category:American male film actors”, which is detected by out method. This method probably misses some Americans that are not assigned to a category of Americans. 3.1 Abortion rate Conceivably, abortion rates are a proxy for women’s liberties in the US. The US legalized abortion in 1973[9] and data is available from that year till now which is shown in Figure 4. Figure 4: The abortion rate in the US over time. Based on data from [10, 11] While there is an increase in the abortion rate in the years 1973 to 1978 which fits somewhat with the comedian data, the fit is very not good. The correlation is .51 (N=28). However, these data are clearly too limited since there is no data before the change in comedian composition. Perhaps if data could be obtained that included estimates of illegal abortions before 1973. 3.2 Total fertility rate Total fertility rate (TFR, birth per woman) are a crude measure of how much time women have to pursue other interests than taking care of children. Fertility data are available for the US from the period 1917 to 2012 and are shown in Figure 5.4 4 When combining the sources, we noticed that they disagree slightly for the overlapping 4 years 1970-1973. We consulted the Center for Disease Control and they told us to go with the newer source in case of discrepancies. Figure 5: The total fertility rate in the US over time. Based on data from the Centers for Disease Control.[12, Table 1A] [13, Table 4] There is clearly an upward bulge in the TFR around 1945-1965 known as the Baby Boom[14]. However, the years do not fit, as the change in the male proportion of comedians came much later. One solution to this is to time-lag the correlation so as to produce the best fit. For the 20 year interval data, the time-lag to produce the highest correlation is 22 years, r=.77. The relationship is shown in Figure 6. Figure 6: Time series showing male proportion of US comedians over time (moving average interval 20) and the total fertility in the US lagged by 22 years. . We’re not sure which weight to put in this finding. When lagging correlations like this, then one reduces the sample size. The N for the years in Figure 6 is reduced to 62 (fertility data 1917-1978 and comedian data 1939-2000). Does the finding work with the other moving averages? To some degree. The corresponding analysis for interval 10 is shown in Figure 7. The best fit is when the lag is 20 years, not 22 as before. N=56 and r=.50. Figure 7: Time series showing male proportion of US comedians over time (moving average interval 10) and the total fertility in the US lagged by 20 years. . If one uses the 5 year interval data, the best fitting delay is 13 years and r=.56 (N=48), as shown in Figure 8. Figure 8: Time series showing male proportion of US comedians over time (moving average interval 5) and the total fertility in the US lagged by 13 years. . References [1] Christopher Hitchens. Why women aren’t funny. Vanity Fair, January 2007. [2] Christopher Hitchens. Why women still don’t get it. Vanity Fair, March 2008. [3] Alessandra Stanley. Who says women aren’t funny? Vanity Fair, April 2008. [4] Gil Greengross and Geoffrey Miller. Humor ability reveals intelligence, predicts mating success, and is higher in males. Intelligence, 39(4):188--192, 2011. [5] Gil Greengross, Rod A Martin, and Geoffrey Miller. Personality traits, intelligence, humor styles, and humor production ability of professional stand-up comedians compared to college students. Psychology of Aesthetics, Creativity, and the Arts, 6 (1):74, 2012. [6] Wikipedia. Data mining --- wikipedia, the free encyclopedia, 2014. URL http:// en.wikipedia.org/w/index.php?title=Data_mining&oldid=607689170. [Online; accessed 13-May-2014]. [7] Wikipedia. Statistics. Statistics. URL https://en.wikipedia.org/wiki/Wikipedia: [8] Wikipedia. List of comedians --- wikipedia, the free encyclopedia, 2014. URL http:// en.wikipedia.org/w/index.php?title=List_of_comedians&oldid=604966226. [Online; accessed 21-April-2014]. [9] Wikipedia. Roe v. wade --- wikipedia, the free encyclopedia, 2014. URL http:// en.wikipedia.org/w/index.php?title=Roe_v._Wade&oldid=606243847. [Online; accessed 13-May-2014]. [10] Rachel K Jones and Kathryn Kooistra. Abortion incidence and access to services in the united states, 2008. Perspectives on sexual and reproductive health, 43(1):41--50, 2011. [11] Rachel K Jones and Jenna Jerman. Abortion incidence and service availability in the united states, 2011. Perspectives on sexual and reproductive health, 46(1):3--14, 2014. [12] Robert L Heuser, National Center for Health Statistics (US), et al. Fertility tables for birth cohorts by color: United States, 1917-73. US Department of Health, Education, and Welfare, Public Health Service, Health Resources Administration, National Center for Health Statistics, 1976. URL http://www.cdc.gov/nchs/data/misc/ fertiltbacc.pdf. [13] Joyce A Martin, Brady E Hamilton, Stephanie J Ventura, Michelle JK Osterman, and TJ Matthews. Births: final data for 2011. National Vital Statistics Report, 62 (1), 2013. URL http://www.cdc.gov/nchs/data/nvsr/nvsr62/nvsr62_09.pdf. [14] Wikipedia. Post–world war ii baby boom --- wikipedia, the free encyclopedia, 2014. URL http://en.wikipedia.org/w/index.php?title=Post%E2%80%93World_War_ II_baby_boom&oldid=605675368. [Online; accessed 19-May-2014]. Appendix The lines only show the years for which N>=20. Figure 9: Male proportion of US comedians over time. Figure 10: Male proportion of US comedians over time. Figure 11: Male proportion of US comedians over time.