Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Proceedings of The National Conference On Undergraduate Research (NCUR) 2012 Weber State University, Ogden Utah March 29 – 31, 2012 Does the Normal Curve Accurately Model the Distribution of Intelligence? Lindsey R. Godwin and Kyle V. Smith Department of Behavioral Science Utah Valley University 800 West University Parkway Orem, UT 84058 Faculty Advisor: Dr. Russell T. Warne Abstract Like many human characteristics, intelligence is theorized to be normally distributed. However, a vocal minority of researchers and practitioners who study individuals with high intelligence have claimed that there are more people in the upper echelons of intelligence than would be expected if the normal curve accurately modeled the distribution of intelligence scores.1,2,3,4 To verify this claim we carefully searched articles from the journal Intelligence dated 1979 to 2012, completed an academic journal search and reviewed national data sets for samples that permit this claim to be tested. To be included samples must have been (a) representative of the population that the intelligence test used was normed on, (b) not be the test’s norm sample, (c) have at least 1,000 subjects in the sample, and (d) examined subject intelligence using an intelligence test with norms that are no more than 15 years old. This search yielded one such sample used in a study by.5 Two national data sets were also identified for use in this review, the National Longitudinal Study of Youth (NLSY) and the Early Childhood Longitudinal Study (ECLS). We reviewed the information provided from these sources and determined that intelligence is indeed normally distributed. Keywords: intelligence, normal distribution, human populations 1. Introduction The scientific study of intelligence dates back to the nineteenth century. Although it is one of the oldest areas of study in the psychological field, there are still many debates over the nature of intelligence. In an effort to quell some of the disagreements in the field and correct popular misconceptions, a group of scientists signed a statement affirming some of the mainstream findings within the field.6 Many issues were addressed in this statement, including providing a definitive definition of intelligence: Intelligence is a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts. Rather it reflects a broader and deeper capability for comprehending our surroundings – “catching on,” “making sense” of things, or “figuring out” what to do”.6, p. 13 This statement on “mainstream” understandings of intelligence provided a general consensus on many topics. For example, there are several different tests available for measuring intelligence. All are positively correlated due to the existence of Spearman’s g, the general factor of intelligence.7 Gottfredson, et al.6 also presented a consensus on one area of that has often been debated, the distribution of intelligence. The 52 signers of the “mainstream” document stated that intelligence is “normally distributed”.6, p. 13 This idea concurred with Terman’s findings from nearly a century ago. When adapting Binet’s test for an American audience, Terman theorized that intelligence was normally distributed and his observations of 905 children in his norm sample generally supported this assertion.8 Yet, a small group of theorists today insist that intelligence is not normally distributed, but rather is skewed, demonstrating a higher number of individuals in the upper echelons than would be present in the normal curve.2,3,4,9,10,11,12 This idea of a non-normal distribution of intelligence is nearly as old as the claim that intelligence follows normal distributions. Terman, in the first volume of his longitudinal study of gifted children was the first to voice dissent against the theory of a normal distribution of intelligence. He stated, “The number of very high cases is larger than the standard deviation of the I.Q. distribution for unselected children would lead one to expect. It is doubtful therefore whether the incidence of superior intelligence follows the normal probability curve”.12 Since that time, other researchers have noted the same unexpected outcome, leading them to believe that the number of individuals with greater than average intelligence is higher than indicated in a normal curve. These findings coincide with those of another critic of the normality hypothesis, Sir Cyril Burt, who found that there are more than ten times the number of individuals with IQs over 160 than would occur under the assumption of normality. Burt determined that the true distribution was much more highly peaked with elongated tails on both ends.10 Several other researchers have found a similar pattern in their dealings in intelligence research.1,2,3,10 Although the idea that the normal curve does not fit has had support by some of the most prominent figures in intelligence over the past century, there has been no change in the general assertion that intelligence is normally distributed as demonstrated by the widespread support for the6 mainstream statement on intelligence. Is this because this idea is misguided or that the field has just not yet come to realize the accuracy of this assertion? When differing results exist, how can one determine the accurate distribution? The best answer to this would be to conduct an extensive study with a representative sample and demonstrate the resulting distribution. However, such studies can be very costly because of the extensive time and resources required to acquire such a sample, test subjects and tabulate results. A reasonable alternative is to conduct an extensive, systematic review of existing literature and data sets to identify samples that will allow this theory to be tested. Our goal in this paper is to examine appropriate data sets and reports from literature in order to test the theory that intelligence is normally distributed. 2. Methods 2.1 Literature Review An in depth literature review was conducted searching a variety of criteria to identify articles that showed a distribution of intelligence scores in the upper levels of cognitive ability. The literature review included a thorough search of all the articles published in the Intelligence journal from 1977 to early 2012. A search of the various articles located in academic search engines such as PsycInfo and ScienceDirect was also conducted. For an article to be included in this review, the sample must have been a representative of the population on which the intelligence test was normed. However, the sample could not be the same with which the test was normed because scores from norm samples are often forced to fit a normal distribution. We decided that a minimum of 1,000 subjects must have been included in the sample because smaller samples were unlikely to have a sufficient number of people in the top percentiles to detect violations of normality. Finally, to avoid the influences of the Flynn13 effect, (i.e., the tendency for mean intelligence scores to gradually increase in a population), the sample must be examined with an intelligence test with norms that are no more than 15 years old. Samples were excluded if it was found that the test utilized had not been established as a producer of reliable data (i.e., consistent), the author noted difficulties encountered during test administration or did not administer the full test or thesample utilizes a non-representative sample of subjects, such as a volunteer subjects. The exclusion of volunteers is necessary as it has been found that volunteers for intelligence testing often have above average intelligence.14 Inclusion of such subjects could create selection bias. 2.2 Datasets The search for applicable samples to be included in this research identified two publically available data sets, the National Longitudinal Study of Youth (NLSY) and the Early Childhood Longitudinal Study (ECLS-K:98, hereafter referred to as ECLS). Datasets must have met the same criteria as in the literature review requirements. In addition, the tests must have been a reasonably good measure of Spearman’s g. Finally, variables in datasets that demonstrated an obvious ceiling effect were excluded. Both the NLYS and ECLS studies met the criteria for inclusion. 988 As a part of the NLSY study, the Armed Services Vocational Aptitude Battery (ASVAB) was administered to participants in 1980. Although the ASVAB was not originally prepared as a means of measuring intelligence alone, but more for military preparedness and job placement, test items include arithmetic reasoning, word knowledge, paragraph comprehension, general science, and mathematical knowledge all of which are attributed to intelligence and are generally recognized to be at least moderately related to general intelligence.15 It should be noted that the NLSY sample was used to norm the ASVAB subtests; however, in the norming process, the subtests were not forced to conform to a normal distribution.16 Therefore, the data set was appropriate for usage in this study. ECLS is a longitudinal study conducted at seven different points in time that measured participants’ academic achievement—among other variables—from kindergarten in (in 1998) to the eighth grade (in 2007). Students were tested at seven time points: general knowledge was tested at the start of kindergarten (C1), end of kindergarten (C2), the start of 1st grade (C3), reading and math was tested at the end of 1st grade (C4), reading, math, and science at the end of 3rd grade (C5), and science was tested at the end of 5th grade (C6). Although ECLS subjects were examined in their academic achievement, we believe that these tests were reasonably strong correlates with g, as most academic tests are.17,18,19,20 A Kolmogrov-Smirnov test was utilized to analyze the data found in both the NLSY and ECLS studies as compared to a normal distribution. However, because of the large sample sizes for both datasets (The ECLS sample size was 21,409, which was weighted to have an effective sample size of 3.56 million to 3.88 million, depending on the measurement occasion), even trivial deviations from normality would be statistically significant. Therefore, we also prepared Q-Q plots that compared the distribution with a normal distribution for variables for the two datasets. To avoid the problems associated with statistical significance tests with large, we also made comparisons between the observed number of subjects in the top percentiles and the theorized number of subjects in the top percentiles under the normality assumption. To make these comparisons, every person’s score was converted to a z-score, which was then used to place participants in four different groups. Participants that scored in the top one percent (z-score of ≥ 2.3263) as Group B, participants that scored in the top two percent (z-score of ≥ 2.0537) as Group C, and those that scored in the top five percent (z-score of ≥ 1.6449) as Group D. The proportions of participants who fall into each group were compared to the proportions that would be expected if intelligence were normally distributed. 3. Results 3.1 Literature Review Results The search for applicable samples to be included in this research identified an article from the Voortgezet Onderwijs Cohort Leerlingen (VOCL) longitudinal cohort study. For the VOCL study, a representative sample of 19,391 students from 126 secondary schools in the Netherlands was collected. The participants started Grade 1 of secondary school in 1999, with participants being an average of 12 years old, and were monitored throughout their secondary education.5 Guldemond and his coauthors used the data that were collected in the first three years of the study for their study. The sample included about 13,000 students who completed the Groningen Intelligence test for Secondary Education (GIVO-test), a Dutch intelligence test, and was a representative sample of Dutch school children. Based on the scores of the intelligence test, the students were categorized into four levels of intelligence: students that scored an IQ of 144 or better, students that scored an IQ of 130-143, students with an IQ of 120-129, and students with an IQ between 110 and 119.5 It was projected that on a normal distribution there would be about 22 students (21.80) with an IQ of 144, and there were 20 found. It was also projected that there would be about 318 (317.55) students with an IQ higher than 130, and there were 342 found. It was projected that about 1,503 (1,503.29) students would be found with an IQ over 120 and there were 1,286 found. Lastly, it was projected that about 4,786 (4,785.70) students found with an IQ higher than 110, and 3,462 were found. The projections and the actual number of students with an IQ over 144 were fairly close; and there were more students than projected that had an IQ over 130. However, there were far fewer students than projected with an IQ over 120 and even more fewer than projected with an IQ over 110. These results may not show a clear pattern, but they do show that with the exception of students found with an IQ over 120, that intelligence seems to generally conform to the distribution of a normal curve. 989 3.2 Data Set Results The results from the data sets are listed in Table 1 (for ECLS) and Table 2 (for NLSY). Table 1 shows that the distribution of the sample’s general knowledge at the beginning of kindergarten (C1) was close to the projections of a normal distribution. However, when testing general knowledge at time points C2 and C3, there were far fewer subjects in the upper echelons than would have been expected from the normal distribution. Results for C4 reading and math achievement showed that the participants scored higher than was projected on a normal distribution, but with C5 reading, math, and science achievement, the sample had fewer participants than anticipated in ever group examined. C6 science achievement was also far lower than projected with a normal distribution; only 0.4% of students obtained z-scores that would place the students in the top 2% on the normal distribution. Table 1. Number of participants that scored in group a, b, c, and d Group C1 Gen Know C2 Gen Know A B C D 0.4 1.2 2.4 5.9 0.1 0.4 1.2 4.5 C3 Gen Know — < 0.1 0.4 3.3 C4 Reading 1.5 2.0 4.2 7.1 C4 Math 1.3 2.2 3.6 6.9 C5 Reading C5 Math C5 Science < 0.1 0.2 0.8 3.9 0.1 0.3 1.2 5.0 0.3 0.9 2.0 5.6 C6 Science — < 0.1 0.4 3.3 Note: Number of participants that are ≤ the projection of a normal distribution are in boldface. Note. A = z-score of ≥ 2.5758; B = z-score of ≥ 2.3263; C = z-score of ≥ 2.0537; D = z-score of ≥ 1.6449. Note. Cells marked with a dash (—) indicate that the test did not have a high enough ceiling to make the comparison between observed scores and the normal distribution. Table 2. Number of participants that scored in group a, b, c, and d Group A B C D ASVAB Gen Science — — — 7.2 ASVAB Arith Reason — — — 8.0 ASVAB Speed — 0.5 1.5 3.4 Coding ASVAB Math Know — — — 7.5 ASVAB Mech Comp — — 2.2 6.8 Note. Number of participants that are ≤ the projection of a normal distribution are in boldface. Note. A = z-score of ≥ 2.5758; B = z-score of ≥ 2.3263; C = z-score of ≥ 2.0537; D = z-score of ≥ 1.6449 Note. Cells marked with a dash (—) indicate that the test did not have a high enough ceiling to make the comparison between observed scores and the normal In Table 2, results for the ASVAB subtests of general science, arithmetic reasoning, mathematical knowledge, and mechanical comprehension all showed a number of participants that were higher than what would be projected on a normal distribution. However, for the coding speed subtest, the results indicated that the number of participants was lower in every group than projected on a normal distribution. This is particularly interesting because coding speed is generally recognized as being related to intelligence and is even included as a subtest on some intelligence tests.21 990 4. Discussion Based on the results of Guldemond et al.’s5 study and the datasets we examined, the affirmation that more people are located in the top percentiles of intelligence than would be expected in a normal distribution may not be supported. Of the 42 tests of the theory of normally distributed intelligence shown in Tables 1 and 2, 23 (54.8%) showed that there were fewer people than expected in the top ranges of intelligence. Because about half our results support the beliefs of a skewed distribution of intelligence and others and half do not, we believe that support for the assertion that there are more people than expected in the upper echelons of intelligence is inconsistent and irregular. In general, we believe that these findings support the mainstream claim that the distribution of the IQ scores can be represented as a normal curve, as stated in the mainstream statement on intelligence.6 It is true that there are some departures from normality that can be seen in Tables 1 and 2. However, there is no consistent pattern to these departures from the normal distribution. We believe that these variations are merely the result of the expected differences between an idealized, theoretical distribution and real-life data. Strength for our conclusion comes from a variety of sources that include multiple age groups that were sampled, ranging from children to adults. The samples also came from multiple countries—the United States and the Netherlands. Moreover, all our results come from nationally representative samples that can be generalized to much larger populations. Finally, the methods of measuring intelligence range from traditional intelligence tests to academic achievement and aptitude testing. We believe that the weaknesses of each of these samples and tests are balanced out by the inclusion of the other results. So why are there many1,2,3,4,9,11 that believe that there are more individuals in the extremely high levels of intelligence? We believe there are three main reasons. First, many of these authors have used tests that produced an IQ score based on a ratio IQ (i.e., mental age divided by chronological age). The ratio IQ, however, was problematic because it does not always produce a normal distribution. Also, the standard deviations of ratio IQs are larger than the expected 15 or 16 points. These reasons are why this method for producing an intelligence test score was abandoned decades ago.21 For example, Silverman4,22 has advocated the use of the Stanford Binet L-M, an old, outdated test (from 1960) that produces an IQ test based on a ratio, and she still uses that test in her clinic for high ability children.23,24 Given the flaws in ratio IQs, it is no surprise that Silverman—and others who still use tests that produce these ratio IQs—believe that there are more high IQ people than the normal curve would predict. Second, some theorists may believe that there are more people with high intelligence than would be predicted by the normal distribution because of the Flynn effect. The Stanford-Binet L-M was last normed in 1976; in the three and a half decades since, it would be expected (based on data from Flynn,13 that the average IQ would rise from 100 points to roughly 110. With such severe IQ score inflation, it is no wonder that people who advocate (and use) the Stanford-Binet L-M in the 21st century find so many more subjects obtaining high IQs than the normal curve would lead them to expect. It is true that the Flynn effect seems most severe for lower intelligence populations than higher intelligence populations,25 but it would be naïve to state that the upper echelons are completely immune to the Flynn effect. Third, many of the authors who suggest that there are more intelligent people than there should be if intelligence were normally distributed have come to this conclusion based on data from non-representative samples.3,4,12 In all of these studies, the authors claim that they found more gifted children than they expected; however, none of these authors made any attempt to take a nationally representative sample of children. Instead, the children in all of these studies were referred to the researcher and in many cases large portions of the population (especially rural groups) were undersampled or ignored completely. It is quite possible that non-representative samples can lead to incorrect conclusions about the generalizability of a study. There could be a variety of other reasons that the idea that intelligence is non-normally distributed is a stubbornly persistent notion. Some examiners could be ignoring stop rules recommended by test creators in the administration guidelines (as one author heard Silverman advocate in 2009). It’s also possible that these theorists—who often specialize in the study of gifted children—spend a great deal of time around many high intelligence people, leading them to believe that there are more high intelligence people than there really are. This frequent exposure to high intelligence people may lead some researchers to a believe that there experience is more common than it actually is—a false consensus effect.26 991 5. Conclusion Our research question—and the title of this paper—was, “Does the normal curve accurately model the distribution of intelligence?” From the results that we presented, we believe that the answer is “mostly yes.” The data in Tables 1 and 2 do not perfectly conform to the normal distribution. However, the normal distribution is highly idealized and minor deviations are to be expected. However, in response to the question, “Are there more people than expected in the upper echelons of intelligence?” the answer is a resounding no. We could find no pattern in our data that would indicate that the normal distribution is not a reasonable model for the number of people who achieve high scores on g-loaded tests. We believe that it is not possible to reject the hypothesis of a normally distribution of intellectual ability and that there is strong evidence that intelligence scores do fall in a normal distribution. The results come from representative samples that included multiple age groups, multiple countries, and many different ways of measuring Spearman’s g. The declaration of anything otherwise has not been sustained and we encourage intelligence researchers and theorists to rally behind the mainstream statement on intelligence stating, “The spread of people along the IQ continuum, from low to high, can be represented well by the bell curve (in statistical jargon the ‘normal curve’)”.6, p. 13 6. References ___________________________________________ 1. Gallagher, J.J. (2008). According to Jim: The flawed normal curve of intelligence. Roeper Review, 30, 211-212. 2. McGuffog, C., Feiring, C. & Lewis, M. (1987). The diverse profile of the extremely gifted child. Roeper Review, 10, 82-89. 3. Robinson, H. B. (1981). The uncommonly bright child. In M. Lewis & L. A. Rosenblum (Eds.), The uncommon child (pp. 57-81). New York: Plenum Press 4. Silverman, L. K. (2009). The measurement of giftedness. International Handbook on Giftedness, 947-970. doi: 10.1007/978-1-4020-6162-2_48 5. Guldemond, H., Bosker, R., Kuyper, H., & van der Werf, G. (2007). Do highly gifted students really have problems? Educational Research and Evaluation, 13, 555-568. doi:10.1080/13803610701786038 6. Gottfredson, L. S. (1997a). Mainstream science on intelligence: An editorial with 52 signatories, history, and bibliography. Intelligence, 24, 13-23. doi:10.1016/S0160-2896(97)90011-8 7. Neisser, U., Boodoo, G., Bouchard, T. R., Boykin, A., Brody, N., Ceci, S. J., & . . . Urbina, S. (1996). Intelligence: Knowns and unknowns. American Psychologist, 51, 77-101. doi:10.1037/0003-066X.51.2.77 8. Terman, L. M. (1916). The measurement of intelligence. Boston, MA: Houghton Mifflin. 9. Burt, C. (1963). Is intelligence distributed normally? The British Journal of Statistical Psychology, 16, 175-194. 10. Parkyn, G. W. (1945). The clinical significance of IQ's on the Revised Stanford-Binet Scale. Journal of Educational Psychology, 36, 114-118. doi:10.1037/h0055705 11. Terman, L. M. (1922). A new approach to the study of genius. Psychological Review, 29, 310-318. doi:10.1037/h0071072 12. Terman, L. M. (1926). Genetic studies of genius: Vol. I. Mental and physical traits of a thousand gifted children. (2nd ed.). Stanford, CA: Stanford University Press. 13. Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin, 101(2), 171-191. doi: 10.1037/0033-2909.101.2.171 14. Rosenthal, R., & Rosnow, R. L. (1975). The volunteer subject. New York, NY: Wiley. 15. Gottfredson, L. S. (1997b). Why g matters: The complexity of everyday life. Intelligence, 24, 79-132. doi:10.1016/S0160-2896(97)90014-3 16. Bock, R. D., & Mislevy, R. J. (1981). Data quality analysis of the Armed Services Vocational Aptitude Battery. Chicago, IL: National Opinion Research Center. 17. Lohman, D. F. (2006). Beliefs about differences between ability and accomplishment: From folk theories to cognitive science. Roeper Review, 29, 32-40. 18. Merwin, J. C., & Gardner, E. F. (1962). Development and application of tests of educational achievement. Review of Educational Research, 32, 40-50. doi:10.2307/1169202 19. Schmeiser, C. B., & Welch, C. J. (2006). Test development. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 307-353). Westport, CT: Praeger Publishers. 992 20. Zwick, R. (2006). Higher education admissions testing. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 647-679). Westport, CT: Praeger Publishers. 21. Kaplan, R. M., & Saccuzzo, D. P. (2009). Psychological testing: Principles, applications, and issues. (7 ed.). Belmont, CA: Wadsworth. 22. Silverman, L. K., & Kearney, K. (1992). The case for the Stanford-Binet L-M as a supplemental test. Roeper Review, 15, 34-37 23. Konigsberg, E. (2006). Prairie fire. The New Yorker, 81(44), 44-57. 24. Silverman, L.K. (2002). Why we use the Stanford-Binet (Form L-M), The Examiner: The Journal of the Kansas Association of School Psychologists, 28(3), 20-21. 25. Zhou, X., Zhu, J., & Weiss, L. G. (2010). Peeking inside the “black box” of the Flynn effect: Evidence from three Wechsler instruments .Journal of Psychoeducational Assessment, 28, 399-411. 26. Marks, G., & Miller, N. (1987). Ten years of research on the false-consensus effect: An empirical and theoretical review. Psychological Bulletin, 102, 72-90. doi:10.1037/0033-2909.102.1.72 993