Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 2: Descriptive Statistics **This chapter corresponds to chapters 2 (“Means to an End”) and 3 (“Vive la Difference”) of your book. What it is: Descriptive statistics are values that describe the characteristics of a sample or population. This chapter will focus on two types of descriptive statistics. The first type is measures of central tendency (the mean, median, and mode), which are statistics that describe the typical value in a sample or population. The second type is measures of variability (the range, standard deviation, and variance), which are statistics that describe how different the scores in a sample or population are from each other. When to use it: You should use descriptive statistics when you wish to describe the average value and/or the amount of variability in a sample or population. Questions asked by descriptive statistics: What is the typical value in a set of scores? How variable are a set of scores? Examples of research questions that would use descriptive statistics: o o What is the average household income of children diagnosed with Attention Deficit Hyperactivity Disorder? Do students at Ivy League universities all have the same high school grades and SAT scores (e.g., everyone has a 4.0 and a 1600) or is there a large amount of variability in the students’ grades and SAT scores (such that some students have very high scores and others very low scores)? Using SPSS to Calculate Descriptive Statistics (dataset: Chapter 2 Example 1.sav) Stella has noticed that Aggies are always saying “howdy” to her. She wonders if Aggies are just more extraverted than your average person. So, Stella gives 20 Aggies a questionnaire called the Extraversion IQ Instrument that provides persons with an extraversion score. Like a normal IQ measure, the Extraversion IQ Instrument is scaled such that a score of 100 means you have an average amount of extraversion. Stella wants to know (a) if Aggies are more extraverted than normal and (b) if all Aggies are extraverted or if Aggies are variable in extraversion. A note on drawing inferences from descriptive statistics Stella wants to know if Aggies are “more extraverted than normal.” To address this question she is using descriptive statistics such as the mean, median, and mode. For instance, if the mean Extraversion IQ of the Aggie sample is significantly larger than 100, Stella might infer that Aggies are more extraverted than normal. The problem with relying solely on descriptive statistics to draw inferences such as “more extraverted than normal” (or “the mean of group 1 is larger than the mean of group 2”, or “the amount of variability in this sample is large”) is that the meaning of “significantly larger” is vague. How much higher than 100 would the Aggie sample mean have to be for Stella to declare it “significantly larger” than 100 (e.g., is 115 high enough; how about 101)? There are actually formal statistics called inferential statistics that can be used to quantify exactly how large something needs to be to call it “significant”. You will learn all about inferential statistics later in the semester. For now, you will just use your subjective judgment (based on your knowledge of descriptive statistics) to determine whether an average is larger than another average or whether a sample has a large or small amount of variability. However, you should know that this is, at best, a “quick and dirty” way to do this, and that inferential statistics are much more appropriate. Selection of the appropriate statistic(s) Because Stella is interested in describing Aggies’ average amount of extraversion, the mean, median, and mode are each appropriate descriptive statistics. Additionally, because Stella is interested in describing the degree to which Aggies vary in extraversion, the range, standard deviation, and variance are each appropriate statistics (although you don’t normally report the variance because it’s difficult to interpret squared units such as amount of extraversion squared). Computation of the statistic(s) We will use SPSS to calculate the descriptive statistics for us. Open the dataset “Chapter 2 Example 1.sav”. Take a moment to familiarize yourself with the data. Note how data for this type of analysis should be entered. 1) Each participant has one row in the data. 2) One column is used to indicate each participant’s identification number, which is just a number that is assigned to each participant in the study (this variable is “ID” in the present example). 3) The second column indicates each participant’s score on the variable for which we want descriptive statistics (this variable is “exiq” in the present example, meaning the Extraversion IQ score for each participant). The data should look something like this in SPSS: If you switch to variable view, you should see that the two variables have labels indicating that they represent “Participant ID” and “Extraversion IQ”, respectively. If you did not like those labels, you could change the labels to whatever you want. To calculate descriptive statistics in SPSS, click on the “Analyze” drop-down menu, highlight “Descriptive Statistics”, and then click “Frequencies”, as pictured below. The following pop-up window will appear: Note that the two variables are listed in the pop-up window by their labels, with their variable names in parentheses (e.g. “Extraversion IQ [exiq]”). Highlight the variable(s) for which you wish to calculate descriptive statistics (“Extraversion IQ [exiq]” in this example) and then click on the arrow to make the variable(s) appear in the Variable(s): window, as pictured below. Now click the “Statistics…” button. The following popup window will appear: We use this window to tell SPSS exactly which descriptive statistics to calculate. In the present example, we would like the three main measures of central tendency (mean, median, and mode) and the three main measures of variability (range, standard deviation, and variance). So, click on the boxes to put checkmarks next to those six descriptive statistics. When calculating the range it is useful to know the minimum and maximum values that the range is based on, so it is a good idea to put checkmarks next to “Minimum” and “Maximum” as well. Your screen should look like this: Click Continue to return to the original “Frequencies” popup window. Uncheck “Display Frequency Tables”. For some purposes, the frequency tables can be useful, but for our present purposes, the frequency tables would just be extra, unneeded output. Your screen should look like this: The Frequencies popup window also provides an easy way to create a type of graph/chart called a histogram. You will learn more about histograms in the class on “Graphing Data”, so we won’t go into them in too much detail here. In essence, a histogram is a graph that provides a snapshot of your distribution of data when you have continuous/quantitative data (as is the case with our Extraversion IQ data). To create a histogram, click on the “Charts…” button. The following popup window will appear. Click on the circle next to “Histograms:”. Your window should look like this: Click “Continue” to return to the original popup window. Finally, click “OK” and navigate to the Output window to find your results. The output will generate a table and a histogram that look like this: Statistics Extraversion IQ N Valid Missing 20 0 Mean 101.0500 Median 102.5000 Mode Std. Deviation Variance This tells you there were 20 participants and none of them were missing Extraversion IQ scores. 104.00 15.32619 234.892 Range 56.00 Minimum 74.00 Maximum 130.00 The values for each of the descriptive statistics are listed here. This is the histogram. Y‐axis: Frequency of scores X‐axis: Continuous variable Interpreting the Output Mean – The mean (sum of Extraversion IQs divided by sample size) is 101.05. This suggests that this sample of Aggies is just about average in their level of extraversion. Remember, a better way to draw this inference that Aggies are “just about average in their level of extraversion” would be to use inferential statistics, but it can be handy to make a “quick and dirty” subjective determination simply based on the descriptive statistics as well. Median – The median (the middle score in the distribution of Extraversion IQs) is 102.50. This value is very similar to the mean and means that half of the Aggies had scores less than 102.5 and half had scores greater than 102.5. As it does not differ very much from 100, the median also suggests that this sample of Aggies is essentially average in extraversion. Mode – The mode (the most frequent Extraversion IQ) is 104. This means that a score of 104 occurred most frequently in the sample of Aggies. The mode is a little higher than the mean or median, but all three measures seem to converge on the idea that this sample is about average (maybe just slightly above average) in extraversion. Standard Deviation – This is essentially the average deviation of the individual Extraversion IQs from the mean Extraversion IQ of 101.05. The standard deviation of 15.33 is relatively large (a subjective judgment) and suggests there is a fair amount of variability in extraversion in this sample. Some Aggies are quite introverted, some are quite extraverted, and others are about average. Variance – This is the standard deviation squared and essentially carries the same information as the standard deviation (i.e., it tells us there is a fair amount of variability in this sample). Range, Minimum, and Maximum – The range tells us that there is a difference of 56 Extraversion IQ points between the highest and lowest Extraversion IQs in this sample. The lowest Extraversion IQ was 74 (which is almost two standard deviations below the mean; this person is pretty introverted) and the highest Extraversion IQ was 130 (almost two standard deviations above the mean; this person is pretty extraverted). The wide range substantiates the idea that Aggies vary a great deal in extraversion. Histogram – A histogram is a type of bar graph with the continuous variable (Extraversion IQ) on the X-axis and the frequency of scores (in this case, the number of times Extraversion IQ scores within each of the ranges on the X-axis occur) on the Y-axis. So, you can see that the first bar on the left shows that there were two Extraversion IQs that were slightly below 80, the next bar shows there were two Extraversion IQs that were slightly above 80, the next bar shows there was one Extraversion IQ slightly below 90, and so on. Although it can be interesting to look at each of the individual bars, probably the most useful aspect of the histogram is its ability to capture the entire distribution of scores in pictorial form. By looking at the histogram as a whole you can see that (a) the Extraversion IQs range from about the upper 70s to around 130, (b) the majority of scores are in the middle of the distribution clustered around 100 or so, and (c) there are fewer scores at the extreme ends of the distribution. A histogram is one of the most effective ways to determine the shape of your distribution of scores, and can act as a useful supplement to the mean, median, and mode when describing your distribution. Interpretation of the Findings Based on our descriptive statistic results, it looks like Aggies aren’t really much more extraverted than your average person. So, Stella’s idea that Aggies greet people with “howdy” so often because they are just so extraverted might not be true; some Aggies are extraverted, some are introverted, and some are average. Now we report our results. When reporting variability results in a journal article, researchers typically report the standard deviation and range, but not the variance (because the variance is difficult to interpret due to it being based on squared units). Researchers also typically only report one measure of central tendency instead of exhaustively reporting all six. The reported measure of central tendency is the measure that is most appropriate, given the data. See page 29 of your Salkind text for information on when each measure of central tendency is most appropriate. In the present circumstance, the data are quantitative (as opposed to qualitative/categorical) and there are no obvious and influential outliers. So, the most precise of the three measures of central tendency, the mean, is most appropriate. Here’s an example of how these results might be reported in a journal article: The sample of 20 Aggies was average to slightly-above-average in Extraversion IQ scores (M = 101.05). There was also substantial variability between Aggies in extraversion scores (SD = 15.33), with Extraversion IQs ranging from 73 to 130 (higher Extraversion IQs mean more extraversion). For someone unfamiliar with statistics, you might say: “Aggies are about average in extraversion (maybe slightly above average). Some Aggies are very introverted, some are very extraverted, and all points in between.” Practice Problem #1 for SPSS (answer in Appendix) Brown University students are tired of hearing that they are just a bunch of rich kids whose parents bought their way into an Ivy League university. To make their point, they draw a random sample of 30 Brown students and record these students’ combined parental income. Below are combined parental incomes of the 30 Brown students. Use SPSS and descriptive statistics to answer the questions below. 30,000 72,000 61,000 44,000 312,000 59,000 58,000 26,000 225,000 42,000 1,200,000 27,000 77,000 79,000 40,000 59,000 379,000 52,000 55,000 91,000 70,000 145,000 100,000 35,000 63,000 925,000 45,000 60,000 48,000 53,000 A. Calculate the mean, median, and mode of the parental incomes. B. Do the mean, median, and mode differ from each other? Are the differences large or small? If the measures of central tendency do differ from each, why do you think this is? Is one of the three measures more appropriate in this instance, and why? C. Calculate the range, standard deviation, and variance of the parental incomes. D. What do you conclude about whether all Brown students are “just a bunch of rich kids?” Practice Problem #2 for Hand Calculation (answer in Appendix) The makers of the hot new weight loss drug, Xylophone, ran a weight loss study to test how well their drug works. Below is the number of pounds lost (negative numbers) or gained (positive numbers) by the 10 participants during the 4-week weight loss study. Xylophone has been featuring Participant #2 in their commercials, pointing toward Participant #2’s 15 pounds lost as proof that Xylophone is an effective weight loss drug. Use hand calculations and descriptive statistics to answer the questions below. A table for calculating the standard deviation is provided. Participant ID Weight Lost/Gained (X) 1 2 3 4 5 6 7 8 9 10 (X – x ) (X – x )2 -3 -15 0 5 -2 7 3 0 0 5 Sum SD = Σ( X − X ) 2 n −1 A. Calculate the mean, median, and mode of weight lost/gained. B. Do the mean, median, and mode differ from each other? Why do you think they do or do not differ? Is one of the three measures more appropriate in this instance, and why? C. Based on the measures of central tendency, do you agree that Xylophone is an effective weight loss drug? D. Calculate the range, standard deviation, and variance of weight lost/gained. Practice Problem #3 for Hand Calculation and SPSS (answer in Appendix) The mean score on the math subtest of the SAT is 500 and the standard deviation is 100. Gabe believes that persons aren’t reaching their potential when taking the SAT because they are too tense, so he suggests that people get back massages right before they take the SAT. He gives ten participants back massages right before they take the math subtest of the SAT. Their scores are below. Calculate the six major descriptive statistics. Based on those descriptive statistics do you believe that the back massages helped the participants score better? Participant ID SAT Score 1 406 2 582 3 736 4 565 5 378 6 466 7 521 8 435 9 495 10 435