Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of the possible responses. However, the bar chart does not graphically distinguish between quantitative and qualitative variables. Once we looked at the variable label and the values, we would realize that this is a quantitative variable, but it would take that extra work to understand it. 5/24/2017 Slide 1 If the quantitative variable has a large number of categories (called a continuous variable), the bar chart provides little information beyond the fact that there are a lot of different values, and some occur more frequently than others. 5/24/2017 Slide 2 Histograms are used as the preferred graph for quantitative variables. While the bars resemble those of a bar chart, histograms are distinguished by the absence of gaps between consecutive bars. For continuous variables, values are grouped in equally spaced intervals to convey a sense of what the distribution looks like. 5/24/2017 Slide 3 While we used counts and percents to describe the distribution of a qualitative variable, we use statistical measures to describe the center, spread, and shape of a quantitative variable. Measures of central tendency identify a value in the center of the distribution. Measures of variability or dispersion summarize how the values for individual cases are spread out around the measure of central tendency. 5/24/2017 Measures of central tendency identify a value in the center of the distribution. Measures of variability or dispersion summarize how the values for individual cases are spread out around the measure of central tendency. Slide 4 There are two measures of the shape of the distribution: skewness and kurtosis. Many of the statistics we will use assume that the distribution of a variable is bellshaped, i.e. the normal distribution. Skewness measures the symmetry of the distribution on both sides of the average score for the distribution. Having overlaid a blue normal curve on the distribution of this variable, we can see that the bars on either side of the red center line are similar as one moves away from the center. Kurtosis measure the degree to which the distribution is peaked or flat compared to the normal distribution. In this example, the bars at the center of the distribution are close to what would be expected for a normal distribution and the frequencies decrease as we move away from the center. 5/24/2017 Slide 5 Both of these variables have a problem with skewness, caused by atypical scores at one end of the distribution. Skewness is characterized as negative or positive, depending on which side, or tail, of the distribution has the unusual scores. 5/24/2017 This is an example of negative skewness, where a few small scores have elongated the left tail of the distribution. The tail on the right is truncated. This is an example of positive skewness, where a few large scores have elongated the right tail of the distribution. The tail to the left is truncated. Slide 6 Both of these variables have a problem with kurtosis, caused by either too few cases in the center of the distribution, or too many cases in the center of the distribution. This is an example of negative kurtosis, where the scores are uniformly distributed through the range of scores. The kurtosis statistic will have a negative value. 5/24/2017 This is an example of positive kurtosis, where the scores are heavily concentrated in the center of the distribution. The kurtosis statistic will have a positive value. Slide 7 There are two measures of central tendency for quantitative variables: the mean and the median. The mean is the average score. The median is the middle score, i.e. half of the scores are higher and half are lower. When the distribution has minimal skewness and is symmetric, both the red mean line and the green median line fall in the center of the distribution. While both measures reflect the center of the distribution, the mean is the preferred measure because it uses information for all of the cases in the distribution. 5/24/2017 For each measure of centrality, there is a corresponding measure of spread. The standard deviation is used with the mean, and the interquartile range is used with median. Slide 8 When skewing is present, the red mean line moves away from the center of the distribution as identified by the green median line in the direction of the skewness. At some level of skewness , the median becomes more effective at representing the center of the distribution. The issue is selecting a defensible rule for deciding the dividing line between acceptable skewness and problematic skewness. 5/24/2017 The rule of thumb that we will use is that skewness less than -1.0 or greater than +1.0 is problematic and indicates that the median is the preferred measure. Slide 9 Kurtosis does not affect the location of the measure of central tendency. Kurtosis indicates that there are either more cases than expected in the middle of the distribution (positive kurtosis), or fewer cases than expected (negative kurtosis). When the bars fall below the center of the normal curve overlay, the distribution has negative kurtosis, and is referred to as a flat distribution. 5/24/2017 The bars extending about the normal curve overlay indicate that there is positive kurtosis. A distribution with positive kurtosis is characterized as a “peaked distribution.” Slide 10 • The homework problems on central tendency and variability focus on describing the distribution of quantitative variables. • The counts and percents that we used for qualitative variables are not effective for quantitative variables that can have many different scores in the distribution. • We describe the distribution of quantitative variables with summary statistics that try to communicate the value on which the distribution is centered, the spread of the values from the center of the distribution, the symmetry of the distribution around the center measure, and the degree to which the distribution is bell-shaped or flat. 5/24/2017 Slide 11 • The center, or central tendency, of the distribution is usually represented by the mean (average score) or the median (middle score) of the distribution. • The standard deviation is used as the measure of spread (variability or dispersion) that is paired with the mean. It measures the average difference between the mean and each of the scores in the distribution. • The range and interquartile range are used to measure the spread around the median. The range is the difference between the highest score and lowest score. The interquartile range is the difference between the highest and lowest score when the smallest 25% and the largest 25% of the scores are removed from the distribution. 5/24/2017 Slide 12 • Both the mean and the median can be computed for the values in the distribution of any quantitative variable. • However, the degree to which one or the other is a “good” measure or indicator of the central tendency of a distribution differs with the shape of the distribution, specifically the symmetry of the distribution as measured by skewness. • If the distribution is symmetric, both the mean and the median fall in the center of the distribution. The mean is the preferred measure because it uses all of the cases in the distribution in its calculation, and because it can be used in a broader range of statistical tests. • If the distribution is not symmetric, the median stays in the middle of the distribution, but the mean is pulled away from the center toward one of the tails of the distribution. 5/24/2017 Slide 13 • The degree of symmetry of a distribution of scores for a quantitative variable can vary quite widely. These six histograms show progressively increasing skewness. At what point do we choose the median over the mean? 5/24/2017 Slide 14 • There is no universally accepted criteria for the amount of skewness that dictates a preference for the median. • Most agree that we should be concerned with substantial violations of skewness and ignore minor departures, but there is not agreement of what is a substantial violation. • One rule of thumb indicates that a distribution has a substantial skewness problem when the size of the skew statistic is twice its standard error (in the SPSS output). • The rule of thumb that I have used and which will be used for the problems is that skewness is a problem if it is less than -1 for negatively skewed distributions or greater than +1 for positively skewed. 5/24/2017 Slide 15 The skewness for this histogram is 0.35. The skewness for this histogram is 1.09. 5/24/2017 The skewness for this histogram is 0.84. The skewness for this histogram is 1.33. By my rule of thumb, we would use the mean as the measure of central tendency for the top row, and the median for the bottom row. That the rule is arbitrary is shown by the similarity of the last chart on the top row to the first chart on the bottom row. The skewness for this histogram is 0.94. The skewness for this histogram is 1.86. Slide 16 One rule of thumb suggests that when the value of the skewness statistic is 2 times the value of the skewness standard error, the median is preferred. For this variable, the statistic (.401) is more than twice the standard error (.153), so the median would be preferred. 5/24/2017 Slide 17 Another rule of thumb uses only the value of the skewness statistic. When the skewness is smaller than -1.0 or larger than + 1.0, the distribution is badly skewed and the median is a better measure of central tendency. This is the rule of thumb used in our problems. The skewness of this distribution (0.40) is in the allowable range, making the mean and standard deviation the preferred measures of center and spread. 5/24/2017 Slide 18