Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EPSY 439 Variability Range Variance Standard Deviation z-score Concept Map Variability - Chapter 3 Index of Vari abil ity D is tribution Standard D ev iation R ange D epends only on ex tr eme s c ores Single Sc ore Same mean, s ame r ange differer nt form D ev iation Sc ores :(X- Mean) C omputation: R aw- Sc ore z Sc ores Graphing Standard D ev iations Meas ure of a s ample's v ariablity Population v ariability C omputation: D ev iation-Sc ore Varianc e C omputation: D ev iation-Sc ore D es c ribes the relation of an X to the Mean with res pec t to the v ar iability of the dis tribution Es timate of a population's v ariability R epres ents # of SD s a s c ore is abov e or below the Mean Giv es y ou a w ay to c ompar e raw s c ores Standard Sc ore R ange betw een -3 and +3 C omputation: R aw- Sc ore C onc ept of N -1 May 24, 2017 Copyright 2000 - Robert J. Hall 2 Overview The mean, as a single value, is a useful measure for describing the center of a distribution and where the most frequent scores in a distribution are located. It does not tell us very much, however, about scores away from the center of the distribution and/or scores that occur infrequently. May 24, 2017 Copyright 2000 - Robert J. Hall 3 Overview Thus, to describe a data set accurately, we need to know not only where scores are centered but also about how much individual scores within the distribution differ from one another. Measures of variability, then, provide context (spread/consistency) for measures of center. May 24, 2017 Copyright 2000 - Robert J. Hall 4 Overview Three different distributions have the same mean score Sample A 0 2 6 10 12 S` X = 6 Sample B 8 7 6 5 4 S` X = 6 Sample C 6 6 6 6 6 S` X = 6 Each of the three samples has a mean of 6, so if you did not look at the raw scores, you might think that we have three identical distributions. Clearly, they are not the same. May 24, 2017 Copyright 2000 - Robert J. Hall 5 Overview Distance between the locations of scores in three distributions X Sample A Sample B Sample C `X = 6 `X = 6 `X = 6 0, 2, 6, 10, 12 4, 5, 6, 7, 8 6, 6, 6, 6, 6 XXXXX X X X X X X X X X 0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Here you can see differences that separate scores in the three samples. May 24, 2017 Copyright 2000 - Robert J. Hall 6 Statistical Model of Variability The statistical model for abnormality requires two statistical concepts: a measure of the average and a measure of the deviation from the average – We have examined the standard techniques for defining the average score; we will now focus on methods for measuring deviations. – The fact that scores deviate from average means that they are variable. Variability, then, is one of the most basic statistical concepts. May 24, 2017 Copyright 2000 - Robert J. Hall 7 Meaning of Variability The term variability has much the same meaning in statistics as it has in everyday language; to say that things are variable means that they are not all the same. In statistics, our goal is to measure the amount of variability for a particular set of scores, a distribution. – Are scores all clustered together, or are they scattered over a wide range of values? May 24, 2017 Copyright 2000 - Robert J. Hall 8 Measuring Variability A good measure of variability should: provide an accurate picture of the spread of the distribution, and give an indication of how well an individual score (or group of scores) represents the entire distribution. – In inferential statistics, where we often use small samples to answer general questions about large populations, variability is a particularly important concern. Why? May 24, 2017 Copyright 2000 - Robert J. Hall 9 Types of Variability In this section we will consider three different measures of variability: the range, – interquartile range the standard deviation, and the variance Of these three, the standard deviation is by far the most important. May 24, 2017 Copyright 2000 - Robert J. Hall 10 Range The range is the distance between the largest score (Xmax) and the smallest score in the distribution (Xmin). In determining this distance, you must also take into account the real limits of the maximum and minimum X values. The range, therefore, is computed as: range = URL Xmax LRL Xmin May 24, 2017 Copyright 2000 - Robert J. Hall 11 Range The range is the most obvious way of describing how spread out the scores are in the distribution. *** Problem *** Range is completely determined by the two extreme values and ignores the other scores in the distribution. May 24, 2017 Copyright 2000 - Robert J. Hall 12 Range Example: The following two distributions have exactly the same range (10 points in each case) but – Distribution 1 clusters scores at the upper end of the range – Distribution 2 spreads scores out over the range Distribution 1: 1, 8, 9, 9, 10, 10 Distribution 2: 1, 2, 4, 6, 8, 10 May 24, 2017 Copyright 2000 - Robert J. Hall 13 Range Summary Because the range does not consider all the scores in the distribution, it often does not give an accurate description of the variability for the entire distribution. Generally, the range is considered to be a crude and unreliable measure of variability. – John Tukey has developed ways looking at data that give prominence to the dispersion of data using what is called the boxplot and interquartile range. May 24, 2017 Copyright 2000 - Robert J. Hall 14 Interquartile Range A distribution can be divided into four equal parts using quartiles. The first quartile (Q1) is the score that separates the lower 25% of the distribution from the rest. The second quartile (Q2 or median) is the score that has exactly two quarters, or 50%, of the distribution below it. May 24, 2017 Copyright 2000 - Robert J. Hall 15 Interquartile Range Finally, the third quartile (Q3) is the score that divides the bottom three-fourths of the distribution from the top quarter. The interquartile range is the distance between the first quartile and the third quartile: interquartile range = Q3 - Q1 May 24, 2017 Copyright 2000 - Robert J. Hall 16 Interquartile Range Frequency distribution for a set of 16 scores 4.0 Interquartile range (3.5 points) Bottom 3.0 25% Top 25% 2.0 1.0 Std. Dev = 2.50 Mean = 6.3 0.0 N = 16.00 2.0 4.0 3.0 6.0 5.0 8.0 7.0 Q1 = 4.5 10.0 9.0 11.0 Q3 = 8 SCORE May 24, 2017 Copyright 2000 - Robert J. Hall 17 Boxplot Max Q2 = 6 Valu e Q3 = 8 Q1 = 4.5 Min 12 11 10 9 8 7 6 5 4 3 2 1 0 N= IQR = 3.5 16 SCORE May 24, 2017 Copyright 2000 - Robert J. Hall 18 Construction of Boxplot Draw a number line that includes the range of observations. Compute Q1, Q2, and Q3. Above the line drawn in the first step, draw a box extending from Q1 to Q3. Inside the box, draw a line at the median. May 24, 2017 Copyright 2000 - Robert J. Hall 19 Construction of Boxplot To identify the outliers, compute the lower and upper fences, which are located at 1.5 (IQR) below Q1 and above Q3. That is, the lower fence is fL= Q1 - 1.5(IQR) and the upper fence is fU= Q3 + 1.5(IQR) May 24, 2017 Copyright 2000 - Robert J. Hall 20 Construction of Boxplot Observations located beyond the fence are classified as outliers and are identified with an asterisk (*). If there are no outliers, extend horizontal line segments (whiskers) from the ends of the box to the smallest and largest observations. If there are outliers, extend the whiskers to the smallest and largest non-outliers. May 24, 2017 Copyright 2000 - Robert J. Hall 21 May 24, 2017 Copyright 2000 - Robert J. Hall 22 Example Temperatures at time of space shuttle launches: 53 57 58 63 66 67 67 67 68 69 70 70 70 70 72 73 75 75 76 76 78 79 80 81 Five number summary: Low 53 May 24, 2017 Q1 66.84 M 70 Q3 75.5 Copyright 2000 - Robert J. Hall High 81 23 Example - Frequency Distribution T e Median: 70.5 m a u li la r r u r c c c e e e e V 5 3 a 1 2 2 2 5 7 1 2 2 3 5 8 1 2 2 5 6 3 1 2 2 7 6 6 1 2 2 8 6 7 3 5 5 3 6 8 1 2 2 5 6 9 1 2 2 7 7 0 4 7 7 3 7 2 1 2 2 5 7 3 1 2 2 7 7 5 2 3 3 0 7 6 2 3 3 3 7 8 1 2 2 5 7 9 1 2 2 7 8 0 1 2 2 8 8 1 1 2 2 0 T o 4 0 0 T o 4 0 1 58.3 70.0 ? 50 69.5 41.7 16.6 50 41.7 8 .3 .50 16.6 16.6 .50 * 1 .5 69.5 0.5 70.0 May 24, 2017 Copyright 2000 - Robert J. Hall 24 Example - Frequency Distribution T e First Quartile: 67.5 33.3 m a u li la r r u r c c c e e e e V 5 3 a 1 2 2 2 5 7 1 2 2 3 5 8 1 2 2 5 6 3 1 2 2 7 6 6 1 2 2 8 6 7 3 5 5 3 6 8 1 2 2 5 6 9 1 2 2 7 7 0 4 7 7 3 7 2 1 2 2 5 7 3 1 2 2 7 7 5 2 3 3 0 7 6 2 3 3 3 7 8 1 2 2 5 7 9 1 2 2 7 8 0 1 2 2 8 8 1 1 2 2 0 T o 4 0 0 T o 4 0 1 66.84 ? 25 66.5 20.8 12.5 25 20.8 4.2 .34 12.5 12.5 .34 * 1 .34 66.5 0.34 66.84 May 24, 2017 Copyright 2000 - Robert J. Hall 25 Example - Frequency Distribution T e Third Quartile: 75.5 75.0 m a u li la r r u r c c c e e e e V 5 3 a 1 2 2 2 5 7 1 2 2 3 5 8 1 2 2 5 6 3 1 2 2 7 6 6 1 2 2 8 6 7 3 5 5 3 6 8 1 2 2 5 6 9 1 2 2 7 7 0 4 7 7 3 7 2 1 2 2 5 7 3 1 2 2 7 7 5 2 3 3 0 7 6 2 3 3 3 7 8 1 2 2 5 7 9 1 2 2 7 8 0 1 2 2 8 8 1 1 2 2 0 T o 4 0 0 T o 4 0 1 8.3 74.5 66.7 75.0 66.7 8.3 1.00 8.3 8.3 1.00 * 1 1.00 74.5 1.00 75.5 May 24, 2017 Copyright 2000 - Robert J. Hall 26 Example - Summary Statistics S u t e r N V S 4 M S 0 M S 0 M S 0 M S 0 S S 2 V S 7 S S 5 S 2 E K S 4 S 8 E R S 8 P 2 S 0 5 S 0 7 S 5 May 24, 2017 Copyright 2000 - Robert J. Hall 27 Example - Histogram Histogram 6 5 Frequency 4 3 2 Std. Dev = 7.22 1 Mean = 70.0 N = 24.00 0 52.5 57.5 55.0 62.5 60.0 67.5 65.0 72.5 70.0 77.5 75.0 80.0 Space Shuttle Launch Temperatures May 24, 2017 Copyright 2000 - Robert J. Hall 28 Example - Boxplot 90 80 70 60 1 50 N= O1 indicates that case number 1, 53o, is an outlier in this data set. Case 1 was the temperature at the time that the Challenger space shuttle was launched. On the morning of January 28, 1986 Challenger exploded and crashed 73 seconds after take off. There were no survivors. A simple boxplot such as this might have raised enough of a red flag for mission control to abort the launch. 24 Launch Temperature May 24, 2017 Copyright 2000 - Robert J. Hall 29 Victims of Challenger Disaster Seven astronauts were killed when the space shuttle Challenger suddenly exploded shortly after launch on January 28, 1986. The crewmembers who died during the mission are shown here. May 24, 2017 Copyright 2000 - Robert J. Hall 30 May 24, 2017 Copyright 2000 - Robert J. Hall 31 Standard Deviation - Index of Variability The most commonly used measure of the variability of scores in a distribution is the standard deviation. It is an index of the variability (spread) of scores about the mean of a distribution, something like the average dispersion or deviation of the scores (X) about their mean (`X ). – The greater the spread of scores, the greater the standard deviation. May 24, 2017 Copyright 2000 - Robert J. Hall 32 Standard Deviation - Illustration – For example, in the figure below, the scores in distribution A cluster about the mean of the distribution, and the scores in distribution B are spread out. The standard deviation for distribution A, then, would be less than the standard deviation for distribution B. A B May 24, 2017 Copyright 2000 - Robert J. Hall 33 Standard Deviation - Definition More formally, the standard deviation may be defined as the square root of the average squared deviation of scores from the mean of the distribution, measured in units of the original score. This definition is reflected in the deviation score formula. S May 24, 2017 2 ( X X ) N Copyright 2000 - Robert J. Hall 2 x N 34 Standard Deviation - Explanation Major limitation of the range as a measure of the variability of scores in a distribution is that it is based on two extreme, often unstable, scores. Another way to measure the spread of scores in a distribution might be to consider the variability of each of the scores in the distribution about the mean -- the most stable measure of central tendency -- from each of the scores in the distribution. May 24, 2017 Copyright 2000 - Robert J. Hall 35 Standard Deviation - Deviation Scores This process produces deviation scores that can be represented as: x X X These deviation scores provide a measure of the distance of each raw score from the mean of the distribution. May 24, 2017 Copyright 2000 - Robert J. Hall 36 Standard Deviation - S(X - `X) Since we want a measure that takes all deviation scores into account, the most obvious next step would be to sum up the deviation scores. Since the mean is the balance point of the distribution, however, we know that this attempt is doomed to failure: ( X X ) x 0 May 24, 2017 Copyright 2000 - Robert J. Hall 37 Standard Deviation - S(X - `X)2 How do we get around this problem? Square each deviation score to eliminate the sign and to preserve the relative distance between scores and the mean. This would yield: ( X X ) May 24, 2017 2 x 0 Copyright 2000 - Robert J. Hall 2 38 Standard Deviation - Control for N We’re finished, right? No. We still have a problem comparing across distributions because different distributions are likely to have different numbers of people. To control for the size of the distribution, we get a measure of the “average” amount of variability about the mean of the distribution. May 24, 2017 Copyright 2000 - Robert J. Hall 39 Standard Deviation - Average Deviation This average can be obtained by dividing the sum of squared deviation scores by N: ( X X ) N May 24, 2017 2 x Copyright 2000 - Robert J. Hall 2 N 40 Standard Deviation - Average Dispersion This formula will give us something like the average dispersion of scores in the distribution about the mean; it is defined as the variance. The last problem is to return the measure of variability back to the original score scale. This problem can be solved by taking the square root. May 24, 2017 Copyright 2000 - Robert J. Hall 41 Standard Deviation - Square Root ( X X ) N May 24, 2017 2 Copyright 2000 - Robert J. Hall x 2 N 42 Standard Deviation - Example (Deviation Score) X 1 3 5 7 9 Mean 5 5 5 5 5 x -4 -2 0 2 4 S(X - `X)2 2 x 16 4 0 4 16 x 2 N 40 8 5 2 x 40 x N May 24, 2017 2 8 2.83 Copyright 2000 - Robert J. Hall 43 Standard Deviation - Example (Raw Score) X 1 3 5 7 9 25 X2 1 9 25 49 81 165 S X 252 165 5 165 125 5 5 May 24, 2017 Copyright 2000 - Robert J. Hall X 2 2 N N 40 8 2.83 5 44 Organizational Chart - Variability Describing variability (differences between scores) Descriptive measures are used to describe a known sample of scores or a population In formulas final division uses N To describe sample variance compute Sx2 To describe population variance compute sx2 Taking square root gives Taking square root gives Sample standard deviation Sx May 24, 2017 Population standard deviation sx Inferential measures are used to estimate the population based on a sample In formulas final division uses N - 1 To estimate population variance compute sx2 Taking square root gives Estimate population standard deviation sx Copyright 2000 - Robert J. Hall 45