Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CLASS NOTES: Measures of Central Tendency; Variability CONCEPT CALCULATION/EXAMPLES APPLICATION For a population: Remember your symbols representing a population verses a sample. µ is the mean symbol & N is the number of scores / subjects for a population. M is the mean symbol & n is the number of scores / subjects for a sample. Central tendency: A statistical measure to determine a single score that defines the center of a distribution. The most common method for summarizing and describing a distribution is to find a single value that defines the average score & can serve as a representative of the entire distribution. Mean: The sum of the scores divided by the number of scores * mean characteristics: - changing the value of any score changes the mean - if you add or remove a score, it will change the mean unless the score added or removed is the same as the mean - if you multiply or divide each score by a constant, then the mean can be multiplied or divided by the same constant µ = _∑X_ N For a sample: M = _∑X_ n data set: 3, 7, 4, 6 N=4 _20_ 4 =5 Simple Weighted mean: Combining 2 or more sets of scores & then finding the overall mean for the combined group. There is a more complex version of obtaining the correct mean for 2 data sets that have unequal ‘n’ values. We will review this formula later on. X-values that represent a population are often represented as a capital X. x-values that are a part of the sample standard deviation formula often are represented by a line over a lower case x. Out of a population number of 4, the data set includes 3, 7, 4 & 6. The sum of these is 20 divided by N (4). The population mean is 5. M = ∑X(overall sum for the combined group) Basically, you are calculating the n (total number in the combined group) mean of one or more groups of data to find the overall mean for the combined groups for those data = ∑X1_+_∑X2_ sets. A more complex formula is n1 + n2 used for when the data sets are unequal. Example: Group 1 n = 6 Group 2 n = 6 Scores represent minutes of intervention time In this example, Group #1 has 6 clients with a sum total of 26 minutes. Group #2 has 6 clients with a sum total of 14 minutes. Using the calculations, the weighted mean is 3.33. Group 1 6 3 5 3 4 5 ∑ = 26 Group 2 4 1 2 3 1 3 ∑ = 14 Remember that to calculate the weighted mean, you are drawing from 2 or more sets of scores. These sets of scores may or may not have the same “n”. _26 + 14_ = _40_ = 3.33 6+6 12 Median: The score that divides a When N is an odd number: distribution exactly in half. Exactly 50% of the individuals in a Data set: 3, 5, 8, 10, 11 distribution have scores at or below the mean. The median is equivalent to the 50th percentile. When N is an even number: The goal of the median is to determine the precise midpoint of a Data set: 3, 3, 4, 5, 7, 8 distribution. List scores in order from lowest to highest & the middle score is the median. In this case, the median is number 8. List scores in order from lowest to highest. In this case, 2 numbers are in the middle: 4 & 5. Locate the mid-point between the 2 middle scores. In this case, the mid-point is 4.5. List scores in order from lowest to When there are several scores with the same value in the middle highest. In this case, the median is 4. of the distribution: Data set: 1, 2, 2, 3, 4, 4, 4, 4, 4, 5 Mode: The score or category that has the greatest frequency. The mode can be used to determine the typical or average value for any scale of measurement & also is the only measure of central tendency that can be used w/ data from a nominal scale of measurement. Selecting a Measure of Central Tendency: The best scenario is where you have enough data so that you can calculate all measures of central tendency, but that may not always be the case. The mean is usually always the preferred measure of Score 7 6 5 4 3 2 1 0 f 1 0 3 2 3 5 4 2 Using this data, the score with the highest frequency is 2 (with a frequency of 5). Keep in mind that it is possible to have more than one mode (scores that have the same highest frequency). Two modes is called bimodal. central tendency, but there may be times when the mean is not or cannot be calculated & used as the best representative of the data. When to use the median: When a data set contains extreme scores or skewed distributions Data set: 5, 6, 4, 4, 3, 28, 6, 1, 33 With extreme scores such as 28 & 33 in the set, the median (which is 5) may be a better representation. When all the data is not available; such as a missing value(s) or you are presented with an open ended distribution Person 1 2 3 4 5 6 Time (mns) 8 11 12 13 17 Never finished The mean cannot be accurately calculated with missing information, so the median would be the better representation (the median time in this case would be 12.5 with 2.5 scores below the median & 2.5 (including the undetermined score) above the median). When there is no upper or lower limit listed for one of the categories Person 5 or more 4 3 2 1 0 f 3 2 2 3 6 4 Again, the full information is not available to compute the mean. Color Blue Green Yellow Orange Purple f 9 6 2 3 1 The mean or median cannot be calculated with nominal scales. Only the mode. In this case, the mode is the color “blue” b/c it is the most frequent color chosen. When to use the Mode: With nominal scales When discrete variables are used Examples: numbers of children, room numbers, etc. Variables that exist only in whole, indivisible categories that cannot be split or fractioned are best represented by the mode. Describing shape The mode describes the peak of a distribution, so it can be helpful in the description of shape. Central Tendency & the Shape of a Distribution: A = mean, median & mode A = mode B = median C = mean A = mean B = median C = mode The mean & median would be in the center or the “valley” of the distribution with the modes (bimodal) are represented by the tops of both “hills” of the distribution No mode. This distribution has a mean & median (the center), but no one score has a greater frequency. Variability Variability: Provides a quantitative measure of the degree to which scores in a distribution are spread out or clustered together. A good measure of variability serves 2 purposes: Standard deviation is primarily a descriptive measure. It describes how variable or spread out the scores are in a distribution. It allows us to interpret individual scores; where one particular score may lie in * describes the distribution. It tells whether the scores are clustered close together or spread out over a large distance. Variability is usually defined in terms of distance; how much distance to expect between one score & another, or b/t an individual score & the mean. measures how well an individual score represents the entire distribution. This is very important in inferential statistics where small samples are used to answer questions about a population. Variability provides information about how much error to expect if you are using a sample to represent a population. relation to the mean & in relation to the average distance of all scores from the mean The mean & the SD are the most common values used to describe a set of data. Adding a constant to each score will not change the SD Multiplying each score by a constant causes the SD to be multiplied by the same constant. Range: The difference b/t the upper real limit of the largest (maximum) X value & the lower real limit of the smallest (minimum) X value The range lets you know how spread out your scores are. Although the range gives you some general information about your data, it does not give an accurate description of the variability for the entire distribution b/c it does not consider all the scores. Therefore, the range is considered to be a crude & unreliable measure of variability. SD is particularly important to inferential statistics. The goal of inferential statistics is to detect meaningful & significant patterns in research results. Variable - Weight 170 180 190 Real 169.5– 179.5- 189.5Limits 170.5 180.5 190.5 Range = URL Xmax – LRL Xmin Data set: 3, 7, 12, 8, 5, 10 12.5 – 2.5 = 10 Interquartile Range: The range covered by the middle 50% of the distribution. Q3 – Q1 Since the interquartile range is covered by the middle 50% of the distribution, that means that it inhabits the space b/t the score that falls at the 25% & the score that falls at the 75%. You can draw a frequency histogram to find these scores or form a frequency distribution table to locate the 25% and 75% to see what scores fall between these 2 boundaries. The interquartile range may be used as a measure of variability b/c it shows where the majority of the scores lie & ignores those scores that may be outliers. Its limits of course, are that not all scores are represented in its calculation. _Q3 – Q1_ 2 Semi-Interquartile Range: Half of the interquartile range The semi-interquartile range is half of the interquartile range. So you would follow the instructions as listed above & then divide by 2. The limits here are again that not all scores are represented. Deviation: The distance of one particular score from the mean. The standard deviation is the most commonly used & most important measure of variability. The SD measures the variability by considering the distance b/t each score & the mean & determines whether the scores are generally near or far from the mean. Basically, it approximates the average distance of all scores from the mean. Deviation here simply explains how one particular value deviates from the mean, or the center of the distribution. Deviation score = X - µ 2 steps to finding the deviation: determine the deviation or distance from the mean for each individual score Example: X = 53; µ = 50 53 – 50 = 3 Σ(X - µ) Calculate the mean of the deviation scores X 8 1 3 0 X-µ +5 -2 0 -3 Σ(X - µ) = 0 Remember your notation. X = a particular score while µ is the notation for the population mean. If the mean (µ) of a set of data is 50 & a particular score (X) is 53, then the deviation of the score of 53 from the mean of 50 is 3. Remember your notation of Σ indicating the “sum.” Since the mean is the average of a set of scores, then each score has a placement either above, or below the mean. If the score lies above the mean, this directionality is identified with a “+” sign in front o the score. If the score lies below the mean, it is indicated with a “ - ” in front of the score. To find the average deviation score for a data set, you add together all the deviation scores. Your answer should always be “0”. Sum of squares: The sum of the squared deviation scores. This represents the numerator part of the standard deviation formula. The sum of squares is obtained thru the steps shown in the middle column: 1) Σ(X - µ)2 standard deviation formula for a population: 1) Find each deviation score 2) Square each deviation score 3) Sum the squared deviations σ = Σ (X – µ)2 N The post-script “2” (X2) means that you multiply the value by itself. Ex: 52 = 5 x 5 = 25. This formula is highlighted in blue as it may be used in future formulas to establish “deviation” This is also called the sum of squares The statistical notation indicating standard deviation for a population is the symbol σ. standard deviation formula for a sample: s = (X – M)2 n–1 (these formulas above were designed to obtain the standard deviation, or the average distance of all values from the mean, or center of the distribution. However, see Population Variance below to help direct you to a more usable, operational formula for obtaining the standard deviation) Population Variance: The mean squared deviation. Variance is the mean (or average) of the squared deviation scores. The measure of variability is based upon squared distances. This helps with inferential statistical methods, but may not be the best descriptive measure for variability. The statistical notation indicating standard deviation for a sample is s. The statistical notation for a sample mean is M. The variability of a population is usually greater than the variability of a sample. That is why the sample formula uses “n - 1”, called the degrees of freedom. var iance To correct for the issue mentioned above, the calculation is then “square-rooted” Standard Deviation: The square root of variance or the average squared deviation. This tells you the average distance of all the scores from a data set from the mean by combining variation, sum of squares & variance. computational formula for standard deviation for a population: σ = ΣX2 – (ΣX)2 _ __N__ N The more workable formula for obtaining the standard deviation involves modification of the individual steps to design the computational formula. computational formula for standard deviation for a sample: s= Σx2 – (Σx)2 n __ n-1 Example: Step 1: Place each one of your “X” values, or each number in the data set under the “X” column. Step 2: The second column is where you “square” each individual “X” value (X2). Here is where you multiply each X value by itself (ex: X2 = 32 = 3x3 = 9) Data set: 1, 3, 6, 11 X 1 3 6 11 Σ = 21 (ΣX)2 = 441 Step 3: Sum up all of your X values at the bottom of your X column. Also, sum up all of your squared values (X2) at the bottom under the X2 column represented as ΣX2 . s= X2 1 9 36 121 2 ΣX = 167 167 – 441 ______4__ 4–1 Remember that Σ is the symbol for “sum” or adding values together. It sometimes helps w/ placement to place “cross arrows” beneath your table to the formula so that you know you are placing the right values in the correct places on the formula. In “Step 3,” there is no parentheses indicating that you are adding up values that are already squared: ΣX2 In “Step 4,” the ΣX is in parentheses (ΣX) indicating that you must first sum each of the X values first before squaring that value: (ΣX)2 following “order of operation” rules where you compute values w/in parentheses first before computing outside parentheses. Step 4: At the bottom of your X column beneath where you summed your X values, multiply your ΣX (sum of X) by itself, which is represented as (ΣX)2 = (212 = 21x21=441). Step 5: Identify your “N” value (the number of scores in your data set) (“N” for population data, “n” for sample data) n=4 (ΣX)2 = 441 ΣX2 = 167 At this point, you have all of the values you need. Match up the values that correspond to the symbols in the formula. s= 167 – 441 ______4_ 4–1 Step 6: Working w/ the numerator first: Order of operation indicates that we divide before subtracting. So we divide (ΣX)2/n first (441/4 = 110.25). s = 167 – 110.25 4–1 Step 7: Subtract: ΣX2 – 110.25 (167-110.25 = 56.75) s = 56.75 4-1 Step 8: Since we are using a “sample” for this example, our denominator is “n-1” to control for variations b/t population & sample groups. Had we been working w/ population data, we would only have “N” as our denominator. For this example, we subtract n-1 (41=3) s = 56.75 3 Step 9: Now we have 2 values left: one in the numerator & one in the denominator. So we divide: (56.75/3 = 18.916666) s = √18.916666 Step 10: DO NOT FORGET TO SQUARE ROOT YOUR FINAL VALUE. You do this by hitting the square root button on your calculator. s = 4.34932 When working a formula that has a numerator & a denominator, complete all calculations w/in the numerator & all calculations w/in the denominator separately until you come up w/ one value on top & one value on the bottom. Then complete the calculations. One of the most frequent errors made when computing the standard deviation formula is forgetting to square root your final value. When writing or typing out your formula, you may draw the square root over the formula, insert the “square root” symbol from Word, or write “sq. rt.” in front of the formula so that you do not forget to complete this final step. It is always recommended that students write out both population & sample standard deviation formulas on sticky notes & lay it next to every problem you are working so that you can double check to make sure you have each value in the right spot on the formula & that you do not miss a step! Remember that the farther to the right you round, the more accurate your outcomes will be. Degrees of freedom: The df for the sample variance are defined as df = n – 1. The df determine the number of scores in the sample that are independent and free to vary. This is why “n – 1” is used in the sample formula for standard deviation as it corrects for bias in sampling variability (since sample variability typically is smaller than in population variability). So, df is extremely important for inferential statistics. df = n - 1 See the example above of working w/ a sample formula & shows in the denominator of the formula underneath the square root the df, or n-1. Degrees of freedom is not a calculated error, but instead controls for differences b/t sample outcomes & population outcomes as there are always some variance b/t the two.