Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Central Tendency CENTRAL TENDENCY: • A statistical measure that identifies a single score that is most typical or representative of the entire group • Usually, a value that reflects the middle of the distribution is used, because this is where most of the scores pile up • No single measure of central tendency works best in all circumstances, so there are 3 different measures -- mean, median, and mode. Each works best in a specific situation Central Tendency $450,000 Examples: The average height of men in Montgomery County is 5’ 6”. The average salary in ACME Inc is $57,000. $150,000 $100,000 ARITHMETICAL AVERAGE $57,000 $50,000 $37,000 $30,000 ( MEDIAN ) The one in the middle 12 above him, 12 below $20,000 MODE ( ) Occurs most frequently 2 Central Tendency (Mode) MODE: • The score or category that has the greatest frequency; the most common score • To find the mode, simply locate the score that appears most often – In a frequency distribution table, it will be the score with the largest frequency value – In a frequency graph, it will be the tallest bar or point Example: A sample of class ages is given. . . Ages f 23 1 22 0 21 2 20 0 19 3 18 2 * The age with the highest frequency is 19, with a frequency of 3; therefore, the mode is 19. Central Tendency (Mode) • A distribution may have more than one mode, or peak: A distribution with 2 modes is said to be bimodal; A distribution with more than 2 modes is said to be multimodal Example: A sample of class ages. . . Age 23 22 21 20 19 18 f 1 3 1 1 3 2 * age 22 and age 19 both have a frequency of 3; if this distribution were graphed, there would be 2 peaks; therefore this distribution is bimodal -both 22 and 19 are modes 3 Central Tendency (Mode) Advantages: • Easiest to determine • The only measure of central tendency that can be used with nominal (categorical) data Disadvantages: • Sometimes is not a unique point in the distribution (bimodal or multimodal) • Not sensitive to the location of scores in a distribution • Not often used beyond the descriptive level Central Tendency (Median) MEDIAN: • The score that divides the distribution exactly in half; 50% of the individuals in a distribution have scores at or below the median 4 Central Tendency (Median) Method when N is an odd number: List the scores from lowest to highest; the middle score on the list is the median Example: The ages of a sample of class members are 24, 18, 19, 22, and 20. What is the median value? • List the scores from lowest to highest: 18, 19, 20, 22, 24 • The middle score is 20 - therefore, that is the median Method when N is an even number: List scores in order from lowest to highest and locate the point halfway between the middle two scores Example: The ages of a sample of class members are 18, 19, 20, 22, 24 and 30. What is the median age? The scores are already listed from lowest to highest; select the middle two scores (20, 22) and find the middle point: 20 + 22 median = 2 = 21 Central Tendency (Median) Advantages: • Is less affected by extreme scores than the mean is; is better for skewed distributions Example: compare 2 samples of class ages 1) 18, 19, 20, 22, 24 2) 18, 19, 20, 22, 47 • The median in both cases is 20 - it’s not thrown off by the extreme score of 47 • Can be used with ordinal data • The only index that can be used with open-ended distributions (distributions without a lower or upper limit for one of the categories Disadvantages: • Can’t be used with nominal level data • Not sensitive to the location of all scores within a distribution 5 Central Tendency (Mean) MEAN (µ, x): • The arithmetical average of the scores • The amount that each individual would receive if the total (Σx) were divided up equally between everyone in the distribution • Computed by adding all of the scores in the distribution and dividing that sum by the total number of scores ∑x • Population mean: µ = N • Sample mean: x= ∑x n Central Tendency (Mean) • Note that, while the computations would yield the same answer, the symbols differ for a population (µ, N) and a sample (x,n) Example: x= ∑ x = 18 +19 + 19 + 21 + 23 = 100 = 20 n 5 5 6 Central Tendency (Mean) Advantages • Sensitive to the location of every score in a distribution • Least sensitive to sample fluctuation (if we were to take several samples, these sample means would differ less than if we compared the medians or the modes from the samples) Disadvantages • May only be used with interval or ratio level data • Sensitive to extreme scores, and therefore may not be desirable when working with highly skewed distributions Example: compare 2 samples of class ages 1) 18, 19, 20, 22, 24 x1 = 20.6 Vs. x 2 2) 18, 19, 20, 22, 47 = 25.2 Central Tendency • Select the method of central tendency that gives you the most information, yet is appropriate for the type of data you have. In general, use the mean if it’s appropriate. If your data are skewed, or if they are not measured on an interval or ratio scale, use the median. If your data are measured on a nominal scale, the mode is the only appropriate measure of central tendency 7 Selecting the best measure of central tendency 4 • the mean, median, and mode for a ‘normal’ distribution are the same 3 2 1 0 mean median mode 4 3 2 1 0 mode mean median mode •this distribution is symmetrical, but •the mean and the median are the same •and it is “bi-modal”, it has two modes Selecting the best measure of central tendency 4 3 2 1 0 mean median •it is also possible to have a distribution with the same mean and median, but no mode •In the case of a rectangular distribution 8 Selecting the best measure of central tendency •for positively skewed distributions the mode would be the lowest, followed by the median then the mean 4 3 2 1 0 mode mean median •for negatively skewed distributions the mean would be the lowest, followed by the median then the mode 4 3 2 1 0 mean mode median Measures of Variability VARIABILITY (a.k.a. “Spread”): • The degree to which scores in a distribution are spread out or clustered together • This is important because we need to not only know what the average score is in a distribution, we also need to know how near or far the majority of the scores are in relation to this central value • Measures of variability include: range, sums of squares (including deviation and mean deviation scores), variance, and standard deviation 9 Measures of Variability (Range) RANGE: The distance between the largest score and the smallest score in the distribution There are 2 methods for computing the range: 1) Subtract the lower real limit (LRL) for the lowest score in the distribution (xmin) from the upper real limit (URL) for the highest score (xmax) Range=urlxmax - lrlxmin 2) Subtract the minimum score from the maximum score and add 1 to the difference Range=xmax - xmin +1 Example: find the range for the following set of scores: 6, 6, 8, 9, 10 Xmax = 10 Xmin = 6 urlxmax= 10.5 lrlxmin = 5.5 Range = 10 - 6 + 1 = 5 or Range = 10.5 - 5.5 = 5 Measures of Variability (Range) Advantages: • Easy to obtain • Gives a quick approximation of variability Disadvantages: • Only sensitive to the 2 extreme scores -insensitive to all intermediate scores – For 2 sets of scores, 1) 1, 8, 9, 9, 10, and 2) 1, 3, 5, 7, 10, the range is identical although the scores are distributed very differently • Substantial sample fluctuation -- can easily change from sample to sample • Little used beyond the descriptive level 10 Measures of Variability • Deviation:score - mean (X i − X) • Mean deviation: the average absolute deviation score ∑ Xi − X n • Sums of Squares: The sum of the squared deviations around the mean ss = ∑ ( x − µ ) 2 Measures of Var. (Sums of Squares) Example:Two sets of quiz scores: x x- µ (x-µ)2 1 -4 16 3 -2 4 4 -1 1 7 2 4 7 2 4 8 3 9 ∑X=30 0 SS=38 µ=5.00 x x- µ 4 -1 5 0 5 0 5 0 5 0 6 1 ∑x=30 0 µ=5.00 (x-µ)2 1 0 0 0 0 1 SS=2 Both distributions have the same means but the actual scores are dispersed differently 11 Measures of Variability (Variance) VARIANCE: • The average squared deviation from the mean • Provides a control for sample size (as N increases, SS will naturally increase) σ 2 denotes population variance • Formulae: SS σ2 = N or σ 2 ∑( x − µ) = N From the previous example, σ2 = 38 = 6.333 6 & 2 σ2 = 2 = 0.4 5 Measures of Var. (Standard Deviation) STANDARD DEVIATION: • Measure of variability that approximates the average deviation (distance from the mean) for a given set of scores • σ denotes population standard deviation Definitional formula: σ = σ2 For the first problem in the previous example, σ = 6.33 = 2.517 • Our scores differ from the mean an average of 2.517 points 12 Measures of Var. (Standard Deviation) Properties of the standard deviation: • Standard deviation provides a measure of the average distance from the mean • When the standard deviation is small, the scores are close to the mean (the curve is narrow), and when the standard deviation is large, scores are typically spread out farther from the mean (the curve is wide) • Standard deviation is a very important component of inferential statistics Measures of Var. (Standard Deviation) • If a constant is added or subtracted to each score, the standard deviation does not change • For example, if an instructor chooses to “curve” a set of test scores by adding 10 points to each score, the distance between individual scores doesn’t change. All of the scores are just shifted up 10 points. • The mean would increase by 10 points. However, the standard deviation remains the same 13 Measures of Var. (Standard Deviation) • If each score in a distribution is multiplied or divided by a constant, the standard deviation of that distribution would also be multiplied or divided by the same constant • Thus, if an instructor changes a 50-point exam into a 100-point exam by multiplying everyone’s score by 2, the “spread” of the scores also is multiplied by 2. For example, on the old scale, the scores could range from 27-50 (a difference of 23 points), while on the new scale the scores range from 54-100 (a difference of 46 points)(this is not the standard deviation) • In this example, the standard deviation would be multiplied by 2. Measures of Var. (Standard Deviation) Advantages: • Sensitive to the location of all scores • Less sample fluctuation -- changes less from sample to sample • Widely used in both descriptive and advanced statistical procedures Disadvantages: • Sensitive to extreme scores -- highly skewed distributions can have a negative impact • Both the range and standard deviation can only be applied to interval or ratio level scales of measurement 14 Measures of Variability Population vs. Sample variability • The variance and standard deviation formulas we have examined so far are population formulas. These tend to underestimate the population variability when used on a sample; in other words, these are biased statistics • Thus, when we are computing variances and standard deviations on samples, we correct for this bias by altering the formula; the corrected formula provides a more accurate estimate of the population values Measures of Variability Variance formula for a sample estimating a population: s2 = SS n −1 Standard deviation formula for a sample estimating a population: SS s= n −1 or s = s2 15 Degrees of Freedom • we know that in order to calculate variance we must know the mean ( X) • this limits the number of scores that are free to vary • degrees of freedom ( df ) are defined as the number of scores in a sample that are free to vary • where n is the number of df = n − 1 scores in the sample Degrees of Freedom Cont. Picture Example •There are five balloons: one blue, one red, one yellow, one pink, & one green. •If 5 students (n=5) are each to select one balloon only 4 will have a choice of color (df=4). The last person will get whatever color is left. 16 Degrees of Freedom Cont. Statistical Example • Given that there are 5 students ( n = 5 ) with a mean score of 10 ( X = 10 ) • There are four degrees of freedom df = n − 1 = 5 − 1 = 4 • In other words, four of the scores are free to vary, but the fifth is determined by the mean • If we make the first four scores 9, 10, 11, & 12, then the fifth score must be 8. Measures of Variability Statistical term population value sample value ∑x µ= x ∑ x= Mean Variance N SS σ2 = N Standard deviation σ = σ2 n s2 = SS n −1 s = s2 17 Measures of Variability Choosing a Measure of Variability: • When selecting a measure of variability, choose the one that gives you the most information, but is appropriate for your data situation. In general, use the standard deviation. If your data are skewed, or if they are not measured on an interval or ratio scale, use the range • (Sums of squares and variance are derivatives of the standard deviation - they’re used to compute the standard deviation, but provide little useful information on their own.) Standard Scores and Distributions Standard scores: • Transform individual scores (raw scores) into standard (transformed) scores that give a precise description of where the scores fall within a distribution • Use standard deviation units to describe the location of a score within a distribution • when tests are said to be ‘curved’, the scores are transformed • one way of transforming scores is to add (or subtract) a constant to each score • when a constant is added or subtracted the mean will also change the same amount as the rest of the scores, but the standard deviation will be unaffected 18 Transformations of scores X X+3 3 6 4 7 5 8 6 9 • the mean of the X distribution is 4.5 and the standard deviation is 1.29 • the mean of the X+3 distribution is 7.5 and the standard deviation is still 1.29 • When a constant is added or subtracted to every score in a distribution, the shape of a distribution does not change, it simply shifts along the x-axis. 4 3 2 1 0 Transformations of scores • a second way of transforming scores is to multiply (or divide) a constant to every score in the distribution • this will change the mean as well as the standard deviation the same as the rest of the scores X 3 4 5 6 4 3 2 1 0 X(3) 9 12 15 18 • the mean of the X’s is 4.5 and the standard deviation is 1.29 • the mean of the X’s multiplied by 3 is 13.5 and the standard deviation is 3.87 • When a constant is multiplied or divided to every score in a distribution, the shape of a distribution changes. 19 Standard Scores and Distributions STANDARDIZED DISTRIBUTIONS: are composed of transformed scores with predetermined values for µ and σ (regardless of the values in the raw score distribution Examples: • IQ scores are standardized with a µ=100 and σ = 15 • SAT scores are standardized with a µ=500 and a σ =100 Standard Scores and Distributions Z-scores: • Standard scores that specify the precise location of each raw score in a normal distribution in terms of standard deviation units • Consist of 2 parts: – The sign (+ or -) indicates whether the score is located above or below the mean – The magnitude of the actual number indicates how far the score is from the mean in terms of standard deviations 20 Standard Scores and Distributions Examples: • In a distribution of test scores with µ=100, σ =15, what is the z-score for a score of 130? For a score of 85? – With a score of 130, z=2; it is a positive z-score because 130 is above the mean -- it’s higher than 100; the magnitude is 2 because we can add exactly 2 standard deviations to the mean (100+15+15) and obtain our score (130) – With a score of 85, z=-1; it is below the mean (making it a negative z-score) and it is exactly 1 standard deviation from the mean (100-15=85) Standard Scores and Distributions x−µ σ • The numerator is a deviation (distance) score that indicates how far away from the mean your score of interest is, thus providing the sign (+ or -) of the z-score • Dividing by σ expresses the “distance” score in standard deviation units (a z-score) -- this works the same way as if you knew the number of gallons a bucket held, but you wanted to express that amount in terms of quarts -- you just divide the number of gallons by four to get the number of quarts Formula: z = 21 Standard Scores and Distributions Examples: • A distribution of exam scores has µ=25 and σ =3.6. You scored 29. What is your z-score? z= x−µ σ z= 29 − 25 = 1.11 3.6 – Thus, you scored 1.11 standard deviations above the mean • What is your z-score if you scored 22 on the exam? z= x−µ σ z= 22 − 25 = −.83 3.6 – This time, you fell .83 standard deviations below the mean Standard Scores and Distributions You can also calculate a person’s raw score (x) when you are provided with a z-score Formula: x = µ + zσ Example: A person has a z-score of 1.5 for the SAT math test (µ=500, σ =100). What is his raw score? X = 500+1.5(100) = 500 + 150 = 650 Thus, his z-score indicates that his SAT math score was 650 22 Standard Scores and Distributions Characteristics of a z-score distribution: • Shape: The distribution will be exactly the same shape as the distribution of raw scores • Mean: The mean is always 0, regardless of the raw score distribution • Standard deviation: The standard deviation of a z-score distribution will always be 1, regardless of the raw score distribution Standard Scores and Distributions Using z-scores for making comparisons: • One benefit to using z-scores is that they allow comparisons between distributions with different characteristics by providing a standard metric or scale (standard deviation units) EXAMPLE: A student score 29 on a statistics exam (µ=24, σ =3), and a 50 on a biology exam (µ=50, σ =5). On which exam did the student perform better? • By standardizing scores using standard deviation units, we can compare scores in 2 completely different distributions (compare apples and oranges); Simply convert both raw scores into z-scores, then compare the z-scores to each other Statistics Biology z=(29-24)/3=1.67 z=(50-50)/5=0 23 Standard Scores and Distributions Other standardized distributions: • Many people don’t like the fact that z-score distributions have negative scores and decimal places, so they use other, similar standardized distributions which avoid the negative connotations of a “-” sign • For example, IQ scores are viewed in terms of – a standardized distribution with µ=100, s=50 – t-scores distribution (which we’ll learn more about later) with µ=50, s =10 Standard Scores and Distributions How do we standardize raw scores into a distribution we want? EXAMPLE: A set of exam scores have µ=43, σ =4. We would like to create a new standard distribution with µ=60, s =20. What would the new standardized value for a score of 41 on the exam? First, change the raw score into a z-score (using the procedure we just learned about) z= x−µ σ z= 41 − 43 = −.50 4 Second, change the z-score into the new standardized score x = µ + zσ Std new = µ new + zσ new Std new = 60 + −.50(20) = 50 24 The Normal Distribution • the normal distribution is not a single distribution, rather it is an infinite set of distributions that can be described using the mean ( X ) and standard deviation ( s ). • the shape of the distribution describes many existing variables variables, i.e. weight The Normal Distribution • by definition the area under a normal distribution = 1.0 • the normal shape can also be used to determine the proportion of an area in the distribution 34% • for example, the area between the mean and one standard deviation is about 34% −2s −1s µ 1s 2s 25 The Normal Distribution 68% 50% 34% 13.5% 2.5% 34% 13.5% 95% 2.5% • each line represent 1 standard deviation, the percentages refer to the entire shaded area The Normal Distribution • the areas have been calculated for all zscores • remember that z-scores transform raw scores into the number of standard deviations it is away from the mean • using z-scores allows us to determine proportions or probabilities for normal distributions 26 The Unit Normal Table • the unit normal table is a table that contains proportions in a normal distribution associated with z-score values • See Table A from Pagano • there are two important things to note – the table includes the area in the “body” and in the “tail” – there are no negative z-values The Unit Normal Table • remember that the normal distribution is symmetrical, because of this the proportion will be the same whether the score is positive or negative .0062 .0062 -2.5 2.5 • whether the z-score is 2.5 or -2.5, the area beyond, or the area in the tail, is still .0062 27 The Unit Normal Table • when you are dealing with the unit normal table it is sometimes confusing whether you are looking at the tail or the body, especially when you have a negative z-score • perhaps the best way do deal with this confusion it to draw a picture and shade the area you are looking for The Unit Normal Table What proportion of people had a score higher than z=1.5? 1. 2. 3. draw a picture of the area you are looking for draw the line where the z score is located (z=1.5) shadow the area you are asked for (people who scored higher so shadow right of line) 4. then look up in tables the area in the tail for z=1.5 ANSWER: the proportion of people with a z-score higher 4 than 1.5 is .0668 3 2 1 0 Z=1.5 Proportion=.0668 28 The Unit Normal Table You can also use the table to determine areas between scores, i.e. How many people are between z=-.5 & z=.5? 1. Draw the picture of the area you are looking for (We know that the whole area under the curve equals 1) 2. Draw the lines where the z-scores are located 3. Shadow the area you are asked for 4. The light areas are the tails provided in the tables Area A= .31, Area B= .31. So the shadowed area equals 1- Area A - Area B = 1 - .31 - .31 = .38 ANSWER: the proportion of people between z=-.5 & z=.5 is 0.38 4 Z= -.5 Z= .5 Proportion= .38 3 Area B= .31 Area A= .31 2 1 0 The Unit Normal Table Another method would be 1. 2. 3. 4. 5. Draw the picture of the area you are looking for Draw the lines where the z-scores are located Shadow the area for the body of z= .5 (Area C= .69) Shadow the area for the tail of z= -.5 (Area D= .31) Substract the tail of z=-.5 from the body of z=.5 Area C - Area D = .69 - .31 = .38 4 3 2 1 04 3 2 4 1 3 0 2 1 0 ANSWER: the proportion of people between z=-.5 & z=.5 is 0.38 4 3 Area D= .31 2 1 0 Z= -.5 Z= .5 Area C= .69 Proportion= .38