Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CENTRAL TENDENCY, VARIABILITY, NORMAL CURVE, STANDARD SCORES! LECTURE#2 ! PSYC218 ANALYSIS OF BEHAV. DATA ! DR. OLLIE HULME, 2011, UBC! Housekeeping! Earn 10 bucks in experiment at 7.30pm Feedback Assignment released onto vista tonight or early tomorrow due by 5pm next thurs, in-class hard copy only Install SPSS today, no excuses later in the week if you have problems with your disc – patch for mac will be on website tonite. Coglab ‘Memory span’ due next Thursday noon Assignment 1 due next Thursday 5pm Survey monkey due next Tuesday noon www.surveymonkey.com/s/RSZPDPV. You need your coglab ID # (for tracking). Roadmap! Syllabus test Central tendency (Ch4) Variability (Ch4) Normal curve (Ch5) Standard scores (Ch5) SPSS Demo Experiment for money Syllabus quiz! Content a) Only material in the lectures can be tested on the exam b) Material from lectures and textbook can be tested on the exam Exams a) 1 x MT 1 x Final, all in class, all multiple choice b) 2 MT in class, 1 final 30th july, mixture of multiple choice and computational questions c) 1 MT in class, 1 final 30th july, computational questions only Grades Combined total for HSP and clicker participation is a) 5% b) 6% c) 8% d) 2% Central Tendency & Variability! Central Tendency Allows us to describe a group of scores in terms of an average, representative or typical value Mean, Median, Mode How it clusters toward the centre’ Central Tendency: Mean! Same as the average (ex. average exam grade) Symbolized as (X bar) when the calculation is made on sample data (most common) Symbolized as (mu) when the calculation is made on population data Calculating the Mean! = The sum of the scores divided by the number of scores Both formulas identical, just use different symbols so that can differentiate sample from population Calculate the mean on the following sample data: X = [110, 103, 121] X 110 +103 +121 ∑ X= = = 111.33 n 3 Properties of Mean! Sensitive to exact value of all the scores in the distribution If a score is changed the mean will always change since every score is used in its calculation Sample Data: 110, 103, 121 Change one score: 110 changed to 109 If my IQ goes up by 1 point, then the class average IQ will also increase Properties of the Mean! It is very sensitive to extreme scores because every score is used in its calculation Sample Data: 109, 103, 121 Add an extreme score: 1001 Properties of the Mean! The sum of the deviations about the mean equals zero Sample Data: 109, 103, 121 = 111 = (109-111) + (103 -111) + (121 – 111) = (-2) + (-8) + (10) =0 Properties of the Mean! Sum of the squared deviations of all the scores about their mean is a minimum ∑(X − X ) 2 is a minimum While the sum of the squared deviations about the mean does not equal 0 it is smaller than if the squared deviations € the were taken for any value other than mean ∑ (X − X ) 2 X ∑(X −110) 109 (109-‐110)2 = 1 (109-‐111)2 = 4 (109-‐112)2 = 9 103 (103-‐110)2 = 49 (103-‐111)2 = 64 (103-‐112)2 = 81 121 € (121-‐110)2 = 121 (121-‐111)2 = 100 (121-‐112)2 = 81 168 171 2 € 171 The smallest value, is for the mean Properties of the Mean! Under most circumstances it is less subject to sampling variation than the other measures of central tendency (median or mode) Remember a sample is a subset of the population. So we could take many different samples from the same population and we could calculate the mean, median and mode for each of these samples. This would cause the means, modes & medians for these different samples to vary. The means will vary (differ) least across the samples This is main reason the mean is used in inferential statistics rather than the median or mode. The Overall Mean! Sometimes you need to calculate the overall mean from a collection of means of smaller samples Let’s say we have 3 exams: Number of students in exam1 = n1 Mean for exam1 = X1 € mean for 1st exam = 72 and 97 students write that exam. mean for 2nd exam is 68 and 90 students write that exam. mean on the 3rd exam is 65 and 88 students write that exam What is the overall mean for all exams? = 68.45 The Median! The scale value below which 50% of the scores fall If the scores are in a grouped frequency distribution the median equals the 50th Percentile Point (P50 ) Median is the score that is slap bang in the middle of the rank How to find the Median! If the scores are raw (ungrouped) then rank order the scores If there are an odd number of scores median is the centermost score. If there are an even number of scores median is the average of the two centermost scores. Odd Number of Scores Scores: 9,4,2,9,7,5,2,6,3 Rank Ordered: 2,2,3,4,5,6,7,9,9 Median = 5 Even Number of Scores Scores: 9,4,2,9,7,5,2,6 Ranked: 2,2,4,5,6,7,9,9 Median = 5.5 Properties of the Median! Less sensitive to extreme scores than mean Sample Data: 101, 109, 103, 105, 121 = 107.80 Extreme scores will be very high or very low scores so they will be at the ends of the rank ordered scores and therefore won’t be included in the calculation of the median. Mdn = 105 Change last score to an extreme score: 101, 109, 103, 105, 1001 = 283.80 Mdn = 105 Holy crap that was a big one! I can hardly contain my indifference Properties of the Median! Median is more subjectible to sampling variability than the mean but less so than the mode If we took many different samples from the same population and calculated the mean, median and mode for each of these samples, the medians for these samples would vary. The medians for the different samples would vary more than the means but less than the modes. The Mode! Mode is the most frequent score in the distribution For scores in a grouped frequency distribution the mode is the midpoint of the interval with the highest frequency For raw (ungrouped) scores no are calculations necessary just inspect the data to find the most frequent score. Central Tendency & Symmetry! Negative skew Bell-shaped Mean < Median < Mode Mean = Median = Mode Positive skew Mean > Median > Mode Skew! -ve +ve Negative skew is like the bell-shape, but extra stuff at the end toward the negative end of the scale positive skew is like the bell-shape, but extra stuff at the end toward the positive end of the scale Most values are higher Most values are lower Variability! Variability Allows us to describe how spread out or dispersed the scores are Range, Standard Deviation, Variance Measures of Variability: Range! The difference between the highest and lowest scores in the distribution Range = Highest Score – Lowest Score Scores: 2, 3, 7, 18, 6 Range = 18 – 2 = 16 Very crude measure of variability as it only considers the two most extreme scores Standard deviation! The standard deviation is a commonly used measure of the variability in the data. We will calculate it in number of steps, calculating the deviation scores, then the sum of the squares, then plugging it all together in a simple equation Standard Deviation: Deviation scores! Let’s first consider deviation scores which tell us how far away a raw score is from the mean Deviation score is simply the difference between X and the mean Calculate deviation scores for the following sample data: 109, 103, 121 X 109 109-‐111 = -‐2 103 103-‐111 = -‐8 121 121-‐111 = 10 Remember! Standard Deviation: Sum of Squares! The sum of squares is the sum of the squared deviation scores 2 SS = ∑ (X – X) (sum of squared sample scores) (sum of squared population scores) € X 109 -‐2 -‐22 = 4 103 -‐8 -‐82 = 64 121 10 102 = 100 SS = ∑ (X – X) SS = 168 € 2 The Standard Deviation! We are interested in some measure of the average deviation about the mean so we need to divide SS by n SS = n -1 Average squared deviation € We are still in squared units so now we need to take the square root of the average squared deviation = 84 The standard deviation = 9.16 Why n – 1 instead of just n? This is a trick used to prevent the sample underestimating the standard deviation of the population The Standard Deviation! The standard deviation is symbolized as s for sample data) Sum of Squares The standard deviation is symbolized as for population data Sum of Squares Note: the only difference in the formula for the sample standard deviation (s) and the population standard deviation is the denominator (n-1 vs. N). The Deviation Method! Calculate the standard deviation using the following sample data: 1, 2, 3, 6, 8 Step 1: Calculate the mean Step 2: Calculate the deviation scores and the squared deviation scores Continued…! Step 3: Calculate the sum of squares If we we had data from the whole population Step 4: Calculate s Standard Deviation Properties! 1. It is a measure of dispersion relative to the mean 2. It is sensitive to each score in the distribution 3. If only one score is shifted closer to the mean the standard deviation will decrease 4. If only one score is shifted further from the mean the standard deviation will increase 5. It is stable with regard to sampling fluctuations And that sir, is why it is so widely used It is not the average deviation as some people on the internet might say! Variance! Variance = Square of the standard deviation Another way of expressing the spread of the data s2 = € σ2 € 2 X – X ( ) ∑ n –1 ( X – µ )2 ∑ = N (sample variance) (population variance) Illustration! You recently completed a memory test where you were only able to remember 8 of the words the experimenter read to you? How well did I do relative to everyone else You don’t have access to the data or to a grouped frequency distribution so you can’t determine your percentile rank . So how can you figure out how your memory compares to others’ memory? Well what if I gave you the standard deviation and the mean, would that help your fragile little mind? Z-score! = a transformed score that expresses how many standard deviations a raw score is above or below the mean Positive z-score: raw score above mean z = X – µ σ (population data) € Negative z-score: raw score is below mean Value of the z-score indicates how many standard deviations the raw score is from the mean Essentially, how far is the score away from the mean, in units of the standard deviation Transform Raw Score to Z-Score! You recalled 8 words. The mean of the sample is 7 and the standard deviation is 1.88. What is your z score? X=8 =7 s = 1.88 z = 0.53 Your score is 0.53 standard deviations above the mean Comparing Apples and Oranges! You participated in another experiment. This time the researcher was assessing your verbal abilities. You got a score of 28. The mean of the sample is 32.33 and the standard deviation is 9.52. What is your z-score? X = 28 = 32.33 s = 9.52 z = -0.45 Your verbal ability score is .45 standard deviations below the mean. relative to the rest of the group is you memory or your verbal ability better? Clicker Question! Calculate the z score for a score of 25. Assume = 15 and s = 5. a) b) c) d) e) 1 2 3 4 5 Z Scores to Raw Scores! To convert a z-score to a raw score you need to multiply the z-score by the standard deviation and then add the mean X = (z) (s) + (sample data) X = (z) ( ) + (population data) Pretty easy to go back and forth between zscore and raw score if you know the mean and standard deviation This is simply a re-arrangement of this equation (or the equivalent one for population data – not shown ) If you don’t know how to do this, you need to brush up on basic algebra, re-arranging equations. Again Khanacademy.com highly recommended Practice Transforming Z to Raw! Your friend Edgar does the same 2 experiments and determines that he has a z score of -1.60 on the memory test and a z score of 1.02 on the verbal abilities test. What were his raw scores? X = (z)(s) + Memory Test X = (-1.60)(1.88) + 7 = 3.99 Verbal Test X = (1.02)(9.52) + 32.33 = 42.04 Characteristics of Z-scores! The mean of a distribution of z-scores always equals 0 Z score transformations involve subtracting the mean from each raw score. So the overall mean of the z scores will be 0 Z-score is just a deviation score in units of standard deviation Since we know that the mean of the deviations = 0 We know that the mean of the z-scores will also be zero The sum of the deviations about the mean always equals 0. So the average or mean deviation will also equal 0 (0/n=0). Characteristics of Z Scores! The standard deviation of a distribution of z scores always equals 1 Z scores are deviation scores in the metric of the standard deviation. The formula involves dividing the deviation score by the standard deviation. This puts the deviation score in the metric of standard deviation units. So a z-score of 1 means your score is 1 standard deviation above the mean. A z-score of -1 means your score is 1 standard deviation below the mean. So it follows that the standard deviation of a distribution of z-scores will always be 1 (of course the distribution will speak in its own language). fact Characteristics of Z Scores! Z-scores have the same shape as the set of raw scores Z-score transformations only change the values of the scores in a simple way. It takes each raw score, subtracts the mean, and divides by the standard deviation. The shape of the distribution scores and the relative positions of the scores remain intact In technical jargon this is a linear transformation Characteristics of Z Scores! Z-scores have the same shape as the set of raw scores If the distribution of raw scores is negatively skewed the distribution of z scores will also be skewed (the scaling of the x-axis will change though) raw Z-scores If the distribution of raw scores is a normal bell shaped curve then so will the distribution of z-scores raw Z-scores Clicker Question! A z-score of 1.75 means… a) b) c) d) the raw score is below the mean the raw score is 1.75 units above the mean the raw score is 1.75 standard deviations above the mean the average raw score is 1.75 units from the mean The Normal Curve! Many variables in nature fall on a normal curve Important for many inferential statistics (tests of significance) Considered the most prominent distribution in statistics Many variables are normally distributed, height, weight, IQ frequency Variable The Normal Curve! Normal curve is often used as a first approximation to describe random variables that tend to cluster around a single mean value. Commonly used throughout psychology, natural sciences, social sciences as a simple model for complex phenomena It’s prevalence is explained by central limit theorem, which shows that under many conditions the sum of a large number of random variables is distributed approximately normally. Non-normality! Not everything is normally distributed Quantities that grow exponentially, such as prices, incomes or populations, are often skewed to the right, and hence may be better described by other distributions, such as the log-normal distribution or Pareto distribution. .e.g. Reaction times are often not normally distributed Areas Under the Normal Curve ! Normal curve has a precise equation which describes it. For all normal distributions there are special relationships between mean, standard deviation + areas under the curve (Pagano p96) 68.26% 3 stand dev. 2 stand dev. 1 stand. dev. Below mean mean 1 stand. dev. above mean 2 stand dev. 3 stand dev. Areas Under the z-distribution! These percentages are just something we know to be true of all normal distributions – known as the ‘68-95-99.7’ rule, or the empirical rule. 68.26% z scores -3 -2 -1 0 1 2 3 The same relationship holds for normal data that is transformed into zscores, since z-scores are in units of standard deviations IQ scores ! IQ is normally distributed, the average is 100 and standard deviation is 16 IQ: =100 = 16 For any normal distribution this relationship always holds 68.26% z scores -3 -2 -1 0 1 2 3 *note these are population parameters IQ scores interpretation ! % of people with IQ between 100 and 132? 34.13% of people have an IQ 100 - 16 13.59% of people have an IQ 116 - 132 34.13+13.59%= 47.72% 68.26% z scores -3 -2 -1 0 1 2 3 Clicker Question! Based on the previous figure what percentage of scores have z-score values of -3 or lower? a) b) c) d) e) .13% .26% 2.15% 2.28% 4.56% z scores -3 -2 -1 0 1 2 3 Clicker Question! Based on the previous figure what percentage of scores have z-score values greater than 2? a) b) c) d) e) Less than .13% .13% .26% 2.28% 4.56% z scores 2.15% + 0.13% = 2.28% -3 -2 -1 0 1 2 3 Example! I have an IQ of 107 Caclcute Prank from z! Assume we have population data What is Ronald’s percentile rank (what percent of population has a lower IQ than him)? X – µ σ z = Remember for IQ mean = 100 and standard deviation = 16 Step 1: Calculate his z-score € Step 2: Draw a normal curve and place the z-score on the curve (to aid understanding) frequency Step 3: Look up percentage below this z-score z = .44 .44 Z -3 -2 Percentile rank can be calculated from the area of the curve -1 0 1 2 3 Z-score table concept ! In the same way someone has calculated the areas under the curve for the z-scores (-3,-2,-1,1,2,3) in this graph… They have calculated them for the full range of z-scores inbetween and put them in a z-score table found in table A of appendix D of Pagano Using this table allows you to calculate the percentage of scores with a z-score above or below any zscore of interest. Z-score table – Appendix D ! Column A lists all of the various possible z scores Note it only lists positive scores. because the normal curve is symmetrical so the corresponding values for –ve scores are identical. Column B lists the proportion of scores that fall between the z score (listed in column A) and the mean Column C lists the proportion of scores that fall between the zscore (listed in column A) and the closest tail of the distribution For positive z scores it gives the proportion of scores that are higher than the z score, For negative z scores it is proportion lower Clicker Question! You look up a z score of -.50 in Table A. Column C shows that .3085 corresponds to that z score. This means that: a) 30.85% of the scores in the distribution are lower than the z score b) 30.85% of the scores in the distribution are higher than the z score c) 30.85% of the scores lie between the mean and the z score d) 80.85% of the scores in the distribution are higher than the z score e) 80.85% of the scores in the distribution are lower than the z score Back to Ronaldʼs percentile rank! Step 3: Find the corresponding area under the curve by referring to the z-score table .1700 x 100 = 17% .44 Column B method: Z -3 -2 -1 0 1 2 3 Find z = .44 in column A. Column B shows proportion of scores that fall between the mean and the z-score = 0.17 or 17% Total area below this z-score then is 50% (always 50% below mean) + 17% = 67% Percentile rank for IQ of 107 = 67% 67% of scores fall below Ronald’s IQ score Back to Ronaldʼs percentile rank! Step 3: Find the corresponding area under the curve by referring to the z-score table 33% .3300 x 100=33% Column C method: .44 Z -3 -2 -1 0 Find z = .44 in column A Column C shows that the proportion of scores that fall above our z score (since our score is +) is 0.3300 = 33% If 33% fall above then 67% fall below Percentile rank for score of 107 = 67% 1 2 3 Further Illustration! Determine what percentage of people received a score between the score you received (z = .53) and the score your friend received (z = -1.60)? no negative z-scores in the table, so you have to look up 1.60 positive, which gives same answer Column B method B Z-score of -1.60 = 0.4452 = 44.52% people between this score and the mean 44.52% Therefore 64.71% (44.52% + 20.19%) of the scores fall between your score and your friend’s score Z -3 -2 -1.60 -1 20.19% B Z-score of .53 is =0.2019 = 20.19% people between this score and the mean 0 .53 1 2 3 Percentile Points! If the memory test was given to an entire population and = 7 and = 1.88 Percentile point for 75% What is the score below which 75% of the scores fall? What is P75? 25% Z Step 1: Using Table A locate the area in Column C closest to .2500 (25%) and find its z-score Area closest = 0.2514 z value = 0.67 -3 -2 -1 0 .67 1 2 Step 2: Transform z = .67 to a raw score. X = (z)( ) + X = (.67)(1.88) + 7 X = 8.26 So 75% of the population received a memory test score lower than 8.26 3 Further Illustration! If the memory test was given to an entire population and =7 and = 1.88 What are the scores that bound (that define the boundary) the middle 90% of the distribution. 90% 5% Z -3 -2 5% -1 0 1 2 3 Further Illustration! Step 1: Using Table A locate the area in in column C closest to .0500 (5%) Because you want to know the score for which 5% of scores are higher and the score for which 5% of scores are lower as these scores will bound the middle 90%. (100-90)/2 = 5% find the corresponding z-score z=1.65 The other z-score will be -1.65 because both boundaries are the same distance from the mean 90% 5% Z -3 Step 2: -2 -1.65 5% -1 x= 3.90 0 1 1.65 2 3 x = 10.10 Transform z = 1.65 and z = -1.65 to raw scores via X = (z)( ) + ] z = 1.65 x = (1.65)(1.88) + 7 x = 10.10 z = -1.65 x = (-1.65) (1.88) + 7 x = 3.90 The scores 3.90 and 10.10 bound the middle 90% of the distribution