Scales, Transformations, & Norms Norms Norm-Referenced Test : one of the most useful ways of describing a person’s performance on a test is to compare his/her test score to the test scores of some other persons or group of people. • Norms are average scores computed for a large representative sample of the population. • The arithmetic average (mean) is used to judge whether a score on the scale above or below the average relative to the population of interest. • a representative sample is required to ensure meaningful comparisons are made. Norms (cont.) • No single population can be regarded as the normative group. • a representative sample is required to ensure meaningful comparisons are made. • When norms are collect from the test performance of groups of people these reference groups are labeled normative or standardized samples. Norms (cont.) • The normative sample selected as the normative group, depends on the research question in particular. • It is necessary that the normative sample selected be representative of the examinee and of the research question to be answered, in order for meaningful comparisons to be made. • For example: tests measuring attitudes towards federalism having norm groups consisting of only students in the province of Quebec might be very useful for interpretation regionally in Quebec, however their generalizability in other parts of the country (Yukon, Toronto, Ontario) would be suspect. Sample Groups Although the three terms below are used interchangeably, they are different. Standardized Sample - is the group of individuals on whom the test is standardized in terms of scoring procedures, administration procedures, and developing the tests norms. (e.g., sample used in technical manual) Normative Sample - can refer to any group from which norms are gathered. Norms collected after test is published Reference Group - any group of people against which test scores are compared. (e.g., a designated group such as students in 3090.03 or World Champions) Types of Norms Norms can be Developed: • Locally • Regionally • Nationally Normative Data Can be Expressed By: • Percentile Ranks • Age Norms • Grade Norms Local Norms • Test users may wish to evaluate scores on the basis of reference groups drawn from specific geographic or institutional setting. For Example Norms can be created employees of a particular company or the students of a certain university. • Regional & National norms examine much broader groups. Subgroup Norms • When large samples are gathered to represent broadly defined populations, norms can be reported in aggregate or can be separated into subgroup norms. • Provided that subgroups are of sufficient size and fairly representative of their categories, they can be formed in terms of: - Age - Sex - Occupation - Education Level Or any other variable that may have a significant impact on test scores or yield comparisons of interest. Percentile Ranks • The most common form of norms and is the simplest method of presenting test data for comparative purposes. • The percentile rank represents the percentage of the norm group that earned a raw score less than or equal to the score of that particular individual. For example, a score at the 50th percentile indicates that the individual did as well or better on the test than 50% of the norm group. • When a test score is compared to several different norm groups, percentile ranks may change. For example, a percentile rank on a mathematical reasoning test may be lower when comparing it to math grade students, than music students. Age Norms • Method of describing scores in terms of the average or typical age of the respondents achieving a specific test score. • Age norms can be developed for any characteristic that changes systematically with age. • In establishing age norms, we need to obtain a representative sample at each of several ages and measure the particular age related characteristic in each of these samples. • It is important to remember that there is considerable variability within the same age, which means that some children at one age will perform similar to children at other ages. Grade Norms • Most commonly used in school settings. • Similar to age norms except the baseline is grade level rather than age. • It is important to remember that there is considerable variability within individuals of different grade, which means that some children in one grade will perform similar to or below children in other grades. • One needs to be extremely careful when interpreting grade norms not to fall into the trap of saying that, just because a child obtains a certain grade-equivalent on a particular test, he/she is the same grade in all areas. Evaluating Suitability of a Normative Sample • How large is the normative sample? • When was the sample gathered? • Where was the sample gathered? • How were individuals identified and selected? • What was the composition of the normative sample? - age, sex, ethnicity, education level, socioeconomic status Caution When Interpreting Norms • Norms are not based on samples that adequately represent the type of population to which the examinee’s scores are compared. • Normative data can become outdated very quickly. • The size of the sample taken. Setting Standards/Cutoffs • Rather than finding out how you stand compared to others, it might be useful to compare your performance on a test to some external standard. For Example - if most people in class get an F on a test and you get a D, your performance in comparison to the normative group is good. However, overall your score is not good. Criterion-Referenced Tests - assesses your performance against some set of standards. (e.g., school tests, Olympics) Cutoff Scores - 1 SD?, 2 SD? Raw Scores • Raw scores are computed for instruments using Likert scales (interval or ordinal) by assigning scores to responses and totaling the scores of the items. - For positively phrase items, e.g., “ I think things will turn out right” 5=Always, 4=Often, 3=Sometimes, 2=Seldom, 1=Never - For positively phrase items, e.g., “ I think things will turn out right” 1=Always, 2=Often, 3=Sometimes, 4=Seldom, 5=Never • The raw score would be the sum of the scores for pertinent items. • The problem with raw scores are that they are fairly meaningless without some sort of benchmark with which to make a comparison (e.g., What would a raw score of 30 on an Optimism Transformations • Raw scores (i.e., simplest counts of behaviour sampled by a measuring procedure) do not always provide useful information. • It is often necessary to reexpress, or transform raw scores into some more informative scale. •The simplest form of transformation is changing raw scores to percentages. For Example If a student answers 35 questions out of 50 correctly on a test, that student’s score could be reexpressed as a score of 70%. Linear Transformations • Changes the units of measurement, while leaving the interrelationship unaltered. • An advantage of this procedure is that the normally distributed scores of tests with different means and score ranges can be meaningfully compared and averaged. • Most familiar linear transformation is the z score. Standard Scores • Standard scores allow each obtained score to be compared to the same reference value. • In order to facilitate comparison between obtained scores and the scores of other individuals (i.e., the normative sample), as well as comparison among the various scales and instruments. • Standard scores are calculated from raw scores such that each scale and subscale will have the same mean (or average) score and standard deviation. For example, IQ scores are transformed so that the average score is 100, with a SD of 15. Z Scores •A z-score tells how many standard deviations someone is above or below the mean. Simply put, the mean of the distribution is given the z value of zero (0) and is standard deviation is counted by ones. •A z-score of -1.4 indicates that someone is 1.4 standard deviations below the mean. Someone who is in that position would have done as well or better than 8% of the students who took the test. • To calculate a z-score, subtract the mean from the raw score and divide that answer by the standard deviation. (i.e., raw score =15, mean = 10, standard deviation = 4. Therefore 15 minus 10 equals 5. 5 divided by 4 equals 1.25. Thus the z-score is 1.25.) • Z scores have negative values, which can be difficult to interpret to test users. How can you explain an examinee that his z score is -1.5? For this reason it is often convenient to perform a linear transformation on z-scores to convert them to values that are easier to record or explain. The general form of such a transformation is: T Scores T-Scores (or standardized scores) are a conversion (transformation) of raw individual scores into a standard form, where the conversion is made without knowledge of the population's mean and standard deviation. • • The scale has a mean set at 50 and a standard deviation at 10. T = 50 + l0 x z score • An advantage of using a T-Scores is that none of the scores are negative. Area Transformations • Area transformations do more than simply put scores on a new and more convenient scale -- it changes the point of reference. • Area transformations adjust the mean and standard deviation of the distribution into convenient units. • Advantages of area transformations are obvious. Out of the infinite number of possible empirical distributions of test scores, the normal distribution is most frequently assumed and approximated. It is also most frequently studied, in considerably greater detail than other possible test score distributions. • Normalization thus allows the application of knowledge concerning properties of standard normal distribution toward the interpretation of the obtained scores. Normal Distribution Curve • Many human variables fall on a normal or close to normal curve including IQ, height, weight, lifespan, and shoe size. • Theoretically, the normal curve is bell shaped with the highest point at its center. The curve is perfectly symmetrical, with no skewness (i.e., where symmetry is absent). If you fold it in half at the mean, both sides are exactly the same. •From the center, the curve tapers on both sides approaching the X axis. However, it never touches the X axis. In theory, the distribution of the normal curve ranges from negative infinity to positive infinity. •Because of this, we can estimate how many people will compare on specific variables. This is done by knowing the mean and standard deviation. Normal Distribution The bell-shaped curve has the following properties: 1. bilaterally symmetrical (right and left halves are mirror images) 3. the limits of the curve are plus and minus infinity, so the tails of the curve will never quite touch the baseline 4. about 68% of the total area of the curve lies between one standard deviation below the mean and one standard deviation above the mean 5. about 95% of the total area of the curve lies between two standard deviations below the mean and two standard deviations above the mean 6. about 99.8% of the total area of the curve lies between three standard deviations below the mean and three standard deviations above the mean. Skewness • Skewness is the nature and extent to which symmetry is absent. Positive Skewness - when relatively few of the scores fall at the high end of the distribution. For Example - positively skewed examination results may indicate that a test was too difficult. Negative Skewness - when relatively few of the scores fall at the low end of the distribution. For Example - negatively skewed examination results may indicate that a test was too easy. Standard Deviations The standard deviation represents the average distance each score is from the mean. Use of Standard Deviations with Norms: • Knowing the average of a population allows for a determination as to whether a particular respondent scored above or below that average, but does not indicate how much above or below average the score falls. Standard Deviation plays a role in this. • Scores within 1 SD of average are pretty much in the middle cluster of the population. Scores between 1 & 2 SDs from the average are moderately above or below the average , and scores 2 SDs from the average are markedly for above or below the average.