Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 4: Using and Reporting Standardized Test Results Assessment In Early Childhood Education Fifth Edition Sue C. Wortham Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Chapter Objectives 1. Explain the difference between norm-referenced and criterion-referenced tests. 2. List common characteristics of norm-referenced and criterion-referenced tests. 3. Explain the advantages and disadvantages of using standardized tests. 4. Understand how test scores are interpreted and reported. 5. Describe how individual and group test results are used to report student progress and program effectiveness. 6. Discuss the advantages and disadvantages of using norm-referenced and criterion-referenced tests with young children. 7. Understand the difficulties in using standardized tests with young children. Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Distinctions Between Norm-Referenced and Criterion-Referenced Tests • Norm-referenced tests provide information on how the performance of an individual compares with that of a known group. • Criterion-referenced tests provide information on how the individual performed on some standard or objective (without considering the performance of others). Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Common Characteristics of Norm- and Criterion-Referenced Tests • Require a relevant and representative sample of test items • Require specification of the achievement domain to be measured • Use the same type of test items • Use the same rules for item writing (except for item difficulty) • Judged by the same qualities of goodness (validity and reliability) • Useful in educational measurement (Linn and Gronlund, 2000) Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Aptitude vs. Achievement Tests Aptitude Tests Achievement Tests Predict a student’s ability to learn a skill or accomplish a task. (Stanford Binet, Wechsler, SAT when used to predict success) Measure what the student has learned or mastered. (California Achievement, IOWA Basic Skills, SAT when used to determine what has been learned) 15.5 Uses of Norm-Referenced Tests with School-Age Children Achievement tests are: • given to measure and analyze individual and group performance resulting from the educational program • analyzed for trends in achievement • used to describe the program effectiveness areas of weakness and strength, and plans can be made to improve curriculum Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. How Criterion-Referenced Tests with Preschoolers Are Used • • • Developmental screening Diagnostic evaluation Instructional planning Developmental screenings determine whether further evaluation is needed to identify disabilities and strategies for remediation. Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Reasons Criterion-Referenced Tests Are Used with School-Age Children • Achievement test scores describe individual performance and are used to plan instruction for groups and individual students. • Diagnostic evaluation intelligence batteries in academic content areas are used with students who demonstrate learning difficulties. Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Savant Syndrome condition in which a person otherwise limited in mental ability has an exceptional specific skill Calculation abilities Drawing Musical Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. The Psychometric Approach Intelligence • A single attribute? – Spearman (1863-1945) 2 – factor theory of intelligence “g” = general ability “s” = special abilities Figure 9.3 According to Spearman (1904), all intelligent abilities have an area of overlap, which he called (for “general”). Each ability also depends partly on an s (for “specific”) factor. Figure 9.4a Measurements of sprinting, high jumping, and long jumping correlate with one another because they all depend on the same leg muscles. Similarly, the g factor that emerges in IQ testing could reflect a single ability that all tests tap. • Many attributes? – Thurstone: 7 primary mental abilities • Spatial ability, perceptual speed, numeric reasoning, verbal meaning, word fluency, memory, inductive reasoning What is Intelligence? • Fluid intelligence and crystallized intelligence – Cattell & Horn believed that the “g” factor has two components: - Fluid intelligence is the power of reasoning, solving unfamiliar problems, seeing relationships and gaining new knowledge - Crystallized intelligence is acquired knowledge and the application of that knowledge to experience. Concept Check: A 16-year-old is learning to play chess and is becoming proficient enough to be accepted into the school’s chess club. Is this fluid or crystallized intelligence? Concept Check: • Ten years later, the chess player achieves grandmaster status. Is this a result of fluid or crystallized intelligence? Gardner’s Theory of Multiple Intelligences Logical-Mathematical Linguistic Musical Spatial Bodily-Kinesthetic Interpersonal Intrapersonal Naturalistic Existential Copyright © Allyn & Bacon 2006 Gardner’s Multiple Intelligences Sternberg’s Triarchic Theory • Contextual Component (“street smarts or practical”) – Adapting to the environment • Experiential Component: (creative) – Response to novelty – Automatization • Componential Component (“academic or analytical”) – Information processing – Efficiency of strategies Theories and Tests of Intelligence • IQ tests – Intelligence quotient (IQ) tests attempt to measure an individual’s probable performance in school and similar settings. Binet (1857-1911) and Simon created 1st IQ test in 1905 Binet Intelligence Tests Mental Age Intelligence Quotient (IQ) • An individual’s level of mental development relative to others MentalAge IQ = 100 Chronological Age 4.23 Theories and Tests of Intelligence • The Stanford-Binet test – The Stanford-Binet test - V (2-85) – The mean or average IQ score for all age groups is designated as 100 ± 15 (85-115). – Given individually Interpreting Test Scores • A child’s performance on a standardized test is meaningless until it can be compared with other scores. • A raw score is translated into a standard score that reports how well the child’s performance compares with that of other children who took the same test. • The bell-shaped normal curve is the graph on which the distribution of standard scores is arranged. Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. The Normal Curve • Represents the ideal normal distribution of test scores • The scores are distributed in a bell-shaped frequency polygon, with most scores clustered toward the center of the curve – (see Figure 4-5 on p. 87 of the text) • Standard deviations are used to calculate how an individual scored, compared with the scores of the norming group Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Normal Distribution Normal Distribution © 2006 The McGraw-Hill Companies, Inc. All rights reserved. Santrock, Educational Psychology, Second Edition, Classroom Update Bell Curve 4.28 Individual Intelligence Tests The Wechsler Scales Overall IQ and also verbal and performance IQs. (WPPSI-III) Wechsler Preschool and Primary Scale of Intelligence-Revised. Ages 2 ½ to 7 years, 3 months (WISC-IV) Wechsler Intelligence Scale for Children-Revised. Ages 6 to 16 years, 11 months (WAIS-IV) Wechsler Adult Intelligence ScaleRevised Ages 16 to 90 years, 11 months WPPSI-III WPPSI 4.33 WAIS-III 4.34 WISC-IV • Word Reasoning—measures reasoning with verbal material; child identifies underlying concept given successive clues. • Matrix Reasoning—measures fluid reasoning a (highly reliable subtest on WAIS® –III and WPPSI™–III); child is presented with a partially filled grid and asked to select the item that properly completes the matrix. • Picture Concepts—measures fluid reasoning, perceptual organization, and categorization (requires categorical reasoning without a verbal response); from each of two or three rows of objects, child selects objects that go together based on an underlying concept. • Letter-Number Sequencing—measures working memory (adapted from WAIS–III); child is presented a mixed series of numbers and letters and repeats them numbers first (in numerical order), then letters (in alphabetical order). • Cancellation—measures processing speed using random and structured animal target forms (foils are common non-animal objects). WAIS - IV Theories and Tests of Intelligence • Raven’s Progressive Matrices – Psychologists created “culture-reduced” tests without language. It tests abstract reasoning ability (non-verbal intelligence or performance IQ) 4.42 Descriptive statistics are the mathematical procedures that are used to describe and summarize data. Counting the Data-Frequency Look at the set of data that follows on the next slide. A tally mark was made to count each time a score occurred Which number most likely represents the average score? Which number is the most frequently occurring score? Frequency Distribution Scores 100 99 98 94 90 89 88 82 75 74 68 60 Tally 1 1 11 11 1111 1111 11 1111 1111 1111 1 11 1 1 1 Frequency 1 1 2 2 5 7 10 6 2 1 1 1 Average Score? 88 Most Most Frequent Score? 88 Tally 1 1 11 11 1111 1111 11 1111 1111 1111 1 11 1 1 1 This frequency count represents data that closely represent a normal distribution. Descriptive Statistics 15.48 Frequency Polygons Data 100 99 98 98 94 94 90 90 90 90 90 89 89 89 89 88 88 75 75 74 68 60 5 4 3 2 1 60 68 74 75 88 89 90 94 98 99 100 Scores © 2006 The McGraw-Hill Companies, Inc. All rights reserved. Santrock, Educational Psychology, Second Edition, Classroom Update Measures of Central Tendency Measures of central tendency provide information about the average or typical score in a data set Mean: The numerical average of a group of scores Median: The score that falls exactly in the middle of a data set Mode: The score that occurs most often 15.50 Central tendency = representative or typical value in a distribution X Mean M Same thing as an average N Computed by Summing all the scores (sigma, ) Dividing by the number of scores (N) Mean- To find the mean, simply add the scores and divide by the number of scores in the set of data. 98 + 94 + 88 + 75 = 355 Divide by the number of scores: 355/4 = 88.75 Mean Measures of Central Tendency • Steps to computing the median 1. Line up scores from highest to lowest 2. Count up to middle score • If there is 1 middle score, that’s the median • If there are 2 middle scores, median is their average Median-The Middlemost point in a set of data Data Set 1 100 99 99 98 97 96 90 88 85 80 79 Data Set 2 Median 96 100 99 98 97 86 82 78 72 70 68 The median is 84 for this set. 84 represents the middle most point in this set of data. Mode-The most frequently occurring score in a set of data. Find the modes for the following sets of data: Data Set 3 99 89 89 89 89 75 Mode: 89 Data set 4 99 88 88 87 87 72 70 88 and 87 are both modes for this set of data. This is called a bimodal distribution. Measures of Variability (Dispersion) Range- Distance between the highest and lowest scores in a set of data. 100 - 65 = 35 35 is the range in this set of scores. Variance - Describes the total amount that a set of scores varies from the mean. 1. Subtract the mean from each score. When the mean for a set of data is 87, subtract 87 from each score. 100 - 87 = 13 98- 87 = 11 95- 87 = 8 91- 87 = 4 85- 87 = -2 80- 87 = -7 60- 87 = -27 2. Next-Square each differencemultiply each difference by itself. 13 x 13 = 169 11 x 11 = 121 8 x 8= 64 4 x 4 = 16 -2 x -2 = 4 -7 x -7 = 49 -27x -27= + 729 1,152 3. Sum these differences Sum of squares 4. Divide the sum of squares by the number of scores. 1,152 divided by 7 =164.5714 This number represents the variance for this set of data. Standard Deviation-Represents the typical amount that a score is expected to vary from the mean in a set of data. 5. To find the standard deviation, find the square root of the variance. For this set of data, find the square root of 164.5714. The standard deviation for this set of data is 12.82 or 13. Ceiling and Floor Effects • Ceiling effects – Occur when scores can go no higher than an upper limit and “pile up” at the top – e.g., scores on an easy exam, as shown on the right – Causes negative skew • Floor effects – Occur when scores can go no lower than a lower limit and pile up at the bottom – e.g., household income – Causes positive skew Skewed Frequency Distributions • Normal distribution (a) • Skewed right (b) – Fewer scores right of the peak – Positively skewed – Can be caused by a floor effect • Skewed left (c) – Fewer scores left of the peak – Negatively skewed – Can be caused by a ceiling effect © 2006 The McGraw-Hill Companies, Inc. All rights reserved. Santrock, Educational Psychology, Second Edition, Classroom Update Understanding Descriptive Statistics The Normal Distribution: A “bell-shaped” curve in which most of the scores are clustered around the mean; the farther from the mean, the less frequently the score occurs. 15.68 © 2006 The McGraw-Hill Companies, Inc. All rights reserved. Santrock, Educational Psychology, Second Edition, Classroom Update Commonly Reported Test Scores Based on the Normal Curve 15.69 Z Scores • When values in a distribution are converted to Z scores, the distribution will have – Mean of 0 – Standard deviation of 1 • Useful – Allows variables to be compared to one another even when they are measured on different scales, have very different distributions, etc. – Provides a generalized standard of comparison Z Scores • To compute a Z score, subtract the mean from a raw score and divide by the SD • To convert a Z score back to a raw score, multiply the Z score by the SD and then add the mean (X M ) Z SD X ( Z )( SD) M The Normal Curve • Derived scores are used to specify where the individual score falls on the curve and how far above or below the mean the score falls • Raw scores are transformed into percentiles, stanine or other standard scores • All scoring scales are drawn parallel to the baseline of the normal curve; and use the deviation from the mean as the reference to compare an individual score with the mean score of a group Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Percentile Ranks • Percentiles represent the point on the normal curve below which a percentage of test scores is distributed. • A student’s percentile rank on a test indicates the percentage of students who scored lower in the comparison group. For example, if a student is ranked in the 55th percentile, the student’s score was 55% better than the comparison group who took the test. Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Stanines Parents find stanine results easiest to understand because their child’s standardized test scores are reported as: 9 Very superior 8 Superior 7 Considerably above average 6 Slightly above average 5 Average 4 Slightly below average 3 Considerably below average 2 Poor 1 Very poor Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Z Scores and T Scores • Called standard scores • Report how many standard deviations a transformed raw score is located above or below the mean Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Grade Equivalent Scores Test publishers recommend that grade equivalents not be used to report to parents because they may not understand that the score does not mean the child should be placed in a higher or lower grade. • Grade level results are compared with test results from grades above and below the grade, indicating whether the child performed above or below average. • The grade equivalent score does not indicate grade level placement in school. Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Reporting Standardized Test Results Both norm- and criterion-referenced information can be organized in a useful form. • Scores can be reported for an individual, a class, a grade, a school, and a district. • Strengths and weaknesses can be analyzed by content areas, by school, and by grade level. • Achievement can be compared over several years to determine long-term improvement or decline. Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Reporting Test Results to Parents A parent–teacher conference may be used to report test results. The teacher should explain: • both the value and the limitations of the test scores • why the test was chosen • how the results will be used - for example, to plan appropriate learning experiences for their child Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Advantages of Standardized Tests Norm-referenced and criterion-referenced achievement tests provide valuable information regarding the effectiveness of curriculum and instruction. • Teachers can determine curriculum strengths and weaknesses. • Individual students’ reports determine who would benefit from additional instruction and those who are ready to move to more advanced learning experiences. Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Advantages of Standardized Tests Standardized tests have unique qualities that are advantageous: • Uniformity in test administration • Quantifiable scores • Norm referencing • Validity and reliability Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Disadvantages of Standardized Tests • Standardized tests are not necessarily the best method of evaluation of young children. • A variety of strategies should be used in assessing children. Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Concerns About the Use of Standardized Tests • Use of tests with children from a different culture or whose first language is not English • Use of standardized tests to deny children entrance to school, or retention in grade Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. No Child Left Behind Act Assessment of Students with Disabilities and/or Limited English Proficiency (LEP) • NCLB requires that all students be assessed regardless of their special needs • Accommodations have been made for students with disabilities and for those who speak a language other than English or have limited English • Limitations of the tests designed for NCLB when used with these populations has become an issue Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Standardized Tests Have Effects On Curriculum And Instruction • Standardized tests only sample a few of the curriculum objectives. • Pressures for higher test scores result in limitations on the curriculum that is taught. • Instruction becomes focused on what will be tested and limits the balance of the curriculum. Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved. Misapplication of Test Results Using of standardized tests to decide school entry or the placement into early childhood programs is inappropriate because: • tests do not differentiate between limited intelligence and limited opportunities to learn • decisions on enrollment, retention, and placement in special classes should never be based on a single test score • other sources of information, including systematic observation and samples of children’s work, should be a part of the evaluation process Wortham. Assessment in Early Childhood Education, 5e. © 2008 by Pearson Education, Inc. All Rights Reserved.