Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Why Statistics ? Notes about behavior Ideas and ramblings from Devore & Peck: Statistics Why Stats? Three Reasons To be informed To understand issues and make decisions The be able to evaluate decisions about your life and those who you may teach Reason One: Being Informed Our life is filled with data but most of that data comes in the form of sound bites. We are not given sufficient details to make our own decisions We are expected to “follow” The average American is data ignorant Not “understanding” data takes away our control of decision making Reason One: Informed Consumer To be in control of your decisions you must be able to: 1. 2. 3. Extract information from charts and graphs Follow the logic of numerical arguments Know the basic rules of how data should be gathered, summarized, and analyzed to draw valid “truthful” statistical conclusions. Reason Two: Understanding and Making Decisions Be able to decide if information is adequate and sufficient to make a decision 1. Know enough to challenge the data presented by virtue of “knowing” about data Analyze the data that is available. Assess assumptions inherent (built into) the type of data collected Draw conclusions & make decisions about the data Assess the risk of an incorrect decision Reason Three: Life Decisions Drug screening for work (or the Olympics): False positives and negatives Criteria to define financial need. Scores on state achievement exams Data about teen accidents that affects the rate of payment Probability of an incorrect medical diagnosis Types of Data Nominal – Naming Ordinal – Ordering Interval – Equal Intervals Ratio – True Zero Working with Interval and Ratio Data Remember Interval and Ratio data are the only two types of data that can be added, subtracted, multiplied, or divided. Note about the use of symbols to indicate operations: There are three symbols that mean “multiply” x , , and ( ) thus 2x3, 2 3, and 2(3) There are four symbols that mean “divide” / (or virgule), ) ― (vinculum) , ÷ (obelus), and ) (a closed parenthesis attached to a vinculum) Measures of Similarity Measures of Central Tendency Mean 2. Median 3. Mode Normal Curve –estimating the population from a sample 1. Mean (Average or (x-bar)) = the sum of scores divided by the number of scores +5+7+3+6+4 = 25 total with 5 scores 5/25 = 5 = Median (middle) "Middle value" of a list. The smallest number such that at least half the numbers in the list are no greater than it. If the list has an odd number of entries, the median is the middle entry in the list after sorting the list into increasing order. If the list has an even number of entries, the median is equal to the sum of the two middle (after sorting) numbers divided by two +5+7+3+6+4 change to … +3+4+5+6+7 = find the middle = 5 True Middle = (5 (scores) + 1) ÷ 2 = 3rd score, 5 (for odd numbers ONLY) Mode (most common) For lists, the mode is the most common (frequent) value. A list can have more than one mode. For histograms, a mode is a relative maximum ("bump") In this list (+5+7+3+6+4) there is no mode but in this list +5+7+3+6+6+4 the most common number is 6 X X X X X X X X 1 2 3 4 5 6 7 Population Estimates- Normal Curve The normal curve (aka Bell curve) is an estimate of the population of all possible instances of an object, event or other entity 68.26% 95.44% 97.74% of the entire population Nature and the role of variability If all students at PHCC were invariable my job would be really easy! Sadly, the students at PHCC exhibit high variability including age, education, socioeconomic status, self-assessment, and educational expectations - so I have lots of work in planning & preparing for class, anticipating prior knowledge, anticipating levels of understanding, and predicting the speed of information delivery. What is variability? Variability refers to the spread of scores is about describing how the data (plural of datum) along the scale of measurement is organized. Data variability describes the ways in which the data are grouped Range, Variance, and Deviation How do we see and tell about variability – the distance (or spread) of scores across the continuum of scores? One way is the range. In the example 3+4+5+6+7 the range is 7-3 = 4 The is not enough to describe the data. The following examples all have a of 5: 2+1+6+1+15 = 25/5 = =5 2+2+1+6+3+3+3+7+18 = 45/9 = =5 Data has a of 5 3+4+5+6+7 = =5 Range = 7-3 = 4 1+1+2+6+15 = 25/5 = =5 Range = 15-1 = 14 1+2+2+3+3+3+6+7+18 = 45/9 = =5 Range = 18-1 =17 Variability I Frequency Reason Academic problems 1. Poor advising or 2. teaching Needed a break 3. Economic reasons 4. Family 5. responsibilities To attend another 6. school Personal problems 7. Other 8. Why students drop out 1 4 1 2 6 2 10 3 5 5 3 15 7 8 0 1 2 3 4 5 Reasons 6 7 8 4 5 6 7 8 Variance Another way to estimate variability is by calculating the variance. The variance is the sum of differences between each score and the . The variance is squared so that any negative numbers do not counterbalance the positive numbers. first calculate the mean of the scores, then measure the amount that each score deviates from the mean and then square that deviation (by multiplying it by itself). Numerically, the variance equals the average of the several squared deviations from the mean High variance Low variance Calculating Variance 1. 2. 3. 4. 5. First calculate the mean of the scores, then measure the amount that each score deviates from the mean then square that deviation (by multiplying it by itself). Add up all of the variance-squared scores Divide by the number of scores (5) +5 +7 +3 +6 +4 Variance = 10 / 5 = 2 – – – – – 5 5 5 5 5 = = = = = 0 =02 = 0 2 =22 = 4 -2 =-22=4 1 =12 = 1 -1=-12 = 1 10 Another Variance First calculate the mean of the scores, then measure the amount that each score deviates from the mean then square that deviation (by multiplying it by itself). Add up all of the variance-squared scores Divide by the number of scores (5) 2 +2 – 5 =-3=-3 =9 2 +1 – 5 =-4=-4 =12 2 +6 – 5 =-1=-1 =1 2 +1 – 5 =-4=-4 =12 2 +15–5 =10=10 =100 Variance = 134 / 5 = 26.8 134 Standard Deviation Is simply the square root of the variance Problem 1 Variance = 10 / 5 = 2 S.D. or s = √2 = 1.414214 Problem 2 Variance = 134 / 5 = 26.8 S.D. or s = √26.8 = 5.176872 Interquartile Range The distance from the 75th percentile to the 25th percentile in a group of scores. As the median divides a data set in half, the quartiles divide the data set into fourths. Hence the second quartile, denoted Q2, is the median. 1+2+2+3+3+3+6+7+18 True Middle = (9 (scores) + 1) ÷ 2 = 5th score, 3 True Middle of lower half = (4 scores) +1) ÷ 2 = 2.5 True Middle of upper half = (4 scores) +1) ÷ 2 = 6.5 Interquartile Range = Q3 – Q1 = 6.5 – 2.5 = 3 The Interquartile range ignores outlier numbers such as the 18 we are interested only in the data above and below Q2. In the above example and do not include Q2 in either score Converted Measures Scores can be converted to a common denominator to provide equated comparisons between groups. Z-scores (standard scores), percentile, and stanine scores are all converted to a common base so that comparisons between groups can be made. Percentiles Raw scores, or total of points a student earns on a tests, are converted into percentage values. There are two statistics used for this purpose: the percentile rank which is a number between 0 and 100 indicating the percent of cases in a norm group falling at or below that score. The percentile is a point on a scale of scores at or below which a given percent of the cases falls. For example, a child who scores at the 42 percentile , is doing as well as, or better than, 42 percent of the students who took the same test. Percentiles are like quartiles, except that they divide the data set into 100 equal parts instead of four equal parts Percentiles Explained The percentile for an observation x is found by dividing the number of observations less than x by the total number of observations and then multiplying this quantity by 100. Once you can calculate Percentitles, you can also determine Deciles and Quartiles. The First Quartile = the 25th Percentile The Second Quartile = the 50th Percentile The Third Quartile = the 75th Percentile Given 45 out of 50 students had test scores less than 80. Since 45/50 = 90%. If you had a score of 80, you were in the 90th percentile 1+2+2+3+3+3+6+7+18 The percentile for a score of 6 = (6 ÷ 9) x 100 = .66667 x 100 = 66.66% So a score of 6 is higher that 66% of the other scores Stanine Scores Stanines The term stanine is derived from “standard nine” Stanine scores range from 1 to 9 with 5 in the center. Except for 1 and 9, each stanine includes a band of scores one half a standard deviation wide. Thus stanine scores are standard scores with a mean of 5 and a standard deviation of 2. Test scores are commonly expressed using these single-digit scores which can help students and parents visualize where someone falls on the test scale. The National Stanine is a scale score that divides the scores of the norming sample into nine groups, ranging from a high of 9 to a low of 1. Stanine 1-3 are generally considered below average, Stanine scores 4-6 average, and Stanine 7-9 above average. Stanine scores have a constant relationship to Percentiles; that is a given Percentile always falls into the same stanine. Stanine 5, for example, always includes Percentiles 41-59. Stanine Example Danville Montessori Third Grade CAT Scores (Total Battery) National Stanine Scale Score National Percentile 1998-1999 1999-2000 2000-2001 7.5 7.2 7.3 740.1 725.3 730.4 95.0 89.0 *** 2001-2002 2002-2003 2003-2004 7.7 7.4 8.6 743.2 732.4 757.0 98.0 *** 97.0 2004-2005 9.0 759.0 *** YEAR *** denotes less than ten students tested, therefore the National Percentile for the group is not computed. Stanine Description The middle stanine is the fifth one; it contains the middle 20% of the scores. Each stanine interval, except the first and last ones, spans half of a standard deviation. 1,2 or 3 = "below average" 4,5 or 6 = "average" 7,8, or 9 = "above average" Stanine Calculation Stanine is calculated from a z-score (2 x z-score) + 5 A mean of 5 and a S.D. of 2 Standard Score When a set of scores are converted to zscores, the scores are said to be standardized and are referred to as standard scores. Standard scores have a mean of 0 and a standard deviation of 1. Stats Interpretation Summary Variability II Estimate of number of each color of M & M’s in a large bag 250 198 200 132 150 132 100 66 66 66 50 Color ue Bl en G re ng e O ra ed R w llo Ye ow n 0 Br Frequency in Large Bag Estimated Number of M & M's N=660 M & M’s Variability I Q: What is the percentage of each color in "M&M's" Chocolate Candies? A: On average, "M&M's" Plain Chocolate Candies and our new "M&M's" Mint Chocolate Candies contain 30% browns, 20% each of yellows and reds and 10% each of oranges, greens, and blue. For "M&M's" Peanut Chocolate Candies, the ratio is 20% each of browns, yellows, reds, greens and oranges. We use the same ratio for our "M&M's" Peanut Butter and Chocolate Total Brown Yellow Red 660 198 132 132 Orange Green 66 66 Blue 66 M & M’s Variability II Estimate of colors present in large bag 30% 20% 20% 10% 250 10% 10% 198 200 132 150 132 100 66 66 66 50 Color ue Bl G re en O ra ng e Re d Ye llo w n 0 Br ow Frequency in Large Bag Estimated Number of M & M's N=660 Count Your M & M’s Use Excel to graph your counts – convert to %. Does it match what the company says? 12 11 10 9 8 7 6 5 4 3 2 1 red blue yellow green brown orange Homework Find the mean, median, & mode for your M & M’s & for the total group Find the Variance and the standard deviation your M & M’s & for the total group Convert your groups M & M’s to Percentiles Answer the question- How does your sample vary from the total group sample Homework Format Results Description of M & M population ( the % estimates from Mars Company) 1. The results (central tendency)…. 2. The variability (range, variance, and standard deviation) 3. Most (color) fell in the 90th percentile, while (and so forth) 4. * provide charts as necessary Description of Sample 1. Within the present sample (do the same as above except for using the sample’s statistics) 2. * Provide charts as necessary Summary 1. Summarize your results by comparing your sample to the population