Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Descriptive Statistics “Data have a story to tell. Statistical analysis is detective work in which we apply our intelligence and our tools to discover parts of that story.” -Hamilton (1990) Objectives: 1. Explain the general role of statistics in assessment & evaluation 2. Explain three methods for describing a data set: shape, center, and spread 3. Explain the relationship between the standard deviation and the normal curve Levels of Measurement Nominal Ordinal Interval Ratio Determining what statistics are appropriate Nominal Naming things. Creating groups that are qualitatively different or unique… But not necessarily quantitatively different. Nominal Placing individuals or objects into categories. Making mutually excusive categories. Numbers assigned to categories are arbitrary. Nominal Sample variables: – – – – – Gender Race Ethnicity Geographic location Hair or eye color Ordinal Rank ordering things. Creating groups or categories when only rank order is known. Numbers imply order but not exact quantity of anything. Ordinal The difference between individuals with adjacent ranks, on relevant quantitative variables, is not necessarily the same across the distribution. Ordinal Sample variables: – Class Rank – Place of finish in a race (1st, 2nd, etc.) – Judges ratings – Responses to Likert scale items (for example – SD, D, N, A, SA) Interval Orders observations according to the quantity of some attribute. Arbitrary origin. Equal intervals. Equal differences expressed as equal distances. Interval Sample variables: – Test Scores • SAT • GRE • IQ tests – Temperature • Celsius • Fahrenheit Ratio Quantitative measurement. Equal intervals. True zero point. Ratios between values are useful. Ratio Sample variables: – Financial variables – Finish times in a race – Number of units sold – Test scores scaled as percent correct or number correct Levels of Measurement Review What level of measurement? – Today is a fall day. – Today is the third hottest day of the month. – The high today was 70o Fahrenheit. – The high today was 20o Celsius. – The high today was 294o Kelvin. Levels of Measurement Review What level of measurement? – Student #1256 is: – a male – from Lawrenceville, GA. – He came in third place in the race today. – He scored 550 on the SAT verbal section. – He has turned in 8 out of the 10 homework assignments. Levels of Measurement Review What level of measurement? – Student #3654 is: – in the third reading group. – Nominal? – Ordinal? – Interval? – Ratio? Descriptive Statistics Used to describe the basic features of a batch of data. Uses graphical displays and descriptive quantitative indicators. The purpose of descriptive statistics is to organize and summarize data so that the data is more readily comprehended. That is, descriptive statistics describes distributions with numbers. Five Descriptive Questions What is the middle of the set of scores? How spread out are the scores? Where do specific scores fall in the distribution of scores? What is the shape of the distribution? How do different variables relate to each other? Five Descriptive Questions Middle Spread Rank or Relative Position Shape Correlation Middle Mean Median Mode Examples of these measures Mean of: 2, 3, 6, 7, 3, 5, 10 (2 + 3 + 6 + 7 + 3 + 5 + 10)/ 7 = 36/ 7 = 5.14 Mode of: 2, 3, 6, 7, 3, 5, 10 is 3 Median of: 2, 3, 6, 7, 3, 5, 10 First data is ordered: 2, 3, 3, 5, 6, 7, 10. Middle value is 5 therefore that is the median. Some Important Points Mode is the only descriptive measure used for nominal data Median is unaffected by extreme values, it is resistant to extreme observations. Mean or Average is affected by extremely small or large values. We say that it is sensitive or nonresistant to the influence of extreme observations. The mean is the balance point of the distribution. In symmetric distributions the mean and median are close together. More important points In skewed data the mean is pulled to the tail of the distribution. Median is not necessarily preferred over the mean even if it is resistant. However if data is known to be strongly skewed then the median is preferable. Finally, the average is usually the measurement of central tendency of choice because it is stable during sampling. Spread Standard Deviation Variance Range IQR Describing Data: Center & Spread How do measures of variability differ when distributions are spread out? Large S X = 50 (S = 20) X = Mean S = Standard Deviation Average or Normal S Small S X = 50 (S = 10) X = 50 (S = 5) Rank or Relative Position Five number summary Min, 25th, 50th, 75th, Max Identifying specific values that have interpretive meaning Identifying where they fall in the set of scores Box plots Outliers Shape Positive Skewness Negative Skewness Normality Histograms Shape - Normality 100 60 80 50 40 122 233 60 40 40 20 Std. Dev = 4.84 30 Mean = 38.0 184 71 125 9 N = 344.00 0 25.0 30.0 27.5 35.0 32.5 40.0 37.5 45.0 42.5 50.0 47.5 20 N= Scanning 344 Scanning Shape- Positive Skewness 50 4.5 4.0 40 29 104 107 256 336 27 110 3.5 30 3.0 2.5 20 2.0 10 Std. Dev = .56 1.5 Mean = 2.10 N = 344.00 0 1.0 13 4. 88 3. 63 3. 38 3. 13 3. 88 2. 63 2. 38 2. 13 2. 88 1. 63 1. 38 1. 13 1. .5 N= Total for IIP 344 Total for IIP Shape – Negative Skewness 40 4.5 4.0 30 3.5 3.0 20 79 130 2.5 10 2.0 91 119 1.5 111 64 118 Std. Dev = .42 Mean = 3.32 N = 154.00 0 00 4. 75 3. 50 3. 25 3. 00 3. 75 2. 50 2. 25 2. 00 2. 75 1. 50 1. 1.0 N= PREACT 154 PREACT Describing Data: Center & Spread Relating the Standard Deviation (S) to the normal distribution. “68-95-99.7% Rule” When a distribution of data resembles a normal distribution (or normal curve): 68% of the data lies within + or – 1 standard deviation 95% of the data lie within + or – 2 standard deviations 99.7% of the data lie within + or – 3 standard deviations from the mean 68% 95% 99.7% Outliers 50 40 120 82 71 30 61 220 336 18 300 329 11 85 276 125 196 321 107 20 10 0 -10 N= 344 BDI Total Outliers BDI Total 140 120 100 80 Frequency 60 40 Std. Dev = 7.10 20 Mean = 7.1 N = 344.00 0 0.0 10.0 5.0 BDI Total 20.0 15.0 30.0 25.0 40.0 35.0 Outliers Statistics BDI Total N Mean Median Mode Std. Deviation Variance Minimum Maximum Percentiles Valid Missing 25 50 75 344 0 7.12 5.00 0 7.101 50.426 0 40 2.00 5.00 10.00