* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Measures of Central Tendency and Dispersion
Survey
Document related concepts
Transcript
Deviations from Normality Skewness and Kurtosis Why do we care if the distribution is not normal? • It helps you understand how a characteristic exhibits itself in your sample or in the population. • It impacts what descriptive statistics you might use. • It impacts the inferential statistics you might use. Skewness The majority of scores do not fall in the middle of the distribution. The distribution is asymmetrical You label the kind of skew according to the longer tail of the distribution. Normal versus Skewed Distributions Frequency en.wikipedia.org/wiki/Image:Standard_deviation_diagram.png Long tail is on the positive end Positive skew Long tail is on the negative end Negative skew Normal Distribution Mean Median Mode What Positive Skew Means If this represented the results of a quiz, the majority of the participants scored very low— almost no one scored in the highest range. Mode 5 Median This test must have been very Mean difficult? 1 2 3 4 easy or 0 10 20 30 40 50 60 70 80 90 100 What Negative Skew Means 5 Mode If this represented the results of a quiz, the majority of the participants scored high— almost no one scored in the lowest range. Median 4 Mean This test must have been very difficult? 1 2 3 easy or 0 10 20 30 40 50 60 70 80 90 100 • Normal vs. Skewed Distributions Income Distributions http://www.city-data.com/city/Chicago-Illinois.html Kurtosis leptokurtic platykurtic Kurtosis Leaping leptokurtic distribution 0 10 20 30 40 50 60 70 80 90 100 Platykurtic Platykurtic like a platypus distribution 0 10 20 30 40 50 60 70 80 90 100 leptokurtic Normal (mesokurtic) platykurtic Sprinthall - Quick Kurtosis Rule When you have a distribution and How you determine this: know its standard deviation and Range = Standard Value range, you can estimate its 6 kurtosis. Compare actual sd to SV Fact: For a normal distribution, the standard deviation is about If sd > SV, platykurtic 1/6 of the range. If the standard deviation is more than 1/6 of the range, then a If sd < SV, leptokurtic distribution is platykurtic. If the standard deviation is less than 1/6 of the range, then a distribution is leptokurtic. Dr. Bellini’s MCC Research 47/6 = SV SV = 7.83 8.98 > 7.83 Platykurtic Dr. Bellini’s MCC Research 19/6 = SV SV = 3.17 3.52 > 3.17 Platykurtic Skew 100 125 75 Count Count 100 75 50 50 25 25 0 $20,000 $40,000 $60,000 Beginning Salary $25,000 $50,000 $75,000 $100,000 Current Salary $125,000 C:\Program Files\SPSS\University of Florida graduate salaries.sav Statistics 125 Starting Salary N Mean Median Mode Std. Deviation Range Count 100 75 50 25 10000 20000 30000 40000 Starting Salary 50000 60000 1100 26064.20 26000.00 20000 6967.982 58300 Measurement Scales Measurement • Assigning numbers to observations following a set of rules. How are numbers assigned to observations? What scale is used? 1. Nominal 2. Ordinal 3. Interval 4. Ratio Nominal Data • Using numbers to label categories, but the numbers have no inherent numerical qualities • Male = 1 Female = 2 • social security number • jersey numbers • race/ethnicity Other examples of nominal scaling • Whether a participant does or does not have a driver’s license (0,1) • Whether the participant belongs to the experimental group or the control group (0,1) • The school the participant attends (1,2,3,4,5) Uses of nominal data • Generally, the most you can do with nominal data is count it. Categorical or Continuous? • Is a variable that uses a nominal scale of measurement categorical or continuous? Ordinal Scaling • The assigned number provides information about the rank of an observation. • Ordinal scales put observations in order. • Rating scales are often considered ordinal in how they measure characteristics. Example Strongly Agree agree Statistical Thinking is my favorite class 1 Neither agree Disagree nor disagree 2 X Hospital pain scales (1-10) 3 Strongly disagree 4 5 Ordinal Scales have two basic rules: 1. Equality/non-equality rule 2. Greater-than-or-less-than rule Ranking of Tennis Players http://sports.espn.go.com/sports/tennis/rankings?sport=WOMRANK Uses of ordinal data You can express if something is greater than or less than (but you can’t express how much greater than or less than). < > • Strongly agreeing is more than simply agreeing • Being tenth in the class is lower than being ninth How much lower? Issues There are limitation of what you can do statistically with ordinal data Degree of violation Consensus NOIR Interval Scaling Satisfies all the requirements of an ordinal scale (there is a high to low structure to the scale) and • The intervals between the points on the scale become meaningful because the distance between successive points on an interval scale are equal. Examples • Degrees Fahrenheit or Celsius • Calendar years Interval scales do not have a meaningful zero point. They may contain a zero point, but it is arbitrary. NOIR Ratio • Satisfies all the requirements of an interval scale • There is a real and meaningful zero point on a ratio scale • Weight, height, heart rate, breaths per minute, degrees Kelvin, annual income, miles per hour, pulse, etc. HINT: If your scale has negative numbers, like with temperature, then it is interval but it probably isn’t ratio. Our Survey R N O N N R N O O R Measures of Central Tendency and Dispersion Describing data Describing Data vs. Describing People Measures of Central Tendency •Mean (M or X) •Median (Md) •Mode (Mo) Issues with Describing Data 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 4 4 4 4 4 4 4 4 4 5 5 6 6 6 6 n = 34 Mean = 3.15 ??? What does that mean? The Mean Most common synonym is the “average” But what does the mean “mean”? Definitions: • “arithmetic average” • a descriptor of the center of the data, when data are distributed “normally” When a mean value is most useful • To simply summarize a data set that is normally distributed • To summarize data from a sample that can be used to estimate information about an entire population When a mean can be less useful as s descriptor • To summarize data that is skewed (especially when it used as the only descriptor) • To summarize data where there is an outlier • To summarize data measured using a nominal scale Mean and Skew- Find the mean annual income $ 7,200 $ 9,011 $ 20,074 $ 24,999 $ 36,567 $ 32,145 $ 54,158 $567,987 $94,018 How well does this number represent a measure of the center of our data set? Median a.k.a. Middle score in a ranked set of scores. It divides the distribution of scores into equal halves. When there is an odd number of scores: 1 2 2 3 5 5 7 8 10 15 16 16 21 When there is an even number of scores: 4 5 7 15 16 19 31 32 Average = (15+16)/2 = 15.5 Median Score Mean and Skew- Find the mean annual income $7,200 $28,572 $9,011 $20,074 $24,999 How well does this number represent a measure of the center of our data set? 2.0 $32,145 $54,158 $567,987 Count $36,567 1.5 1.0 0.5 0.0 10000.00 20000.00 30000.00 40000.00 VAR00002 50000.00 Mode Most frequently occurring score is a group of scores. Exam scores: 78 85 92 55 87 85 98 84 71 88 85 78 65 99 100 85 62 100 Measures of central tendency Guide to which measures of central tendency are appropriate to use with each scale of measurement: Mean Median Mode Nominal X Ordinal (X)? X X Interval X X X Ratio X X X Measures of central tendency Guide to which measures of central tendency are appropriate to use with each scale of measurement: Mean Median Mode Nominal X Ordinal (X)? X X Interval Ratio X X X X X X Mean, Median and/or Mode Class Survey Age Variable Describe this dataset in terms of its Mean 6 Median Mode Count 4 Range 2 0 30 40 50 age Descriptive Statistics N age Valid N (listwise) 14 14 Range 37 Mean 36.79 Std. Deviation 10.714 Class Survey Commute Variable Describe this dataset in terms of its 8 Mean Count 6 Median 4 Mode 2 Range What about this data set makes it difficult to describe using measures of central tendency? Descriptive Statistics N travel Valid N (listwise) 14 14 Range 64.5 Mean 10.914 Std. Deviation 16.3124 10.0 20.0 30.0 40.0 50.0 60.0 travel Commute Giving more information, such as how skewed this variable is, would be helpful in the description.