Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Research Methods & Design in Psychology Lecture 3 Descriptives & Graphing Lecturer: James Neill Overview • Univariate descriptives & graphs • Non-parametric vs. parametric • Non-normal distributions • Properties of normal distributions • Graphing relations b/w 2 and 3 variables Empirical Approach to Research A positivistic approach ASSUMES: • the world is made up of bits of data which can be ‘measured’, ‘recorded’, & ‘analysed’ • Interpretation of data can lead to valid insights about how people think, feel and behave What do we want to Describe? Distributional properties of variables: • Central tendency(ies) • Shape • Spread / Dispersion Basic Univariate Descriptive Statistics Central tendency • Mode • Median • Mean Shape • Skewness • Kurtosis Spread • Interquartile Range • Range • Standard Deviation • Variance Basic Univariate Graphs • • • • Bar Graph – Pie Chart Stem & Leaf Plot Boxplot Histogram Measures of Central Tendency • Statistics to represent the ‘centre’ of a distribution – Mode (most frequent) – Median (50th percentile) – Mean (average) • Choice of measure dependent on – Type of data – Shape of distribution (esp. skewness) Measures of Central Tendency Mode Median Mean Nominal X Ordinal X X Interval X X X Ratio X? X X Measures of Dispersion • Measures of deviation from the central tendency • Non-parametric / non-normal: range, percentiles, min, max • Parametric: SD & properties of the normal distribution Measures of Dispersion Range, Min/Max Percentile s SD Nominal Ordinal X Interval X X X? Ratio X X X Describing Nominal Data • Frequencies – Most frequent? – Least frequent? – Percentages? • Bar graphs – Examine comparative heights of bars – shape is arbitrary • Consider whether to use freqs or %s Frequencies • Number of individuals obtaining each score on a variable • Frequency tables • graphically (bar chart, pie chart) • Can also present as % Frequency table for sex S E X C u m u la t iv e F r e q u e n c yP e r c e n tV a lid P e r c e n t P e r c e n t V a lidf e m a le 1 4 7 0 . 0 7 0 . 0 7 0 . 0 m a le 6 3 0 . 0 3 0 . 0 1 0 0 . 0 T o t a l 2 0 1 0 0 . 0 1 0 0 . 0 Bar chart for frequency by sex SEX 16 14 12 10 8 6 Frequency 4 2 0 female SEX male Pie chart for frequency by sex SEX male female Bar chart: Do you believe in God? 60 50 Count 40 30 20 10 0 No Sort of Do you believe in God? Yes Bar chart for cost by state Bar chart vs. Radar Chart Bar Chart of Sorted Factor Effect Sizes Time 1 to 2 0.45 0.40 Effect size 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 Time Management Social Competence Achievement Motivation Intellectual Flexibility Task Leadership Factors Emotional Control Active Initiative Self Confidence Bar chart vs. Radar Chart Radar Chart of Factor Effect Sizes Time 1 to 2 Time Management Social Competence 0.60 0.40 Self Confidence 0.20 Achievement Motivation 0.00 Active Initiative Intellectual Flexibility Emotional Control Task Leadership Mode • Most common score - highest point in a distribution • Suitable for all types of data including nominal (may not be useful for ratio) • Before using, check frequencies and bar graph to see whether it is an accurate and useful statistic. Describing Ordinal Data • Conveys order but not distance (e.g., ranks) • Descriptives as for nominal (i.e., frequencies, mode) • Also maybe median – if accurate/useful • Maybe IQR, min. & max. • Bar graphs, pie charts, & stem-&-leaf plots Stem & Leaf Plot • Useful for ordinal, interval and ratio data • Alternative to histogram Box & whisker • Useful for interval and ratio data • Represents min. max, median and quartiles Describing Interval Data • Conveys order and distance, but no true zero (0 pt is arbitrary). • Interval data is discrete, but is often treated as ratio/continuous (especially for > 5 intervals) • Distribution (shape) • Central tendency (mode, median) • Dispersion (min, max, range) • Can also use M & SD if treating as continuous Describing Ratio Data • Numbers convey order and distance, true zero point - can talk meaningfully about ratios. • Continuous • Distribution (shape – skewness, kurtosis) • Central tendency (median, mean) • Dispersion (min, max, range, SD) Univariate data plot for a ratio variable Mean <-Kurt-> The Four Moments of a Normal Distribution <-SD-> <-Skew Skew-> The Four Moments of a Normal Distribution Four mathematical qualities (parameters) allow one to describe a continuous distribution which as least roughly follows a bell curve shape: • • • • 1st = mean (central tendency) 2nd = SD (dispersion) 3rd = skewness (lean / tail) 4th = kurtosis (peakedness / flattness) Mean (1st moment ) • Average score • Mean = X / N • Use for ratio data or interval (if treating it as continuous). • Influenced by extreme scores (outliers) Standard Deviation (2nd moment ) • SD = square root of Variance = (X - X)2 N–1 • Standard Error (SE) = SD / square root of N Skewness (3rd moment ) • • • • • • Lean of distribution +ve = tail to right -ve = tail to left Can be caused by an outlier Can be caused by ceiling or floor effects Can be accurate (e.g., the number of cars owned per person) Skewness (3rd moment ) • Negative • Positive Ceiling Effect Floor Effect Kurtosis (4th moment ) • • • • Flatness or peakedness of distribution +ve = peaked -ve = flattened Be aware that by altering the X and Y axis, any distribution can be made to look more peaked or more flat – so add a normal curve to the histogram to help judge kurtosis Kurtosis (4th moment ) Red = Positive (leptokurtic) Blue = negative (platykurtic) Key Areas under the Curve for Normal Distributions • For normal distributions, approx. +/- 1 SD = 68% +/- 2 SD ~ 95% +/- 3 SD ~ 99.9% Areas under the normal curve Types of Non-normal Distribution • • • • • • Bi-modal Multi-modal Positively skewed Negatively skewed Flat (platykurtic) Peaked (leptokurtic) Non-normal distributions Non-normal distributions Rules of Thumb in Judging Severity of Skewness & Kurtosis • View histogram with normal curve • Deal with outliers • Skewness / kurtosis <-1 or >1 • Skewness / kurtosis significance tests Histogram of weight Histogram 8 6 4 Frequency 2 Std. Dev = 17.10 Mean = 69.6 N= 20.00 0 40.0 50.0 WEIGHT 60.0 70.0 80.0 90.0 100.0 110.0 Histogram of daily calorie intake Histogram of fertility 1 60 50 Frequency 40 30 20 10 Mean =81.21 Std. Dev. =18.228 N =188 0 0 20 40 60 80 100 120 140 2 Count 60 40 20 0 Very feminine Fairly feminine Androgynous Fairly masculine Femininity-Masculinity Very masculine 3 Gender: male 50 40 Count 30 20 10 0 Fairly feminine Androgynous Fairly masculine Very masculine 4 Gender: female Count 60 40 20 0 Very feminine Fairly feminine Androgynous Fairly masculine Very masculine 60 5 50 Frequency 40 30 20 10 0 0 50 100 150 Exercise (mins/day) 200 250 Skewed Distributions & the Mode, Median & Mean • +vely skewed mode < median < mean • Symmetrical (normal) mean = median = mode • -vely skewed mean < median < mode Effects of skew on measures of central tendency More on Graphing (Visualising Data) Edward Tufte Graphs: Reveal data Communicate complex ideas with clarity, precision, and efficiency Tufte's Guidelines 1 • • • • • Show the data Substance rather than method Avoid distortion Present many numbers in a small space Make large data sets coherent Tufte's Guidelines 2 • Encourage eye to make comparisons • Reveal data at several levels • Purpose: Description, exploration, tabulation, decoration • Closely integrated with statistical and verbal descriptions Tufte’s Graphical Integrity 1 • Some lapses intentional, some not • Lie Factor = size of effect in graph size of effect in data • Misleading uses of area • Misleading uses of perspective • Leaving out important context • Lack of taste and aesthetics Tufte's Graphical Integrity 2 • Trade-off between amount of information, simplicity, and accuracy • “It is often hard to judge what users will find intuitive and how [a visualization] will support a particular task” (Tweedie et al) Chart scale Chart scale Chart scale Cleveland’s Hierarchy Volume Fa Ethso Mo iop zam ia biq ue Ke ny Mo a Ba rocco ng lad esh Ind Pa ia kis tan Eg yp t Bu rki na $millioninfoodaid(1988) Food Aid Received by Developing Countries 350 300 250 200 150 100 50 0 Percentage of Doctors Devoted Solely to Family Practice in California 1964-1990 Distortive Variations in Scale Distortive Variations in Scale Restricted Scales Restricted Scales Example Graphs Depicting the Relationship between Two Variables (Bivariate) People Histogram Separate Graphs Example Graphs Depicting the Relationship between Three Variables (Multivariate) Clustered bar chart 19 vs. 20 century causes of death Demographic distribution of age Where partners first met Line graph Line graph Causes of Mortality Bivariate Normality Exampes of More Complex Graphs Sea Temperature Sea Temperature Inferential Statistical Analaysis Decision Making Tree Links • Presenting Data – Statistics Glossary v1.1 http://www.cas.lancs.ac.uk/glossary_v1.1/presdata.html • A Periodic Table of Visualisation Methods - http://www.visualliteracy.org/periodic_table/periodic_table.html • Gallery of Data Visualization • Univariate Data Analysis – The Best & Worst of Statistical Graphs - http://www.csulb.edu/~msaintg/ppa696/696uni.htm • Pitfalls of Data Analysis – http://www.vims.edu/~david/pitfalls/pitfalls.htm • Statistics for the Life Sciences – http://www.math.sfu.ca/~cschwarz/Stat301/Handouts/Handouts.html