Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Important Properties of Distributions: • • Focus is on summarizing the distribution as a whole, rather than individual values The distribution of points represents a combination of: – – • Common patterns or group conditions Unique features or individual conditions How to summarize/describe these: a) Numerically (in statistical indexes) b) Visually (in images and pictures) c) Verbally (in words and phrases) (Parenthetical Note) • Note: focus here is on summarizing the distribution of one variable at a time – – • This is called univariate analysis or statistics Unique features or individual conditions If we consider the combined (joint) distribution of multiple variables (to see how they are interrelated): – – Analyzing two variables jointly is called bivariate analysis Analyzing three or more variables jointly is called multivariate analysis Important Properties of Distributions: 1) Central Tendency: • • What is the typical or average value of the distribution? Where is the middle of the data? 2) Variation: • • How wide are the data points spread out? (range) How concentrated are the data points within the distribution? (variance) 3) Size: • How numerous are the data points in the distribution? 4) Symmetry: (also called skew) • How lop-sided is the distribution across its range? 5) Peakiness (also called modality): • • • Are all data points smoothly spread over the values? Are there notable peaks or lumps are in the distribution? How many and how sharp are the peaks? Central Tendency: (3 common measures) 1) Mode: • • The most common, popular, or “typical” value. Applies to all levels of variables – nominal & up 2) Median: • • • The “midpoint” (50/50) of ordered distribution. Divides distribution into upper and lower halves. Variable must be at least ordinal level (ordered). 3) Mean: • • • The “average” (“center of gravity”) of the values Weighted by the size or value of the data points. Variable need to be interval level (at least). Which one is the correct measure of Central Tendency? 1) Depends on the type of data • • • Nominal = mode Ordinal = median Interval/Ratio = mean (quasi-interval?) 2) Depends on the distribution of the variable • • • Highly skewed or weirdly distributed variables Unusual or extreme outliers (AKA the “Bill Gates effect” or the “New York City effect”) Variables with infinitely many “unique” values How to compute measures of Central Tendency? A. By hand (& calculator)? • • See the textbook and the handouts Notice difference between formulas for: (a) data list, (b) frequency table, © grouped distribution B. By SPSS? • Use one of 3 procedures: 1) Frequencies command compute more kinds of statistics and accompanying chart; more detailed output 2) Descriptives command quickly compute most common statistics but no median and no charts 3) Explore command wider array of information Shape of Distribution: properties 1) Symmetry: • • “Lopsidedness” unevenness around center “Skew” = the technical name for asymmetry – – • • Skew = direction of the longer tail Left-Skew = negative; Right-Skew = positive Some statistics assume symmetric distribution If symmetric, mean & median = same 2) Peakiness: • • Multi-modality number of peaks “Kurtosis” sharpness of peaks 3) Truncation: • Some values are excluded or “censored” How to tell Shape of a Distribution? 1) Look at frequency table (if # values = small): 2) Look at frequency graph: • • Bar chart or line graph (if # values = small) Histogram (if # values = large) 2) Compare values of median and mean: • • • Difference between Mean & Median = skew If Mean > Median: skewed to the right If Mean < Median: skewed to the left 3) Box Plots: Bar Chart Histogram How to tell Shape of a Distribution? 3) Box Plots: How to tell Shape of a Distribution? 3) Box Plots: Variation (the spread) of the data): 1) Range: • The difference between the highest and lowest values in the distribution 2) Inter-Quartile Range: • The difference (range) between the 25th & 75th percentiles (lowest & highest quarters) of the distribution. (span of the middle 50%) 3) Variance (& standard deviation): • • • The total amount of variance around the mean. Counts the amount but not direction of deviation. Weights large deviations more heavily. How to compute the Variance: 1) Compute the Mean of the distribution 2) Compute the deviation of each score from the Mean of the distribution 3) Square the deviations from the mean 4) Add all the squared deviations together 5) Divide by the total number of scores To Compute the Standard Deviation: 1) Take the square root of the variance 2 Measures of Variance? • Note 2 slightly different formulas: – Population/Description formula: 2 ( x x ) i i N – Sample/Estimation formula: 2 ( x x ) i i N1 How to compute the Variance: Note two different computing strategies that yield the same answers: 1) Definitional Formula: • • • Requires computing the mean first & then deviations Uses deviation scores and decimal fractions Messier computations (with decimal fractions) 2) Computational Formula: • • • Computations occur in the same step Does not compute deviations Simpler computations (decimals only at the end)