* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 4. Variability
Sufficient statistic wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Misuse of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Variability Quantitative Methods in HPELS HPELS 6210 Agenda Introduction Frequency Range Interquartile range Variance/SD of population Variance/SD of sample Selection Introduction Statistics of variability: Describe how values are spread out Describe how values cluster around the middle Several statistics Appropriate measurement depends on: Scale of measurement Distribution Basic Concepts Measures of variability: Frequency Range Interquartile range Variance and standard deviation Each statistic has its advantages and disadvantages Agenda Introduction Frequency Range Interquartile range Variance/SD of population Variance/SD of sample Selection Frequency Definition: The number/count of any variable Scale of measurement: Appropriate for all scales Only statistic appropriate for nominal data Statistical notation: f Frequency Advantages: Ease of determination Only statistic appropriate for nominal data Disadvantages: Terminal statistic Calculation of the Frequency Instat Statistics tab Summary tab Group tab Select group Select column(s) of interest OK Agenda Introduction Frequency Range Interquartile range Variance/SD of population Variance/SD of sample Selection Range Definition: The difference between the highest and lowest values in a distribution Scale of measurement: Ordinal, interval or ratio Range Advantages: Ease of determination Disadvantages: Terminal statistic Disregards all data except extreme scores Calculation of the Range Instat Statistics tab Summary tab Describe tab Calculates OK range automatically Agenda Introduction Frequency Range Interquartile range Variance/SD of population Variance/SD of sample Selection Interquartile Range Definition: The difference between the 1st quartile and the 3rd quartile Scale of measurement: Ordinal, interval or ratio Example: Figure 4.3, p 107 Interquartile Range Advantages: Ease of determination More stable than range Disadvantages: Disregards quartiles all values except 1st and 3rd Calculation of the Interquartile Range Instat Statistics tab Summary tab Describe tab Choose additional statistics Choose interquartile range OK Agenda Introduction Frequency Range Interquartile range Variance/SD of population Variance/SD of sample Selection Variance/SD Population Variance: Scale of measurement: Interval or ratio Advantages: The average squared distance/deviation of all raw scores from the mean The standard deviation squared Statistical notation: σ2 Considers all data Not a terminal statistic Disadvantages: Not appropriate for nominal or ordinal data Sensitive to extreme outliers Variance/SD Population Standard deviation: The average distance/deviation of all raw scores from the mean The square root of the variance Statistical notation: σ Scale of measurement: Interval or ratio Advantages and disadvantages: Similar to variance Calculation of the Variance Population Why square all values? If all deviations from the mean are summed, the answer always = 0 Calculation of the Variance Population Example: 1, 2, 3, 4, 5 Mean = 3 Variations: 1 – 3 = -2 2 – 3 = -1 3–3=0 4–3=1 5–3=2 Sum of all deviations = 0 Sum of all squared deviations Variations: 1 – 3 = (-2)2 = 4 2 – 3 = (-1)2 = 1 3 – 3 = (0)2 = 0 4 – 3 = (1)2 = 1 5 – 3 = (2)2 = 4 Sum of all squared deviations = 10 Variance = Average squared deviation of all points 10/5 = 2 Calculation of the Variance Population Step 1: Calculate deviation of each point from mean Step 2: Square each deviation Step 3: Sum all squared deviations Step 4: Divide sum of squared deviations by N Calculation of the Variance Population σ2 = SS/number of scores, where SS = Σ(X - )2 Definitional formula (Example 4.3, p 112) or ΣX2 – [(ΣX)2] Computational formula (Example 4.4, p 112) Computational formula Step 4: Divide by N Computation of the Standard Deviation Population Take the square root of the variance Agenda Introduction Frequency Range Interquartile range Variance/SD of population Variance/SD of sample Selection Variance/SD Sample Process is similar with two distinctions: Statistical notation Formula Statistical Notation Distinctions Population vs. Sample σ2 = s 2 σ=s =M N=n Formula Distinctions Population vs. Sample s2 = SS / n – 1, where SS = Σ(X - M)2 Definitional formula ΣX2 - [(ΣX)2] Computational formula Why n - 1? N vs. (n – 1) First Reason General underestimation of population variance Sample variance (s2) tend to underestimate a population variance (σ2) (n – 1) will inflate s2 Example 4.8, p 121 Actual population σ2 = 14 Average biased s2 = 63/9 = 7 Average unbiased s2 = 126/9 = 14 N vs. (n – 1) Second Reason Degrees of freedom (df) df = number of scores “free” to vary Example: Assume n = 3, with M = 5 The sum of values = 15 (n*M) Assume two of the values = 8, 3 The third value has to be 4 Two values are “free” to vary df = (n – 1) = (3 – 1) = 2 Computation of the Standard Deviation of Sample Instat Statistics tab Summary tab Describe tab Calculates OK standard deviation automatically Agenda Introduction Frequency Range Interquartile range Variance/SD of population Variance/SD of sample Selection Selection When to use the frequency Nominal data With the mode When to use the range or interquartile range Ordinal data With the median When to sue the variance/SD Interval or ratio data With the mean Textbook Problem Assignment Problems: 4, 6, 8, 14.