Download Chapter 4. Variability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Variability
Quantitative Methods in HPELS
HPELS 6210
Agenda
Introduction
 Frequency
 Range
 Interquartile range
 Variance/SD of population
 Variance/SD of sample
 Selection

Introduction

Statistics of variability:
 Describe
how values are spread out
 Describe how values cluster around the middle

Several statistics  Appropriate measurement
depends on:
 Scale
of measurement
 Distribution
Basic Concepts

Measures of variability:
 Frequency
 Range
 Interquartile
range
 Variance and standard deviation

Each statistic has its advantages and
disadvantages
Agenda
Introduction
 Frequency
 Range
 Interquartile range
 Variance/SD of population
 Variance/SD of sample
 Selection

Frequency
Definition: The number/count of any
variable
 Scale of measurement:

 Appropriate
for all scales
 Only statistic appropriate for nominal data

Statistical notation: f
Frequency

Advantages:
 Ease
of determination
 Only statistic appropriate for nominal data

Disadvantages:
 Terminal
statistic
Calculation of the Frequency  Instat
Statistics tab
 Summary tab
 Group tab

 Select
group
 Select column(s) of interest
 OK
Agenda
Introduction
 Frequency
 Range
 Interquartile range
 Variance/SD of population
 Variance/SD of sample
 Selection

Range
Definition: The difference between the
highest and lowest values in a distribution
 Scale of measurement:

 Ordinal,
interval or ratio
Range

Advantages:
 Ease

of determination
Disadvantages:
 Terminal
statistic
 Disregards all data except extreme scores
Calculation of the Range  Instat
Statistics tab
 Summary tab
 Describe tab

 Calculates
 OK
range automatically
Agenda
Introduction
 Frequency
 Range
 Interquartile range
 Variance/SD of population
 Variance/SD of sample
 Selection

Interquartile Range
Definition: The difference between the 1st
quartile and the 3rd quartile
 Scale of measurement:

 Ordinal,
interval or ratio
 Example: Figure 4.3, p 107
Interquartile Range

Advantages:
 Ease
of determination
 More stable than range

Disadvantages:
 Disregards
quartiles
all values except 1st and 3rd
Calculation of the Interquartile
Range  Instat
Statistics tab
 Summary tab
 Describe tab

 Choose
additional statistics
 Choose interquartile range
 OK
Agenda
Introduction
 Frequency
 Range
 Interquartile range
 Variance/SD of population
 Variance/SD of sample
 Selection

Variance/SD  Population

Variance:




Scale of measurement:


Interval or ratio
Advantages:



The average squared distance/deviation of all raw scores from
the mean
The standard deviation squared
Statistical notation: σ2
Considers all data
Not a terminal statistic
Disadvantages:


Not appropriate for nominal or ordinal data
Sensitive to extreme outliers
Variance/SD  Population

Standard deviation:

The average distance/deviation of all raw
scores from the mean
The square root of the variance
Statistical notation: σ

Scale of measurement:


Interval or ratio
Advantages and disadvantages:

Similar to variance
Calculation of the Variance  Population


Why square all values?
If all deviations from the mean are
summed, the answer always = 0
Calculation of the Variance  Population



Example: 1, 2, 3, 4, 5
Mean = 3
Variations:









1 – 3 = -2
2 – 3 = -1
3–3=0
4–3=1
5–3=2
Sum of all deviations = 0
Sum of all squared
deviations
Variations:





1 – 3 = (-2)2 = 4
2 – 3 = (-1)2 = 1
3 – 3 = (0)2 = 0
4 – 3 = (1)2 = 1
5 – 3 = (2)2 = 4
Sum of all squared
deviations = 10
Variance = Average squared deviation of all points 
10/5 = 2
Calculation of the Variance  Population




Step 1: Calculate deviation of each point
from mean
Step 2: Square each deviation
Step 3: Sum all squared deviations
Step 4: Divide sum of squared deviations
by N
Calculation of the Variance  Population

σ2 = SS/number of scores, where SS =
 Σ(X
- )2
Definitional formula (Example 4.3, p 112)
 or

 ΣX2

– [(ΣX)2]
Computational formula (Example 4.4, p 112)
Computational formula
Step 4: Divide by N
Computation of the Standard Deviation  Population

Take the square root of the variance
Agenda







Introduction
Frequency
Range
Interquartile range
Variance/SD of population
Variance/SD of sample
Selection
Variance/SD  Sample

Process is similar with two distinctions:

Statistical notation
 Formula
Statistical Notation Distinctions
Population vs. Sample




σ2 = s 2
σ=s
=M
N=n
Formula Distinctions
Population vs. Sample

s2 = SS / n – 1, where SS =

Σ(X - M)2


Definitional formula
ΣX2 - [(ΣX)2]

Computational formula
Why n - 1?
N vs. (n – 1)  First Reason

General underestimation of population
variance
Sample variance (s2) tend to underestimate
a population variance (σ2)
 (n – 1) will inflate s2
 Example 4.8, p 121

Actual population σ2 = 14
Average biased s2 = 63/9 = 7
Average unbiased s2 = 126/9 = 14
N vs. (n – 1)  Second Reason

Degrees of freedom (df)
 df
= number of scores “free” to vary
 Example:
Assume n = 3, with M = 5
 The sum of values = 15 (n*M)
 Assume two of the values = 8, 3
 The third value has to be 4
 Two values are “free” to vary
 df = (n – 1) = (3 – 1) = 2

Computation of the Standard
Deviation of Sample  Instat



Statistics tab
Summary tab
Describe tab
 Calculates
 OK
standard deviation automatically
Agenda







Introduction
Frequency
Range
Interquartile range
Variance/SD of population
Variance/SD of sample
Selection
Selection

When to use the frequency
 Nominal
data
 With the mode

When to use the range or interquartile range
 Ordinal
data
 With the median

When to sue the variance/SD
 Interval
or ratio data
 With the mean
Textbook Problem Assignment

Problems: 4, 6, 8, 14.