Download Descriptive Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
1
Descriptive Statistics
Outline
• Measures of Central Tendency
• Mode
• Median
• Mean
• Measures of Variability
• Range
• Variance & standard deviation
Lecture 1
2
Measures of Central Tendency
• Measures of central
tendency tell you what
is true on average
• 3 such measures are
used regularly:
 Mode
 Median
 Mean
Lecture 1
3
The Mode
• The most common
value(s) in a data set.
5 6 6 6 7 7 8 9 10 10 10 11 12
• This is a bi-modal
distribution (it has 2
modes):
Mode 1
Lecture 1
Mode 2
4
The Median
• The median of a dataset is the value in the
5 6 6 6 7 7 8 9 10 10 10 11 12
middle
• That is, half of the
scores in the set lie
above and half lie below
6 scores Median
it:
Lecture 1
5
The Mean
• The mean is the
arithmetic average of a
set of numbers.
• Most frequently used
of the three measures
• Most useful when
distribution is not
“skewed.”
Lecture 1
6
Note 1: Skewed distributions
• A skewed distribution
is one in which the
scores “pile up” at one
end of the scale, but are
less frequent at the
other end
4 5 7 8 11 14
(not skewed)
2 4 8 22 300 14000
(skewed)
Scores are rare
among the larger
numbers
Lecture 1
7
Measures of Central Tendency
• We distinguish
between SAMPLE and
POPULATION values.
• Sample values are
shown as English letters
• Population values are
shown as Greek letters
Lecture 1
8
Measures of Central Tendency
• A sample mean is the
average of all values in
the sample.
• Sample mean: X
• A population mean is
the average of all
values in the population
• Population mean: 
(“Mu”)
• This is pronounced “Xbar”
Lecture 1
9
Note 2: Sigma notation
• The Greek letter ∑ indicates the addition
operation.
4
∑(xi) = x1 + x2 + x3 + x4
i=1
Lecture 1
10
The Mean - calculations
4 5 7 8 11 14
Σx  “The sum of X”
Σx = 49
n = 6  because there
are 6 observations in
the data set
n=6
X = Σx = 49 = 8.17
n
6
“X – bar”
Lecture 1
11
The Mean - calculations
4 5 7 8 11 14
2 4 8 22 300 14000
Σx = 49
Σx = 14336 n = 6
n=6
X = Σx = 49 = 8.17
n
6
X = Σx = 14336
n
6
= 2389.333
Lecture 1
12
Measures of Variability
• With any sample
mean, the question
arises, “how useful is
this number – to what
extent is it descriptive of
the data set?”
• The answer depends
upon how variable the
data set is – how similar
each data point is to all
the other data points in
the set.
Lecture 1
13
Measures of variability
• The range.
 The distance between
the highest and lowest
numbers in the data set
 Simplest and least
useful measure of
variability is the range
5 6 6 6 7 7 8 9 10 10 10 11 12
Here, the range is 12 – 5 = 7
Lecture 1
14
Measures of Variability
• The Variance
 measures how much
on average each data
point is different from the
others.
 much more useful than
the range
• To compute variance:
1. Find mean, X
2. Subtract X from each
data point Xi
3. Square differences &
add squared values up
4. Divide total by n-1
Lecture 1
15
The variance
• Why do we square the
differences before
adding them up?
• Because if we didn’t,
the differences would
always add up to zero.
• Sample variance: S2
(“S-squared”)
• Population variance:
σ2 (“sigma squared”)
• Important for you to
understand how S2 is
different from σ2.
Lecture 1
16
Conceptual Formula for the Variance
S2 = (Xi – X)2
n-1
Why “n – 1”?
S2 is the sample variance
because X is the sample
mean
Lecture 1
17
Note: degrees of freedom
• Suppose we have six
scores, and we know
that X = 8
• Suppose five of the six
scores are:
3, 5, 9, 10, 12
• What is the last score?
Lecture 1
18
Note: degrees of freedom
• Sum of the five scores: There are n-1 degrees
of freedom in n scores
3+5+9+10+12 = 39
• X = 8 so sum of all six
scores is 6 * 8 = 48
• So, unknown score is
48 – 39 = 9
Given n – 1 scores and
the sample mean, there
is no uncertainty about
what the remaining
score is (that score is
not free to vary).
Lecture 1
19
18
17
?
The
sum=of145
the
10
x 14.5
nine observations
shown
is 125.
145
– 125
= 20
10
8
11
14
19
12
16
Lecture 1
X = 14.5.
Thus,
there are
nine degrees of
What is in
theten
missing
freedom
observation?
observations
20
Computational Formula for the Variance
Note these are
Xs, not X-bars
S2 = X2 – (X)2
n
(n-1)
Note: S = S2
Lecture 1