Download Slides - Open Online Courses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Univariate Descriptive
Statistics
Heibatollah Baghi, and
Mastee Badii
George Mason University
1
Objectives
• Define measures of central tendency
and dispersion.
• Select the appropriate measures to use
for a particular dataset.
2
How to Summarize Data?
• Graphs may be useful, but the
information they offer is often inexact.
• A frequency distribution provides many
details, but often we want to condense a
distribution further.
3
Two Characteristics of
Distributions
1. Measures of Central Tendency.
2. Measures of Variability or Scatter.
4
Measures of Central Tendency:
Mean
 The mean describes the center or the
balance point of a frequency distribution.
The sample mean:
X 
X
n
 Calculate the mean value for the following
data: 23, 23, 24, 25, 25 ,25, 26, 26, 27, 28.
 25.2
5
Measures of Central Tendency:
Mode
• The most frequent value or category in
a distribution.
• Calculate the mode for the following set
of values: 20, 21, 21, 22, 22, 22, 22,
23, 23, 24.
• 22
6
Measures of Central Tendency:
Median
• The middle value of a set of ordered
numbers.
• Calculate for an even number of cases.
• 21, 22, 22, 23, 24, 26, 26, 27, 28, 29.
• 25
• Calculate for odd number of data with no
duplicates: 22, 23, 23, 24, 25, 26, 27, 27, 28.
• 25
• Median changes when data at center repeats.
7
Comparison of Measures of
Central Tendency
Mode
Most frequently occurring value
Nominal, Ordinal, and
(sometimes)
Interval/Ratio-Level
Data
Median
Ordinal-Level Data and
Interval/Ratio-Level
Exact center (when odd N) of
rank-ordered data or average of data (particularly when
two middle values (when even N) skewed)
Mean
Interval/Ratio-Level
Data
Arithmetic average
(Sum of Xs/N)
8
Comparison of Measures of Central
Tendency in Normal Distribution
• Mean, median and
mode are the same
• Shape is symmetric
9
Comparison of Measures of
Central Tendency in Bimodal
Distribution
• Mean & median are
the same
• Two modes different
from mean and
median
10
Comparison of Measures of
Central Tendency in Negatively
Skewed Distributions
• Mean, median & mode
are different
• Mode > Median > Mean
Outliers pull
the mean away
From the median
11
Comparison of Measures of
Central Tendency in Positively
Skewed Distributions
• Mean, median & mode
are different
• Mean > Median > Mode
Outliers pull
the mean away
From the median
12
Comparison of Measures of
Central Tendency in Uniform
Distribution
• Mean, median &
mode are the same
point
13
Comparison of Measures of
Central Tendency in J-shape
Distribution
• Mode to extreme right
• Mean to the right of
median
14
Measures of Variability or
Scatter
• Reporting only an
average without an
accompanying measure
of variability may
misrepresent a set of
data.
• Two datasets can have
the same average but
very different variability.
15
Measures of Variability or
Scatter: Range
• The difference between the highest and
lowest score
• Easy to calculate
• Highly unstable
• Calculate range for the data: 110, 120,
130, 140, 150, 160, 170, 180, 190
• 190 – 110 = 80
16
Measures of Variability or Scatter:
Semi Inter-quartile Range
• Half of the difference between the 25%
quartile and 75% quartile
• SQR = (Q3-Q1)/2
• More stable than range
17
Measures of Variability: Sample
Variance
• The sum of squared differences between
observations and their mean [ss = Σ (X - M)2 ]
divided by n -1.
• Sample variance : Standard deviation squared
• Formula for sample variance
ss
2
s 
n 1
18
Measures of Variability or
Scatter: Standard Deviation
• The squared root of the variance.
S
( X i  X ) 2
n 1
• Calculate standard deviation for the data:
110, 120, 130, 140, 150, 160, 170, 180,
190.
19
Calculating Standard
Deviation
• Sample Sum of Squares:
2
(

X
)
SS  X 2 
n
• Sample Variance
ss
s 
n 1
2
• Sample Standard Deviation
ss
s
n 1
SS is the key
to many
statistics
20
Calculating Standard Deviation
Data
X-M
(X - M)2
110
-40
1600
N-1
9
120
-30
900
130
-20
400
Sample
Variance
667
140
-10
100
150
0
0
Standard
Deviation
25.8
150
0
0
160
10
100
170
20
400
180
30
900
190
40
1600
Total
0
6000 (SS)
SS is the key
to many
statistics
21
Formula Variations
Calculating
formula
Sum of
squares
Variance
Standard
deviation
2
(

X
)
SS  X 2 
n
Defining
formula
2

(
X
i

X
)
s2 
n 1
ss
s 
n 1
( Xi  X) 2
s 
n 1
ss
s
n 1
( X i  X ) 2
S
n 1
2
2
22
Comparison of Measures of
Variability and Scatter
• In Normal Distribution
• Range ~ 6 standard deviation
• Standard Deviation partitions data in
Normal Distribution
23
Standardized Scores: Z Scores
• Mean & standard deviations are used to
compute standard scores
Z = (x-m) / s
• Calculate standard deviation for blood
pressure of 140 if the sample mean is
110 and the standard deviation is 10
• Z = 140 – 110 / 10 = 3
24
Value of Z Scores
• Allows comparison of observed
distribution to expected distribution
Histogram
Frequency
12
Expected
Observed
10
8
6
4
2
0
-1.71
-0.95
-0.18
0.59
1.36
More
Bin
25
Take Home Lesson
Measures of Central Tendency &
Variability Can Describe the
Distribution of Data
26