Download Data: Levels of Measurement and Basic Statistics

Document related concepts
no text concepts found
Transcript
I. Introduction to Data and Statistics
A. Basic terms and concepts
Data set
- variable
- observation
- data value
Central
Gulf
States
age
> 65
< 19
TX
53
19
LA
34
MS
AL
$
Rent $
34
98
25
14
58
89
78
35
65
78
25
56
25
78
65
12
89
B. Primary and Secondary data
1. Primary data
- original data
- collected for a specific purpose
- sample design and procedures
- time and $
2. Secondary data
- archival data
- agency or organization
- organized in a set format
- time and $
- data quality an issue
- sample design
C. Individual and spatially aggregated data
State 1 State 2
Region
State 3 State 4
Region
State 1 State 2
State 3 State 4
D. Discreet and Continuous data
1. Discreet
2. Continuous
E. Qualitative and Quantitative data
1. Qualitative (categorical)
Ex: land cover, sex, political party,
race
2. Quantitative
Ex: population, precipitation, grades
II. Scales of Measurement
A. Nominal
B. Ordinal
C. Interval
D. Ratio
for comparison must use the same scale of
measurement
A. Nominal
- Mutually exclusive
- Exhaustive
Ex:
Name: George = 1, Wanda = 2, Bob = 3
Land Cover: Forested = 45, urban = 39, etc...
Climate regimes: polar = 1, temperate = 2,
tropical = 3
Sex: Male = 1, Female = 2
B. Ordinal
- ranked data
- arbitrary
- comparisons
- not a set interval between rankings
Ex:
Places rated (cities, beaches…)
Level of satisfaction (poor, ok, good)
C. Interval
- separated by absolute differences
- does not have an absolute zero
Ex:
- temperature
- elevation
D. Ratio
- separated by absolute differences
- absolute zero
Ex:
- precipitation
- tree growth
- income
III. Graphing procedures (univariate)
A. frequency histogram
B. cumulative histogram
A. frequency histogram
(+)
(frequency polygon)
Freq.
(#, %)
income,
grades
(-)
0
50
100
B. Cumulative frequency histogram
(cumulative frequency polygon)
(+)
Cumulative
Freq.
(#, %)
(-)
0
50
100
IV. Descriptive Statistics (univariate)
- summary of data characteristics
- inferential; extend sample to a larger population
A. Measures of Central Tendency
B. Measures of Dispersion
C. Measures of Shape
A. Measures of Central
Tendency
• attempt to define the most typical value of a
larger data set
1. Mode
2. Median
3. Mean (average)
Mode (nominal only)
• value that occurs most frequently
• only measure of central tendency appropriate
for nominal level data
• works better for grouped data, not raw values
• many data sets will not have two exact data sets
2. Median
• the middle value from a set of ranked
observations
• equal number of observations on either side
• appropriate when data is heavily skewed
• interval or ratio level data, not nominal
3. Mean (average), .xi / n
• most commonly used value of central tendency
• interval or ratio level data
• sensitive to outliers
• most easily understood
• assumptions:
• unimodal
• symmetric distribution
mode
mean
median
Normal distribution
0
(-)
50
100
(+)
mode
median
mean
0
(-)
50
100
(+)
B. Measures of Dispersion
• provide information about distribution of data
1. Range
2. Standard deviation
3. Coefficient of variation
1. Range
difference between largest and smallest value
• simplest measure of dispersion
• easy to calculate
• can be misleading
• ignores all other values
• does not take into account clustering of
data
2. Standard deviation
• the average deviation of each value from the mean
• based on the mean
• better indicator of the dispersion of the entire
sample (in comparison to the range)
• scale dependent value
3. Coefficient of variation
• standard deviation / mean
• allows you to compare dispersion independent
of scale
• should be used to make comparisons where
there are differences in mean
Range: 85 - 15 = 70
Std. dev. ~ .xi - X
C.V. = Std. dev. / mean
X = 50
0
(-)
15
50
85
100
(+)
C.V. = Std. dev. / mean
C. Measures of Shape
1. Skewness
2. Kurtosis
Leptokurtic
Mesokurtic
Platykurtic
Symmetrical
(+) skew
(bell shaped)
(-) skew
Mean Center
I.D.
Xi
Yi
A
B
C
D
E
F
G
2.8
1.6
3.5
4.4
4.3
5.2
4.9
1.5
3.8
3.3
2.0
1.1
2.4
3.5
4
B (1.6, 3.8)
C (3.5, 3.3)
3
G (4.9, 3.5)
F (5.2, 2.4)
2
D (4.4, 2.0)
1
0
A (2.8, 1.5)
1
2
E (4.3, 1.1)
3
4
5
6
4
3
B (1.6, 3.8)
C (3.5, 3.3)
Mean Center (3.81, 2.51)
G (4.9, 3.5)
F (5.2, 2.4)
2
D (4.4, 2.0)
1
0
A (2.8, 1.5)
1
2
E (4.3, 1.1)
3
4
5
6
Weighted Mean Center
I.D.
Xi
Yi
f (w)
A
B
C
D
E
F
G
2.8
1.6
3.5
4.4
4.3
5.2
4.9
1.5
3.8
3.3
2.0
1.1
2.4
3.5
5
20
8
4
6
5
3
4
B (20)
G (3)
C (8)
3
F (5)
2
D (4)
1
0
E (6)
A (5)
1
2
3
4
5
6
I.D.
Xi
Yi
f (w)
w Xi
wYi
A
B
C
D
E
F
G
2.8
1.6
3.5
4.4
4.3
5.2
4.9
1.5
3.8
3.3
2.0
1.1
2.4
3.5
5
20
8
4
6
5
3
14
32
28
17.6
25.8
26
14.7
7.5
76
26.4
8.0
6.6
12
10.5
4
B (20)
G (3)
C (8)
3
2
F (5)
Weighted Mean
Center (3.10, 2.88)
D (4)
1
0
E (6)
A (5)
1
2
3
4
5
6
Correlation
- Bivariate relationship
Scattergrams
1. Direction
negative or positive
2. Strength of relationship
perfect, strong, weak, no
Positive (direct) correlation
(+)
(-)
(+)
Negative (inverse) correlation
(+)
(-)
(+)
Perfect correlation
(+)
(-)
(+)
Strong correlation
(+)
(-)
(+)
Weak correlation
(+)
(-)
(+)
No correlation ??
(+)
(-)
(+)
Controlled Correlation
(+)
(-)
(+)
Controlled correlation (clumping)
(+)
(-)
(+)
(+)
(-)
(+)
Threshold
(+)
(-)
(+)
Curvilinear
(+)
(-)
(+)
Related documents