Download GEOGRAPHICAL STATISTICS GE 2110

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
GEOGRAPHICAL STATISTICS
GE 2110
ZAKARIA A. KHAMIS
Descriptive statistics
2
 Statistics are interesting….”only when they are set in
wider context that they begin to come to life”
 Five Rules for using statistics by Danny Dorling
1.
2.
3.
4.
5.
Often there is little point in using statistics
If you do use statistics, make sure they can be
understood
Do not overuse statistics in your work
If you find a complex statistics useful then explain it
clearly
Recognize and harness the power of statistics
Zakaria Khamis
5/24/2017
Measures of Central Tendency
3
 In most cases, it is helpful to describe data by a
single number that is most representative of the
entire collection of data
 The single numbers which tend to appearing in the
middle of the data distribution  MCT
 They act as the fulcrum (center of gravity) at which
the data balance
Zakaria Khamis
5/24/2017
Means
4
 Means are of many types, the very commonly used is
Arithmetic mean; however, there are Geometric and
Harmonic among others
Arithmetic Mean
 Simply is the average the observations (data)
 Arithmetic Mean is in most cases referred to mean
and is denoted by
Zakaria Khamis
x
5/24/2017
Means
5
 The mean, or average, of n numbers is the sum of the
numbers divided by n
n
 Mathematically,
x
x
i 1
i
n
 Where xi denotes the value of observation i, and n
denotes number of observations
 Mean value is influenced by extreme measurements
Zakaria Khamis
5/24/2017
Means
6
Zakaria Khamis
5/24/2017
Means
7
Geometric Mean
 The geometric mean only applies to positive
numbers.
 It is also often used for a set of numbers whose
values are meant to be multiplied together or are
exponential in nature, such as data on the growth of
the human population or interest rates of a financial
investment
Zakaria Khamis
5/24/2017
Means
8
 The Geometric mean of n numbers is the nth root of
the product of the numbers
 Mathematically,
GM  n
n
x
i
i 1
 Where xi denotes the value of observation i, and n
denotes number of observations
 This is rarely used in statistical analysis
Zakaria Khamis
5/24/2017
Means
9
Harmonic Mean
 This is most commonly used when the average rate is
what of interest  E.g. the average speed of a car;
the average rate of population increase
 The Harmonic mean of n numbers is given by
HM 
Zakaria Khamis
n
n
1

i 1 xi
5/24/2017
Mode and Median
10
 Median is defined as the observation that splits the
ranked list of observations (arranged in ascending or
descending) in half
 When the number of observation is odd, median is
simply equal to the middle value on a ranked list of
observations
 When the number of observation is even, median is the
average of the two values in the middle of ranked list
Zakaria Khamis
5/24/2017
Mode and Median
11
 Mode refers to the most frequently occurring value
 If two numbers tie for most frequent occurrence, the
collection has two modes and is called bimodal.
 Which of the three measures of central tendency is the
most representative?
 The answer is that it depends on the distribution of the
data and the way in which you plan to use the data
Zakaria Khamis
5/24/2017
Measures of Central Tendency
12
Zakaria Khamis
5/24/2017
Measures of Central Tendency
13
Class examples:
12, 33, 11, 45, 45, 34, 20, 67, 87, 19, 12, 12
 Mean =
 Mode =
 Median =
Zakaria Khamis
5/24/2017
Measures of Dispersion/Variability
14
 The phenomena and aspects of the world we lives is
changing spatially (within location) and temperarily
(time to time)
 For examples. The changes in human population,
the changes in standard living, and changes in literacy
rate and the changes in price
 variability attract the experts to make detailed
studies about them and then correlate these changes
with the human life.
Zakaria Khamis
5/24/2017
Measures of Dispersion/Variability
15
 In statistics, the MCT measures the center of the data
while the dispersion measures how the observation
spread away from the center
 If the observation are close to the center ( arithmetic
mean or median)  dispersion, scatter or variation
is small
 If the observations are spread away from the center
 dispersion is large.
Zakaria Khamis
5/24/2017
Measures of Dispersion/Variability
16
 Suppose we have three groups of students who have
obtained the following marks in a test
Group A: 46, 48, 50, 52, 54
Group B: 30, 40, 50, 60, 70
Group C: 40, 50, 60, 70, 80
Zakaria Khamis
Mean =50
Mean =50
Mean =60
5/24/2017
Measures of Dispersion/Variability
17
 The idea of dispersion is important in the study of
wages of workers, prices of commodities, standard of
living of different people, distribution of wealth,
distribution of land among farmers and various
other fields of life.
 It will help in identifies those variation and solve
any problem which might happen.
Zakaria Khamis
5/24/2017
Dispersion Range
18
 Is the difference between the highest and the
lowest value in a series of data
Range  xmax  xmin
Zakaria Khamis
5/24/2017
Variance and Standard Deviation
19
 The variance represents the average squared
deviation of an observation from the mean
n
s2 
2
(
x

x
)
 i
i 1
n
 The standard deviation refers to the square root of
variance
n
s
Zakaria Khamis
2
(
x

x
)
 i
i 1
n
5/24/2017
Variance and Standard Deviation
20
 The standard deviation of a set is a measure of
how much a typical number in the set differs from
the mean. The greater the standard deviation, the
more the numbers in the set vary from the mean
 Imagine a researcher examine the monthly salary of
Zanzibar secondary school teachers. He took 10
samples out of secondary school teachers .
 44, 50, 38, 96, 42, 47, 40, 39, 46, 50’ 0000
Zakaria Khamis
5/24/2017
Variance and Standard Deviation
21
 He calculated the mean = 49.2
 This information telling us that all secondary school
teachers receive 49.2 per months.
 However there might be variation because we have
different categories of teacher in Zanzibar: diploma,
bachelor degree, Master degree , private and public
owned.
Zakaria Khamis
5/24/2017
Variance and Standard Deviation
22
 Standard deviation = 17
Mean +/- standard deviation
49.2 - 17 = 32.2
49.2 + 17 = 66.2
 This mean that, most of the secondary school
teachers receive between 32.20 and 66.20tsh/=
Zakaria Khamis
5/24/2017
Quartiles
23
 While standard deviation (SD) is the measure of
dispersion that is associated with the mean;
Quartiles measure dispersion associated with the
median
 Consider an ordered set of numbers whose median is
m. The lower quartile is the median of the numbers
that occur before m. The upper quartile is the
median of the numbers that occur after m.
Zakaria Khamis
5/24/2017
Quartiles
24
Zakaria Khamis
5/24/2017
Inter-Quartile Range
25
 In some statistical analysis we may need to find the
difference which exists between the Quartiles  the
inter-quartile is calculated
 Inter-quartile range is the difference between the
25th and 75th percentile
 When the data have been ranked from lowest to
highest, with n observations, the 25th percentile is
represented by observation
( n  1)
Zakaria Khamis
4
5/24/2017
Inter-Quartile Range
26
 The 75th percentile is represented by observation
3(n  1)
4
 This provides much more detail information about
the data, for it provides within data picture of the
variability by removing the outlying values
Zakaria Khamis
5/24/2017
Skewness and Kurtosis
27
 Skewness measures the degree of asymmetry
exhibited by the data
 The data can exhibits +ve skewness or –ve skewness
 If the mean of the data is greater than its median, the
data is positively skewed; and if the mean of the data
is less than its median, the data is negatively skewed
n
 Mathematically,
Zakaria Khamis
skewness 
 (x  x)
i 1
3
i
ns 3
5/24/2017
Skewness and Kurtosis
28
 Kurtosis measure the peaking of the data relative to
the normal distribution
 Data with high degree of peakeness is said to be
leptokurtic and have the kaurtosis value more than 3
 Flat data has the kurtosis value of less than 3, and it
is called platykurtic
 Mathematically,
Zakaria Khamis
n
kurtosis 
4
(
x

x
)
 i
i 1
ns 4
5/24/2017
Skewness and Kurtosis
29
Zakaria Khamis
5/24/2017