Download Descriptive Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
Advanced Higher Geography
Descriptive
Statistics
Descriptive statistics include:
Types of data
Measures of central tendency
Measures of dispersion
Types of data (1)
Nominal data: data that has names. eg:
rock types (sedimentary, igneous or
metamorphic).
Ordinal data: data that can be placed in
ascending or descending order. eg:
settlement type (city, town, village &
hamlet).
Types of data (2)
Interval data: data with no true zero.
Very uncommon so don’t worry about it.
Ratio data: most numerical data.
Central Tendency
When you calculate the central
tendency of a data set you calculate its
average.
The measurements used for calculating
central tendency include the mean, the
mode and the median.
The Mean
Calculating the mean is one of the
commonly used statistics in geography.
It is found by totalling the values for all
observations (∑x) and dividing by the
total number of observations (n).
The formula for finding
the mean is:
Mean = ∑x
n
The Median
The median is the middle value when all
of the data is placed in ascending /
descending order.
Where there are two middle values we
take the average of these.
The Mode
The mode is the number that occurs the
most often.
Sometimes there are two (or more)
modes. Where there are two modes the
data is said to be bi-modal.
5 mins
©Microsoft Word clipart
Find the mean, median and mode of the
following data.
The weekly pocket money for 9 first year pupils was
found to be:
3 – 12 – 4 – 6 – 1 – 4 – 2 – 5 – 8
Mean
5
Median
4
Mode
4
Groups of data
Sometimes the data we collect are in
group form.
Slope Angle (°)
Midpoint (x)
Frequency (f)
Midpoint x frequency (fx)
0-4
2
6
12
5-9
7
12
84
10-14
12
7
84
15-19
17
5
85
20-24
22
0
0
n = 30
∑(fx) = 265
Total
Finding the mean is slightly more difficult. We
use the midpoint of the group and multiply this
by the frequency.
Slope Angle (°)
Midpoint (x)
Frequency (f)
Midpoint x frequency (fx)
0-4
2
6
12
5-9
7
12
84
10-14
12
7
84
15-19
17
5
85
20-24
22
0
0
n = 30
∑(fx) = 265
Total
The mean is: ∑(fx)/n = 265 / 30 = 8.8
Which is in the 5 – 9 group
Slope Angle (°)
Midpoint (x)
Frequency (f)
Midpoint x frequency (fx)
0-4
2
6
12
5-9
7
12
84
10-14
12
7
84
15-19
17
5
85
20-24
22
0
0
n = 30
∑(fx) = 265
Total
We cannot find the mode for grouped data but
we can find the modal group. The modal group.
The modal group is the group that occurs most
frequently (ie: 5-9 group).
Your turn
Read page 25 – 29 of ‘Geographical
Measurements and Techniques:
Statistical Awareness, LT Scotland,
June 2000.
Answer questions 1 & 2 from Task 4 in
this book.
The Interquartile Range
The interquartile range consists of the
middle 50% of the values in a
distribution; 25% each side of the
median (middle value). This calculation is
useful because it shows how closely the
values are grouped around the median.
The benefits
It is easy to calculate
It is unaffected by extreme values
It is a useful way of comparing sets of
similar data.
Interquartile range
We know that the median divides the data into two
halves. We also know that for a set of n ordered
numbers the median is the (n + 1) ÷ 2 th value.
Similarly, the lower quartile divides the bottom half
of the data into two halves, and the upper quartile
also divides the upper half of the data into two
halves.
Lower quartile is the (n + 1) ÷ 4
Upper quartile is the 3 (n + 1) ÷ 4
Question
Box and whisker diagrams
A box and whisker plot is used to
display information about the range, the
median and the quartiles. It is usually
drawn alongside a number line, as shown:
Box and whisker
Drawbacks
It can be a laborious process to calculate the location
of the quartiles, especially when there is a large
number of data within the set.
It does not give any indication of how the entire data
set is distributed, just the limits of the middle 50%
of the data
Not all values are considered and hence a false
impression may be given of the data set being
analysed,
Standard Deviation
You could have 2 sets of data that
produce the same mean, but the data
may have a very different range of
values within them.
Standard Deviation is a tool that
produces a figure indicating the extent
to which the data is clustered around
the mean.
The Normal distribution curve
The normal curve assumes
Data in your sample follows the simple distribution
around the mean.
The standard deviation gives important information as
it indicates the shape of the normal curve.
If the SD is large then it suggests a wide spread of
data around the mean and a flatter, wider normal
distribution curve.
If the SD is small, it suggests a steep and narrow
normal distribution curve and a narrow spread around
the mean.
A smaller SD suggests a more reliable
mean. There is likely to be few extreme
values.
It is also useful for comparing two sets
of data that may have similar means but
quite different ranges of data within
each set.