Download GCSE Geographical skills

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
GCSE Geographical skills
3.4.1 Cartographic skills
Cartographic skills relating to a variety of maps at different scales.
Atlas maps:
• use and understand coordinates – latitude and longitude
• recognise and describe distributions and patterns of both human and physical features maps based on
global and other scales may be used and students may be asked to identify
• and describe significant features of the physical and human landscape on them, eg population
distribution, population movements, transport networks, settlement layout, relief and drainage
• analyse the inter-relationship between physical and human factors on maps and establish
• associations between observed patterns on thematic maps.
Ordnance Survey maps:
• use and interpret OS maps at a range of scales, including 1:50 000 and 1:25 000 and other maps
appropriate to the topic
• use and understand coordinates – four and six-figure grid references
• use and understand scale, distance and direction – measure straight and curved line distances using a
variety of scales
• use and understand gradient, contour and spot height
• numerical and statistical information
• use and interpret OS maps at a range of scales, including 1:50 000 and 1:25 000 and other maps
appropriate to the topic
• use and understand coordinates – four and six-figure grid references
• identify major relief features on maps and relate cross-sectional drawings to relief features
• draw inferences about the physical and human landscape by interpretation of map evidence, including
patterns of relief, drainage, settlement, communication and land-use
• interpret cross sections and transects of physical and human landscapes
• describe the physical features as they are shown on large scale maps of two of the following
landscapes – coastlines, fluvial and glacial landscapes
• infer human activity from map evidence, including tourism.
Maps in association with photographs:
• be able to compare maps
• sketch maps: draw, label, understand and interpret
• photographs: use and interpret ground, aerial and satellite photographs
• describe human and physical landscapes (landforms, natural vegetation, land-use and settlement)and
geographical phenomena from photographs
• draw sketches from photographs
• label and annotate diagrams, maps, graphs, sketches and photographs.
3.4.2 Graphical skills
Graphical skills to:
• select and construct appropriate graphs and charts to present data, using appropriate scales – line
charts, bar charts, pie charts, pictograms, histograms with equal class intervals, divided bar,
scattergraphs, and population pyramids
• suggest an appropriate form of graphical representation for the data provided
• complete a variety of graphs and maps – choropleth, isoline, dot maps, desire lines, proportional
symbols and flow lines
• use and understand gradient, contour and value on isoline maps
• plot information on graphs when axes and scales are provided
• interpret and extract information from different types of maps, graphs and charts, including
population pyramids, choropleth maps, flow-line maps, dispersion graphs.
3.4.3 Numerical skills
Numerical skills to:
• demonstrate an understanding of number, area and scales, and the quantitative relationships
between units
• design fieldwork data collection sheets and collect data with an understanding of accuracy, sample size
and procedures, control groups and reliability
• understand and correctly use proportion and ratio, magnitude and frequency
• draw informed conclusions from numerical data.
3.4.4 Statistical skills
Statistical skills to:
• use appropriate measures of central tendency, spread and cumulative frequency (median, mean,
range, quartiles and inter-quartile range, mode and modal class)
• calculate percentage increase or decrease and understand the use of percentiles
• describe relationships in bivariate data: sketch trend lines through scatter plots, draw estimated lines of
best fit, make predictions, interpolate and extrapolate trends
• be able to identify weaknesses in selective statistical presentation of data.
3.4.5 Use of qualitative and quantitative data
Use of qualitative and quantitative data from both primary and secondary sources to obtain, illustrate,
communicate, interpret, analyse and evaluate geographical information.
Examples of types of data:
• maps
• fieldwork data
• geo-spatial data presented in a geographical information system (GIS) framework
• satellite imagery
• written and digital sources
• visual and graphical sources
• numerical and statistical information.
3.4.6 Formulate enquiry and argument
Students should demonstrate the ability to:
• identify questions and sequences of enquiry
• write descriptively, analytically and critically
• communicate their ideas effectively
• develop an extended written argument
• draw well-evidenced and informed conclusions about geographical questions and issues.
Comparing development using single and composite measures
•
Know the different ways of ranking the level of development countries
•
Understand how to the Human Development Index is calculated and correlates with other
measures
•
Be able to produce and interpret a scatter graph showing the correlation between different development
measures
development
1. The table below shows some of ways of measuring the level of development for five countries. After familiarising
yourself with these development measures, your first task is to rank them (the first two HDI rankings have already
been completed for you).
2. Briefly describe how the rankings vary for the countries shown.
Country
UK
China
Data
Rank
Data
Human Development
Index (A composite
measure that reflects a
country’s economic and
social development.)
0.892
2
Gross National Product
per capita (A country’s
earnings per person in US
dollars. Data are adjusted
to reflect living costs.)
Global Corruption Index
(A widely used measure
that gives high scores to
countries believed to be
free of political corruption.)
Rank
Australia
Saudi Arabia
Data
Rank
Data
0.719
0.933
1
0.836
40,000
13,000
46,000
52,000
78
36
80
49
Rank
3. Study the diagram blow which shows how the Human Development Index is calculated. Based on this information
and your own knowledge, can you suggest reasons for the HDI scores shown in the table above?
Life expectancy
The numbers of years
that a person can be
expected to live from
birth. Women
generally live longest
in any society
Income
The HDI uses a
measure of wealth
derived from the Gross
National Income of a
country (an alternative
measure to GDP)
HDI
Education
The HDI uses an
education index based
on the average
number of years of
schooling for people in
a country
The Human
Development
Index (HDI) is a
composite
measure of
development.
Three ‘ingredients’
are processed to
produce a number
between 0 and 1.
In 2014, Norway
was ranked in first
place (0.944).
4. You next task is to complete the scatter graph below using the data provided in the table. Label each country
(Indonesia has already been plotted and labelled for you). Do not draw a line.
How life expectancy varies in relation to GDP per capita (2014)
90
Singapore
80
Life expectancy
70
60
50
Sierra Leone
40
30
20
10
0
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
GDP per capita ($US)
Country
GDP per capita
(US dollars)
Life expectancy
(years)
Country
GDP per capita
(US dollars)
Life expectancy
(years)
Singapore
83,000
83
Mexico
18,000
76
Kuwait
71,000
78
Brazil
16,000
74
Norway
67,000
82
China
13,000
75
USA
55,000
79
Indonesia
10,500
69
Ireland
49,000
81
Nigeria
6,000
54
Canada
44,000
82
Pakistan
5,000
65
France
40,000
82
Kenya
3,000
61
Spain
34,000
82
Sierra Leone
1,500
46
5. In pairs, discuss what the completed scatter graph shows you about the nature of the correlation between GDP per
capita and life expectancy.
6. As part of the dicussion, try to answer the following question: why might it be inappropriate to draw a straight bestfit line on your completed scatter graph?
7. Working on your own, now write a brief paragraph describing the relationship shown by the scatter graph.
8. Finally, discuss the following questions in small groups.
i.
How sure can we be that the data we have used are reliable?
ii.
Why might it be difficult or impossible to collect development data for some countries or regions?
Using proportional flow-line maps to visualise trade patterns and flows
•
Describe the main characteristics of the pattern of international trade for a country or region
•
Understand proportional flow-lines and be able to draw them
•
Be able to calculate a country’s net balance of trade using flow-line maps
The table below shows some of the world’s highest value trade flows in natural resources.
1. Your first task is to describe the characteristics of these flows. Remember that this involves more than just
producing a list of numbers. You could calculate the range of values as part of your answer, for instance.
2. In addition to trade in natural resources, what other types of trade are there?
The four largest bilateral trade flows for agricultural produce (2012)
Type of trade
Exporting country or
region
Importing country or
region
Value (US$ billion)
Soybeans
United States
China
11.6
Soybeans
Brazil
China
7.9
Soybeans
Brazil
European Union
5.9
Wine and beer
European Union
United States
5.0
The four largest bilateral trade flows for metals and ores (2012)
Type of trade
Exporting country or
region
Importing country or
region
Value (US$ billion)
Iron ore
Australia
China
30.5
Copper
Chile
China
14.9
Gold
Switzerland
India
14.6
Iron ore
Brazil
China
13.6
3. Next, draw proportional flow-lines on the world map (below) to show the trade pattern for China. An example has
already been drawn for you showing trade in copper between Chile and China. The width of the arrow is directly
proportional to the value of the flow (measured in US$). Use an atlas to help you locate the countries shown in the
table and label them.
4. Calculate the total value of Chinese imports shown on your completed flow-line map.
China
Chile
Trade flow-lines
(width shows value
5. The graph below shows the pattern of trade imports and exports in manufactured goods for China in 2013. A
different map projection has been used. Can you identify the location of the world’s major continents? What are the
advantages of using this map projection?
6. Using the data shown, calaulate the net balance of trade (the difference between exports and imports) for:
(i) trade between China and the USA in 2013
(ii) trade between China and Japan in 2013.
7. What does this tell you about China’s involvement in world trade in 2013?
8. Try to suggest answers for the following questions.
(i) Why is a manufacturing nation like China also importing manufactured goods from other places?
(ii) What does china’s need to import some manufactured goods tell you about its level of development?
Using climate data to construct an argument
Suggest how the data shown in can be used to:
1. support the view that the UK’s climate is changing
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
2. reject the view that the UK’s climate is changing
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
Describing and analysing a complicated data set
Describe the variations shown in the table.
•
•
•
•
For each column, say if it varies a lot or a little.
Identify the maximum and minimum value in each case (you might even want to subtract the two to find the
range of data is).
If the maximum value is unusually high, point this out (see Sudan).
Finally, look for patterns horizontally as well as vertically. Is there a country which is highest- or lowestscoring in most categories?
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………
Ordnance Survey map skills
Ordnance Survey maps:
• use and interpret OS maps at a range of scales, including 1:50 000 and 1:25 000 and other maps
appropriate to the topic
• use and understand coordinates – four and six-figure grid references
• use and understand scale, distance and direction – measure straight and curved line distances using a
variety of scales
• use and understand gradient, contour and spot height
• numerical and statistical information
• identify basic landscape features and describe their characteristics from map evidence
• identify major relief features on maps and relate cross-sectional drawings to relief features
• draw inferences about the physical and human landscape by interpretation of map evidence, including
patterns of relief, drainage, settlement, communication and land-use
• interpret cross sections and transects of physical and human landscapes
• describe the physical features as they are shown on large scale maps of two of the following
landscapes – coastlines, fluvial and glacial landscapes
• infer human activity from map evidence, including tourism.
Testing relationships between two sets of data
Spearman’s Rank Correlation
1.
Complete the following table as shown -
Note – where two numbers or more
tie, rank them equally (e.g. 2=) but
remember to skip a number when
dealing with the next rank down
Rank Mean household
income from ‘1’
highest to ‘20’ lowest
Country
1. Beckton
2. Boleyn
3. Canning Town North
4. Canning Town South
5. Custom House
6. East Ham Central
7. East Ham North
8. East Ham South
9. Forest Gate North
10. Forest Gate South
11. Green Street East
12. Green Street West
13. Little Ilford
14. Manor Park
15. Plaistow North
16. Plaistow South
17. Royal Docks
18. Stratford and New Town
19. Wall End
20. West Ham
Median household
income (£),
2012-13
34 100
31 630
28 910
32 870
31 840
32 380
32 340
31 430
34 270
34 550
31 570
31 080
29 730
31 580
30 740
31 750
38 580
35 840
32 330
32 310
Rank order of
Median
household
income
Rank Level 4
qualifications from ‘1’
highest to ‘20’ lowest
% Level 4
qualifications
or above
(2011)
27.9
28.8
23.9
33.3
24.2
32.6
31.1
24.1
33.0
32.7
30.0
29.8
24.6
30.2
26.8
26.2
42.7
40.6
29.6
29.8
C Subtract the first
rank figure away
from the second to
find the difference ‘d’
Rank order of
Level 4
qualifications
or above
D Square the
difference in the
previous column
(to get rid of +/-)
Difference in ranks
(d)
TOTAL (or ∑) D2
d2
E Add the
2
d values
together
and write
the total
here
2.
Use the following formula to work out the Spearman’s Rank Index (r) which will tell you what relationship exists between GNP and
HDI.
Formula for working out Spearman Rank –
‘r’ is the value
you are trying
to find out
r = 1 – (6 x ∑d2)
-----------(n3 – n)
‘∑’ is the sum total of all the d2 from
the table on the previous page
‘n’ is the number of countries you
are dealing with
3.
What does your value tell you about the relationship between GNI and HDI? Is this a strong relationship?
4.
In a sequence of full sentences, write up what you have found out about the relationship – or correlation – between GNI per capita
and HDI as follows:
a)
b)
c)
d)
e)
what were you trying to find out?
what data did you use?
what did you do to test the relationship between GNI per capita and HDI?
How does Spearman’s Rank test a relationship between data differently from a scatter graph? Why?
What is the difference between the value that Spearman’s Rank gives you for
♦ a positive,
♦ a negative, and
♦ no (or random) relationships between data?
Draw sketches of scatter graphs to show what each of these looks like.
f) Describe what you had to do and the stages you had to go through to carry out the Spearman Rank correlation.
g) In conclusion, how strong – according to your Spearman’s Rank calculation ‘r’ – is the statistical relationship between GNI per
capita and HDI? Suggest reasons for this.
Statistical Skills (AS and A2)
David Redfern, former AQA Chief Examiner – reproduced with kind permission
•
measures of central tendency – mean, mode, median
•
measures of dispersion – interquartile range and standard deviation
•
Spearman’s rank correlation test
•
application of significance level in inferential statistical results.
•
Higher level tests – Gini coefficient, chi-squared, Mann Whitney U Test.
MEASURING CENTRAL TENDENCY – MEAN, MODE AND MEDIAN
Measuring central tendency is a measure of the ‘middle’ value of the data set. There are 3
ways of measuring central tendency:
•
•
•
Mean
Mode
Median
These techniques are very useful to geographers, enabling us to summarise a data set by
giving the mid-value or most frequently occurring data. They can also be used as part of
more complex techniques such as inter-quartile range
Mean
Mean formula:
x̄ = ∑xError! Bookmark not defined.
n
The mean (sometimes called the average) is calculated by adding up all the values in a data
set and dividing the total sum by the number of values in the data set.
The mean is particularly useful if the data has a small range. However if the range is large
then the mean will be heavily influenced by the extreme values and could give a distorted
picture
Mode
This is the value that occurs most frequently in a set of data. You need to know all values
before calculating the mode. Mode is of no use if there are no repeating values. There may
be more than one mode – this is called ‘bi-modal’.
Mode is often useful when classifying data eg Power’s index of roundness of pebbles in a
river study. It is useful to see which classification occurs most frequently. This is called
‘modal class’.
1
Median
This is the middle value in a data set. The data needs to be rank-ordered before you can
calculate the median.
Median formula
If there are an odd number of values perform the following calculation to work out the
median value:
n+1
2
(n = number of values in the data set)
Therefore if you have 23 values in the data set the median will be the 12th value in the rank
order.
If the number of values is even, the median is the mean of the middle two values. So if there
are 24 values add the values for the 12th and 13th positions and divide by 2.
The median value often needs to be supported by other techniques such as inter-quartile
range (IQR). However, unlike the mean it not affected by extreme values
Worked example:
The aim of the study is to study coastal processes. As part of this aim, the student has
collected data on pebble sizes from two sites along the beach. 15 pebbles were measured,
at each site, along the a-axis. The results are recorded in the table below in rank-order:
Table 1. Pebble Size (a-axis measured in mm)
Rank
North
end
of
beach
South
end
of
beach
1
58
2
43
3
38
4
33
5
32
6
25
7
24
8
24
9
23
10
19
11
19
12
19
13
14
14
12
15
11
81
76
67
67
67
66
63
60
58
47
38
33
25
8
6
Central Tendency calculations:
Mean
Mode
Median
North End
26.3
19
24
South End
50.8
67
60
None of these measures gives an accurate picture of the distribution of data. On their own
they are of limited value. However, it can be seen from the above example that some
judgments can be made. In all 3 measures the central tendency for pebble size is larger at
the southern end of the beach. The average pebble size is much larger at the southern end.
2
The spread of data is quite large, particularly at the southern end and the extreme low
values of 8 and 6 are making the mean value lower than the other two values
To improve the usefulness of the above calculations, measures of the dispersion or
variability of the data should also be calculated
Measuring Dispersion – Range and inter-quartile range
These techniques are used to measure the spread of data. Range and inter-quartile range
allow you to analyse your data in more depth, looking at how spread the data is around the
mean or median
Range
This is simply the difference between the highest value and the lowest value. It gives you a
basic idea of the spread of data but like the mean it is affected by extreme values. An
anomaly therefore can give a false picture
Referring back to Table 1 the range is worked out as follows:
Northern end of beach:
Highest Value = 58
Lowest Value = 11
Range: 58 – 11= 47
Southern end of beach
Highest Value = 81
Lowest Value = 6
Range: 81 - 6 = 75
Therefore we can see that the southern end has a much larger range. However this result is
affected by the anomalies of 8 and 6.
Inter-quartile Range
The inter quartile range is worked out by ranking the data (highest to lowest) and placing
the data into quarters or ‘Quartiles’. The top 25% of the data is placed in the Upper Quartile
(UQ) and the bottom 25% is placed in the Lower Quartile (LQ). The inter-quartile range or
IQR is the difference between 25% and 75% values.
1st
Quartile
2nd
Quartile
3rd
Quartile
4th
Quartile
The boundary between the 1st and 2nd Quartiles is called the Upper Quartile
This is the Inter-quartile Range
The boundary between the 3rd and 4th Quartiles is called the
Lower Quartile
3
The inter-quartile range is more useful, than the range in indicating the spread of data as it
takes away any extreme values (i.e those occurring in the UQ 1st Quartile and LQ 4th
Quartile) and considers the spread of the middle 50% of the data around the median or
middle value.
The IQR has a formula to work it out. Look at the worked example to see how to calculate
the IQR
Inter-quartile Range Formula
Upper Quartile (UQ) = n + 1 th position
4
Lower Quartile (LQ) = n + 1 x 3 th position
4
Inter-quartile Range (IQR) = UQ – LQ
Worked example of Inter-quartile Range
Refer back to Table 1.
The data is already in rank order, ranked from highest to lowest.
•
Next find the Upper Quartile (UQ) by using the formula. In this case the number in
the data set is 15 so n = 15. Therefore using the formula:
15 + 1
4
= 4th
So the Upper Quartile for the North End = 33 and for the South End it is 67
•
Now calculate the Lower Quartile (LQ):
15 + 1 x 3
4
= 12th
The LQ for the North End = 19 and for the South End it is 33
•
You are now able to determine the Inter-quartile Range using the formula:
UQ – LQ
The IQR for the North End is 33 – 19 = 14
The IQR for the South End is 67 – 33 = 34
This shows us that there is now only a small variation around the median values. There is
less variation at the northern end perhaps suggesting that there is less variation in pebble
size whereas the pebbles are less well sorted at the southern end. However this would need
further investigation. In terms of a comparison between the two sites it would suggest that
that the pebbles are smaller and more uniform in size, at the northern end
4
Measuring Dispersion – Standard Deviation
Standard deviation is a measure of the degree of dispersion. The Inter-quartile range will tell
you how clustered the data is around the median value and standard deviation is another
method of examining the spread of data but this time around the mean.
Formula for Standard Deviation
Where:
σ = Standard Deviation
∑ = Sum of
x̄ = Mean
n = Number in the sample
Two sets of data could have the same mean but have a very different spread of data.
Standard Deviation will tell you the extent of this - in other words how reliable the mean is.
A low standard deviation indicates that the data points tend to be very close to the mean,
whereas high standard deviation indicates that the data is spread out over a large range of
values and the mean is less reliable as there is obviously a lot of variation in the sample.
Standard deviation is very useful when used to compare two data sets. In other words you
can use standard deviation when you want to compare the dispersion of two or more sets of
data
The standard deviation links the data set to normal distribution. In a normal distribution:
• 68% of the values lie within ±1 standard deviation of the mean
• 95% of the values lie within ±2 standard deviations of the mean
• 99% of the values lie within ±3 standard deviations of the mean
Refer back to our data in Table 1. In the example below the calculation for standard
deviation has been calculated for the northern end of the beach.
5
Table 2: Standard Deviation calculation for the northern end of beach
A
Pebble size
(mm)
58
43
38
33
32
25
24
24
23
19
19
19
14
12
11
∑x = 394
B
x - x̄
31.7
16.7
11.7
6.7
5.7
-1.3
-2.3
-2.3
-3.3
-7.3
-7.3
-7.3
-12.3
-14.3
-15.3
x̄ = 26.3
C
(x - x̄ )2
1004.89
278.89
136.89
44.89
32.49
1.69
5.29
5.29
10.89
53.29
53.29
53.29
151.29
204.49
234.09
∑(x-x̄
)2 =2270.95
∑(x-x̄ )2 = 2270.95
n
15
= 12.30
Standard Deviation for the northern end = 12.30
How do you use the standard deviation?
In the example above the mean is 26.3. This means that in a normal distribution graph 68%
of the data should lie between 14.00mm and 38.6mm. These values are calculated by
subtracting and adding 12.3 from / to 26.3.
Remember the lower the standard deviation score, the more clustered the results are
around the mean and the more reliable the mean is. For this figure to be of any use, we now
need to compare it to the standard deviation for the southern end of the beach.
Repeat the above exercise on the results from the southern end.
Standard Deviation for the southern end = 22.88
This means that 68% of the data should lie between 27.9mm and 73.7mm.
Standard deviation suggests that there is more clustering around the mean at the northern
end of the beach as the figure is smaller. The mean is therefore more reliable at the
northern end. This is also supported by the other measures of central tendency which have
indicated that there is less dispersion at the northern end of the beach.
6
Testing Correlation – Using Spearman’s Rank Correlation Test
Spearman’s rank is another way of testing a relationship. For example if you draw a
scattergraph you can see by eye if there is a relationship, but you will probably not be able
to clearly assess the strength of the relationship as many points may be some distance from
the line of best-fit.
Spearman’s rank correlation is used to test the strength of the relationship between two
sets of data, providing you with a numerical value. This is an example of objective data.
Once you have this figure you can then test its significance – this means the likelihood of
your results occurring by chance.
When can you use Spearman’s Rank?
The test can be used with any set of raw data or percentages but it is only suitable if all the
following criteria apply:
• Two data sets which you believe may or may not be related e.g. Hydraulic Radius and
Velocity
• At least 10 pairs of data should be used
• No more than 30 pairs (as this makes the exercise unwieldy)
A worked example is shown below. Once you have completed the table and have your
answer at the end of the calculation you should have a figure between -1 and +1. This
indicates the strength and type of relationship:
• The closer it is to +1 indicates a positive relationship (i.e as one set of data increases
so does the other).
• The closer it is to -1 indicates a negative relationship (i.e as one set of data increases
the other decreases)
• If the result is close to 0 it means there is no relationship and you would accept the
null hypothesis.
Candidates will not be expected to learn the formula for Spearman’s Rank or any other
statistical test.
Strengths:
• It gives objective data
• It enables you to demonstrate a clear relationship between two sets of data
• You can state whether the relationship is significant or if your results were just a
fluke
• It is less sensitive to anomalies in data as each piece of data is ranked – large
differences could only be one rank different
Weaknesses:
• It is does not tell you whether there is a causal link (i.e that one change leads to a
change in another) – just that a relationship exists
• Too many ‘tied ranks’ can affect the validity of the test
• It could be subject to human error eg inaccurate calculations
7
Case Study: Using Spearman’s Rank Correlation Test
Investigating plant succession on a shingle ridge
A student has collected data on the changing abiotic factors across a shingle ridge. They
want to complete a statistical test to help prove their hypothesis:
‘Plant height increases as soil depth increases’.
As there are 2 variables (plant height and soil depth) and they have 12 pairs of data they
have decided to use Spearman’s rank correlation test.
Spearman’s Rank should always begin with the assumption that there is no relationship –
this is called the NULL HYPOTHESIS. Always begin your test by writing out the null
hypothesis as well as your chosen hypothesis.
Null hypothesis = There is no relationship between plant height and soil depth
Table 3 – raw data
Site No.
Soil Depth (cm)
Plant Height
(cm)
1
2
3
4
5
6
7
8
9
10
11
12
0.0
3.2
3.6
1.9
10.1
15.2
20.2
23.8
32.0
32.0
34.1
37.4
4.0
1.5
6.0
11.5
22.0
65.0
92.0
103.0
129.0
187.4
156.6
189.3
Table 4: Worked example of Spearman’s rank
Site
number
Soil depth
(A)
1
2
3
4
5
6
7
8
9
10
11
12
0.0
3.2
3.6
1.9
10.1
15.2
20.2
23.8
32.0
32.0
34.1
37.4
Plant height Rank A
(B)
4.0
1.5
6.0
11.5
22.0
65.0
92.0
103.0
129.0
187.4
156.6
189.3
12
10
9
11
8
7
6
5
3.5
3.5
2
1
Rank B
11
12
10
9
8
7
6
5
4
2
3
1
Rank
Difference
Difference
Squared
1
-2
1
2
0
0
0
0
-0.5
1.5
-1
0
2
ΣD =
1
4
1
4
0
0
0
0
0.25
2.25
1
0
13.5
6ΣD2 =
81
You are now able to use the Spearman’s rank correlation test formula:
8
Rs =
1 - 6 x ∑d2
n3 – n
The final calculation is: Rs = 1 – 0.047 Therefore Rs = 0.953
So what does your result of 0.953 mean?
Place your result on a line like the one below:
-1
0
Perfect
NEGATIVE
relationship
NO
relationship
+1
Perfect
POSITIVE
relationship
You now need to do another test to check whether your result could have occurred by
chance – this means ‘how significant is your result’. This means you have to compare your
result with a table of critical values (Table 5). First look at the number of pairs of data you
have – in this case there is 12. You then need to decide which significance level you are
going to use. For geographical purposes, you would usually use the 0.05 significance level.
This means that there is a 5 in 100 chance of the results occurring by chance. Or to put it
another way if other researchers completed the same experiment, 95 out of 100 would get
the same result – therefore there is a relationship.
You then need to see whether your Rs result is above the critical value for the number of
pairs you have. If your R s value is below the critical value you must accept the null
hypothesis – i.e you cannot be sure that your relationship is significant.
Table 5: Critical Values for Rs
n
10
12
14
16
18
20
22
24
26
28
30
0.05 (95%)
significance
level
+/- 0.564
0.506
0.456
0.425
0.399
0.377
0.359
0.343
0.329
0.317
0.306
0.01 (99%)
Significance
level
+/- 0.746
0.712
0.645
0.601
0.564
0.534
0.508
0.485
0.465
0.448
0.432
We can see that in our example the
Rs value of 0.951 is well above the
0.05 significance level of +0.506. In
this case it is also well above the
0.1 significance level of +0.712.
This means that there is a very low
(1 in 100) chance of the results
occurring by chance and we can
reject the null hypothesis
The Rs value was 0.951 and is above the critical value of 0.712 at the 0.01 significance level.
I can therefore reject my null hypothesis and accept that there is a strong relationship
between plant height and soil depth on the shingle ridge and it is highly significant.
9
The Gini coefficient
The most widely used summary measure of inequality in the distribution of household
income is the Gini coefficient. The lower its value, the more equally household income is
distributed. For example, the bottom 5% of households might only have a 1% share of total
household income. The bottom 10% of households might have a 3% share; the bottom 20%
might have an 8% share, and so on. The Gini coefficient is a measure of the overall extent to
which these groupings of households, from the bottom of the income distribution upwards,
receive less than an equal share of income.
How is it calculated?
The idea described above is expressed more formally by the Lorenz curve of the household
income distribution, from which the Gini coefficient can be calculated. Based on a ranking of
households in order of ascending income, the Lorenz curve is a plot of the cumulative share
of household income against the cumulative share of households.
The curve will lie somewhere between two extremes:
•
•
complete equality, where income is shared equally among all households, results in a
Lorenz curve represented by a straight line.
complete inequality, where only one household has all the income and the rest have
none, is represented by a Lorenz curve which comprises the horizontal axis and the
left-hand vertical axis.
The Gini coefficient is the area between the Lorenz curve of the income distribution and the
diagonal line of complete equality, expressed as a proportion of the triangular area between
the curves of complete equality and inequality.
Complete equality would result in a Gini coefficient of zero, and complete inequality, a
Gini coefficient of 100.
A Lorenz curve illustrates the degree of unevenness in a geographical distribution. It is
drawn on graph paper and makes use of cumulative percentage data. The vertical axis
carries the cumulative data, and points are plotted in the order of the largest, which is then
added to the second largest, then to the third largest, and so on. The horizontal axis simply
records the cumulative process. The plots are then connected by a line. If another line is
drawn onto the graph to represent an even distribution, then the degree of unevenness can
be seen. The greater the deviation the plotted line has from the line of even distribution,
the greater the degree of unevenness. A highly concave Lorenz curve represents a high level
of unevenness, and therefore high level of concentration.
10
Assume a country has 10 regions A to J:
Region
% of
population
in region
A
10.5
B
1.6
C
12.2
D
1.8
E
35.3
F
6.7
G
2.1
H
25.3
I
3.5
J
1
Draw the Lorenz curve.
How is the Gini coefficient used?
A global interpretation of the Gini coefficient is:
•
•
•
•
•
•
Low inequality: under 0.299 [e.g. Belarus and Hungary]
Relatively low inequality: 0.3 – 3.99 [e.g. China and Poland]
Relatively high inequality: 0.4 – 0.449 [e.g. Russia and Malaysia]
High inequality: 0.45 – 0.499 [e.g. Venezuela and Mexico]
Very high inequality: 0.5 – 0.599 [e.g. Nigeria and Kenya]
Extremely high inequality: over 6.0 [all of the countries in this group are in southern
Africa]
A city interpretation using the Gini coefficient has also been produced by the UN. In it
Johannesburg has one of the highest figures in the world. In 2010, the Gini coefficient for
the city was calculated at 0.62.
11
The Mann Whitney U-Test
The Mann Whitney U Test is a technique that tests to see if there is a difference between
the medians of two sets of data. It is non-parametric. This means it assumes that the data is
not normally distributed. However it does assume that there is similar dispersion of both
sets of data.
When can you use the Mann Whitney U Test?
The test can be used if you want to investigate the differences between two sets of similar
data. Some conditions apply:
• You can only compare two sets of data
• The data must be ordinal (it can be ranked in order - from lowest to highest)
• You need a minimum of 5 values in each data set
• It is not advisable to use more than 20 values in each data set as the exercise
becomes unwieldy
The Mann-Whitney U test starts with a null hypothesis:
There is no significant difference in the medians of the two sets of data
Once you have calculated the value of U. You then have to compare it to the critical values.
If the value of U is less than or equal to the critical value then you can reject the null
hypothesis and accept that there is a difference in the two sets of data.
Strengths:
• You can use two data sets that have different sizes eg one data set could have 10
values and the other only 8
• You can state whether the relationship is significant or if your results occurred by
chance
• You can see clearly whether there is a difference in the median of two sets of data
Weaknesses:
• It is a lengthy calculation and prone to human error
• It does not explain why the difference in the two data sets occurs
Worked example of Mann-Whitney U Test
In this example a student is investigating the economic deprivation across the city of
Plymouth. As part of his investigation he has used the National Statistics website
(www.neighbourhood.statistics.gov.uk) to obtain secondary data on indices of deprivation.
From his investigation he has formulated the hypothesis:
There is a greater income deprivation score for inner city areas than outer suburbs.
He has found out the income deprivation score for 8 super output areas (small census unit
areas created by the National Statistics Office) in each of his two study areas – one in the
inner city (St Peter and The Waterfront), and the other in the outer suburbs (Plymstock).
The table below shows his results. The income deprivation score measures the % of people
who are income-deprived. The higher the score, the more income-deprived the area is
12
Income Deprivation Score
Inner City (St Peter & The
Outer-Suburb (Plymstock)
Waterfront)
0.53
0.04
0.40
0.09
0.24
0.06
0.25
0.12
0.08
0.05
0.12
0.08
0.16
0.10
0.20
0.13
The student decides to conduct the Mann-Whitney U Test to see is there a difference
between the two areas.
First he sets out the null hypothesis: There is no difference in the income deprivation score
between the inner-city and the outer-suburb
A
Inner City scores (x)
B
Rank (r x )
0.53
0.40
0.24
0.25
0.08
0.12
0.16
0.20
16
15
13
14
4.5
8.5
11
12
Total = 94
C
Outer-Suburb
Scores (y)
0.04
0.09
0.06
0.12
0.05
0.08
0.10
0.13
D
Rank (r y)
1
6
3
8.5
2
4.5
7
10
Total = 42
Now calculate the U values for both samples, using the formulas below:
U x = n x x n y + n x (n x + 1) - ∑r x
2
U y = n x x n y + n y (n y + 1) - ∑r y
2
Where n = number in the sample. In this example it is 8 for both x and y.
The calculations for this example are thus:
U x = (8 x 8) + 8 x 9 – 94 = 6
2
U y = (8 x 8) + 8 x 9 – 42 = 58
2
NB. The result for U x + U y should equal n x x n y .
13
You now need to select the smaller figure of 6 and compare it to the table of critical values
to test the significance of the result. If the value is lower than or equal to the critical value
you reject the Null hypothesis.
Sample size:
nx
8
ny
8
Critical Value at 0.05%
Significance Level
13
A table of critical values for U can be found
at: http://math.usask.ca/~laverty/S245/Tables/wmw.pdf
Now write a summary statement to express your result for the Mann-Whitney U Test. In this
case:
The lower value of U is 6. This is less than the critical value of 13 at the 0.05% significance
level. Therefore I can reject the null hypothesis and accept that there is a difference between
the income deprivation scores for the inner-city area of St Peter & the Waterfront and the
outer-suburb of Plymstock.
It is extremely clear just from looking at the values that there is difference in the income
deprivation scores. The point of doing the Mann-Whitney U test is that it allows us to be
certain about accepting or rejecting the null hypothesis,
The Chi-Squared Test
The chi-squared test (also referred to as the x2 test) is used to investigate spatial
distributions. It looks at frequencies or the distribution of data that you can put into
categories eg pebble shapes at different sites along a rivers course or frequencies of plant
types at different stages of a succession.
Chi-squared is a comparative test as it compares actual data collected against a theoretical
random distribution of the data.
The data collected is called the observed data.
The theoretical, random distribution is called the expected data.
What is needed to use the chi-squared test?
• The data needs to be organized into categories
• The data cannot be in the form of percentages and must be displayed as frequencies
• The total amount of observed data must exceed 20
• The expected data for each category needs to exceed 4
As with other statistical tests, chi-squared requires and tests a null hypothesis. The null
hypothesis is:
There is no significant difference between the observed distribution and the expected
distribution
The strengths of Chi-Squared lie in the fact that as with other statistical tests, you are
checking the significance of your results. As with other statistical tests the weaknesses
14
include human error in calculating x2. It also doesn’t explain why there is or isn’t a pattern to
the distribution. This will need further investigation.
Worked example of Chi-squared test
A group of students investigated the orientation of pebbles in an exposed bed of glacial till.
The glacial till was situated near the lip of a corrie in the Lake District. The students wanted
to investigate whether there was a pattern to the orientation of the long-axis of the till.
Their hypothesis was: There is a relationship between the orientation of the glacial till and
the direction of the glacier.
They measured the orientation of 40 pebbles and placed their results into 4 categories:
O – 45o = 2 pebbles
46 – 90o = 10 pebbles
91 – 135o = 23 pebbles
136 – 180o = 5 pebbles
The data suggests that there is a preferential direction but as this is maybe due to chance a
chi-squared test is carried out. The test begins with the assumption that there is no
preference for any direction with the null hypothesis:
There is no significant difference between the observed orientation of pebbles and the
expected random orientation.
Next the students created a ‘contingency table’ shown below:
Orientation
0 – 45o
46 – 90o
91 – 135o
136 – 180o
Observed Expected
(O)
(E)
2
10
10
10
23
10
5
10
A
O-E
B
(O – E)2
-8
0
13
-5
64
0
169
25
C
(O – E)2
E
6.4
0
16.9
2.5
X2 = 25.8
Chi-squared value (x2) = 25.8
The result by itself is meaningless. You now need to test its significance.
Work out the degrees of freedom using the formula (n – 1), where n is the number of
observations in this case the number of categories which contained observed data.
Therefore for this example n = 4, so the degrees of freedom are 4 – 1 = 3
Using the table below, compare your x2 result with the degrees of freedom for the 95%
significance level. If the x2 result is the same or greater than the value given in the table,
then the null hypothesis can be rejected.
15
Critical Values of chi-squared
Degrees
of
Freedom
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Significance level
0.05%
0.01%
3.84
5.99
7.82
9.49
11.07
12.59
14.07
15.51
16.92
18.31
19.68
21.03
22.36
23.68
25.00
6.64
9.21
11.34
13.28
15.09
16.81
18.48
20.09
21.67
23.21
24.72
26.22
27.69
29.14
30.58
A summary statement:
At 3 degrees of freedom the x2 result of 25.8 is above the 0.01% critical value of 11.34.
Therefore we can reject the null hypothesis and accept that the orientation of the till did not
occur by chance and is not randomly orientated.
AQA exam answers
1. Mann Whitney U calculation given
(a) Interpret the result of the MW U test. (4 marks)
One mark per valid point
The results from the calculation show that 8 (or U2) is less than the critical value of 27 at the
0.05 significance level. This means we can reject the null hypothesis with more than 95%
confidence that the result is not due to chance. We can therefore accept the hypothesis H1
that species diversity will be higher in a managed area than in an unmanaged area. (4)
(b) Suggest why the MWU is suitable to interpret this set of data. (4 marks)
One mark per valid point
A Mann-Whitney U test is suitable to interpret this set of data because the data is collected
from the same location – sand dunes in south west England but is sampled from two
different populations, one managed and one unmanaged. So as geographers we assume
there will be a difference between the two populations so we use the Mann Whitney U Test
to interpret this data. Also there are ten samples from each area so there are enough
samples to perform the test making it an obvious choice to interpret the data. The data
16
shows there may be a significant difference and the Mann Whitney U Test can confirm this.
The data is non-parametric which also makes the test suitable. (4)
2. Study Figure 1 which shows the details of a statistical investigation into the relative
nature of earthquakes in two countries, Japan and Italy. Comment on the
outcomes of this investigation. (7 marks)
Mark scheme
Level 1 (1-4 marks) (mid point 3)
Simple statements arising from the data, such as simple comparative points, with basic or
general statements of the reasons for/type of activity that would be associated with the
plate boundaries.
Level 2 (5-7 marks) (mid point 6)
Some sophistication of comment, showing a clear understanding of the outcomes of for
example the mean values, ranges and the Mann Whitney U test. Clear interpretation and/or
description/ reasoning, together with evidence of geographical thinking.
The Mann Whitney investigation suggests that there is a statistically significant difference
between the magnitude of Japan’s and Italy’s earthquakes as Japan’s outcome is 3.5 which is
below the critical value of 23. As Japan’s mean measurement of earthquake is 7.0 on the
Richter scale whilst that of Italy is 5.8. This suggests that Japan suffers from significantly
more powerful earthquakes. As the two countries are both MEDCs at similar levels of
development, this may mean that the effects of the earthquakes are also more severe in
Japan. Italy however has a greater range (1.9) between its highest and lowest earthquake
values than Japan (1.3) meaning that it has a greater degree of variation in power of
earthquake. This may make it more difficult to plan accurately as the effects of different
strength earthquakes may vary. Japan has also seen the same amount of earthquakes in 5
years that Italy has in over 30 years suggesting that Japan is more seismically active and this
combined with significantly greater magnitude suggests that Japan requires a greater level
of planning and preparation in order to manage the hazards presented. (7)
3. Chi-squared exercise.
Figure 1 shows a map of wards in Leicester. Figure 2a shows census data (2001) of the
population of four of these wards by ethnic group. For the purposes of a chi squared test,
the data became the observed frequency (O). This is the observed population by ethnicity
(O) for the selected wards, and Figure 2b shows the calculated expected frequencies (E)
for these wards in a contingency table.
For the chi-squared test, the null hypothesis (H 0 ) to be tested is:
‘There is no difference in the distribution of population in different ethnic groups within
selected wards in Leicester’.
The alternative hypothesis (H 1 ) is:
‘There is a difference in the distribution of population in different ethnic groups within
selected wards in Leicester’.
17
A chi-squared test was applied to these data. The chi-squared result was calculated as
17284.66 and was significant at both the 95% and 99% significance levels.
5 (a) With reference to the outcome of the chi-squared test and Figures 1, 2a and 2b,
comment on the distribution of ethnic groups in the selected wards of Leicester.
(12 marks)
Mark scheme.
Level 1 (1 – 5 marks) (mid point 3 marks)
There is a basic description of the figures and what they show. One figure may be covered
more strongly than others, so there will be some imbalance. There will be little comment on
the distributions and the variations shown. Hence, there may be a concentration on the
Figures 1 and 2a, with less on Figure 2b and the chi-squared test result or vice-versa. There
will be little or no reference to data.
Level 2 (6 – 10 marks) (mid point 8 marks)
There will be a clear summary of the figures, with an attempt at comment on the
distribution of population shown. There may still be some imbalance between the coverage
of the figures and the chi-squared result. There may be greater knowledge shown on some
figures than others, but a full range is not necessary in this band. Reference will be made to
the data.
Level 3 (11 – 12 marks) (mid point 12 marks)
There will be a detailed summary, including balanced reference to all figures and the chisquared result. Comment will be full on the information shown about the distribution of
ethnic populations and/or the chi-squared figure. There will be a detailed understanding of
the chi-squared result. There will be detailed reference to the data provided. Thinking like a
geographer.
By looking at the census data from 2001 we can see that in total there are more White
people than any other ethnicity at 26871. The number of Asian people is also high at 24942.
However the total population of Black and Other ethnicities is much lower at 2948 and 1988.
The chi-squared test has statistically proved that the ethnicities difference is not the same
between wards. For example in the city centre there is the highest frequency of White
ethnicity. But in the inner city there are very few at 3739. However the Asian population has
increased from 1786 in the CBD to the highest frequency in the data at 15402. Here also the
frequency of Black ethnicity is at the highest at this point too. It is however at the same level
as the expected frequency whilst Asian is not above this. It was expected to be around 9339.
White ethnicity here is much lower though at 3739 when it was expected to be 10062. None
of the expected frequencies are close to the actual. This is why the chi-squared test showed
that there is a definite link between ethnicities in different areas. So the 18995 number
which you get must be higher than the 95% and 99% significance levels. It shows that the
ethnic minority Black and Other ethnicities dominate frequencies in the Spinney Hills. This is
similar to the expected value. The two data sets Asian and White population decide the
outcome of the chi-squared test due to the differences.
(11)
18
Sample assessment exercises
1. A group of students was planning a study of a stream. The students’ hypothesis stated
that the velocity of streams increases with distance from the source. They collected data
from 9 sites along the course of the stream. To do this they timed how long it took for a
float to travel a distance of 10 metres. They measured the velocity three times at each site
and worked out an average for each site. The results are shown in the table below.
Site
1
2
3
4
5
6
7
8
9
Distance from source (km)
1.2
1.7
2.2
2.8
3.5
4.0
4.7
5.6
6.7
Average time for float to travel 10m (secs)
18.8
15.2
14.4
15.2
13.0
13.6
12.0
6.2
9.2
(a) The students decided to present their figures on a scatter graph.
(i) Draw the scatter graph on the graph paper
(5 marks)
Mark scheme:
1 mark for each correctly labelled axis;
3 marks for correctly plotted points.
(ii) Add a trend line to your graph.
(1 mark)
(iii) Explain how you decided where to put the trend line. (4 marks)
(b) (i) Name a statistical technique that the students could use to help them analyse their
data and test the hypothesis.
Describe how they would carry out the technique.
(7 marks)
(ii) Discuss the strengths and weaknesses of the technique described in (b) (i) for testing the
students’ hypothesis.
(5 marks)
(c) Write a report on the results of this investigation. You should refer to:
•
the extent to which the information provided support the students’ hypothesis
•
other information that would help them prove or disprove the hypothesis.
(8 marks)
19
2. The table below shows the figures for total annual rainfall for two recording stations over
a 15 year period.
Year
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
TOTAL
Annual rainfall total for a
station in SE England (mm)
781
618
563
586
545
884
600
793
580
699
842
737
824
530
823
10405
Annual rainfall total for a station
in N Nigeria (mm)
490
379
832
1037
594
855
376
350
479
761
688
1143
356
635
986
9961
(a) Calculate the mean annual rainfall at each station, to two decimal places.
• South East England
• North Nigeria
(2 marks)
(b) i. Complete a dispersion diagram for each of South East England and North Nigeria.
(4 marks)
ii. Identify the upper quartile and lower quartile for each set of rainfall figures by drawing
lines on the dispersion diagram.
(4 marks)
(iii) Calculate the inter-quartile range for each recording station.
• South East England
• North Nigeria
(2 marks)
(c) The standard deviation for South East England is 120.09. The standard deviation for
North Nigeria is 254.97. What do you understand by the term standard deviation?
(3 marks)
(d) Using all of the information you have been given, compare the variability of the annual
rainfall at these two stations. Comment on the possible effects that this variability could
have on the people in the two areas.
(10 marks)
20
3. A group of students undertook a fieldwork enquiry into the proposal to build a new
hospital on the outskirts of their town. The existing hospital is more central to the town. The
group decided to investigate the possible impacts of the development.
The table below shows some data obtained by the students about the location of the
employees of the existing hospital in 2006.
Ward
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Total
Total residents of ward
employed at hospital in 2006
42
16
14
162
106
95
201
184
85
109
273
67
57
61
12
11
1495
Number of employees/km2
of ward in 2006
5.1
0.5
0.5
15.8
6.4
21.8
77.1
18.3
15
9
53.5
14.4
11.1
5.1
1.5
0.6
(a) Present the data for the number of employees/km2 of each ward in 2006 in the form of a
choropleth map using four classes:
(i) Give the class boundaries you would use.
(2 marks)
(ii) Complete a choropleth map using the classes you gave in (i). (6 marks)
(b) The group of students then obtained data showing the gender balance of the workforce:
Gender
Total
%
Male
238
Female
1257
(i) Calculate the % for each gender.
(1 mark)
(ii) The students decided to draw a proportional pie chart to represent the data. Draw this
pie chart using the formula r = √ A/π
Where A = the area of the circle, in which 1mm2 = 1 worker; R = the radius of the circle.
(4 marks)
21
4. Statistical exercises using earthquake data.
(a) Italian earthquakes that caused fatalities of > 10 (1930 – 2012)
Date of earthquake
Location
2012
2009
2002
1990
1980
1976
1971
1968
1962
1930
Medolla
L’Aquila
Molise
Sicily
Irpinia
Friuli
Lazio
Sicily
Irpinia
Irpinia
Moment Magnitude
Scale
5.8
6.3
5.9
5.7
6.9
6.5
4.9
6.4
6.2
6.7
Deaths
20
308
30
17
2914
989
31
370
17
1400
(i) Investigate the relationship between the strength of the earthquake and the number of
deaths.
1. Complete a scatter graph of the data including drawing a line of best fit.
2. Carry out a Spearman’s Rank correlation exercise using a copy of the table below.
Location
Medolla
L’Aquila
Molise
Sicily
Irpinia
Friuli
Lazio
Sicily
Irpinia
Irpinia
Moment
Magnitude
Scale
5.8
6.3
5.9
5.7
6.9
6.5
4.9
6.4
6.2
6.7
Rank
(Scale)
Deaths
Rank (Deaths)
20
308
30
17
2914
989
31
370
17
1400
Calculate the Spearman’s Rank correlation coefficient using the formula:
d
d2
Σ d2 =
Rs =
(ii) Now test the significance of your result using the following table:
22
(iii) In the light of these two completed tasks, what do you now conclude?
(b) Repeat for: Japanese earthquakes that caused fatalities of >10 (1946 – 2012)
Date of earthquake
2011 (Tohoku)
2008
2007
2004
1995 (Kobe)
1978
1968
1964
1948
1946
Date of
earthquake
2011
2008
2007
2004
1995 (Kobe)
1978
1968
1964
1948
1946
Moment Magnitude
Scale
9.0
6.9
6.6
6.9
6.8
7.7
8.2
7.6
7.1
8.1
Moment
Magnitude
Scale
9.0
6.9
6.6
6.9
6.8
7.7
8.2
7.6
7.1
8.1
Rank
(Scale)
Deaths
15885
12
11
40
6434
28
52
26
3769
1362
Deaths
15885
12
11
40
6434
28
52
26
3769
1362
Rank
(Deaths)
d
d2
Σ d2 =
23
(c). Using the Mann Whitney U test:
An investigation into the variability in the size of earthquakes at two different plate margins:
Italy and Japan.
(a) Analyse the data below using the Mann Whitney U test.
Null hypothesis: there is no difference between the magnitude of earthquakes in Japan and
Italy
Italy
x
Rank (r x )
5.8
6.3
5.9
5.7
6.9
6.5
4.9
6.4
6.2
6.7
No in sample (N X ) = 10
Total of rank scores
(Σr x ) =
Japan
y
9.0
6.9
6.6
6.9
6.8
7.7
8.2
7.6
7.1
8.1
No in sample (N y ) = 10
Total of rank scores
(Σr y ) =
Rank (r y )
Now calculate the U values for both samples, using the formulas below:
U x = n x x n y + n x (n x + 1) - ∑r x
2
U y = n x x n y + n y (n y + 1) - ∑r y
2
Where n = number in the sample. In this example it is 10 for both x and y.
The critical value for U at the 95% confidence level for two sample of ten within the Mann
Whitney U test is 23.
(b) Comment on the outcomes of your completed test.
24
5. A river based exercise
Students measured a range of variables along the river Eea at a total of 10 sites along its
course between its source and mouth. They wanted to test whether the Bradshaw model
could be applied to this river. The Spearman’s rank correlation test can be used to help
determine whether or not the River Eea conforms to the Bradshaw model. According to this
model there should be a direct relationship between channel width and depth.
(i) State the expected and null hypotheses that might be tested for these channel variables
(ii) Use the table to calculate the Spearman’s Rank Correlation coefficient. Start by ranking
the width values from highest (1) to lowest (10). Then rank the average depth column in the
same way.
Site no.
Width
(metres)
1
2
3
4
5
6
7
8
9
10
0.97
1.30
2.01
3.06
3.50
2.55
6.23
6.00
6.42
7.25
Rank
Average
depth
(metres)
8.14
6.00
8.12
17.10
16.50
14.68
8.30
9.40
24.70
20.91
Rank
d
d²
Σd²=
Calculation of Rs:
Critical values for Rs at the 5% and 1% significance levels.
Rejection Level
n 0.05 0.02 0.01
5 1
1
6 0.886 0.943 1
7 0.786 0.893 0.929
8 0.738 0.833 0.881
9 0.683 0.783 0.833
10 0.648 0.746 0.794
12 0.591 0.712 0.777
25
6. A Chi Squared Test
When studying the River Eea students measured the roundness of pebbles at Site 2
(upstream) and at Site 8 (downstream) using Powers’ scale of roundness. Their results are
shown in the table below.
Null hypothesis: There is no difference in the shape of pebbles between an upstream and
downstream location on the River Eea, they are distributed randomly.
Row
number
(R)
R1
R2
R3
R4
Column
number
(K)
Angular
Subangular
Sub-round
Round
O
Upstream
22
E
K1
Downstream
O
E
7
15
2
9
4
Σk1=50
10
31
Σk2=50
ΣR
29
K2
17
19
35
n=100
R = Row Number
K = Column Number
O = Observed frequency of pebbles in each category
i) Calculate the expected frequency (E) in each cell using the formula: E = ΣR ΣK
n
E = sum of row (ΣR) multiplied by the sum of its column (ΣK) divided by the sum of all
observed frequencies (n).
(ii) Calculate the Chi² value using the formula:
(iii) Now check this result with the following significance tables. Firstly, calculate the degrees
of freedom, (R-1) x (K-1). Then read from the Critical values table to see if the null
hypothesis can be rejected. Your Chi² value must be greater than that of the critical value to
be significant.
Critical Value: 0.05
Degrees of Freedom
1
2
3
4
5
Value
3.84
5.99
7.82
9.49
11.10
Critical Value: 0.01
Degrees of Freedom
1
2
3
4
5
Value
6.63
9.21
11.30
13.30
15.10
(iv) What does the result tell us about the two samples of pebbles taken from the River Eea?
26
7. A Mann-Whitney U test
•
•
The long axis of pebbles for 15 samples was measured at the upstream (1) and
downstream site (10) on the River Eea in Cumbria.
Null hypothesis: There is no significant difference between pebble sizes at the
upstream and downstream site on the River Eea.
i) Use the table format below to set out the ranks in order (lowest value first). Remember
that the total sample is ranked together and not as individual columns as in the Spearman’s
Rank correlation.
Upstream
x
Rank (r x )
15
8
22
32
16
18.5
34
32
19.5
13.5
28
10.5
13
24.5
45
No in sample (N X ) =15
Total of rank scores (Σr x ) =
Downstream
y
Rank (r y )
4
8
10
6
19
14
6
13.5
7
5
12.5
12
8.5
6
13
No in sample (N Y ) =15
Total of rank scores (Σr y ) =
(ii) Calculate the ‘U’ values for both the upstream site (x) and the downstream site (y) using
the MW formula.
Ux = N x N y + N x (N x + 1) - Σr x
2
Uy = N x N y + N y (N y + 1) - Σr y
2
(iii) In order to test the result you must now test for significance. Take the smaller U value
that is calculated and consult the significance table to decide whether or not you can reject
the null hypothesis.
27
8. Calculation of standard deviation (Stornoway annual precipitation)
Year
1
Precipitation
877
2
1082
3
1203
4
963
5
1241
6
1194
7
1072
8
900
9
1146
10
1094
11
1098
12
1318
13
791
14
1035
15
1151
x - x̄
(x - x̄ )
2
2
∑x =
∑(x - x̄ ) =
x̄ =
∑(x - x̄ ) /n =
2
Standard deviation σ
=
Summarising statistics for annual precipitation (mm)
Statistic
Mean precipitation
Salina Cruz (Mexico)
1063.07
Stornoway
Median precipitation
Standard deviation
911
476.85
1094
Upper quartile
1188
Lower quartile
701
Interquartile range
487
28
9. Calculation of Spearman’s Rank Correlation Coefficient between infiltration rate and
gradient
Area
Infiltration rate
(millilitres per
second)
1
Rank of slope
gradient
(steepest
ranked 1)
15
2
17
12
3
14
21
4
16
18
5
8
9
6
10
4
7
18
31
8
9
63
1
9
6
3
10
12
5
11
2
38
12
3
38
13
4
11
14
13
125
15
5
33
16
11
2
17
1
167
18
7
83
Rank of
infiltration rate
(highest ranked
1)
Difference in
rank
d
2
∑d2=
Spearman’s Rank correlation coefficient (R s ) = 1 – 6 ∑d2
n3 – n
=
Summarising Spearman’s Rank correlation coefficients between infiltration and selected
variables
Statistic
% soil moisture
Correlation Coefficient (R s )
with infiltration rate
-0.84
% ground vegetation
cover
0.74
Gradient
Infiltration rate = the amount of water in millilitres (ml) passing into the ground in 1 second.
% ground vegetation cover is the % of the ground surface covered by vegetation less than
1m in height
When n=18, R s values exceeding ± 0.40 are significant at the 0.05 (5%) level and R s values
exceeding ± 0.56 are significant at the 0.01 (1%) level.
10. Calculation of Spearman’s Rank Correlation Coefficient between particle size and
gradient on a scree slope.
Study site
Angle of
Rank of slope
Mean median
Rank of mean
Difference in
29
slope (x)
gradient
(steepest
ranked 1)
axis (cm) (y)
29
21
2
29
16.7
3
30
21
4
32
16.1
5
31
15.4
6
31
16.9
7
31
13.1
8
30
15.3
9
35
10.5
10
44
11.5
11
40
8.7
12
40
7.9
13
46
6.1
14
48
4.2
15
45
4.7
median axis
(largest ranked
1)
Spearman’s Rank correlation coefficient (R s ) = 1 – 6 ∑d2
n3 – n
rank
d
∑d2=
=
When n=15, R s values exceeding ± 0.44 are significant at the 0.05 (5%) level and R s values
exceeding ± 0.62 are significant at the 0.01 (1%) level.
You could also complete a scatter graph using this data.
30
11. A river in northern England – discharge in cumecs (1991-2009)
Year
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
Mean
Minimum
Maximum
Upper quartile
Lower quartile
IQR
Standard
deviation
January
56.68
70.96
86.42
77.86
24.45
69.94
33.26
81.19
34.51
88.81
49.33
31.69
79.70
79.46
93.93
21.16
10.00
29.98
99.70
April
15.23
6.35
25.85
7.50
45.05
24.68
27.11
17.29
31.16
13.95
34.81
31.58
42.05
50.62
12.78
16.78
10.00
17.58
32.60
July
14.04
6.15
6.55
1.88
40.87
6.03
31.89
42.80
3.60
18.01
10.48
6.16
21.00
5.16
7.17
9.56
3.60
31.00
32.05
October
72.61
42.59
72.93
67.91
30.57
45.62
70.23
47.27
42.68
50.07
33.12
27.56
11.98
31.17
25.22
41.18
10.20
80.60
46.65
January
58.9
10
99.70
81.19
31.69
49.50
27.57
April
24.37
6.35
50.62
32.60
13.95
18.65
12.59
July
15.68
1.88
42.80
31
6.03
24.97
13.10
October
44.75
10.20
80.60
67.91
30.57
37.34
19.96
These data have been collated as part of the process of setting up flood control schemes.
With the aid of all of the data contained above, write a brief report (approximately 300
words) to summarise the variations in discharge and to assess the reliability of mean
discharge values as a basis for river management.
31
12. Figure 1. Urban population data – variations in social and economic conditions in an
urban area in England
Indicator
Ward
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
A
%
households
with NCWP
born head
Rank
B
%
unskilled
workers
Rank
C
%
unemployed
Rank
1.8
3.6
5.7
3.4
50.2
3.9
11
10.2
7.5
12.3
29.9
3.4
12
2.9
1.5
3.7
1.2
6.3
3
7
10
5
18
9
14
13
12
16
17
5
15
4
2
8
1
11
5.5
6.6
6
2.2
15.6
8.4
7.3
8.5
8.9
11.6
12.5
5.6
8.1
1.8
7.2
4.3
4.2
5.1
6
9
8
2
18
13
11
14
15
16
17
7
12
1
10
4
3
5
9.4
15.8
11.7
8.9
23
18.9
13.9
15.1
14.2
17.5
22
12.4
15.4
6.2
16.7
12.3
11.6
9.4
3
13
6
2
18
16
9
12
10
15
17
8
11
1
14
7
5
3
D
%
households
with >1
person per
room
2.3
5.8
4
1.3
15.8
8.2
8
6.7
5.7
7.8
10
3.6
6.4
1.2
4.7
3.4
3
4.6
Rank
E
%
households
without a
car
Rank
Total score
Social and
economic
index
3
11
7
2
18
16
15
13
10
14
17
6
12
1
9
5
4
8
31
42
35
29
62
47
41
48
49
54
68
40
50
16
47
39
32
34
3
10
6
2
17
11
9
13
14
16
18
8
15
1
11
7
4
5
18
50
37
13
89
65
58
65
61
77
86
34
65
8
46
31
17
32
Indicator A: Percentage of households where the head of the household was born in the New Commonwealth and Pakistan
(NCWP)
Exercises.
1. Draw a dispersion diagram/graph to display the range of values for the Total score
for the Social and economic index.
2. Using your completed graph divide the data into four classes.
3. Using the four classes identified in Question 2, complete Figure 2 to produce a
choropleth map to show the variation in social and economic conditions in this urban
area as indicated by the Social and economic index.
4. Select two sets of paired indicators from the data and perform two correlation tasks
– one by scatter graph and the other by Spearman’s rank to ascertain the degree of
relationship between the two sets of paired data.
32
33
13. Worcester population data
Ward
Arboretum
Battenhall
Bedwardine
Cathedral
Claines
Gorse Hill
Nunnery
Rainbow Hill
St Clement
St John
St Peter
Parish
St Stephen
Warndon
Warndon
Parish N
Warndon
Parish S
Total
ethnic
minorities
% of total
population
No. of
people
18-30
% of total
population
587
313
323
1138
280
312
366
294
212
293
249
10.46
6
4.1
15.26
3.56
5.65
4.57
5.02
3.86
3.65
4.43
1195
698
1072
1797
940
872
1199
995
1343
1638
998
234
203
258
4.64
3.84
4.94
329
6.3
Total
population
21.29
13.39
13.61
23.71
11.94
15.79
14.97
17.02
24.45
20.39
17.75
Distance
of centre
of ward
from city
centre
(km)
0.8
1.9
2.4
0.5
2.8
1.9
1
1.1
2.1
2.4
2.9
791
1061
1135
15.67
20.04
21.71
1.5
2.1
3.2
5048
5294
5229
959
18.35
2.5
5225
5612
5214
7876
7458
7875
5523
8011
5845
5493
8033
5622
34
Wave Frequency data
The data show wave frequency data at Hornsea, East Riding, 17-18 January 2016
A 17 January
midnight – noon
00:00
00:30
01:00
01:30
02:00
02:30
03:00
03:30
04:00
04:30
05:00
05:30
06:00
06:30
07:00
07:30
08:00
08:30
09:00
09:30
10:00
10:30
11:00
11:30
12:00
Waves per
minute
9.1
9.0
8.7
8.6
8.8
8.2
8.2
8.6
8.5
8.8
8.8
9.2
9.7
9.5
9.5
9.8
10.0
10.2
10.2
10.5
10.9
10.5
10.3
10.0
10.0
B 17 Jan 13.30 pm
– 18 Jan 01.30 am
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30
21:00
21:30
22:00
22:30
23:00
23:30
00:00
00:30
01:00
01:30
Waves per
minute
14.0
14.3
14.3
15.4
16.2
16.2
15.8
15.8
16.2
15.8
16.2
15.8
15.0
15.0
15.0
14.6
14.0
14.6
14.0
15.0
14.6
15.0
15.8
15.0
15.8
a
Using the data, create a dispersion diagram for wave frequency for each of
Columns A and B, and then calculate the range for each set of data.
b
Calculate the mean, median and mode for wave frequency in each of
Columns A and B.
c
Using the dispersion diagrams, calculate the quartiles for each set of data.
d
Compare the wave data for each period.
e
Explain the likely impact of the waves in each period on Hornsea’s beaches
during 17-18 January 2016.
Water and carbon skills
Storm
Precipitation
Prior
conditions
Peak
discharge
Storm
Average
amount (mm) intensity
(mm per hr)
Maximum
intensity
(mm per hr)
(litres per
second)
(mm)
A
11.8
3.05
10.16
57
1034
B
10.7
2.54
3.56
70
694
C
30.5
3.05
4.06
9
1019
D
16.0
1.52
3.56
79
665
The table shows discharge data for a small river basin associated with four storm events, A to D, occurring at
different times of the year.
(a) Explain the variations in peak discharge for each of the four storm events. [8 marks]
............................................................................................................................................................................................
............................................................................................................................................................................................
............................................................................................................................................................................................
............................................................................................................................................................................................
............................................................................................................................................................................................
............................................................................................................................................................................................
(b) How is time-lag likely to be affected by the intensity of a storm event? [4 marks]
............................................................................................................................................................................................
............................................................................................................................................................................................
............................................................................................................................................................................................
(c) State and comment on three factors affecting run-off volume in river basins. [6 marks]
............................................................................................................................................................................................
............................................................................................................................................................................................
............................................................................................................................................................................................
............................................................................................................................................................................................
............................................................................................................................................................................................
Mark scheme