Download Dispersion Graphs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Categorical variable wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

World Values Survey wikipedia , lookup

Transcript
GEOGRAPHY
●
AS/A2 LEVEL
●
AQA (1031/2031)
Practical Skills for AS/A2 Geography at BHS
3. Statistical Skills
<insert cover image>
[email protected]
zigzageducation.co.uk
GEOGRAPHY
●
AS/A2 LEVEL
●
AQA (1031/2031)
Contents
Thank You for Choosing ZigZag Education ............................................................ Error! Bookmark not defined.
Teacher Feedback Opportunity .................................................................................. Error! Bookmark not defined.
Terms and Conditions of Use ...................................................................................... Error! Bookmark not defined.
Teacher’s Introduction .................................................................................................. Error! Bookmark not defined.
2.1 Measures of Central Tendency: Mean, Median and Mode. ............................................................................. 2
Arithmetic Mean ( x ) ............................................................................................................................................................................... 2
Median ........................................................................................................................................................................................................ 3
Mode ............................................................................................................................................................................................................ 4
2.2 Measures of Dispersion. ......................................................................................................................................... 5
Range .......................................................................................................................................................................................................... 5
Inter-quartile Range .................................................................................................................................................................................. 5
Dispersion Graphs ..................................................................................................................................................................................... 6
Exercise 2.1 ................................................................................................................................................................................................ 7
Comparing Dispersion Graphs................................................................................................................................................................. 9
Box-and-Whisker Diagrams ................................................................................................................................................................ 11
Histograms .............................................................................................................................................................................................. 12
Exercise 2.2 ............................................................................................................................................................................................. 14
Standard Deviation () ......................................................................................................................................................................... 15
Exercise 2.3 ............................................................................................................................................................................................. 16
2.3 Correlation .............................................................................................................................................................. 17
Scattergraphs (Scatter Diagrams) ........................................................................................................................................................ 17
Spearman’s Rank Correlation Coefficient (rs) ..................................................................................................................................... 19
2.4 Comparative Tests................................................................................................................................................. 22
Chi-squared test (χ2) ............................................................................................................................................................................... 22
Mann-Whitney U test ............................................................................................................................................................................ 26
Exercise 2.4 ............................................................................................................................................................................................. 29
2.5 Examination Questions......................................................................................................................................... 30
Examination Assignment 2.1 ................................................................................................................................................................ 30
Examination Assignment 2.2 ................................................................................................................................................................ 32
Page 1 of 29
●
GEOGRAPHY
●
AS/A2 LEVEL
AQA (1031/2031)
2.1 Measures of Central Tendency: Mean, Median and Mode.
Arithmetic Mean ( x )
The arithmetic mean, usually called ‘the mean,’ is the ‘average’. It is found by adding up the values in a
data set and dividing by the number of values
It is expressed as: x 
x
n
where x (bar x) = mean
 (sigma) = the sum of
x = values of the variable
n = number of items in the set
Look at the population data in Figure 2.1 (column 1). To find the mean population the values of the
individual populations are first added together to give a total population of 3,376 millions ( x). This is
divided by the number of countries in the set (n =13) to give a mean value of 259.69 millions (x = 259.69).
Exam hint
Remember when making calculations the answer should be given to 2 decimal places
Figure 2.1: Population data for selected countries in Africa/Asia/Latin America (2006)
Country
Population
(millions) (1)
Egypt
Nigeria
Ethiopia
Uganda
Mexico
Bangladesh
India
Pakistan
China
Brazil
Bolivia
Chile
Puerto Rico
Afghanistan
75
134
74
27
108
146
1,121
165
1311
186
9
16
4
No data
Life
expectancy
at birth (2)
70
44
49
47
75
61
63
62
72
72
64
78
77
42
% Urban
(3)
43
44
15
12
75
23
29
34
37
81
63
87
94
22
Page 2 of 29
●
GEOGRAPHY
●
AS/A2 LEVEL
AQA (1031/2031)
Median
The median is the middle value or mid-point in a set of data. The simplest way to find the median is to
arrange the values in sequence from highest to lowest. This can be done by either simply listing the
numbers in a line (Figure 2.2) or by using a column dispersion diagram (see Figure 2.5). To find the middle
value (median) count the number of values. There must be an equal number of points above and below
the median. For example, with 25 values the median is the 13th value (you can count from either the
bottom or the top), with 12 values above and 12 values below the median.
Exam hint:
You might be tempted to try and find the median by observation alone. Be warned
that this frequently results in errors!
Figure 2.2: Population sizes (millions)
There are 13 values (an odd number) in the set, so the 7th is the median (with 6 values above and 6 values
below the median).
1311
1121
186
165
146
134
108
75
74
27
16
9
4
median (7th value in sequence)
With an odd number of values the median should be easy to find. Look again at the population data in
Figure 2.1 (column 1). There are 13 countries in the data set and their values (in millions) have been
arranged in rank order in Figure 2.2. As there are 13 values, the mid-point is the 7th value (there will be 6
points above and below the median), which is 108. So the median value of the 13 countries is 108 million.
With an odd number of values the median point can be calculated. The median is the
n  1 th
value in the set
2
So the median population in Figure 2.2
13  1 th
value (or 7th) in the set (=108)
2
However, with an even number of values there is no central value and no formula can be used to find
the value. You will need to take the mean of the two middle values. So in a sequence of 14 values the
two mid points are the 7th and 8th value. These two values are then divided by 2 (as there are two of
them) to give you the median.
To find the median value life expectancy for the countries listed in Figure 2.1 (column 2) the data has
been again been arranged in rank order in Figure 2.3. There is an even number of values (14) and the two
middle values are 63 and 64. So the median is the mean of 63 and 64:
63  64
 63.5 .
2
Page 3 of 29
●
GEOGRAPHY
●
AS/A2 LEVEL
AQA (1031/2031)
Figure 2.3: Life expectancy
There are 14 values (an even number), so the median is the mean of the two middle values (with 6 values
above and 6 values below the median).
78
77
75
72
72
70
64
63
62
61
49
47
44
42
 63  64 
 2 
Median = 63.5 
Note: if values occur more than once (as with number 72 in Figure 2.3), list the values next to one another to make
sure they are all counted!
Mode
The mode is the value that occurs most frequently in a data set. However, with continuous data a mode
is a rare occurrence. For example, with data on population sizes there will not be a mode as no two
countries will have the same populations. More important is the grouping of data into a number of
classes, e.g. 0–4, 5–9, 10–14, 15–19. The group that occurs most frequently is called the modal class. This
can be shown in graph form using a histogram (see section 2.2). When grouping into classes aim for 5 or 6
groups and ensure that the class interval is the same!
Of the measures of central tendency, the mean is the one used most frequently. It takes all the values into
consideration and is easy to calculate. However, it can be influenced by one or two extreme values and,
therefore, on its own is of limited value. The median on its own also gives no indication of the spread of
the data (as in Figure 2.2).
Page 4 of 29
●
GEOGRAPHY
AS/A2 LEVEL
●
AQA (1031/2031)
2.2 Measures of Dispersion.
To give a more accurate impression of a data set it is useful to look at the dispersion of the data.
Range
This is the difference between the highest and lowest value in a data set. It is of little significance apart
from indicating the spread of the data.
Inter-quartile Range
To find the inter-quartile range, first rank the data and find the median as described above.
The Upper Quartile is the mid-point of the values above the median, and the Lower Quartile is the midpoint of the values below the median. In each case there will be two middle values and you will need to
take the mean of the values. In Figure 2.4 the upper quartile is the median of the values above 108 and
the lower quartile is the median of the values below 108. For the upper quartile the two middle values
are 186 and 165, so the upper quartile is the mean of the two values, which is 175.5. If you carry out the
same procedure for the lower quartile the result is 21.5.
The subtraction of lower quartile from the upper quartile gives the Inter-quartile Range, which is an
index of dispersion. If we divide the inter-quartile range by two we obtain the ‘Quartile Deviation’.
To help with the calculation of upper and lower quartile with an odd number of values the following
formula can be used.
Upper Quartile =
n  1 th
value (ranked from highest to lowest)
4
So in Figure 2.4 the Upper Quartile would be the
Lower Quartile = 3 ×
13  1 th
value (= 3.5 i.e. the mean values of 3 + 4)
4
n  1 th
value (ranked from highest to lowest)
4
In Figure 2.4 the Lower Quartile would be 3 ×
13  1 th
value (= 10.5 i.e. the mean of values 10 + 11)
4
Note: this formula cannot be used with even number of values
The inter-quartile range is a more useful measure than the range as it tells us how the values are
dispersed about the median (above and below it). It tells us the spread of the middle 50% of the data
above and below the median. A small inter-quartile range means that there is a narrow range of values
about the median. The large inter-quartile range in Figure 2.4 indicates a wide spread of data with regard
to the population sizes of the 13 countries.
Page 5 of 29
●
GEOGRAPHY
●
AS/A2 LEVEL
AQA (1031/2031)
Figure 2.4: The inter-quartile range of population size for selected countries
1311
1121
186
165
146
134
186  165 

2

Upper Quartile = 175.5 
108
75
74
27
16
9
4
 27  16 
 2 
median
Lower Quartile = 21.5 
Inter-quartile range = 154
(175.5 – 21.5)
Visual Representation of Data
The dispersion of values in a data set may be appreciated better if it is presented in visual form. This can
be done using dispersion diagrams and histograms
Dispersion Graphs
A dispersion graph can show a range of values in a data set in the form of a graph (Figure 2.5). They are
visually very effective, as the full range of data can be seen together with the patterns and groupings of
the data. They are particularly useful for making comparisons either between areas or at the same
location over a period of time.

Technique
A vertical scale is drawn and should cover the full range of data. The independent variable is
represented on the horizontal axis, although a scale may be irrelevant if only one column is used. The
values are then plotted on the graph in the form of a column using dots of uniform size. One dot
represents one value (values which are identical should be placed next to one another on the same
line). The median and quartiles can be shown using horizontal lines or arrows. The data for life
expectancy for Figure 2.1 (column 2) can be seen in Figure 2.5 below.
Figure 2.5: A dispersion graph showing life expectancy (for countries in Figure 2.1)
Countries
Page 6 of 29
●
GEOGRAPHY
?
●
AS/A2 LEVEL
AQA (1031/2031)
Exercise 2.1 – Skills: statistical, graphical, mean, median, upper/lower quartile, inter quartile range,
dispersion graphs
Exercise 2.1
1. a) What is the mean population size for the countries shown in Figure 2.6 (column 1)?
..............................................................................................................................................................................
b) List the values for life expectancy shown in Figure 2.6 in rank order (Figure 2.7).
What is the median life expectancy?
..............................................................................................................................................................................
c) Plot the values for life expectancy in Figure 2.6 on a column dispersion graph (Figure 2.8)
d) On the graph (Figure 2.8) mark the median and upper and lower quartiles
e) What is the inter-quartile range for life expectancy?
..............................................................................................................................................................................
Figure 2.6: Population data for selected countries in Europe/N. America (2006)
Country
UK
Belarus
Poland
Russia
Italy
Spain
Ukraine
Canada
USA
Czech Republic
Portugal
Sweden
Croatia
Lithuania
Population
(millions) (1)
61
10
38
142
59
46
47
33
299
10
11
9
4
3
Life Expectancy at
birth (2)
78
69
75
65
80
81
68
80
78
76
78
81
75
72
Figure 2.7 –List of Values for life expectancy
% Urban (3)
89
70
62
73
90
76
68
79
79
77
53
84
56
67
Figure 2.8 Dispersion Graph (life expectancy)
85
80
70
60
50
40
(6 marks)
Page 7 of 29
GEOGRAPHY
●
AS/A2 LEVEL
●
AQA (1031/2031)
2. Compare your competed graph (Figure 2.8) with the dispersion column on Figure 2.5
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(2 marks)
3. What problems are shown here with regard to dispersion graphs?
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(2 marks)
(Total marks = 10)
Page 8 of 29
●
GEOGRAPHY
●
AS/A2 LEVEL
AQA (1031/2031)
Comparing Dispersion Graphs
Although the dispersion graph does not provide a statistical record of the spread of data, it is an
excellent visual guide. It is particularly useful when comparisons need to be made between samples. The
scatter of values is plotted for each sample, using the same scale, and the medians and upper and lower
quartiles are marked. Comparisons can be made and are based on the relative positions of the medians
but, in particular, on the positions of the upper and lower quartiles.
If you examine Figure 2.9 you can see there is no significant difference between the two sets of data, as the
lower quartile of B is between the median and lower quartile of A. But if you look at Figure 2.10 which
shows infant mortality rates in different continents, you can see that the lower quartile for Africa is
above the median for Asia but below the upper quartile. This suggests a difference between the data
which is ‘probably significant’. However, if you compare the dispersion for Africa and Asia with that of
Europe you can see that the inter-quartile ranges do not overlap. The lower quartile for Africa, and also
for Asia, lies above the upper quartile for Europe, which indicates a ‘significant difference’ between the
data.
Figure 2.9 Dispersion diagrams showing no significant difference
A
B
UQ
UQ
UQ
UQ
M
M
M
M
LQ
LQ
LQ
LQ
M = Median
UQ = Upper Quartile
LQ = Lower Quartile
Page 9 of 29
GEOGRAPHY
●
●
AS/A2 LEVEL
AQA (1031/2031)
Figure 2.10: Dispersion diagrams showing the infant mortality for selected countries in Africa, Asia and Europe
Africa
Asia
Europe
Page 10 of 29
GEOGRAPHY
●
AS/A2 LEVEL
●
AQA (1031/2031)
Box-and-Whisker Diagrams
The dispersion graph can easily be converted into a box-and-whisker diagram (Figure 2.11) with a slight
amendment.

Technique
1)
Plot the points and mark on the median and upper and lower quartiles as in Figure 2.5.
2)
Draw a further two horizontal lines, parallel with the horizontal axis, through the highest and
lowest values.
3)
Draw 2 vertical lines from the upper quartile to the lower quartile to ‘box’ the data. The box
represents the inter-quartile range.
4)
Draw a central vertical line from the highest to lowest value. This will result in the ‘whiskers’
which show the range of data.
Figure 2.11 A box-and-whisker diagram showing the infant mortality in Africa
Highest value
Upper quartile
Median
Lower quartile
Lowest value
Africa
Page 11 of 29
●
GEOGRAPHY
●
AS/A2 LEVEL
AQA (1031/2031)
Graphs of frequency distributions
Histograms
Histograms are graphs that show the frequency distribution of data grouped into classes. A histogram is
an effective way of showing the distribution of values in a data set. But it can only be used where the
data is in groups or classes. Figure 2.12 shows bars or rectangles rising from the horizontal (x) axis which
is marked off into classes. The vertical (y) axis indicates the frequency of the dependent variable. Notice
that the bars in the histogram are continuous with no gaps between the bars.
Exam hint:
Students often confuse the histogram with the bar graph. But the histogram shows
frequencies and the data must be in classes. Also the bars in the histogram must be
continuous and not separated with spaces.
Figure 2.12 A typical histogram
Modal class
6
Frequency (y axis)
5
4
3
2
1
Frequency
0
0–20
21–40
41–60
61–80
Class (x axis)
81–100
Classes

Technique
Before you can draw the histogram you must decide on the number of classes and the class
intervals. There must be a fixed class interval within the range of data. You must not use intervals that
are different. Histograms should have at least five classes but there are no hard and fast rules that can
be used in deciding on the number of classes. You must look at the range of data. Some standard text
books advise students to use the formula:
Number of classes = 5 × log of the number of items in the set
But it must be stressed that this is the maximum number of classes and there is absolutely no requirement
to find such a figure! Normally 6 or 7 classes are ideal.
There is, however, a problem with the class interval and the boundaries between the classes because no
class boundary can be omitted or counted twice. So, for example, you cannot have class intervals of 1–
20, 20–40, 40–60 because 20 and 40 would appear in two groups. So, therefore, you would use the class
interval 0–19, 20–39, 40–59, etc. This is fine with discrete data, which are in whole numbers, but most
data is not in whole numbers. So if your class interval is 0–19, 20–39, which class is allocated to a value
of 19.5? Is it 19 or 20? To overcome this problem it is necessary to group the data as 0–19.9, 20–30.9, and
so on and then no value can be omitted. The values of 19.9, 30.9 assume an occurring figure of .999999.
Page 12 of 29
●
GEOGRAPHY
●
AS/A2 LEVEL
AQA (1031/2031)
Exam hint:
When grouping into classes aim for at least 5 groups and ensure that the class interval
is the same. Make sure you are plotting frequency on the vertical axis.
Look again at Figure 2.1 (column 3). The data ranges from 12% urban to 94% urban so a convenient class
range would be from 10–100%. What about the number of classes and the class interval? You could
group the data in 10s which would produce 9 classes but there is too little data for such a large number
of classes. In groups of 20 there would be too few classes; so 15 would seem to be reasonable. Starting at
10, the groups would be 10–24, 25–39, 40–54, 55–69, 70–84, 85–99. The histogram produced is shown in
Figure 2.13 below:
Figure 2.13 Histogram to show % urban populations for selected countries in Africa, Asia, and Latin
America
5
4
frequency
3
2
1
0
10–24
25–39
40–54
55–69
70–84
85–99
% urban
Notes: The modal class in Figure 2.13 is 10–24% urban with a frequency of 4.
Histograms are particularly useful for making visual comparisons between two or more sets of data but you
must ensure the scales and class interval are the same!
Frequency Polygon
This is similar to a histogram and uses the same vertical scale of frequency and horizontal scale of
classes. But instead of bars, points are plotted with dots where the mid-point in the class reaches the
appropriate frequency. The points are joined by a straight line. If the points are plotted and joined by a
smooth curve, instead of straight lines, then the result is a ‘frequency curve’. Frequency curves may be
cumulative, in which case the vertical axis has a cumulative frequency, for example a Lorenz curve.
Page 13 of 29
GEOGRAPHY
?
●
AS/A2 LEVEL
●
AQA (1031/2031)
Exercise 2.2 – Skills: ICT skills, graphical skills, histograms, frequency charts
Exercise 2.2
1) Copy and paste the data for Figure 2.6 on page 7 into an Excel worksheet. Create a histogram using ICT to
show the percentage urban population for the selected countries in Europe/N. America.

Label the axes correctly

Adjust the bars to ensure that they touch and there are no gaps between them

Divide the horizontal axis into percentage groups – 0–24, 25–39, 40–54, 55–69, 70–84, 85–99

Give your graph a title
(5 marks)
2) Using the same scale present the same data using ICT in a frequency polygon line graph.
(5 marks)
3) Compare the two graphs you have drawn with the histogram for urban populations in Africa/Asia/South
America (Figure 2.13).
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(5 marks)
4) Using Figure 2.10 and your completed histograms assess the significance of the two methods (dispersion
graph and histogram) as methods of showing dispersion of data and differences between data.
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(5 marks)
(Total marks = 20)
Page 14 of 29
●
GEOGRAPHY
●
AS/A2 LEVEL
AQA (1031/2031)
Standard Deviation ()
This is the measure of dispersion of all values in a data set from the arithmetic mean. It is the most
common method for showing dispersion but involves more detailed calculation than the inter-quartile
range. If you don’t understand what that means, don’t worry – here’s how to work it out!

Technique
1.
Calculate the arithmetic mean ( x ) of all the values
2.
Measure how much each value differs from it by subtracting the mean from each value ( x  x ) values higher (+) and values lower (-)]
3.
Each difference is squared ( x  x )2
4.
Add together all the squared deviations using the formula
(x  x )
2
and this figure is divided by
the number (n) of values. This is the ‘variance’ of distribution
5.
The standard deviation (S) is the square root of the variance:
S
(x  x )
2
n
Note: you do not need to remember formulas at AS level – they will be provided for you.
The standard deviation is the best method for showing the extent to which the values cluster around the
mean value. A low standard deviation, for example, will indicate that the values are clustered around
the mean and there is a small spread of data. A high standard deviation will indicate that the values are
widely spread around the mean and, therefore, dispersion is large.
However, the degree of dispersion will vary with the mean value itself. If two data sets have the same
standard deviation but different means the dispersion will be greater for the lower value.
In a normal distribution which is symmetrical:
 68% of the values will lie less than ±1 standard deviation from the mean
 95% of the values will lie less than ±2 standard deviations from the mean
 99% of the values will lie less than ±3 standard deviations from the mean
Page 15 of 29
●
GEOGRAPHY
?
●
AS/A2 LEVEL
AQA (1031/2031)
Exercise 2.3 – Skills: statistical skills, standard deviation
Exercise 2.3
1) Calculate the standard deviation of the population data shown in Figure 2.14.
Note: The mean is rarely a whole number so you will need to work to a reasonable level of accuracy; here we have
worked to 2 decimal places. To avoid introducing rounding errors, use the memory function on your calculator to
store the mean ( x ).
Figure 2.14 Table for calculating standard deviation
population
xx
Country
(millions) (x)
Egypt
75
-184.69
Nigeria
134
-125.69
Ethiopia
74
Uganda
27
-232.69
Mexico
108
Bangladesh
146
-113.69
India
1121
861.31
Pakistan
165
-94.69
China
1311
Brazil
186
Bolivia
9
-250.69
Chile
16
Puerto Rico
4
-255.69
( x  x )2
34114.39
15797.97
54144.63
12925.41
741854.92
8966.19
62845.47
65377.37
 x = 3376
x = 259.69
(x  x )
2
=
n = number of values
S
(x  x )
n
Standard Deviation =
1SD = ±
2
(closer to 0 = less deviation)
2SD = ±
(6 marks)
2) Comment on the standard deviation figure.
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(4 marks)
(Total marks = 10)
Page 16 of 29
●
GEOGRAPHY
●
AS/A2 LEVEL
AQA (1031/2031)
2.3 Correlation
Correlation is a measure of the relationship between two variables or two sets of data. It implies that
there is an association between the data, for example rainfall and altitude, but not necessarily that one
causes the other.
It is usual in correlation for one of the variables to depend on the other variable – for this reason it is
know as the dependent variable. The variable it depends on is known as the independent variable.
Correlation can also be either positive or negative.
 Which is the dependent variable in the association – rainfall or altitude?
 Would the association be positive or negative?
 Can you think of a negative association?
Correlation can be shown in different ways. It can be shown in graph form by, for example, a
scattergraph, or by a statistical technique such as Spearman’s rank correlation.
Scattergraphs (Scatter Diagrams)

Technique
This is the simplest and most visual technique to show correlation. The values for the two sets of
data are plotted as dots on a graph using a horizontal (x) axis and a vertical (y) axis. The independent
variable is placed on the horizontal axis and the dependent variable on the vertical axis (Figure 2.15).
Figure 2.15 A typical scattergraph with positive correlation
Dependent variable (y)
7
6
5
4
3
2
1
0
0
1
2
3
4
5
Independent variable (x)
6
7
If one of the values increases as the other increases, then it is a positive correlation (Figure 2.15). If all
these points fall in a straight line rising from left to right (Figure 2.16) it is called perfect positive
correlation.
If one value decreases as the other increases then it is a negative correlation. If all the points fall in a
straight line decreasing from left to right (Figure 2.17) it is called a perfect negative correlation.
Page 17 of 29
●
GEOGRAPHY
●
AS/A2 LEVEL
Figure 2.16 Perfect positive correlation
AQA (1031/2031)
Figure 2.17 Perfect negative correlation
(coefficient = +1)
(coefficient = –1)
The perfectly straight correlation lines (at 45°) of Figure 2.16 and Figure 2.17 rarely occur in reality and a
more likely situation is seen in Figure 2.15.
When all the points have been plotted a ‘best fit line’ (trend line) is drawn to show the trend of the data
(Figure 2.18). The closer the points are to the trend line the greater the association between the data.
Points which occur well away from the best fit line are known as residuals or anomalies.
Exam hint:
 You should construct the best fit line so that there are an equal number of points on
either side of the line. For greater accuracy, calculate the mean values of both
variables and lightly mark the point where they intersect – the best fit line should go
through this point.
 The best fit line in geography is a straight line (not a curve) and does not have to go
through zero.
Figure 2.18 A scattergraph with best fit line
Mean point
Best fit line
7
Dependent variable (y)
6
5
4
3
2
1
0
0
1
2
3
4
5
independent variable (x)
Page 18 of 29
6
7
Equal
number of
points (3) on
either side
of best fit
line
●
GEOGRAPHY
AS/A2 LEVEL
●
AQA (1031/2031)
Advantages and Disadvantages of Scattergraphs
Advantages
 Scattergraphs are useful in identifying patterns and trends from the data and a good visual
impression in produced
 You can easily identify any anomalies/variations in the data – such anomalies cannot be identified
with Spearman’s rank correlation
 Easy to construct
Disadvantages
 It is sometimes difficult to insert best fit lines and to see any clear trend from the plotted data
 They are not an accurate measure of the degree of correlation
Spearman’s Rank Correlation Coefficient (rs)
This is a statistical measure of the strength of the relationship between two variables or two sets of data.
The calculated values will lie within the range of +1 to -1.
 A coefficient of + 1 indicates a perfect positive correlation
 A coefficient of –1 indicates a perfect negative correlation
However, it is very rare to find a perfect correlation.

Technique
There are two parts to the correlation:
1. A coefficient is calculated to give the degree of association between two variables (this on its own is
meaningless, as it is just a figure)
2. The coefficient is then tested to determine its significance
1. Calculation of Coefficient
The Spearman’s rank correlation coefficient uses ‘ranked’ data (see Figure 2.28 on page 32) and is carried
out as follows:
a) Place in rank order the two variables (starting with the highest value as rank 1) i.e. 1, 2, 3, 4, 5.
Where the 2 variables are the same, then sum the two ranks and divide equally between them, e.g.
23
 2.5 (for each variable)
2
23 4
 3 (for each variable)
Ranks 2 & 3 & 4 =
3
Ranks 2 & 3 =
b) When this has been completed for both sets of data, subtract the rank for the second set of variables
from the first set to obtain the difference in each rank (d)
2
c) Square the differences (d )
2
d) Total the differences squared (d )
e) Apply the following formula to determine the coefficient. The range of the coefficient will vary
between +1 and -1
Page 19 of 29
●
GEOGRAPHY
Spearman’s rank (rs )  1 
●
AS/A2 LEVEL
AQA (1031/2031)
6 d 2
n3  n
where rs = Spearman’s rank correlation coefficient
d = the difference in ranks of each match pair
 = sum of
n = number of paired values
2. Testing the Significance (or probability of chance)
i) State the hypothesis in negative terms (called the Null Hypothesis or H0), which implies there is an
absence of any relationship/association, as follows:
H0 = there is no relationship between the 2 variables
ii) State the alternative hypothesis (H1):
H1 = there is a negative/positive correlation between the two variables
iii) The Null Hypothesis will either be accepted or rejected by using a graph (Figure 2.19) or table
(Figure 2.20) to determine the amount of chance association between the 2 variables (Note – the
graph is on a log scale)
iv) Plot the point on the graph using the coefficient and the degrees of freedom (n – 2) to find the
significance level (n – 2 means subtract 2 from the number of pairs in the correlation)
v) H0 is accepted or rejected (5% is the rejection level). If rejected state the significance level:
 If between the 5% and 1% line = 5% significance level
 If between the 1% and 0.1% line = 1% significance level
 If above the 0.1% line = 0.1% significance level (highly or 99.9% significant, i.e. only 0.1%
likelihood that it is due to chance)
1.0
0.9
0.8
0.7
0.6
0.5
Likelihood of the correlation
occurring by chance
0.4
0.3
0.1%
1%
Significance Level
Spearman’s Rank Correlation Coefficient
Figure 2.19 Graph for use in interpreting Correlation Coefficient
0.2
5%
0.1
2
4
6
8 10
20
40
60 80
Degrees of freedom (number of pairs of items in sample -2)
Page 20 of 29
Unable to reject H0 as
significance levels above
5%. Hence 5% level of
significance is known as
Rejection Level
GEOGRAPHY
●
AS/A2 LEVEL
●
AQA (1031/2031)
Critical values can also be found using the table in Figure 2.20. The value of rs for any given number of
pairs (n) must be equal to or greater than the value shown to gain the level of significance.
Figure 2.20 Critical Values of rs for Spearman’s rank correlation coefficient
Number of Pairs
(n)
10
12
14
16
Levels of Significance
5%
0.65
0.59
0.54
0.50
1%
0.78
0.72
0.67
0.63
Page 21 of 29
GEOGRAPHY
●
AS/A2 LEVEL
●
AQA (1031/2031)
2.4 Comparative Tests
Chi-squared test (χ2)
The chi-squared (χ2) test is a significance test that examines the difference between a set of collected
data (called observed data) and a theoretical set of data (called expected data). It may also be used to find
whether there is difference between two sets of observed data. So the test could be used to examine the
angularity/size of river bedload in different stages of a river’s course or pedestrian/traffic flows at
different times of the day.
However, before you use this test you should understand that there are certain conditions that need to
be met:
a) The data must be in the form of ‘frequencies’ counted in a number of different categories (percentages
cannot be used). If the data has been obtained by measurement then it must be grouped into different
classes or categories before it can be used. For example, if data was collected on the size of particles
making up the bedload of a river (measured along the long axis), the particle sizes would need to be
grouped into classes such as: below 15mm, 16–30mm, over 30mm.
b) The total number of observed data must be greater than 20 for the test to have any meaning. The
expected frequency in any one cell should not normally be less than 5.
c) Only one set of collected data is needed for this test. If two sets of collected data are used they should
be independent – one must not be dependent on the other.
Let’s examine how this test could be used in practice with regard to hydrological studies examining
pebble roundness.

Technique (Worked Example – with one set of observed values)
Forty pebbles were collected in the field, using random sampling (see section 4.1), in the upper course of
a river in order to investigate the effect of the river’s course on pebble roundness, using a simple
classification based on observation.
1) State the hypotheses you are going to use – firstly state the null hypothesis (H0) and then the
alternative hypothesis (H1). The null hypothesis implies there is no difference between the observed
and expected data. The alternative states there is a difference between the observed and expected
data.
H0: the upper course of a river will have no effect on pebble roundness
H1: the upper course of a river will have an effect on pebble roundness
2) Make a contingency table (Figure 2.21) into which you can insert the observed data (O) and expected
(theoretical) data (E). Each box in the table is called a ‘cell’. The table shows the actual number of
pebbles collected of various shapes (O) and the expected number (E). Of the 40 pebbles examined 14
were angular, 16 sub-angular, 7 were sub-rounded and 3 rounded. In theory you would expect an
even distribution of pebbles of different roundness, so the expected value (E) is the mean value ie.10
for each size (40 divided by 4).
Page 22 of 29
●
GEOGRAPHY
●
AS/A2 LEVEL
AQA (1031/2031)
Note: the contingency table may be composed of any number of columns and rows. However, with more columns
there is a greater likelihood that one or more cells in the expected data will fail to achieve the minimum value of 5.
Figure 2.21 Contingency table for calculating χ2
Angular
Sub angular
Sub rounded
Rounded
Observed(O)
Expected (E)
OE
2
O  E
Ο  Ε
2
Ε
3) Complete the table (Figure 2.22) by calculating the chi-squared as follows:
 In each category subtract the expected data (E) from the observed data (O) to obtain O  E in each
cell.
O-E is then squared  O  E  in each cell and then divided by the expected data (E) to gain
2
Ο  Ε
2
Ε
 The figures obtained in the cells are added together to find the chi-squared value using the
formula:

Ο  Ε
2
Ε
Figure 2.22 Calculation of χ2
Observed(O)
Expected (E)
OE
2
O  E
Ο  Ε
Ο  Ε
Ε
Sub angular
16
10
6
Sub rounded
7
10
-3
Rounded
3
10
-7
16
36
9
49
1.6
3.6
0.9
4.9
2
Ε

Angular
14
10
4
2
 11.0
Page 23 of 29
●
GEOGRAPHY
AS/A2 LEVEL
●
AQA (1031/2031)
4) Interpret the χ2 figure using a ‘Table of critical values’ (Figure 2.23). The values in the table show
levels of probability and degrees of freedom. The correct degrees of freedom are found by subtracting
1 from the number of observations in the set. As there is only one observation for each of the four
shapes of pebbles, the degrees of freedom are 4 – 1 = 3.
Now check this figure against the level of probability (p). p = 5% (0.05) means that only 5 times in 100
the result could be due to chance; p = 0.1% is the highest level of probability and means there is only 1
chance in a 1000 the result could be due to chance. The critical values given must be equalled or
exceeded at the relevant degree of freedom to achieve the given level of probability.
It can be seen here that the chi-squared figure of 11.0 is below the 1% level of probability (which is
11.35) but above the 5% level of 7.82. Therefore, it is significant at 5% level and the H0 hypothesis can
be rejected and the H1 is accepted.
Figure 2.23 Table of critical values for χ2
Degrees of
freedom
1
2
3
4
5
6
7
8
9
10
Levels of probability (p)
5%(0.05)
1%(0.01)
0.1%(0.001)
3.84
5.99
7.82
9.49
11.07
12.59
14.07
15.51
16.92
18.31
6.64
9.21
11.35
13.28
15.09
16.81
18.48
20.09
21.67
23.21
10.83
13.82
16.27
18.47
20.52
22.46
24.32
26.13
27.88
29.59
Worked Example (with 2 sets of observed data)
Let us examine pebble roundness with two sets of data collected in both the upper and middle course of
a river.
1) Again state the hypotheses you are going to use – first the null hypothesis (H0) and then the
alternative hypothesis (H1)
H0: there is no difference in pebble roundness in the upper and middle course of a river
H1: there is a difference in pebble roundness in the upper and middle course of a river
2) Make a contingency table and insert the observed data (O) for both the upper course and middle
course of the river. The table shows the actual number of pebbles collected of various shapes (O).
There were 14 angular pebbles, 16 sub-angular ones, 7 were sub-rounded and 3 rounded in the upper
course. In the middle course, 38 pebbles were measured – 2 angular, 8 sub-angular, 18 sub-rounded
and 10 rounded. Add up the rows and columns and complete the totals (Figure 2.24).
Page 24 of 29
●
GEOGRAPHY
●
AS/A2 LEVEL
AQA (1031/2031)
Figure 2.24 Observed Frequencies (O)
Angular
Sub angular
Sub rounded
Rounded
Total
14
16
7
3
40 (row)
2
8
18
10
38 (row)
13
78
(grand
total)
Upper
course
Middle
course
Total
16
24
25
3) Using the formula below work out the expected frequencies (E) for each cell (Figure 2.25). This is the
expected number of pebbles of different roundness you would expect to find at each location. Round
up to one decimal place.
E = cell row total × cell column total
grand total
Figure 2.25 Expected Frequencies (E)
Angular
Sub angular
Sub rounded
Rounded
Total
Upper
course
40 16
 8.2
78
40  24
 12.3
78
40  25
 12.8
78
40 13
 6.7
78
40 (row)
Middle
course
38 16
 7.8
78
38  24
 11.7
78
38  25
 12.2
78
38 13
 6.3
78
38 (row)
13
78
(grand
total)
Total
16
24
25
4) Work out the χ value for each cell using the formula:
2

Ο  Ε
2
Ε
.
Total all the values in the cells to find the calculated χ2 value.
χ2 value = 20.3
Figure 2.26: χ2 values
Angular
Upper
course
Middle
course
Total
14  8.2 
2
8.2
 2  7.8
 4.1
2
7.8
8.4
 4.3
Sub angular
16  12.3
2
12.3
8  11.7 
11.7
2.3
 1.1
2
 1.2
Sub rounded
 7  12.8
2
 2.6
12.8
18  12.2 
2
12.2
5.4
Page 25 of 29
 2.8
Rounded
 3  6.7 
Total
2
6.7
10  6.3
 2.0
9.8
 2.2
10.5
2
6.3
4.2
20.3
(total)
GEOGRAPHY
●
AS/A2 LEVEL
●
AQA (1031/2031)
5) Interpret the chi-squared value of 20.3 using a ‘Table of critical values’ (Figure 2.23). The values in the
table show levels of probability and degrees of freedom (V). Use the following formula to find the
degrees of freedom:
V  (r  1)  (c  1) where r = number of rows in the contingency table
c = number of columns in the contingency table
V = degrees of freedom
Degrees of freedom (V) = 1 × 3 = 3
It can be seen in Figure 2.23 that, with 3 degrees of freedom, the chi-squared value of 20.3 is above the
0.1% level of probability (which is 16.27). Therefore, it is significant at 0.1% level and the H0
hypothesis can be rejected (at 0.1% there is only 1 chance in a 1000 the result could be due to chance)
and the H1 is accepted. Therefore, there is a difference in pebble roundness in the upper and middle
course of a river.
Mann-Whitney U test
The Mann-Whitney U test can be used if you wish to test for significant differences between two
independent sets of data. It tells us whether there is a significant difference statistically between the
median values of the two sets of data, although it is not necessary to calculate the median values. The
value of the test lies in the simplicity of its calculation. Also it can be used in a wide variety of situations,
even when there are small samples in the data. For example, it could be used to compare food prices in
local shops with supermarkets, to compare numbers of species of vegetation in a sand dune transect, to
compare traffic or pedestrian flows in different locations in the CBD. Its usefulness stems from the
following:
a) The data used can be either at the interval or ordinal level (i.e. an order of magnitude), as long as it
can be arranged in a rank order. For example, it could be used to measure the intensity of colour in a
soil sample, by describing it as light brown, mid-brown, dark brown, black.
b) The data can be counted or calculated (a t-test is used for measured data, such as river velocity) and
the samples can be of different sizes.
c) It can be used for small samples of data (unlike many statistical tests), provided that both samples
have at least one measurement and one of them has at least 5 measurements.
d) The data does not have to come from a population with a normal distribution.

Technique
1) State the null hypothesis (H0) and the alternative hypothesis (H1). The null hypothesis implies
there is no difference between the two sets of data. The alternative states there is a difference
between the two sets of data.
2) Arrange the data in a rank order sequence (lowest to highest), with the identity of each group
retained, usually called sample A and B. If the samples are of different sizes the smaller sample is
usually designated as sample A (nA).
3) Examine the rank order and for each measurement in sample B count how many values in the
sample from A are smaller and record in a table. If measurements in sample A are the same as
sample B this counts as 0.5. This is then called UA.
Page 26 of 29
GEOGRAPHY
●
AS/A2 LEVEL
●
AQA (1031/2031)
4) Repeat the procedure for sample A by counting how many values in sample B are smaller than each
value in sample A. But note this figure (UB) can be found from a formula:
UB = nA × nB – UA where n = number in the sample
The smaller of the two values UA or UB is the value taken as the U value.
5) Refer to the table of critical values of U at the 5% significance level (Figure 2.27). Read the sample
sizes from the top and left hand sides of the table to find the critical value.
6) If the U value is less than or equal to the critical value, the null hypothesis can be rejected, i.e. there is
a significant difference between the two samples at the 5% significance level.
Figure 2.27: Critical values of U at the 5% level
Size of the smallest sample (n1)
Size of the largest sample (n2)
1 2
3
4
5
6
7
8
9 10 11 12 13 14
1 – –
–
–
–
–
–
–
–
–
–
–
–
–
2 – –
–
–
–
–
–
0
0
0
0
1
1
1
3 – –
–
–
0
1
1
2
2
3
3
4
4
5
4 – –
–
0
1
2
3
4
4
5
6
7
8
9
5 – –
0
1
2
3
5
6
7
8
9 11 12 13
6 – –
1
2
3
5
6
8 10 11 13 14 16 17
7 – –
1
3
5
6
8 10 12 14 16 18 20 22
8 – 0
2
4
6
8 10 13 15 17 19 22 24 26
9 – 0
2
4
7 10 12 15 17 20 23 26 28 31
10 – 0
3
5
8 11 14 17 20 23 26 29 33 36
11 – 0
3
6
9 13 16 19 23 26 30 33 37 40
12 – 1
4
7 11 14 18 22 26 29 33 37 41 45
13 – 1
4
8 12 16 20 24 28 33 37 41 45 50
14 – 1
5
9 13 17 22 26 31 36 40 45 50 55
15 – 1
5 10 14 19 24 29 34 39 44 49 54 59
16 – 1
6 11 15 21 26 31 37 42 47 53 59 64
17 – 2
6 11 17 22 28 34 39 45 51 57 63 67
18 – 2
7 12 18 24 30 36 42 48 55 61 67 74
19 – 2
7 13 19 25 32 38 45 52 58 65 72 78
20 – 2
8 13 20 27 34 41 48 55 62 69 76 83
Dashes indicate no decision possible at the stated level of significance
15
–
1
5
10
14
19
24
29
34
39
44
49
54
59
64
70
75
80
85
90
16
–
1
6
11
15
21
26
31
37
42
47
53
59
64
70
75
81
86
92
98
17
–
2
6
11
17
22
28
34
39
45
51
57
63
67
75
81
87
93
99
105
18
–
2
7
12
18
24
30
36
42
48
55
61
67
74
80
86
93
99
106
112
19
–
2
7
13
19
25
32
38
45
52
58
65
72
78
85
92
99
106
113
119
20
–
2
8
13
20
27
34
41
48
55
62
69
76
83
90
98
105
112
119
127
Worked example of Mann-Whitney U test
A study was carried out on a sand dune ecosystem in Lancashire in order to find the differences in
numbers of vegetation species in a succession at 50m and 300m from the shoreline. A 50m tape was laid
down parallel to the shoreline at 50m and 300m inland and points were selected randomly along the
tape, using a table of random numbers. At each randomly selected point a quadrat was laid down and
the number of species of vegetation counted. The numbers of species recorded were as follows:
At 300m inland – 3, 4, 6, 6, 7, 7
At 50m inland – 0, 1, 1, 2, 3, 4, 5
Page 27 of 29
GEOGRAPHY
●
AS/A2 LEVEL
●
AQA (1031/2031)
1) State the hypotheses you are going to use – first the null hypothesis (H0) and then the alternative
hypothesis (H1)
H0 – there is no significant difference in the number of species of vegetation at 50m and 300m inland
from the shoreline.
H1 – there is a significant difference in the number of species of vegetation at 50m and 300m inland
from the shoreline.
2) The data was arranged in a rank order sequence (lowest to highest), with the identity of each group
retained, called samples A and B. The smaller sample at 300m is designated as sample A (nA).
Sample A (300m inland) 3, 4, 6, 6, 7, 7
Sample B (50m inland)
0, 1, 1, 2, 3, 4, 5
3) Examine the rank order and for each measurement in sample B count how many values in sample A
are smaller and record them. If any measurements in sample A are the same as sample B, this counts
as 0.5 (this would apply to the recordings of 3 and 4). The recording of 4 is designated as 1.5 because
it has one measurement in sample A which is smaller and one the same. This sample is then called
UA .
Measurements in Sample B: 0, 1, 1, 2, 3, 4, 5
Number of smaller measurements at Sample A: 0, 0, 0, 0 0.5, 1.5, 2
Therefore, UA = sum of the scores of sample B = 4
4) Repeat the procedure for sample A by counting how many values in sample B are smaller than each
value in sample A.
Recordings at Sample A: 3, 4, 6, 6, 7, 7
Number of smaller measurements at Sample B:
4.5, 5.5, 7, 7, 7, 7
Therefore, UB = sum of the scores of sample A = 38
Note: there is no need to list all the data in this sample because it can be calculated using the following formula:
UB = nA × nB – UA where n = number in the sample
UB = (6 x 7) – 4 = 38
The smaller of the two values UA or UB is the value taken as the U value.
U value is UA = 4
5) Refer to the table of critical values of U at the 5% significance level (Figure 2.27). Read the sample
sizes from the top and left-hand side of the table to find the critical value. If the U value is less than or
equal to the critical value, the null hypothesis can be rejected at the 5% significance level.
The U value of 4 is indeed less than the critical value of 6. Therefore, the null hypothesis can be
rejected at the 5% significance level. We can accept the alternative hypothesis that there is a
significant difference in the number of species of vegetation at 50m and 300m inland from the
shoreline.
Page 28 of 29
GEOGRAPHY
?
●
AS/A2 LEVEL
●
AQA (1031/2031)
Exercise 2.4 – A2 Skills: Statistical skills, Mann Whitney U test
Exercise 2.4
The following data on traffic flow was recorded in the centre and at the edge of the CBD over 5 minute periods at
different times of the day.
Sample
1
2
3
4
5
6
7
8
CBD centre
220
150
162
110
62
85
46
102
CBD edge
143
88
56
97
42
40
63
88
1) State (a) the null hypothesis and (b) the alternative hypothesis.
a) ................................................................................................................................................................................
................................................................................................................................................................................
b)................................................................................................................................................................................
................................................................................................................................................................................
(2 marks)
2) Carry out the Mann –Whitney U test to determine whether there is a significant difference in the results.
Sample
A
Sample
B
Total
UA
UB
(6 marks)
3) Comment on the significance of the difference between the two sets of results.
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(2 marks)
(Total marks = 10)
Page 29 of 29
GEOGRAPHY
●
AS/A2 LEVEL
●
AQA (1031/2031)
2.5 Examination Questions
Examination Assignment 2.1
(Skills: graphical skills, scattergraphs, best fit lines, handling data)
1) What is the meaning of the term infant mortality rate?
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(2 marks)
2) Using the data in Figure 2.28 (page 32) construct a scattergraph (Figure 2.27, next page) to show the
relationship between per capita GDP (gross domestic product) and infant mortality rate.
(5 marks)
Exam hint:
Make sure you label the axes and place the independent and dependent variables on the correct axes.
3) Draw a best fit line to show the trend of the data. Circle any anomalies.
(2 marks)
4) Explain how you drew the best fit line.
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(3 marks)
5) What conclusions can you draw from the completed scattergraph?
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(4 marks)
Page 30 of 29
GEOGRAPHY
●
AS/A2 LEVEL
●
AQA (1031/2031)
Figure 2.27 A scattergraph to show the relationship between per capita GDP and infant mortality rate
6) Describe the advantages of using the scattergraph as a means of analysing data.
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(4 marks)
7) Describe the factors that influence infant mortality rates in countries at different stages of development
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(5 marks)
(Total marks = 25)
Page 31 of 29
●
GEOGRAPHY
●
AS/A2 LEVEL
AQA (1031/2031)
Examination Assignment 2.2
(Skills: statistical skills, Spearman’s rank correlation, handling statistical data, significance levels, drawing
conclusions from results)
1) Complete the Spearman’s rank correlation table (Figure 2.28)
(6 marks)
Figure 2.28 Infant Mortality/Per Capita GDP* (2007)
Country
per capita
GDP (US$)
Canada
Belarus
Sweden
N. Zealand
France
U.K.
India
Spain
Poland
Philippines
Egypt
Romania
Brazil
Russia
38200
10000
36900
27300
33800
35300
27000
33700
16200
3300
5400
10000
9700
14600
Infant
rank
mortality
rank
d
rate
1
4.6
11
-10
6.6
2
2.8
14
-12
6
5.7
9
-3
4
3.4
13
-9
3
5.0
10
-7
7
34.6
1
6
5
4.3
12
-7
- 32
8 -McCabe
7.1Page 32 29/04/2017Poland
7
1
22.1
29.5
24.6
27.6
11.1
d
2
d
100
144
9
81
49
36
49
1
=
*per capita GDP (or Gross Domestic Product) means the average income per person
2) Use the following formula to calculate the Spearman’s rank correlation coefficient (rs) between per capita GDP
and infant mortality.
Spearman’s rank (rs )  1 
6 d 2
n3  n
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(2 marks)
3) State the Null Hypothesis (H0)
...................................................................................................................................................................................
...................................................................................................................................................................................
(1 marks)
Page 32 of 29
GEOGRAPHY
●
AS/A2 LEVEL
●
AQA (1031/2031)
4) Using the correlation graph (Figure 2.28) give the level of significance of your results. Can you accept or reject
the Null Hypothesis (H0)
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(3 marks)
5) What conclusions can be drawn from your results and what are the reasons for them?
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(7 marks)
Exam hint:
Remember that correlation can be either positive or negative. Both are equally valid as long as the
trend is obvious or the coefficient is significant. The coefficient can only vary between +1 and –1 (if your
coefficient is larger that this then go back and check your calculations!).
6) Assess the strengths and weaknesses of the Spearman’s rank correlation test for analysing data.
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
...................................................................................................................................................................................
(6 marks)
(Total marks = 25)
Page 33 of 29