Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
GCSE Geographical skills 3.4.1 Cartographic skills Cartographic skills relating to a variety of maps at different scales. Atlas maps: • use and understand coordinates – latitude and longitude • recognise and describe distributions and patterns of both human and physical features maps based on global and other scales may be used and students may be asked to identify • and describe significant features of the physical and human landscape on them, eg population distribution, population movements, transport networks, settlement layout, relief and drainage • analyse the inter-relationship between physical and human factors on maps and establish • associations between observed patterns on thematic maps. Ordnance Survey maps: • use and interpret OS maps at a range of scales, including 1:50 000 and 1:25 000 and other maps appropriate to the topic • use and understand coordinates – four and six-figure grid references • use and understand scale, distance and direction – measure straight and curved line distances using a variety of scales • use and understand gradient, contour and spot height • numerical and statistical information • use and interpret OS maps at a range of scales, including 1:50 000 and 1:25 000 and other maps appropriate to the topic • use and understand coordinates – four and six-figure grid references • identify major relief features on maps and relate cross-sectional drawings to relief features • draw inferences about the physical and human landscape by interpretation of map evidence, including patterns of relief, drainage, settlement, communication and land-use • interpret cross sections and transects of physical and human landscapes • describe the physical features as they are shown on large scale maps of two of the following landscapes – coastlines, fluvial and glacial landscapes • infer human activity from map evidence, including tourism. Maps in association with photographs: • be able to compare maps • sketch maps: draw, label, understand and interpret • photographs: use and interpret ground, aerial and satellite photographs • describe human and physical landscapes (landforms, natural vegetation, land-use and settlement)and geographical phenomena from photographs • draw sketches from photographs • label and annotate diagrams, maps, graphs, sketches and photographs. 3.4.2 Graphical skills Graphical skills to: • select and construct appropriate graphs and charts to present data, using appropriate scales – line charts, bar charts, pie charts, pictograms, histograms with equal class intervals, divided bar, scattergraphs, and population pyramids • suggest an appropriate form of graphical representation for the data provided • complete a variety of graphs and maps – choropleth, isoline, dot maps, desire lines, proportional symbols and flow lines • use and understand gradient, contour and value on isoline maps • plot information on graphs when axes and scales are provided • interpret and extract information from different types of maps, graphs and charts, including population pyramids, choropleth maps, flow-line maps, dispersion graphs. 3.4.3 Numerical skills Numerical skills to: • demonstrate an understanding of number, area and scales, and the quantitative relationships between units • design fieldwork data collection sheets and collect data with an understanding of accuracy, sample size and procedures, control groups and reliability • understand and correctly use proportion and ratio, magnitude and frequency • draw informed conclusions from numerical data. 3.4.4 Statistical skills Statistical skills to: • use appropriate measures of central tendency, spread and cumulative frequency (median, mean, range, quartiles and inter-quartile range, mode and modal class) • calculate percentage increase or decrease and understand the use of percentiles • describe relationships in bivariate data: sketch trend lines through scatter plots, draw estimated lines of best fit, make predictions, interpolate and extrapolate trends • be able to identify weaknesses in selective statistical presentation of data. 3.4.5 Use of qualitative and quantitative data Use of qualitative and quantitative data from both primary and secondary sources to obtain, illustrate, communicate, interpret, analyse and evaluate geographical information. Examples of types of data: • maps • fieldwork data • geo-spatial data presented in a geographical information system (GIS) framework • satellite imagery • written and digital sources • visual and graphical sources • numerical and statistical information. 3.4.6 Formulate enquiry and argument Students should demonstrate the ability to: • identify questions and sequences of enquiry • write descriptively, analytically and critically • communicate their ideas effectively • develop an extended written argument • draw well-evidenced and informed conclusions about geographical questions and issues. Comparing development using single and composite measures • Know the different ways of ranking the level of development countries • Understand how to the Human Development Index is calculated and correlates with other measures • Be able to produce and interpret a scatter graph showing the correlation between different development measures development 1. The table below shows some of ways of measuring the level of development for five countries. After familiarising yourself with these development measures, your first task is to rank them (the first two HDI rankings have already been completed for you). 2. Briefly describe how the rankings vary for the countries shown. Country UK China Data Rank Data Human Development Index (A composite measure that reflects a country’s economic and social development.) 0.892 2 Gross National Product per capita (A country’s earnings per person in US dollars. Data are adjusted to reflect living costs.) Global Corruption Index (A widely used measure that gives high scores to countries believed to be free of political corruption.) Rank Australia Saudi Arabia Data Rank Data 0.719 0.933 1 0.836 40,000 13,000 46,000 52,000 78 36 80 49 Rank 3. Study the diagram blow which shows how the Human Development Index is calculated. Based on this information and your own knowledge, can you suggest reasons for the HDI scores shown in the table above? Life expectancy The numbers of years that a person can be expected to live from birth. Women generally live longest in any society Income The HDI uses a measure of wealth derived from the Gross National Income of a country (an alternative measure to GDP) HDI Education The HDI uses an education index based on the average number of years of schooling for people in a country The Human Development Index (HDI) is a composite measure of development. Three ‘ingredients’ are processed to produce a number between 0 and 1. In 2014, Norway was ranked in first place (0.944). 4. You next task is to complete the scatter graph below using the data provided in the table. Label each country (Indonesia has already been plotted and labelled for you). Do not draw a line. How life expectancy varies in relation to GDP per capita (2014) 90 Singapore 80 Life expectancy 70 60 50 Sierra Leone 40 30 20 10 0 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 GDP per capita ($US) Country GDP per capita (US dollars) Life expectancy (years) Country GDP per capita (US dollars) Life expectancy (years) Singapore 83,000 83 Mexico 18,000 76 Kuwait 71,000 78 Brazil 16,000 74 Norway 67,000 82 China 13,000 75 USA 55,000 79 Indonesia 10,500 69 Ireland 49,000 81 Nigeria 6,000 54 Canada 44,000 82 Pakistan 5,000 65 France 40,000 82 Kenya 3,000 61 Spain 34,000 82 Sierra Leone 1,500 46 5. In pairs, discuss what the completed scatter graph shows you about the nature of the correlation between GDP per capita and life expectancy. 6. As part of the dicussion, try to answer the following question: why might it be inappropriate to draw a straight bestfit line on your completed scatter graph? 7. Working on your own, now write a brief paragraph describing the relationship shown by the scatter graph. 8. Finally, discuss the following questions in small groups. i. How sure can we be that the data we have used are reliable? ii. Why might it be difficult or impossible to collect development data for some countries or regions? Using proportional flow-line maps to visualise trade patterns and flows • Describe the main characteristics of the pattern of international trade for a country or region • Understand proportional flow-lines and be able to draw them • Be able to calculate a country’s net balance of trade using flow-line maps The table below shows some of the world’s highest value trade flows in natural resources. 1. Your first task is to describe the characteristics of these flows. Remember that this involves more than just producing a list of numbers. You could calculate the range of values as part of your answer, for instance. 2. In addition to trade in natural resources, what other types of trade are there? The four largest bilateral trade flows for agricultural produce (2012) Type of trade Exporting country or region Importing country or region Value (US$ billion) Soybeans United States China 11.6 Soybeans Brazil China 7.9 Soybeans Brazil European Union 5.9 Wine and beer European Union United States 5.0 The four largest bilateral trade flows for metals and ores (2012) Type of trade Exporting country or region Importing country or region Value (US$ billion) Iron ore Australia China 30.5 Copper Chile China 14.9 Gold Switzerland India 14.6 Iron ore Brazil China 13.6 3. Next, draw proportional flow-lines on the world map (below) to show the trade pattern for China. An example has already been drawn for you showing trade in copper between Chile and China. The width of the arrow is directly proportional to the value of the flow (measured in US$). Use an atlas to help you locate the countries shown in the table and label them. 4. Calculate the total value of Chinese imports shown on your completed flow-line map. China Chile Trade flow-lines (width shows value 5. The graph below shows the pattern of trade imports and exports in manufactured goods for China in 2013. A different map projection has been used. Can you identify the location of the world’s major continents? What are the advantages of using this map projection? 6. Using the data shown, calaulate the net balance of trade (the difference between exports and imports) for: (i) trade between China and the USA in 2013 (ii) trade between China and Japan in 2013. 7. What does this tell you about China’s involvement in world trade in 2013? 8. Try to suggest answers for the following questions. (i) Why is a manufacturing nation like China also importing manufactured goods from other places? (ii) What does china’s need to import some manufactured goods tell you about its level of development? Using climate data to construct an argument Suggest how the data shown in can be used to: 1. support the view that the UK’s climate is changing ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… 2. reject the view that the UK’s climate is changing ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… Describing and analysing a complicated data set Describe the variations shown in the table. • • • • For each column, say if it varies a lot or a little. Identify the maximum and minimum value in each case (you might even want to subtract the two to find the range of data is). If the maximum value is unusually high, point this out (see Sudan). Finally, look for patterns horizontally as well as vertically. Is there a country which is highest- or lowestscoring in most categories? ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………… Ordnance Survey map skills Ordnance Survey maps: • use and interpret OS maps at a range of scales, including 1:50 000 and 1:25 000 and other maps appropriate to the topic • use and understand coordinates – four and six-figure grid references • use and understand scale, distance and direction – measure straight and curved line distances using a variety of scales • use and understand gradient, contour and spot height • numerical and statistical information • identify basic landscape features and describe their characteristics from map evidence • identify major relief features on maps and relate cross-sectional drawings to relief features • draw inferences about the physical and human landscape by interpretation of map evidence, including patterns of relief, drainage, settlement, communication and land-use • interpret cross sections and transects of physical and human landscapes • describe the physical features as they are shown on large scale maps of two of the following landscapes – coastlines, fluvial and glacial landscapes • infer human activity from map evidence, including tourism. Testing relationships between two sets of data Spearman’s Rank Correlation 1. Complete the following table as shown - Note – where two numbers or more tie, rank them equally (e.g. 2=) but remember to skip a number when dealing with the next rank down Rank Mean household income from ‘1’ highest to ‘20’ lowest Country 1. Beckton 2. Boleyn 3. Canning Town North 4. Canning Town South 5. Custom House 6. East Ham Central 7. East Ham North 8. East Ham South 9. Forest Gate North 10. Forest Gate South 11. Green Street East 12. Green Street West 13. Little Ilford 14. Manor Park 15. Plaistow North 16. Plaistow South 17. Royal Docks 18. Stratford and New Town 19. Wall End 20. West Ham Median household income (£), 2012-13 34 100 31 630 28 910 32 870 31 840 32 380 32 340 31 430 34 270 34 550 31 570 31 080 29 730 31 580 30 740 31 750 38 580 35 840 32 330 32 310 Rank order of Median household income Rank Level 4 qualifications from ‘1’ highest to ‘20’ lowest % Level 4 qualifications or above (2011) 27.9 28.8 23.9 33.3 24.2 32.6 31.1 24.1 33.0 32.7 30.0 29.8 24.6 30.2 26.8 26.2 42.7 40.6 29.6 29.8 C Subtract the first rank figure away from the second to find the difference ‘d’ Rank order of Level 4 qualifications or above D Square the difference in the previous column (to get rid of +/-) Difference in ranks (d) TOTAL (or ∑) D2 d2 E Add the 2 d values together and write the total here 2. Use the following formula to work out the Spearman’s Rank Index (r) which will tell you what relationship exists between GNP and HDI. Formula for working out Spearman Rank – ‘r’ is the value you are trying to find out r = 1 – (6 x ∑d2) -----------(n3 – n) ‘∑’ is the sum total of all the d2 from the table on the previous page ‘n’ is the number of countries you are dealing with 3. What does your value tell you about the relationship between GNI and HDI? Is this a strong relationship? 4. In a sequence of full sentences, write up what you have found out about the relationship – or correlation – between GNI per capita and HDI as follows: a) b) c) d) e) what were you trying to find out? what data did you use? what did you do to test the relationship between GNI per capita and HDI? How does Spearman’s Rank test a relationship between data differently from a scatter graph? Why? What is the difference between the value that Spearman’s Rank gives you for ♦ a positive, ♦ a negative, and ♦ no (or random) relationships between data? Draw sketches of scatter graphs to show what each of these looks like. f) Describe what you had to do and the stages you had to go through to carry out the Spearman Rank correlation. g) In conclusion, how strong – according to your Spearman’s Rank calculation ‘r’ – is the statistical relationship between GNI per capita and HDI? Suggest reasons for this. Statistical Skills (AS and A2) David Redfern, former AQA Chief Examiner – reproduced with kind permission • measures of central tendency – mean, mode, median • measures of dispersion – interquartile range and standard deviation • Spearman’s rank correlation test • application of significance level in inferential statistical results. • Higher level tests – Gini coefficient, chi-squared, Mann Whitney U Test. MEASURING CENTRAL TENDENCY – MEAN, MODE AND MEDIAN Measuring central tendency is a measure of the ‘middle’ value of the data set. There are 3 ways of measuring central tendency: • • • Mean Mode Median These techniques are very useful to geographers, enabling us to summarise a data set by giving the mid-value or most frequently occurring data. They can also be used as part of more complex techniques such as inter-quartile range Mean Mean formula: x̄ = ∑xError! Bookmark not defined. n The mean (sometimes called the average) is calculated by adding up all the values in a data set and dividing the total sum by the number of values in the data set. The mean is particularly useful if the data has a small range. However if the range is large then the mean will be heavily influenced by the extreme values and could give a distorted picture Mode This is the value that occurs most frequently in a set of data. You need to know all values before calculating the mode. Mode is of no use if there are no repeating values. There may be more than one mode – this is called ‘bi-modal’. Mode is often useful when classifying data eg Power’s index of roundness of pebbles in a river study. It is useful to see which classification occurs most frequently. This is called ‘modal class’. 1 Median This is the middle value in a data set. The data needs to be rank-ordered before you can calculate the median. Median formula If there are an odd number of values perform the following calculation to work out the median value: n+1 2 (n = number of values in the data set) Therefore if you have 23 values in the data set the median will be the 12th value in the rank order. If the number of values is even, the median is the mean of the middle two values. So if there are 24 values add the values for the 12th and 13th positions and divide by 2. The median value often needs to be supported by other techniques such as inter-quartile range (IQR). However, unlike the mean it not affected by extreme values Worked example: The aim of the study is to study coastal processes. As part of this aim, the student has collected data on pebble sizes from two sites along the beach. 15 pebbles were measured, at each site, along the a-axis. The results are recorded in the table below in rank-order: Table 1. Pebble Size (a-axis measured in mm) Rank North end of beach South end of beach 1 58 2 43 3 38 4 33 5 32 6 25 7 24 8 24 9 23 10 19 11 19 12 19 13 14 14 12 15 11 81 76 67 67 67 66 63 60 58 47 38 33 25 8 6 Central Tendency calculations: Mean Mode Median North End 26.3 19 24 South End 50.8 67 60 None of these measures gives an accurate picture of the distribution of data. On their own they are of limited value. However, it can be seen from the above example that some judgments can be made. In all 3 measures the central tendency for pebble size is larger at the southern end of the beach. The average pebble size is much larger at the southern end. 2 The spread of data is quite large, particularly at the southern end and the extreme low values of 8 and 6 are making the mean value lower than the other two values To improve the usefulness of the above calculations, measures of the dispersion or variability of the data should also be calculated Measuring Dispersion – Range and inter-quartile range These techniques are used to measure the spread of data. Range and inter-quartile range allow you to analyse your data in more depth, looking at how spread the data is around the mean or median Range This is simply the difference between the highest value and the lowest value. It gives you a basic idea of the spread of data but like the mean it is affected by extreme values. An anomaly therefore can give a false picture Referring back to Table 1 the range is worked out as follows: Northern end of beach: Highest Value = 58 Lowest Value = 11 Range: 58 – 11= 47 Southern end of beach Highest Value = 81 Lowest Value = 6 Range: 81 - 6 = 75 Therefore we can see that the southern end has a much larger range. However this result is affected by the anomalies of 8 and 6. Inter-quartile Range The inter quartile range is worked out by ranking the data (highest to lowest) and placing the data into quarters or ‘Quartiles’. The top 25% of the data is placed in the Upper Quartile (UQ) and the bottom 25% is placed in the Lower Quartile (LQ). The inter-quartile range or IQR is the difference between 25% and 75% values. 1st Quartile 2nd Quartile 3rd Quartile 4th Quartile The boundary between the 1st and 2nd Quartiles is called the Upper Quartile This is the Inter-quartile Range The boundary between the 3rd and 4th Quartiles is called the Lower Quartile 3 The inter-quartile range is more useful, than the range in indicating the spread of data as it takes away any extreme values (i.e those occurring in the UQ 1st Quartile and LQ 4th Quartile) and considers the spread of the middle 50% of the data around the median or middle value. The IQR has a formula to work it out. Look at the worked example to see how to calculate the IQR Inter-quartile Range Formula Upper Quartile (UQ) = n + 1 th position 4 Lower Quartile (LQ) = n + 1 x 3 th position 4 Inter-quartile Range (IQR) = UQ – LQ Worked example of Inter-quartile Range Refer back to Table 1. The data is already in rank order, ranked from highest to lowest. • Next find the Upper Quartile (UQ) by using the formula. In this case the number in the data set is 15 so n = 15. Therefore using the formula: 15 + 1 4 = 4th So the Upper Quartile for the North End = 33 and for the South End it is 67 • Now calculate the Lower Quartile (LQ): 15 + 1 x 3 4 = 12th The LQ for the North End = 19 and for the South End it is 33 • You are now able to determine the Inter-quartile Range using the formula: UQ – LQ The IQR for the North End is 33 – 19 = 14 The IQR for the South End is 67 – 33 = 34 This shows us that there is now only a small variation around the median values. There is less variation at the northern end perhaps suggesting that there is less variation in pebble size whereas the pebbles are less well sorted at the southern end. However this would need further investigation. In terms of a comparison between the two sites it would suggest that that the pebbles are smaller and more uniform in size, at the northern end 4 Measuring Dispersion – Standard Deviation Standard deviation is a measure of the degree of dispersion. The Inter-quartile range will tell you how clustered the data is around the median value and standard deviation is another method of examining the spread of data but this time around the mean. Formula for Standard Deviation Where: σ = Standard Deviation ∑ = Sum of x̄ = Mean n = Number in the sample Two sets of data could have the same mean but have a very different spread of data. Standard Deviation will tell you the extent of this - in other words how reliable the mean is. A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data is spread out over a large range of values and the mean is less reliable as there is obviously a lot of variation in the sample. Standard deviation is very useful when used to compare two data sets. In other words you can use standard deviation when you want to compare the dispersion of two or more sets of data The standard deviation links the data set to normal distribution. In a normal distribution: • 68% of the values lie within ±1 standard deviation of the mean • 95% of the values lie within ±2 standard deviations of the mean • 99% of the values lie within ±3 standard deviations of the mean Refer back to our data in Table 1. In the example below the calculation for standard deviation has been calculated for the northern end of the beach. 5 Table 2: Standard Deviation calculation for the northern end of beach A Pebble size (mm) 58 43 38 33 32 25 24 24 23 19 19 19 14 12 11 ∑x = 394 B x - x̄ 31.7 16.7 11.7 6.7 5.7 -1.3 -2.3 -2.3 -3.3 -7.3 -7.3 -7.3 -12.3 -14.3 -15.3 x̄ = 26.3 C (x - x̄ )2 1004.89 278.89 136.89 44.89 32.49 1.69 5.29 5.29 10.89 53.29 53.29 53.29 151.29 204.49 234.09 ∑(x-x̄ )2 =2270.95 ∑(x-x̄ )2 = 2270.95 n 15 = 12.30 Standard Deviation for the northern end = 12.30 How do you use the standard deviation? In the example above the mean is 26.3. This means that in a normal distribution graph 68% of the data should lie between 14.00mm and 38.6mm. These values are calculated by subtracting and adding 12.3 from / to 26.3. Remember the lower the standard deviation score, the more clustered the results are around the mean and the more reliable the mean is. For this figure to be of any use, we now need to compare it to the standard deviation for the southern end of the beach. Repeat the above exercise on the results from the southern end. Standard Deviation for the southern end = 22.88 This means that 68% of the data should lie between 27.9mm and 73.7mm. Standard deviation suggests that there is more clustering around the mean at the northern end of the beach as the figure is smaller. The mean is therefore more reliable at the northern end. This is also supported by the other measures of central tendency which have indicated that there is less dispersion at the northern end of the beach. 6 Testing Correlation – Using Spearman’s Rank Correlation Test Spearman’s rank is another way of testing a relationship. For example if you draw a scattergraph you can see by eye if there is a relationship, but you will probably not be able to clearly assess the strength of the relationship as many points may be some distance from the line of best-fit. Spearman’s rank correlation is used to test the strength of the relationship between two sets of data, providing you with a numerical value. This is an example of objective data. Once you have this figure you can then test its significance – this means the likelihood of your results occurring by chance. When can you use Spearman’s Rank? The test can be used with any set of raw data or percentages but it is only suitable if all the following criteria apply: • Two data sets which you believe may or may not be related e.g. Hydraulic Radius and Velocity • At least 10 pairs of data should be used • No more than 30 pairs (as this makes the exercise unwieldy) A worked example is shown below. Once you have completed the table and have your answer at the end of the calculation you should have a figure between -1 and +1. This indicates the strength and type of relationship: • The closer it is to +1 indicates a positive relationship (i.e as one set of data increases so does the other). • The closer it is to -1 indicates a negative relationship (i.e as one set of data increases the other decreases) • If the result is close to 0 it means there is no relationship and you would accept the null hypothesis. Candidates will not be expected to learn the formula for Spearman’s Rank or any other statistical test. Strengths: • It gives objective data • It enables you to demonstrate a clear relationship between two sets of data • You can state whether the relationship is significant or if your results were just a fluke • It is less sensitive to anomalies in data as each piece of data is ranked – large differences could only be one rank different Weaknesses: • It is does not tell you whether there is a causal link (i.e that one change leads to a change in another) – just that a relationship exists • Too many ‘tied ranks’ can affect the validity of the test • It could be subject to human error eg inaccurate calculations 7 Case Study: Using Spearman’s Rank Correlation Test Investigating plant succession on a shingle ridge A student has collected data on the changing abiotic factors across a shingle ridge. They want to complete a statistical test to help prove their hypothesis: ‘Plant height increases as soil depth increases’. As there are 2 variables (plant height and soil depth) and they have 12 pairs of data they have decided to use Spearman’s rank correlation test. Spearman’s Rank should always begin with the assumption that there is no relationship – this is called the NULL HYPOTHESIS. Always begin your test by writing out the null hypothesis as well as your chosen hypothesis. Null hypothesis = There is no relationship between plant height and soil depth Table 3 – raw data Site No. Soil Depth (cm) Plant Height (cm) 1 2 3 4 5 6 7 8 9 10 11 12 0.0 3.2 3.6 1.9 10.1 15.2 20.2 23.8 32.0 32.0 34.1 37.4 4.0 1.5 6.0 11.5 22.0 65.0 92.0 103.0 129.0 187.4 156.6 189.3 Table 4: Worked example of Spearman’s rank Site number Soil depth (A) 1 2 3 4 5 6 7 8 9 10 11 12 0.0 3.2 3.6 1.9 10.1 15.2 20.2 23.8 32.0 32.0 34.1 37.4 Plant height Rank A (B) 4.0 1.5 6.0 11.5 22.0 65.0 92.0 103.0 129.0 187.4 156.6 189.3 12 10 9 11 8 7 6 5 3.5 3.5 2 1 Rank B 11 12 10 9 8 7 6 5 4 2 3 1 Rank Difference Difference Squared 1 -2 1 2 0 0 0 0 -0.5 1.5 -1 0 2 ΣD = 1 4 1 4 0 0 0 0 0.25 2.25 1 0 13.5 6ΣD2 = 81 You are now able to use the Spearman’s rank correlation test formula: 8 Rs = 1 - 6 x ∑d2 n3 – n The final calculation is: Rs = 1 – 0.047 Therefore Rs = 0.953 So what does your result of 0.953 mean? Place your result on a line like the one below: -1 0 Perfect NEGATIVE relationship NO relationship +1 Perfect POSITIVE relationship You now need to do another test to check whether your result could have occurred by chance – this means ‘how significant is your result’. This means you have to compare your result with a table of critical values (Table 5). First look at the number of pairs of data you have – in this case there is 12. You then need to decide which significance level you are going to use. For geographical purposes, you would usually use the 0.05 significance level. This means that there is a 5 in 100 chance of the results occurring by chance. Or to put it another way if other researchers completed the same experiment, 95 out of 100 would get the same result – therefore there is a relationship. You then need to see whether your Rs result is above the critical value for the number of pairs you have. If your R s value is below the critical value you must accept the null hypothesis – i.e you cannot be sure that your relationship is significant. Table 5: Critical Values for Rs n 10 12 14 16 18 20 22 24 26 28 30 0.05 (95%) significance level +/- 0.564 0.506 0.456 0.425 0.399 0.377 0.359 0.343 0.329 0.317 0.306 0.01 (99%) Significance level +/- 0.746 0.712 0.645 0.601 0.564 0.534 0.508 0.485 0.465 0.448 0.432 We can see that in our example the Rs value of 0.951 is well above the 0.05 significance level of +0.506. In this case it is also well above the 0.1 significance level of +0.712. This means that there is a very low (1 in 100) chance of the results occurring by chance and we can reject the null hypothesis The Rs value was 0.951 and is above the critical value of 0.712 at the 0.01 significance level. I can therefore reject my null hypothesis and accept that there is a strong relationship between plant height and soil depth on the shingle ridge and it is highly significant. 9 The Gini coefficient The most widely used summary measure of inequality in the distribution of household income is the Gini coefficient. The lower its value, the more equally household income is distributed. For example, the bottom 5% of households might only have a 1% share of total household income. The bottom 10% of households might have a 3% share; the bottom 20% might have an 8% share, and so on. The Gini coefficient is a measure of the overall extent to which these groupings of households, from the bottom of the income distribution upwards, receive less than an equal share of income. How is it calculated? The idea described above is expressed more formally by the Lorenz curve of the household income distribution, from which the Gini coefficient can be calculated. Based on a ranking of households in order of ascending income, the Lorenz curve is a plot of the cumulative share of household income against the cumulative share of households. The curve will lie somewhere between two extremes: • • complete equality, where income is shared equally among all households, results in a Lorenz curve represented by a straight line. complete inequality, where only one household has all the income and the rest have none, is represented by a Lorenz curve which comprises the horizontal axis and the left-hand vertical axis. The Gini coefficient is the area between the Lorenz curve of the income distribution and the diagonal line of complete equality, expressed as a proportion of the triangular area between the curves of complete equality and inequality. Complete equality would result in a Gini coefficient of zero, and complete inequality, a Gini coefficient of 100. A Lorenz curve illustrates the degree of unevenness in a geographical distribution. It is drawn on graph paper and makes use of cumulative percentage data. The vertical axis carries the cumulative data, and points are plotted in the order of the largest, which is then added to the second largest, then to the third largest, and so on. The horizontal axis simply records the cumulative process. The plots are then connected by a line. If another line is drawn onto the graph to represent an even distribution, then the degree of unevenness can be seen. The greater the deviation the plotted line has from the line of even distribution, the greater the degree of unevenness. A highly concave Lorenz curve represents a high level of unevenness, and therefore high level of concentration. 10 Assume a country has 10 regions A to J: Region % of population in region A 10.5 B 1.6 C 12.2 D 1.8 E 35.3 F 6.7 G 2.1 H 25.3 I 3.5 J 1 Draw the Lorenz curve. How is the Gini coefficient used? A global interpretation of the Gini coefficient is: • • • • • • Low inequality: under 0.299 [e.g. Belarus and Hungary] Relatively low inequality: 0.3 – 3.99 [e.g. China and Poland] Relatively high inequality: 0.4 – 0.449 [e.g. Russia and Malaysia] High inequality: 0.45 – 0.499 [e.g. Venezuela and Mexico] Very high inequality: 0.5 – 0.599 [e.g. Nigeria and Kenya] Extremely high inequality: over 6.0 [all of the countries in this group are in southern Africa] A city interpretation using the Gini coefficient has also been produced by the UN. In it Johannesburg has one of the highest figures in the world. In 2010, the Gini coefficient for the city was calculated at 0.62. 11 The Mann Whitney U-Test The Mann Whitney U Test is a technique that tests to see if there is a difference between the medians of two sets of data. It is non-parametric. This means it assumes that the data is not normally distributed. However it does assume that there is similar dispersion of both sets of data. When can you use the Mann Whitney U Test? The test can be used if you want to investigate the differences between two sets of similar data. Some conditions apply: • You can only compare two sets of data • The data must be ordinal (it can be ranked in order - from lowest to highest) • You need a minimum of 5 values in each data set • It is not advisable to use more than 20 values in each data set as the exercise becomes unwieldy The Mann-Whitney U test starts with a null hypothesis: There is no significant difference in the medians of the two sets of data Once you have calculated the value of U. You then have to compare it to the critical values. If the value of U is less than or equal to the critical value then you can reject the null hypothesis and accept that there is a difference in the two sets of data. Strengths: • You can use two data sets that have different sizes eg one data set could have 10 values and the other only 8 • You can state whether the relationship is significant or if your results occurred by chance • You can see clearly whether there is a difference in the median of two sets of data Weaknesses: • It is a lengthy calculation and prone to human error • It does not explain why the difference in the two data sets occurs Worked example of Mann-Whitney U Test In this example a student is investigating the economic deprivation across the city of Plymouth. As part of his investigation he has used the National Statistics website (www.neighbourhood.statistics.gov.uk) to obtain secondary data on indices of deprivation. From his investigation he has formulated the hypothesis: There is a greater income deprivation score for inner city areas than outer suburbs. He has found out the income deprivation score for 8 super output areas (small census unit areas created by the National Statistics Office) in each of his two study areas – one in the inner city (St Peter and The Waterfront), and the other in the outer suburbs (Plymstock). The table below shows his results. The income deprivation score measures the % of people who are income-deprived. The higher the score, the more income-deprived the area is 12 Income Deprivation Score Inner City (St Peter & The Outer-Suburb (Plymstock) Waterfront) 0.53 0.04 0.40 0.09 0.24 0.06 0.25 0.12 0.08 0.05 0.12 0.08 0.16 0.10 0.20 0.13 The student decides to conduct the Mann-Whitney U Test to see is there a difference between the two areas. First he sets out the null hypothesis: There is no difference in the income deprivation score between the inner-city and the outer-suburb A Inner City scores (x) B Rank (r x ) 0.53 0.40 0.24 0.25 0.08 0.12 0.16 0.20 16 15 13 14 4.5 8.5 11 12 Total = 94 C Outer-Suburb Scores (y) 0.04 0.09 0.06 0.12 0.05 0.08 0.10 0.13 D Rank (r y) 1 6 3 8.5 2 4.5 7 10 Total = 42 Now calculate the U values for both samples, using the formulas below: U x = n x x n y + n x (n x + 1) - ∑r x 2 U y = n x x n y + n y (n y + 1) - ∑r y 2 Where n = number in the sample. In this example it is 8 for both x and y. The calculations for this example are thus: U x = (8 x 8) + 8 x 9 – 94 = 6 2 U y = (8 x 8) + 8 x 9 – 42 = 58 2 NB. The result for U x + U y should equal n x x n y . 13 You now need to select the smaller figure of 6 and compare it to the table of critical values to test the significance of the result. If the value is lower than or equal to the critical value you reject the Null hypothesis. Sample size: nx 8 ny 8 Critical Value at 0.05% Significance Level 13 A table of critical values for U can be found at: http://math.usask.ca/~laverty/S245/Tables/wmw.pdf Now write a summary statement to express your result for the Mann-Whitney U Test. In this case: The lower value of U is 6. This is less than the critical value of 13 at the 0.05% significance level. Therefore I can reject the null hypothesis and accept that there is a difference between the income deprivation scores for the inner-city area of St Peter & the Waterfront and the outer-suburb of Plymstock. It is extremely clear just from looking at the values that there is difference in the income deprivation scores. The point of doing the Mann-Whitney U test is that it allows us to be certain about accepting or rejecting the null hypothesis, The Chi-Squared Test The chi-squared test (also referred to as the x2 test) is used to investigate spatial distributions. It looks at frequencies or the distribution of data that you can put into categories eg pebble shapes at different sites along a rivers course or frequencies of plant types at different stages of a succession. Chi-squared is a comparative test as it compares actual data collected against a theoretical random distribution of the data. The data collected is called the observed data. The theoretical, random distribution is called the expected data. What is needed to use the chi-squared test? • The data needs to be organized into categories • The data cannot be in the form of percentages and must be displayed as frequencies • The total amount of observed data must exceed 20 • The expected data for each category needs to exceed 4 As with other statistical tests, chi-squared requires and tests a null hypothesis. The null hypothesis is: There is no significant difference between the observed distribution and the expected distribution The strengths of Chi-Squared lie in the fact that as with other statistical tests, you are checking the significance of your results. As with other statistical tests the weaknesses 14 include human error in calculating x2. It also doesn’t explain why there is or isn’t a pattern to the distribution. This will need further investigation. Worked example of Chi-squared test A group of students investigated the orientation of pebbles in an exposed bed of glacial till. The glacial till was situated near the lip of a corrie in the Lake District. The students wanted to investigate whether there was a pattern to the orientation of the long-axis of the till. Their hypothesis was: There is a relationship between the orientation of the glacial till and the direction of the glacier. They measured the orientation of 40 pebbles and placed their results into 4 categories: O – 45o = 2 pebbles 46 – 90o = 10 pebbles 91 – 135o = 23 pebbles 136 – 180o = 5 pebbles The data suggests that there is a preferential direction but as this is maybe due to chance a chi-squared test is carried out. The test begins with the assumption that there is no preference for any direction with the null hypothesis: There is no significant difference between the observed orientation of pebbles and the expected random orientation. Next the students created a ‘contingency table’ shown below: Orientation 0 – 45o 46 – 90o 91 – 135o 136 – 180o Observed Expected (O) (E) 2 10 10 10 23 10 5 10 A O-E B (O – E)2 -8 0 13 -5 64 0 169 25 C (O – E)2 E 6.4 0 16.9 2.5 X2 = 25.8 Chi-squared value (x2) = 25.8 The result by itself is meaningless. You now need to test its significance. Work out the degrees of freedom using the formula (n – 1), where n is the number of observations in this case the number of categories which contained observed data. Therefore for this example n = 4, so the degrees of freedom are 4 – 1 = 3 Using the table below, compare your x2 result with the degrees of freedom for the 95% significance level. If the x2 result is the same or greater than the value given in the table, then the null hypothesis can be rejected. 15 Critical Values of chi-squared Degrees of Freedom 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Significance level 0.05% 0.01% 3.84 5.99 7.82 9.49 11.07 12.59 14.07 15.51 16.92 18.31 19.68 21.03 22.36 23.68 25.00 6.64 9.21 11.34 13.28 15.09 16.81 18.48 20.09 21.67 23.21 24.72 26.22 27.69 29.14 30.58 A summary statement: At 3 degrees of freedom the x2 result of 25.8 is above the 0.01% critical value of 11.34. Therefore we can reject the null hypothesis and accept that the orientation of the till did not occur by chance and is not randomly orientated. AQA exam answers 1. Mann Whitney U calculation given (a) Interpret the result of the MW U test. (4 marks) One mark per valid point The results from the calculation show that 8 (or U2) is less than the critical value of 27 at the 0.05 significance level. This means we can reject the null hypothesis with more than 95% confidence that the result is not due to chance. We can therefore accept the hypothesis H1 that species diversity will be higher in a managed area than in an unmanaged area. (4) (b) Suggest why the MWU is suitable to interpret this set of data. (4 marks) One mark per valid point A Mann-Whitney U test is suitable to interpret this set of data because the data is collected from the same location – sand dunes in south west England but is sampled from two different populations, one managed and one unmanaged. So as geographers we assume there will be a difference between the two populations so we use the Mann Whitney U Test to interpret this data. Also there are ten samples from each area so there are enough samples to perform the test making it an obvious choice to interpret the data. The data 16 shows there may be a significant difference and the Mann Whitney U Test can confirm this. The data is non-parametric which also makes the test suitable. (4) 2. Study Figure 1 which shows the details of a statistical investigation into the relative nature of earthquakes in two countries, Japan and Italy. Comment on the outcomes of this investigation. (7 marks) Mark scheme Level 1 (1-4 marks) (mid point 3) Simple statements arising from the data, such as simple comparative points, with basic or general statements of the reasons for/type of activity that would be associated with the plate boundaries. Level 2 (5-7 marks) (mid point 6) Some sophistication of comment, showing a clear understanding of the outcomes of for example the mean values, ranges and the Mann Whitney U test. Clear interpretation and/or description/ reasoning, together with evidence of geographical thinking. The Mann Whitney investigation suggests that there is a statistically significant difference between the magnitude of Japan’s and Italy’s earthquakes as Japan’s outcome is 3.5 which is below the critical value of 23. As Japan’s mean measurement of earthquake is 7.0 on the Richter scale whilst that of Italy is 5.8. This suggests that Japan suffers from significantly more powerful earthquakes. As the two countries are both MEDCs at similar levels of development, this may mean that the effects of the earthquakes are also more severe in Japan. Italy however has a greater range (1.9) between its highest and lowest earthquake values than Japan (1.3) meaning that it has a greater degree of variation in power of earthquake. This may make it more difficult to plan accurately as the effects of different strength earthquakes may vary. Japan has also seen the same amount of earthquakes in 5 years that Italy has in over 30 years suggesting that Japan is more seismically active and this combined with significantly greater magnitude suggests that Japan requires a greater level of planning and preparation in order to manage the hazards presented. (7) 3. Chi-squared exercise. Figure 1 shows a map of wards in Leicester. Figure 2a shows census data (2001) of the population of four of these wards by ethnic group. For the purposes of a chi squared test, the data became the observed frequency (O). This is the observed population by ethnicity (O) for the selected wards, and Figure 2b shows the calculated expected frequencies (E) for these wards in a contingency table. For the chi-squared test, the null hypothesis (H 0 ) to be tested is: ‘There is no difference in the distribution of population in different ethnic groups within selected wards in Leicester’. The alternative hypothesis (H 1 ) is: ‘There is a difference in the distribution of population in different ethnic groups within selected wards in Leicester’. 17 A chi-squared test was applied to these data. The chi-squared result was calculated as 17284.66 and was significant at both the 95% and 99% significance levels. 5 (a) With reference to the outcome of the chi-squared test and Figures 1, 2a and 2b, comment on the distribution of ethnic groups in the selected wards of Leicester. (12 marks) Mark scheme. Level 1 (1 – 5 marks) (mid point 3 marks) There is a basic description of the figures and what they show. One figure may be covered more strongly than others, so there will be some imbalance. There will be little comment on the distributions and the variations shown. Hence, there may be a concentration on the Figures 1 and 2a, with less on Figure 2b and the chi-squared test result or vice-versa. There will be little or no reference to data. Level 2 (6 – 10 marks) (mid point 8 marks) There will be a clear summary of the figures, with an attempt at comment on the distribution of population shown. There may still be some imbalance between the coverage of the figures and the chi-squared result. There may be greater knowledge shown on some figures than others, but a full range is not necessary in this band. Reference will be made to the data. Level 3 (11 – 12 marks) (mid point 12 marks) There will be a detailed summary, including balanced reference to all figures and the chisquared result. Comment will be full on the information shown about the distribution of ethnic populations and/or the chi-squared figure. There will be a detailed understanding of the chi-squared result. There will be detailed reference to the data provided. Thinking like a geographer. By looking at the census data from 2001 we can see that in total there are more White people than any other ethnicity at 26871. The number of Asian people is also high at 24942. However the total population of Black and Other ethnicities is much lower at 2948 and 1988. The chi-squared test has statistically proved that the ethnicities difference is not the same between wards. For example in the city centre there is the highest frequency of White ethnicity. But in the inner city there are very few at 3739. However the Asian population has increased from 1786 in the CBD to the highest frequency in the data at 15402. Here also the frequency of Black ethnicity is at the highest at this point too. It is however at the same level as the expected frequency whilst Asian is not above this. It was expected to be around 9339. White ethnicity here is much lower though at 3739 when it was expected to be 10062. None of the expected frequencies are close to the actual. This is why the chi-squared test showed that there is a definite link between ethnicities in different areas. So the 18995 number which you get must be higher than the 95% and 99% significance levels. It shows that the ethnic minority Black and Other ethnicities dominate frequencies in the Spinney Hills. This is similar to the expected value. The two data sets Asian and White population decide the outcome of the chi-squared test due to the differences. (11) 18 Sample assessment exercises 1. A group of students was planning a study of a stream. The students’ hypothesis stated that the velocity of streams increases with distance from the source. They collected data from 9 sites along the course of the stream. To do this they timed how long it took for a float to travel a distance of 10 metres. They measured the velocity three times at each site and worked out an average for each site. The results are shown in the table below. Site 1 2 3 4 5 6 7 8 9 Distance from source (km) 1.2 1.7 2.2 2.8 3.5 4.0 4.7 5.6 6.7 Average time for float to travel 10m (secs) 18.8 15.2 14.4 15.2 13.0 13.6 12.0 6.2 9.2 (a) The students decided to present their figures on a scatter graph. (i) Draw the scatter graph on the graph paper (5 marks) Mark scheme: 1 mark for each correctly labelled axis; 3 marks for correctly plotted points. (ii) Add a trend line to your graph. (1 mark) (iii) Explain how you decided where to put the trend line. (4 marks) (b) (i) Name a statistical technique that the students could use to help them analyse their data and test the hypothesis. Describe how they would carry out the technique. (7 marks) (ii) Discuss the strengths and weaknesses of the technique described in (b) (i) for testing the students’ hypothesis. (5 marks) (c) Write a report on the results of this investigation. You should refer to: • the extent to which the information provided support the students’ hypothesis • other information that would help them prove or disprove the hypothesis. (8 marks) 19 2. The table below shows the figures for total annual rainfall for two recording stations over a 15 year period. Year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 TOTAL Annual rainfall total for a station in SE England (mm) 781 618 563 586 545 884 600 793 580 699 842 737 824 530 823 10405 Annual rainfall total for a station in N Nigeria (mm) 490 379 832 1037 594 855 376 350 479 761 688 1143 356 635 986 9961 (a) Calculate the mean annual rainfall at each station, to two decimal places. • South East England • North Nigeria (2 marks) (b) i. Complete a dispersion diagram for each of South East England and North Nigeria. (4 marks) ii. Identify the upper quartile and lower quartile for each set of rainfall figures by drawing lines on the dispersion diagram. (4 marks) (iii) Calculate the inter-quartile range for each recording station. • South East England • North Nigeria (2 marks) (c) The standard deviation for South East England is 120.09. The standard deviation for North Nigeria is 254.97. What do you understand by the term standard deviation? (3 marks) (d) Using all of the information you have been given, compare the variability of the annual rainfall at these two stations. Comment on the possible effects that this variability could have on the people in the two areas. (10 marks) 20 3. A group of students undertook a fieldwork enquiry into the proposal to build a new hospital on the outskirts of their town. The existing hospital is more central to the town. The group decided to investigate the possible impacts of the development. The table below shows some data obtained by the students about the location of the employees of the existing hospital in 2006. Ward 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Total Total residents of ward employed at hospital in 2006 42 16 14 162 106 95 201 184 85 109 273 67 57 61 12 11 1495 Number of employees/km2 of ward in 2006 5.1 0.5 0.5 15.8 6.4 21.8 77.1 18.3 15 9 53.5 14.4 11.1 5.1 1.5 0.6 (a) Present the data for the number of employees/km2 of each ward in 2006 in the form of a choropleth map using four classes: (i) Give the class boundaries you would use. (2 marks) (ii) Complete a choropleth map using the classes you gave in (i). (6 marks) (b) The group of students then obtained data showing the gender balance of the workforce: Gender Total % Male 238 Female 1257 (i) Calculate the % for each gender. (1 mark) (ii) The students decided to draw a proportional pie chart to represent the data. Draw this pie chart using the formula r = √ A/π Where A = the area of the circle, in which 1mm2 = 1 worker; R = the radius of the circle. (4 marks) 21 4. Statistical exercises using earthquake data. (a) Italian earthquakes that caused fatalities of > 10 (1930 – 2012) Date of earthquake Location 2012 2009 2002 1990 1980 1976 1971 1968 1962 1930 Medolla L’Aquila Molise Sicily Irpinia Friuli Lazio Sicily Irpinia Irpinia Moment Magnitude Scale 5.8 6.3 5.9 5.7 6.9 6.5 4.9 6.4 6.2 6.7 Deaths 20 308 30 17 2914 989 31 370 17 1400 (i) Investigate the relationship between the strength of the earthquake and the number of deaths. 1. Complete a scatter graph of the data including drawing a line of best fit. 2. Carry out a Spearman’s Rank correlation exercise using a copy of the table below. Location Medolla L’Aquila Molise Sicily Irpinia Friuli Lazio Sicily Irpinia Irpinia Moment Magnitude Scale 5.8 6.3 5.9 5.7 6.9 6.5 4.9 6.4 6.2 6.7 Rank (Scale) Deaths Rank (Deaths) 20 308 30 17 2914 989 31 370 17 1400 Calculate the Spearman’s Rank correlation coefficient using the formula: d d2 Σ d2 = Rs = (ii) Now test the significance of your result using the following table: 22 (iii) In the light of these two completed tasks, what do you now conclude? (b) Repeat for: Japanese earthquakes that caused fatalities of >10 (1946 – 2012) Date of earthquake 2011 (Tohoku) 2008 2007 2004 1995 (Kobe) 1978 1968 1964 1948 1946 Date of earthquake 2011 2008 2007 2004 1995 (Kobe) 1978 1968 1964 1948 1946 Moment Magnitude Scale 9.0 6.9 6.6 6.9 6.8 7.7 8.2 7.6 7.1 8.1 Moment Magnitude Scale 9.0 6.9 6.6 6.9 6.8 7.7 8.2 7.6 7.1 8.1 Rank (Scale) Deaths 15885 12 11 40 6434 28 52 26 3769 1362 Deaths 15885 12 11 40 6434 28 52 26 3769 1362 Rank (Deaths) d d2 Σ d2 = 23 (c). Using the Mann Whitney U test: An investigation into the variability in the size of earthquakes at two different plate margins: Italy and Japan. (a) Analyse the data below using the Mann Whitney U test. Null hypothesis: there is no difference between the magnitude of earthquakes in Japan and Italy Italy x Rank (r x ) 5.8 6.3 5.9 5.7 6.9 6.5 4.9 6.4 6.2 6.7 No in sample (N X ) = 10 Total of rank scores (Σr x ) = Japan y 9.0 6.9 6.6 6.9 6.8 7.7 8.2 7.6 7.1 8.1 No in sample (N y ) = 10 Total of rank scores (Σr y ) = Rank (r y ) Now calculate the U values for both samples, using the formulas below: U x = n x x n y + n x (n x + 1) - ∑r x 2 U y = n x x n y + n y (n y + 1) - ∑r y 2 Where n = number in the sample. In this example it is 10 for both x and y. The critical value for U at the 95% confidence level for two sample of ten within the Mann Whitney U test is 23. (b) Comment on the outcomes of your completed test. 24 5. A river based exercise Students measured a range of variables along the river Eea at a total of 10 sites along its course between its source and mouth. They wanted to test whether the Bradshaw model could be applied to this river. The Spearman’s rank correlation test can be used to help determine whether or not the River Eea conforms to the Bradshaw model. According to this model there should be a direct relationship between channel width and depth. (i) State the expected and null hypotheses that might be tested for these channel variables (ii) Use the table to calculate the Spearman’s Rank Correlation coefficient. Start by ranking the width values from highest (1) to lowest (10). Then rank the average depth column in the same way. Site no. Width (metres) 1 2 3 4 5 6 7 8 9 10 0.97 1.30 2.01 3.06 3.50 2.55 6.23 6.00 6.42 7.25 Rank Average depth (metres) 8.14 6.00 8.12 17.10 16.50 14.68 8.30 9.40 24.70 20.91 Rank d d² Σd²= Calculation of Rs: Critical values for Rs at the 5% and 1% significance levels. Rejection Level n 0.05 0.02 0.01 5 1 1 6 0.886 0.943 1 7 0.786 0.893 0.929 8 0.738 0.833 0.881 9 0.683 0.783 0.833 10 0.648 0.746 0.794 12 0.591 0.712 0.777 25 6. A Chi Squared Test When studying the River Eea students measured the roundness of pebbles at Site 2 (upstream) and at Site 8 (downstream) using Powers’ scale of roundness. Their results are shown in the table below. Null hypothesis: There is no difference in the shape of pebbles between an upstream and downstream location on the River Eea, they are distributed randomly. Row number (R) R1 R2 R3 R4 Column number (K) Angular Subangular Sub-round Round O Upstream 22 E K1 Downstream O E 7 15 2 9 4 Σk1=50 10 31 Σk2=50 ΣR 29 K2 17 19 35 n=100 R = Row Number K = Column Number O = Observed frequency of pebbles in each category i) Calculate the expected frequency (E) in each cell using the formula: E = ΣR ΣK n E = sum of row (ΣR) multiplied by the sum of its column (ΣK) divided by the sum of all observed frequencies (n). (ii) Calculate the Chi² value using the formula: (iii) Now check this result with the following significance tables. Firstly, calculate the degrees of freedom, (R-1) x (K-1). Then read from the Critical values table to see if the null hypothesis can be rejected. Your Chi² value must be greater than that of the critical value to be significant. Critical Value: 0.05 Degrees of Freedom 1 2 3 4 5 Value 3.84 5.99 7.82 9.49 11.10 Critical Value: 0.01 Degrees of Freedom 1 2 3 4 5 Value 6.63 9.21 11.30 13.30 15.10 (iv) What does the result tell us about the two samples of pebbles taken from the River Eea? 26 7. A Mann-Whitney U test • • The long axis of pebbles for 15 samples was measured at the upstream (1) and downstream site (10) on the River Eea in Cumbria. Null hypothesis: There is no significant difference between pebble sizes at the upstream and downstream site on the River Eea. i) Use the table format below to set out the ranks in order (lowest value first). Remember that the total sample is ranked together and not as individual columns as in the Spearman’s Rank correlation. Upstream x Rank (r x ) 15 8 22 32 16 18.5 34 32 19.5 13.5 28 10.5 13 24.5 45 No in sample (N X ) =15 Total of rank scores (Σr x ) = Downstream y Rank (r y ) 4 8 10 6 19 14 6 13.5 7 5 12.5 12 8.5 6 13 No in sample (N Y ) =15 Total of rank scores (Σr y ) = (ii) Calculate the ‘U’ values for both the upstream site (x) and the downstream site (y) using the MW formula. Ux = N x N y + N x (N x + 1) - Σr x 2 Uy = N x N y + N y (N y + 1) - Σr y 2 (iii) In order to test the result you must now test for significance. Take the smaller U value that is calculated and consult the significance table to decide whether or not you can reject the null hypothesis. 27 8. Calculation of standard deviation (Stornoway annual precipitation) Year 1 Precipitation 877 2 1082 3 1203 4 963 5 1241 6 1194 7 1072 8 900 9 1146 10 1094 11 1098 12 1318 13 791 14 1035 15 1151 x - x̄ (x - x̄ ) 2 2 ∑x = ∑(x - x̄ ) = x̄ = ∑(x - x̄ ) /n = 2 Standard deviation σ = Summarising statistics for annual precipitation (mm) Statistic Mean precipitation Salina Cruz (Mexico) 1063.07 Stornoway Median precipitation Standard deviation 911 476.85 1094 Upper quartile 1188 Lower quartile 701 Interquartile range 487 28 9. Calculation of Spearman’s Rank Correlation Coefficient between infiltration rate and gradient Area Infiltration rate (millilitres per second) 1 Rank of slope gradient (steepest ranked 1) 15 2 17 12 3 14 21 4 16 18 5 8 9 6 10 4 7 18 31 8 9 63 1 9 6 3 10 12 5 11 2 38 12 3 38 13 4 11 14 13 125 15 5 33 16 11 2 17 1 167 18 7 83 Rank of infiltration rate (highest ranked 1) Difference in rank d 2 ∑d2= Spearman’s Rank correlation coefficient (R s ) = 1 – 6 ∑d2 n3 – n = Summarising Spearman’s Rank correlation coefficients between infiltration and selected variables Statistic % soil moisture Correlation Coefficient (R s ) with infiltration rate -0.84 % ground vegetation cover 0.74 Gradient Infiltration rate = the amount of water in millilitres (ml) passing into the ground in 1 second. % ground vegetation cover is the % of the ground surface covered by vegetation less than 1m in height When n=18, R s values exceeding ± 0.40 are significant at the 0.05 (5%) level and R s values exceeding ± 0.56 are significant at the 0.01 (1%) level. 10. Calculation of Spearman’s Rank Correlation Coefficient between particle size and gradient on a scree slope. Study site Angle of Rank of slope Mean median Rank of mean Difference in 29 slope (x) gradient (steepest ranked 1) axis (cm) (y) 29 21 2 29 16.7 3 30 21 4 32 16.1 5 31 15.4 6 31 16.9 7 31 13.1 8 30 15.3 9 35 10.5 10 44 11.5 11 40 8.7 12 40 7.9 13 46 6.1 14 48 4.2 15 45 4.7 median axis (largest ranked 1) Spearman’s Rank correlation coefficient (R s ) = 1 – 6 ∑d2 n3 – n rank d ∑d2= = When n=15, R s values exceeding ± 0.44 are significant at the 0.05 (5%) level and R s values exceeding ± 0.62 are significant at the 0.01 (1%) level. You could also complete a scatter graph using this data. 30 11. A river in northern England – discharge in cumecs (1991-2009) Year 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Mean Minimum Maximum Upper quartile Lower quartile IQR Standard deviation January 56.68 70.96 86.42 77.86 24.45 69.94 33.26 81.19 34.51 88.81 49.33 31.69 79.70 79.46 93.93 21.16 10.00 29.98 99.70 April 15.23 6.35 25.85 7.50 45.05 24.68 27.11 17.29 31.16 13.95 34.81 31.58 42.05 50.62 12.78 16.78 10.00 17.58 32.60 July 14.04 6.15 6.55 1.88 40.87 6.03 31.89 42.80 3.60 18.01 10.48 6.16 21.00 5.16 7.17 9.56 3.60 31.00 32.05 October 72.61 42.59 72.93 67.91 30.57 45.62 70.23 47.27 42.68 50.07 33.12 27.56 11.98 31.17 25.22 41.18 10.20 80.60 46.65 January 58.9 10 99.70 81.19 31.69 49.50 27.57 April 24.37 6.35 50.62 32.60 13.95 18.65 12.59 July 15.68 1.88 42.80 31 6.03 24.97 13.10 October 44.75 10.20 80.60 67.91 30.57 37.34 19.96 These data have been collated as part of the process of setting up flood control schemes. With the aid of all of the data contained above, write a brief report (approximately 300 words) to summarise the variations in discharge and to assess the reliability of mean discharge values as a basis for river management. 31 12. Figure 1. Urban population data – variations in social and economic conditions in an urban area in England Indicator Ward 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 A % households with NCWP born head Rank B % unskilled workers Rank C % unemployed Rank 1.8 3.6 5.7 3.4 50.2 3.9 11 10.2 7.5 12.3 29.9 3.4 12 2.9 1.5 3.7 1.2 6.3 3 7 10 5 18 9 14 13 12 16 17 5 15 4 2 8 1 11 5.5 6.6 6 2.2 15.6 8.4 7.3 8.5 8.9 11.6 12.5 5.6 8.1 1.8 7.2 4.3 4.2 5.1 6 9 8 2 18 13 11 14 15 16 17 7 12 1 10 4 3 5 9.4 15.8 11.7 8.9 23 18.9 13.9 15.1 14.2 17.5 22 12.4 15.4 6.2 16.7 12.3 11.6 9.4 3 13 6 2 18 16 9 12 10 15 17 8 11 1 14 7 5 3 D % households with >1 person per room 2.3 5.8 4 1.3 15.8 8.2 8 6.7 5.7 7.8 10 3.6 6.4 1.2 4.7 3.4 3 4.6 Rank E % households without a car Rank Total score Social and economic index 3 11 7 2 18 16 15 13 10 14 17 6 12 1 9 5 4 8 31 42 35 29 62 47 41 48 49 54 68 40 50 16 47 39 32 34 3 10 6 2 17 11 9 13 14 16 18 8 15 1 11 7 4 5 18 50 37 13 89 65 58 65 61 77 86 34 65 8 46 31 17 32 Indicator A: Percentage of households where the head of the household was born in the New Commonwealth and Pakistan (NCWP) Exercises. 1. Draw a dispersion diagram/graph to display the range of values for the Total score for the Social and economic index. 2. Using your completed graph divide the data into four classes. 3. Using the four classes identified in Question 2, complete Figure 2 to produce a choropleth map to show the variation in social and economic conditions in this urban area as indicated by the Social and economic index. 4. Select two sets of paired indicators from the data and perform two correlation tasks – one by scatter graph and the other by Spearman’s rank to ascertain the degree of relationship between the two sets of paired data. 32 33 13. Worcester population data Ward Arboretum Battenhall Bedwardine Cathedral Claines Gorse Hill Nunnery Rainbow Hill St Clement St John St Peter Parish St Stephen Warndon Warndon Parish N Warndon Parish S Total ethnic minorities % of total population No. of people 18-30 % of total population 587 313 323 1138 280 312 366 294 212 293 249 10.46 6 4.1 15.26 3.56 5.65 4.57 5.02 3.86 3.65 4.43 1195 698 1072 1797 940 872 1199 995 1343 1638 998 234 203 258 4.64 3.84 4.94 329 6.3 Total population 21.29 13.39 13.61 23.71 11.94 15.79 14.97 17.02 24.45 20.39 17.75 Distance of centre of ward from city centre (km) 0.8 1.9 2.4 0.5 2.8 1.9 1 1.1 2.1 2.4 2.9 791 1061 1135 15.67 20.04 21.71 1.5 2.1 3.2 5048 5294 5229 959 18.35 2.5 5225 5612 5214 7876 7458 7875 5523 8011 5845 5493 8033 5622 34 Wave Frequency data The data show wave frequency data at Hornsea, East Riding, 17-18 January 2016 A 17 January midnight – noon 00:00 00:30 01:00 01:30 02:00 02:30 03:00 03:30 04:00 04:30 05:00 05:30 06:00 06:30 07:00 07:30 08:00 08:30 09:00 09:30 10:00 10:30 11:00 11:30 12:00 Waves per minute 9.1 9.0 8.7 8.6 8.8 8.2 8.2 8.6 8.5 8.8 8.8 9.2 9.7 9.5 9.5 9.8 10.0 10.2 10.2 10.5 10.9 10.5 10.3 10.0 10.0 B 17 Jan 13.30 pm – 18 Jan 01.30 am 13:30 14:00 14:30 15:00 15:30 16:00 16:30 17:00 17:30 18:00 18:30 19:00 19:30 20:00 20:30 21:00 21:30 22:00 22:30 23:00 23:30 00:00 00:30 01:00 01:30 Waves per minute 14.0 14.3 14.3 15.4 16.2 16.2 15.8 15.8 16.2 15.8 16.2 15.8 15.0 15.0 15.0 14.6 14.0 14.6 14.0 15.0 14.6 15.0 15.8 15.0 15.8 a Using the data, create a dispersion diagram for wave frequency for each of Columns A and B, and then calculate the range for each set of data. b Calculate the mean, median and mode for wave frequency in each of Columns A and B. c Using the dispersion diagrams, calculate the quartiles for each set of data. d Compare the wave data for each period. e Explain the likely impact of the waves in each period on Hornsea’s beaches during 17-18 January 2016. Water and carbon skills Storm Precipitation Prior conditions Peak discharge Storm Average amount (mm) intensity (mm per hr) Maximum intensity (mm per hr) (litres per second) (mm) A 11.8 3.05 10.16 57 1034 B 10.7 2.54 3.56 70 694 C 30.5 3.05 4.06 9 1019 D 16.0 1.52 3.56 79 665 The table shows discharge data for a small river basin associated with four storm events, A to D, occurring at different times of the year. (a) Explain the variations in peak discharge for each of the four storm events. [8 marks] ............................................................................................................................................................................................ ............................................................................................................................................................................................ ............................................................................................................................................................................................ ............................................................................................................................................................................................ ............................................................................................................................................................................................ ............................................................................................................................................................................................ (b) How is time-lag likely to be affected by the intensity of a storm event? [4 marks] ............................................................................................................................................................................................ ............................................................................................................................................................................................ ............................................................................................................................................................................................ (c) State and comment on three factors affecting run-off volume in river basins. [6 marks] ............................................................................................................................................................................................ ............................................................................................................................................................................................ ............................................................................................................................................................................................ ............................................................................................................................................................................................ ............................................................................................................................................................................................ Mark scheme