Download Visualisation of UK census and housing market

Visualisation of UK Census interaction and housing market interaction data Crispin Cooper Department of City and Regional Planning, King Edward VII Avenue, Cardiff University Tel. +44 (0)29 208 74022 [email protected] KEYWORDS: visualisation, census, interaction, house prices, migration 1. Introduction This paper presents two examples of the application of a technique useful for the visualisation of interaction matrices, allowing rapid, broad comprehension of the contents of complex datasets. The examples presented are UK Census migration data from the year 2000-2001, and inter-regional house price cross-correlations derived from Land Registry data, both at Local Authority level. The traditional method of displaying such data would be with a flow map (Bertin 1984, page 350) although more modern techniques have also been used to calculate flow density (Rae 2009). In the first case, however, it is necessary to threshold the data (removing smaller flows of migrants) to ensure readability of the output, and both approaches miss patterns in the smaller flows of migrants which may be of interest to researchers. By way of contrast, the pixel matrix plots presented here use data with a log function applied, which actually emphasizes smaller flows instead of trying to remove them from the plot. Pixel matrix plots lie in the tradition of Exploratory Data Analysis as advocated by Tukey (1977). While Bertin (1984) recommends displaying interactions in a matrix, and reordering matrix rows and columns to achieve greater readability, his methods of diagonalization and triangulation would produce different orderings based on the data values in different sets, all the while discarding spatial information. Instead, the approach taken is similar to Marble (1997) albeit with a more sophisticated technique for ordering matrix rows and columns (Guo & Gahegan 2006, Guo 2007), and the inclusion of pixel methods (Keim 1996, 2001). Alternative approaches to visualisation of census flow data are discussed in Openshaw (1995), Kwan (2000) and Yan (2009), the latter of which uses a self organizing map to classify different types of interaction. 2. Methodology An example of pixelation is shown in Figure 1. Cells of an interaction matrix representing flows between five locations, a-e, are shaded according to their values. It is then possible, without discarding information, to reduce the size of the graphic until each data point is represented by only one pixel. The problem with this approach alone is that when a large number of origins and destinations are used, it is not easy to comprehend the plot because the X and Y axes don't represent anything real. It is better if a more intuitive ordering of the axes can be derived. The ordering aimed for is one in which places which are physically close together in 2-d space, are close together on the ordering; and vice versa. Thus, in the sense of Bertin (1984) the matrix itself becomes a map: a graphic where “the elements of a geographic component are arranged on a plane in the manner of their observed geographic order on the surface of the Earth” (page 285). Figure 1. Pixelation applied to a simple interaction data set. The method chosen to achieve the desired ordering is that proposed by Guo and Gahegan (2006), in which a variety of different algorithms are investigated for their ability to fulfil the criteria described above. The best of the algorithms investigated, CLO-OPT, was first developed in the field of bioinformatics (BarJoseph, 2003). In particular it outperforms the space filling curve techniques used by Marble (1997). The algorithm works by first hierarchically clustering all locations in geographical space, and then re-ordering the cluster tree to find the shortest path that visits all points. This has the effect that (i) urban areas formed by dense clusters of points tend to be kept together, and (ii) shortest path calculation is computationally feasible – O(n3), rather than O(n!) as is the general case without such a constraint. A related re-ordering method is that of Wood (2009), which approximates a map whereby each originating region is itself replaced by a miniature map of the entire data set, showing the destinations of the flows originating from that region. A key difference from that method is symmetry: while Wood’s method encourages contemplation of flows as properties either of their origins or destinations, the visualisations presented here treat both symmetrically, thus emphasizing the structure of the interaction matrix itself (albeit at the cost of a less intuitive pixel ordering). Thus, each method will have a tendency to emphasize different patterns in the data. Interactive software has been developed to assist reading of the plots presented here; an example screenshot is shown in Figure 2. This also illustrates the ordering of UK Local Authorities chosen by the algorithm. Figure 2. Screenshot of interactive software developed to assist exploration of pixel matrix plots. The map to the right hand side displays the linear ordering of points in 2-d space, with the current origin and destination highlighted on the map by red and green circles and on the matrix by lines of the corresponding colour. The data displayed are commuting flows from the 2001 Census. 3. Visualisation of intra-UK migration flows Figure 3 shows the resulting visualisation of intra-UK migration. The ordering presented is the same as that shown in Figure 2. Several patterns in this plot are worthy of discussion.      The grouping of the lightest pixels towards the diagonal of the image shows that the vast majority of migrations take place on a local basis. The fact that the remainder of the plot appears to consist mainly of vertical and horizontal lines, shows that for non-local migrations, distance is not a deciding factor; rather it is the inherent repulsiveness of origins and attractiveness of destinations that determine migration flows. The four yellow squares of feature a represent London, for which a lifecycle migration pattern is visible. Thick orange lines, extending horizontally outwards from region a, indicate a flow of younger people from all over the country migrating into the capital. Fainter green lines extending vertically out of region a indicate the middle-aged leaving London; while the strong blue patch below (marked b) shows an older population migrating from London to East Anglia and surrounding areas. The yellow colour of London itself shows inter-London migration to be dominated by 16-40 year olds. The City of London is clearly visible as a black cross centred in the lower right yellow square. This is because it has little residential population and therefore very few migrations to and from the City occur. Urban polycentricity is visible in London. This is as defined by Hall (2001) who notes that rather than having a single centre which exceeds all other parts of the region in its provision  of products and services, London is “now the centre of a system of some 30-40 centres”. The pixel matrix plot visualisation arguably shows that Greater London is polycentric in terms of migration movements, because no discernible internal structure is visible within the yellow squares of region a. This is in contrast to other areas in the UK, for example region c which represents South Wales, with the bright internal cross shape representing Cardiff – which is clearly a monocentric keystone of interaction for the region. The feature marked f represents Northern Ireland, easily identifiable because of its strong internal structure but having with little interaction with the rest of the UK. Flow volume Age 16-25 (mix) Age 25-40 (mix) Age 40+ All ages 0 ~1-50 ~50-1000 Figure 3. Pixel visualisation of UK Local Authority migration flows for the years 2000-2001, with logarithmic scaling. The features labelled a-f in white are discussed in Section 3. 4. Visualisation of UK housing market cross-correlations Figure 4 shows a similar plot for cross-correlations in the housing market. Cross correlation for an origin A and destination B is defined as the coefficient of correlation between house price increases at A, and price increases at B over the following 200 days. The pattern exhibited is remarkably different to the pattern of migration shown in Figure 3. Two key features of the structure of the housing market as presented in this plot are:   Large blocks of red along the diagonal axis, indicating large areas with strongly correlated house price time series; Numerous horizontal and vertical lines of similar-coloured pixels, indicating the existence of certain places which tend to drive the market more strongly than others, or are driven by the market more strongly than others. Correlation Greater than average Less than average Colour Figure 4. Pixel visualisation of England & Wales house price time series cross-correlations, 20002006. 5. Conclusions It should be noted that the data features seen in these visualisations should not be taken as complete deductions, but as hypotheses for further (more rigorous) deductive testing. In the vein of Exploratory Data Analysis, the primary value of these plots is in hypothesis generation. It should also be noted that the patterns seen are heavily dependent on the order of pixels in the matrix; therefore the absence of a visible pattern should not imply the absence of a feature in the data. A key limitation of the technique is that any linearization of 2-d space will necessarily not be a perfect representation of that space. Thus, some features seen in the plot are artefacts of linearization rather than real features of the data. The features marked d and e in Figure 3 are an example of this. For comparison, Figure 5 shows a visualisation of inter-LA distances for England and Wales. It can be seen that the majority of points separated by <50km are close to the diagonal; however, a few problematic regions exist. The primary justification of the plots presented here is that the author has found them useful in the analysis of large data sets, in the development of models of the data and in the debugging of related software. While such displays of information require a certain amount of practice to read effectively, this effort enables quick viewing of more patterns in the data than would be discernable by most existing techniques. . Figure 5. Visualisation of England/Wales inter-LA distance, illustrating the quantity and nature of linearization artefacts endemic to the technique. 6. Acknowledgements The author would like to thank the Landmark Information Group ltd for providing the house price data used in this paper. References Bar-Joseph et al (2003), K-ary clustering with optimal leaf ordering for gene expression data, Bioinformatics 19(9), 1070-1078’. Bertin, J. (1984), Semiology of Graphics. Guo, D. (2007), Visual analytics of spatial interaction patterns for pandemic decision support, International Journal of Geographical Information Science 21(8). Guo, D. & Gahegan, M. (2006), Spatial ordering and encoding for geographic data mining and visualisation, Journal of Intelligent Information Systems 27, 243-266. Hall, P. (2001), Christaller for a global age: Redrawing the urban hierarchy, in Stadt und Region: Dynamik von Lebenswelten, Tagungsbericht und wissenschaftliche Abhandlungen, 53. Deutscher Geographentag Leipzig. URL: http://www.lboro.ac.uk/gawc/rb/rb59.html Keim, D. A. (1996), Pixel-oriented database visualizations, SIGMOD record 25(4). Keim, D., Hao, M. C., Ladisch, J., Hsu, M. & Dayal, U. (2001), Pixel bar charts: a new technique for visualizing large multi-attribute data sets without aggregation, in Proceedings of the IEEE Symposium on Information Visualization 2001 (INFOVIS'01). Marble, D., Guo, Z., Liu, L. & Saunders, J. (1997), Recent advances in the exploratory analysis of interregional flows in space and time, in Z. Kemp, ed.,Innovations in GIS 4, Taylor and Francis. Openshaw, S., ed. (1995), Census Users's Handbook. Rae, A. (2009), From spatial interaction data to spatial interaction information? geovisualisation and spatial structures of migration from the 2001 census, Computers, Environment and Urban Systems. doi:10.1016/j.compenvurbsys.2009.01.007. Tukey, J. W. (1977), Exploratory Data Analysis, Addison Wesley. Wood, J., Dykes, J., Slingsby, A. and Radburn, R. (2009), Flow trees for exploring spatial trajectories, in Fairbairn, D., Ed., Proceedings of the GIS Research UK 17th Annual Conference, pp. 229-234, University of Durham, Durham, UK. Yan, J. & Thill, J.-C. (2009), Visual data mining in spatial interaction analysis with self-organizing maps, Environment and Planning B 36. Biography Crispin Cooper began his research career in the Intelligent Systems group at the University of York. He is now completing his thesis on advanced numerical analysis of census and house price data at the department of City and Regional Planning, Cardiff. His first degree is in Computer Science (Cambridge, 2002).

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Visualisation of UK census and housing market