Download Visualisation of UK census and housing market

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Principal component analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Visualisation of UK Census interaction
and housing market interaction data
Crispin Cooper
Department of City and Regional Planning, King Edward VII Avenue, Cardiff University
Tel. +44 (0)29 208 74022
[email protected]
KEYWORDS: visualisation, census, interaction, house prices, migration
1. Introduction
This paper presents two examples of the application of a technique useful for the visualisation of
interaction matrices, allowing rapid, broad comprehension of the contents of complex datasets. The
examples presented are UK Census migration data from the year 2000-2001, and inter-regional house
price cross-correlations derived from Land Registry data, both at Local Authority level.
The traditional method of displaying such data would be with a flow map (Bertin 1984, page 350)
although more modern techniques have also been used to calculate flow density (Rae 2009). In the
first case, however, it is necessary to threshold the data (removing smaller flows of migrants) to
ensure readability of the output, and both approaches miss patterns in the smaller flows of migrants
which may be of interest to researchers.
By way of contrast, the pixel matrix plots presented here use data with a log function applied, which
actually emphasizes smaller flows instead of trying to remove them from the plot. Pixel matrix plots
lie in the tradition of Exploratory Data Analysis as advocated by Tukey (1977). While Bertin (1984)
recommends displaying interactions in a matrix, and reordering matrix rows and columns to achieve
greater readability, his methods of diagonalization and triangulation would produce different
orderings based on the data values in different sets, all the while discarding spatial information.
Instead, the approach taken is similar to Marble (1997) albeit with a more sophisticated technique for
ordering matrix rows and columns (Guo & Gahegan 2006, Guo 2007), and the inclusion of pixel
methods (Keim 1996, 2001).
Alternative approaches to visualisation of census flow data are discussed in Openshaw (1995), Kwan
(2000) and Yan (2009), the latter of which uses a self organizing map to classify different types of
interaction.
2. Methodology
An example of pixelation is shown in Figure 1. Cells of an interaction matrix representing flows
between five locations, a-e, are shaded according to their values. It is then possible, without
discarding information, to reduce the size of the graphic until each data point is represented by only
one pixel.
The problem with this approach alone is that when a large number of origins and destinations are
used, it is not easy to comprehend the plot because the X and Y axes don't represent anything real. It
is better if a more intuitive ordering of the axes can be derived. The ordering aimed for is one in
which places which are physically close together in 2-d space, are close together on the ordering; and
vice versa. Thus, in the sense of Bertin (1984) the matrix itself becomes a map: a graphic where “the
elements of a geographic component are arranged on a plane in the manner of their observed
geographic order on the surface of the Earth” (page 285).
Figure 1. Pixelation applied to a simple interaction data set.
The method chosen to achieve the desired ordering is that proposed by Guo and Gahegan (2006), in
which a variety of different algorithms are investigated for their ability to fulfil the criteria described
above. The best of the algorithms investigated, CLO-OPT, was first developed in the field of
bioinformatics (BarJoseph, 2003). In particular it outperforms the space filling curve techniques used
by Marble (1997). The algorithm works by first hierarchically clustering all locations in geographical
space, and then re-ordering the cluster tree to find the shortest path that visits all points. This has the
effect that (i) urban areas formed by dense clusters of points tend to be kept together, and (ii) shortest
path calculation is computationally feasible – O(n3), rather than O(n!) as is the general case without
such a constraint.
A related re-ordering method is that of Wood (2009), which approximates a map whereby each
originating region is itself replaced by a miniature map of the entire data set, showing the destinations
of the flows originating from that region. A key difference from that method is symmetry: while
Wood’s method encourages contemplation of flows as properties either of their origins or
destinations, the visualisations presented here treat both symmetrically, thus emphasizing the
structure of the interaction matrix itself (albeit at the cost of a less intuitive pixel ordering). Thus,
each method will have a tendency to emphasize different patterns in the data.
Interactive software has been developed to assist reading of the plots presented here; an example
screenshot is shown in Figure 2. This also illustrates the ordering of UK Local Authorities chosen by
the algorithm.
Figure 2. Screenshot of interactive software developed to assist exploration of pixel matrix plots.
The map to the right hand side displays the linear ordering of points in 2-d space, with the current
origin and destination highlighted on the map by red and green circles and on the matrix by lines of
the corresponding colour. The data displayed are commuting flows from the 2001 Census.
3. Visualisation of intra-UK migration flows
Figure 3 shows the resulting visualisation of intra-UK migration. The ordering presented is the same
as that shown in Figure 2. Several patterns in this plot are worthy of discussion.





The grouping of the lightest pixels towards the diagonal of the image shows that the vast
majority of migrations take place on a local basis.
The fact that the remainder of the plot appears to consist mainly of vertical and horizontal
lines, shows that for non-local migrations, distance is not a deciding factor; rather it is the
inherent repulsiveness of origins and attractiveness of destinations that determine migration
flows.
The four yellow squares of feature a represent London, for which a lifecycle migration
pattern is visible. Thick orange lines, extending horizontally outwards from region a,
indicate a flow of younger people from all over the country migrating into the capital. Fainter
green lines extending vertically out of region a indicate the middle-aged leaving London;
while the strong blue patch below (marked b) shows an older population migrating from
London to East Anglia and surrounding areas. The yellow colour of London itself shows
inter-London migration to be dominated by 16-40 year olds.
The City of London is clearly visible as a black cross centred in the lower right yellow
square. This is because it has little residential population and therefore very few migrations
to and from the City occur.
Urban polycentricity is visible in London. This is as defined by Hall (2001) who notes that
rather than having a single centre which exceeds all other parts of the region in its provision

of products and services, London is “now the centre of a system of some 30-40 centres”. The
pixel matrix plot visualisation arguably shows that Greater London is polycentric in terms of
migration movements, because no discernible internal structure is visible within the yellow
squares of region a. This is in contrast to other areas in the UK, for example region c which
represents South Wales, with the bright internal cross shape representing Cardiff – which is
clearly a monocentric keystone of interaction for the region.
The feature marked f represents Northern Ireland, easily identifiable because of its strong
internal structure but having with little interaction with the rest of the UK.
Flow volume Age 16-25 (mix) Age 25-40 (mix) Age 40+ All ages
0
~1-50
~50-1000
Figure 3. Pixel visualisation of UK Local Authority migration flows for the years 2000-2001, with
logarithmic scaling. The features labelled a-f in white are discussed in Section 3.
4. Visualisation of UK housing market cross-correlations
Figure 4 shows a similar plot for cross-correlations in the housing market. Cross correlation for an
origin A and destination B is defined as the coefficient of correlation between house price increases at
A, and price increases at B over the following 200 days. The pattern exhibited is remarkably different
to the pattern of migration shown in Figure 3. Two key features of the structure of the housing
market as presented in this plot are:


Large blocks of red along the diagonal axis, indicating large areas with strongly correlated
house price time series;
Numerous horizontal and vertical lines of similar-coloured pixels, indicating the existence of
certain places which tend to drive the market more strongly than others, or are driven by the
market more strongly than others.
Correlation
Greater than average
Less than average
Colour
Figure 4. Pixel visualisation of England & Wales house price time series cross-correlations, 20002006.
5. Conclusions
It should be noted that the data features seen in these visualisations should not be taken as complete
deductions, but as hypotheses for further (more rigorous) deductive testing. In the vein of
Exploratory Data Analysis, the primary value of these plots is in hypothesis generation. It should also
be noted that the patterns seen are heavily dependent on the order of pixels in the matrix; therefore the
absence of a visible pattern should not imply the absence of a feature in the data.
A key limitation of the technique is that any linearization of 2-d space will necessarily not be a
perfect representation of that space. Thus, some features seen in the plot are artefacts of linearization
rather than real features of the data. The features marked d and e in Figure 3 are an example of this.
For comparison, Figure 5 shows a visualisation of inter-LA distances for England and Wales. It can
be seen that the majority of points separated by <50km are close to the diagonal; however, a few
problematic regions exist.
The primary justification of the plots presented here is that the author has found them useful in the
analysis of large data sets, in the development of models of the data and in the debugging of related
software. While such displays of information require a certain amount of practice to read effectively,
this effort enables quick viewing of more patterns in the data than would be discernable by most
existing techniques.
.
Figure 5. Visualisation of England/Wales inter-LA distance, illustrating the quantity and
nature of linearization artefacts endemic to the technique.
6. Acknowledgements
The author would like to thank the Landmark Information Group ltd for providing the house price
data used in this paper.
References
Bar-Joseph et al (2003), K-ary clustering with optimal leaf ordering for gene expression data,
Bioinformatics 19(9), 1070-1078’.
Bertin, J. (1984), Semiology of Graphics.
Guo, D. (2007), Visual analytics of spatial interaction patterns for pandemic decision support,
International Journal of Geographical Information Science 21(8).
Guo, D. & Gahegan, M. (2006), Spatial ordering and encoding for geographic data mining and
visualisation, Journal of Intelligent Information Systems 27, 243-266.
Hall, P. (2001), Christaller for a global age: Redrawing the urban hierarchy, in Stadt und Region:
Dynamik von Lebenswelten, Tagungsbericht und wissenschaftliche Abhandlungen, 53. Deutscher
Geographentag Leipzig. URL: http://www.lboro.ac.uk/gawc/rb/rb59.html
Keim, D. A. (1996), Pixel-oriented database visualizations, SIGMOD record 25(4).
Keim, D., Hao, M. C., Ladisch, J., Hsu, M. & Dayal, U. (2001), Pixel bar charts: a new technique for
visualizing large multi-attribute data sets without aggregation, in Proceedings of the IEEE Symposium
on Information Visualization 2001 (INFOVIS'01).
Marble, D., Guo, Z., Liu, L. & Saunders, J. (1997), Recent advances in the exploratory analysis of
interregional flows in space and time, in Z. Kemp, ed.,Innovations in GIS 4, Taylor and Francis.
Openshaw, S., ed. (1995), Census Users's Handbook.
Rae, A. (2009), From spatial interaction data to spatial interaction information? geovisualisation and
spatial structures of migration from the 2001 census, Computers, Environment and Urban Systems.
doi:10.1016/j.compenvurbsys.2009.01.007.
Tukey, J. W. (1977), Exploratory Data Analysis, Addison Wesley.
Wood, J., Dykes, J., Slingsby, A. and Radburn, R. (2009), Flow trees for exploring spatial
trajectories, in Fairbairn, D., Ed., Proceedings of the GIS Research UK 17th Annual Conference, pp.
229-234, University of Durham, Durham, UK.
Yan, J. & Thill, J.-C. (2009), Visual data mining in spatial interaction analysis with self-organizing
maps, Environment and Planning B 36.
Biography
Crispin Cooper began his research career in the Intelligent Systems group at the University of York.
He is now completing his thesis on advanced numerical analysis of census and house price data at the
department of City and Regional Planning, Cardiff. His first degree is in Computer Science
(Cambridge, 2002).