Download position paper - School of Computing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Geocoding wikipedia , lookup

Region wikipedia , lookup

Geographic information system wikipedia , lookup

Distributed GIS wikipedia , lookup

Spatial analysis wikipedia , lookup

Transcript
Quantitative Representations of Place
Florian A. Twaroch, Mark M. Hall and Christopher B. Jones
Cardiff Unversity, School of Computer Science
{f.a.twaroch;m.m.hall;[email protected]}
Everyday, people refer to places using names. Although people use the same names for certain places they disagree about the spatial extent of the named places. This is because places can be represented in various different ways by individuals and each representation is the result of perceptual and cognitive processes.
Spatial language concerned with real world features and with the relationships between them is essentially vague. While this vagueness is managed quite effectively in natural language communication between people, there are currently only very limited facilities for interpreting such language when used to communicate with computers. For the purpose of spatial search engines we need to have formal models of place and spatial relationships that can mimic cognitive processes to represent place. We would like to be able to derive different representations of place dependent on the users context and experience with a place. In order to investigate formal models of place we have to start by studying a number of sub research questions which are involved in creating formal representations of cognitive processes and which we are going to discuss in this position paper.
Our aim is to gain formal representations that will suit a high number of people in a similar context, having a similar level of experience. In order to meet the user's need in a public information system we need to understand how individuals quantify spatial relationships and we are faced with the challenge of acquiring knowledge of the intended spatial interpretation of vernacular place names.
More recently it has become apparent that the Web itself is a valuable source of such knowledge. It has been observed that Web pages that include a vernacular place name often include the names of other places and associated spatial relationships. In what follows we will review two lines of research that exploit data retrieved from the Web to investigate the geographic concept of 'place' and associated spatial relationships.
Vernacular Place Names
We define vernacular place names as names that are commonly in use to refer to places that reflect a common perception about the spatial extent associated with them. In many cases, the name may not correspond to an officially designated region or place. Examples would be the South of France, the English Midlands and the American Mid West.
Names people use differ from administrative names and can therefore not be found in gazetteers. But gazetteers are currently the main source of knowledge for information services. We therefore investigate ways to firstly collect names and secondly model their spatial extent.
Earlier descriptions of techniques for acquiring vernacular place name knowledge are labour intensive (Montello et al. 2003). More recent work has aimed to automate, to some degree, the process of boundary construction. Automated definition of vague regions have been among other techniques based on:
● census and socio­economic data (Thurstain­Goodwin and Unwin, 2000)
●
morphometric classes and multiscale analysis of digital terrain models (Fisher 2004)
●
Web based map tools (Evans and Waters 2007)
●
parsing and geotagging of websites utilizing trigger phrases (Purves et al. 2005, Jones et al. 2008)
We have been exploiting a few individual web resources that contain numerous geo­
referenced place names (in Cardiff, UK) that relate to business entities and to other private or community services and facilities (Twaroch et al. 2008). One of these sources, Google Maps, enables retrieval of the coordinates of businesses, georeferenced by their address, and other “community” entities with vernacular place names that have been geo­referenced in Google Earth. Another source, the Gumtree website, enables retrieval (Web scraping) of the georeferences of advertised services for which a place name has been provided.
Our work bases on a model that utilizes kernel density estimators (KDE). KDE determinine a weighted average of data points within a moving window centred on a grid of points p. The KDE method provides a tool to determine the shape of the probability distribution from the given set of data points, assuming that the collected data points are drawn from the estimated distribution. Each point on the density surface represents the relative likelihood that a further occurring point is part of the region. The method is similar to fuzzy region approaches. Thresholds of these models at different levels yield footprints of certain confidence values, expressing degrees of familiarity with the modelled region. Based on the evaluation of retrieved data sets we classify three types of place names:
1. Place names whose commonly perceived extent coincides with the administrative definition.
2. Place names whose extent does not coincide with an existing definition.
3. Place names that exist in people’s minds but not in the administrative geography.
Base d o n Or d n a nce Survey Data
© Crow n Co pyrig h t 2 0 0 8
Figure 1. Vernacular and administrative definition coincide (Plasnewydd – ward in Cardiff)
Figure 1 shows an example where the derived region coincides with the administrative geography. All points derived from Google community maps are within the boundary of the administrative definition, suggesting that people’s use of the place name coincides with its administrative definition. The name does not seem very popular as neither Gumtree nor Google business queries yield enough data points to derive further representations.
Mining data for the neighbouring region “Roath” reveals that people’s cognition of “Roath” differs significantly from the administrative geography (see figure 2). Data from Gumtree even suggests that the former place Plasnewydd is overridden by the definition of “Roath” in people’s mind. A possible explanation for this result is that the region “Roath” is a popular area where students and families with children are living. A number of web documents that promote real estate would therefore refer more often to “Roath” than to less popular adjacent areas. Future research will uncover such effects by mining and analysing further data from the web sources such as the author’s identity, the intention of the description, the age of the data source and others.
Roath
Plasnewydd
Cathays
Splott
Base d o n Or d n a nce Survey Data
© Crow n Co pyrig h t 2 0 0 8
Figure 2. Vernacular and administrative definition do not coincide (Roath– ward in Cardiff)
We found that the phenomenon of points being relatively highly scattered, as observed above for the area of Roath, applied in general for data derived from business maps. The investigated regions represented by business data differed considerably in size compared to regions derived solely from community driven data. Here further research is necessary and other data sources such as Yellow page data and place name data derived through web questionnaires should be considered. Sound methods to combine data from different sources will then be required.
Recently we compared representations derived from Web data to results achieved through a human­subject study based on two UK regions: Sheffield City Centre (an urban dwelling) and The Midlands, a larger imprecise region (Twaroch et al. under review). The main finding of this study was that the shape of the region produced seems to depend on the source of the data: the community­based content reflecting more faithfully the boundary than using data mined from directory listings. This suggests that multiple sources could be used to generate various types of boundary, rather than using a single source. This complements previous work in this area where typically only single sources have been studied.
Quantifying Spatial Relations to Describe Places
Not every possible location has a place name associated with it. Thus in order to refer to the location, it must be related to a place that has a name associated with it. This is achieved through the use of spatial prepositions. These spatial prepositions construct 2 to 5­ary relations between the located object (figure) and the reference objects (ground). Spatial prepositions can be chained together to create spatial phrases like “Sheep in a field near Stackpole Head”.
An approach that is purely focused on place names would only be able to locate “Stackpole Head” and could not provide any more detailed location information. Our aim is to gain an understanding of how people use spatial prepositions, what geometric extents are associated with them and then use this to extend our understanding and model of locational phrases to those including spatial prepositions. To achieve this a quantitative model for the spatial prepositions needs to be developed and grounded in data generated from actual human use of the prepositions.
One web data source that we have mined to achieve an initial model is the database of image captions provided by the Geograph project (http://www.geograph.org.uk). The Geograph projects aim is to provide representative photographs for every square kilometre of the UK and Ireland. From a spatial language point of view the dataset is ideal, as the goal of providing representative photographs provides a clear focus for the photographic subjects and also leads to the photo captions containing more spatial language.
We used natural language processing to find spatial prepositions and toponyms in the image captions. Then phrases matching the pattern “<something> <spatial preposition> <toponym>” were determined, each of the matching phrases forming one valid use of the spatial preposition. The toponyms were localised using the Geonames.org (http://www.geonames.org) gazetteer and then the distance and angle from the toponym to the GPS coordinates stored for the image captions calculated. This resulted in lists of distance/angle usage pairs for each spatial preposition. To ensure that scale issues did not distort the results too much, only those toponyms referring to populated places were used.
The initially most interesting result is that the distances used in all spatial prepositions are not very large, with most distances being less than 5km. The most probable cause of this is that when describing an image location there are multiple candidate toponyms that could be used to describe the location and it seems that within the UK there is always a place within 5km that can be used as a ground toponym. Other results are that only the four main cardinal directions are used frequently and that the angles involved are not evenly split across the whole 360° arc, but show a slight shift towards north. A similar effect can be seen for the preposition “near”, where there is a preference for picking a ground toponym south of the location being described. One possible theory is that when seen on a map the toponym is underneath the described location and thus provides a kind of “virtual support”.
Figure 3. Angle histogram for the spatial preposition “near”
We are working on a field based model for representing the extents of spatial prepositions and also for creating crisp representations out of the continuous field representation. This will allow for complex reasoning and combination of multiple spatial prepositions, while at the same time maintaining compatibility with existing crisp GIS methods and applications.
Figure 4. Point cloud for “north of” and smoothed field representation
Conclusions
Perceptual and cognitive processes are involved when representations of place are constructed. This is valid for the model of the extent of a spatial region but also for the quantification of a spatial relationship. The Web provides evidence for spatial regions and spatial relationships but a number of open questions has to be resolved in the future.
We will have to develop methods to compare different Web sources and investigate origin and intention of the place descriptions. In order to evaluate our data we plan a large scale web questionnaire capturing contextual information and user experience for description of places and spatial relationships between places. Methods to collect place names and describe their spatial extent for non­populated locations will have to be developed.
A number of alternative methods, such as qualitative models to represent vernacular regions will be investigated. We are especially interested in cognitive models for the representation of vernacular regions. The complexity of the problem and the variety of factors that influence human cognition of place constitute a significant challenge to future work. The results can be expected to contribute to the improvement of geographic information retrieval systems and a better understanding of people’s definition of place.
Acknowledgements
We would like to gratefully acknowledge contributors to Geograph British Isles (see http://www.geograph.org.uk/credits/2007­02­24, whose work is made available under the following Creative Commons Attribution­ShareAlike 2.5 Licence (http://creativecommons.org/licenses/by­
sa/2.5/). We thank Ordnance Survey for funding our research and supporting the project on representation of place for geographic information retrieval.
References
Evans, A. and T. Waters (2007): Mapping Vernacular Geography: Web­based GIS Tools for Capturing 'Fuzzy' or 'Vague' Entities. International Journal of Technology, Policy and Management. 7 (2): p. 134­150.
Fisher, P. (2004): Where is Helvellyn? Fuzziness of multi­scale landscape morphometry. Transactions of the Institute of British Geographers. 29 (1): p. 106­128.
Jones, C. B., et al. (2008): Modelling Vague Places with Knowledge from the Web. International Journal of Geographic Information Systems, 2008: p. in print.
Landau, B. and R. Jackendoff (1993): “What” and “Where” in Spatial Language and Spatial Cognition. Behavioral and Brain Sciences, 16(2):217–238
Montello, D., et al. (2003): Where's Downtown?: Behavioural Methods for Determining Referents of Vague Spatial Queries. Spatial Cognition & Computation, 2003. 3 (2­3): p. 185­204
Purves, R., P. Clough, and H. Joho (2005): Identifying imprecise regions for geographic information retrieval using the Web . in GIS Research UK 13th Annual Conference .
Robinson, V. B. (2000): Individual and multipersonal fuzzy spatial relations acquired using human–machine interaction. Fuzzy Sets and Systems, 113(1):133–145, 2000.
Thurstain­Goodwin, M. and D. Unwin (2000): Defining and delineating the central areas of towns for statistical monitoring using continuous surface representations , in Transactions in GIS. p. 305­317.
Twaroch, F. A., C. B. Jones, and A. I. Abdelmoty (2008): Acquisition of a Vernacular Gazetteer from Web Sources, LocWeb 2008, p. in print