Download C5. Project management - The Centre for Computational Geography

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
SPIN!, IST-99-10536, 15.06.1999
1
Part B
B1. Title. Spatial Mining for Data of Public
Interest
SPIN!
Proposal No. IST-1999-10536
Proposal for:
IST programme, 1.1.2-5.1.4 Cross-Programme Action CPA4: New Indicators and statistical
methods
1
SPIN!, IST-99-10536, 15.06.1999
2
B3. OBJECTIVES ................................................................................................................................................. 3
B4. CONTRIBUTION TO PROGRAMME/KEY ACTION OBJECTIVES ................................................... 5
B5. INNOVATIONS ............................................................................................................................................. 6
STATE OF THE ART .............................................................................................................................................. 6
TECHNOLOGICAL & SCIENTIFIC ADVANCES......................................................................................................... 7
DISTRIBUTION OF WORKLOAD ON WORK PACKAGES .......................................................................................... 11
INTRODUCTION TO WORKPACKAGES .................................................................................................................. 16
RISK MANAGEMENT ........................................................................................................................................... 17
PERT DIAGRAM .................................................................................................................................................. 20
WORK PACKAGE DESCRIPTION ........................................................................................................................... 21
C2. CONTENTS FOR PART C ........................................................................................................................ 40
C3. COMMUNITY ADDED VALUE AND CONTRIBUTION TO EU POLICIES .................................... 41
C4. CONTRIBUTION TO COMMUNITY SOCIAL OBJECTIVES ............................................................ 42
C5. PROJECT MANAGEMENT ...................................................................................................................... 43
C6. DESCRIPTION OF THE CONSORTIUM ............................................................................................... 45
C7. DESCRIPTION OF THE PARTICIPANTS.............................................................................................. 46
GMD - GERMAN NATIONAL RESEARCH CENTER FOR INFORMATION TECHNOLOGY.......................................... 46
DEPARTMENT OF INFORMATICS OF THE UNIVERSITY OF BARI ........................................................................... 48
SCHOOL OF GEOGRAPHY AT THE UNIVERSITY OF LEEDS ................................................................................... 49
THE INSTITUTE FOR INFORMATION TRANSMISSION PROBLEMS, RUSSIAN ACADEMY OF SCIENCES (IITP RAS) 50
DIALOGIS SOFTWARE & SERVICES GMBH, ST. AUGUSTIN, GERMANY .............................................................. 51
PROFESSIONAL GEO SYSTEMS B.V. (PGS), AMSTERDAM ................................................................................ 52
GEOFORSCHUNGSZENTRUM, POTSDAM, GERMANY DESCRIPTION OF THE PARTNER ......................................... 52
MANCHESTER METROPOLITAN UNIVERSITY/MIMAS ....................................................................................... 53
C8. ECONOMIC DEVELOPMENT AND SCIENTIFIC AND TECHNOLOGICAL PROSPECTS......... 54
APPENDIX – PUBLICATIONS OF PARTNERS CITED IN PART B ......................................................... 58
REFERENCES PARTNER P1 – GMD .................................................................................................................... 58
REFERENCES PARTNER P2 - UNIVERSITY OF BARI ............................................................................................. 59
REFERENCES PARTNER P3 – IITP, RUSSIAN ACADEMY OF SCIENCES ................................................................ 59
REFERENCES PARTNER 4 – LEEDS...................................................................................................................... 59
REFERENCES PARTNER P5 – DIALOGIS .............................................................................................................. 60
REFERENCES PARTNER P6 – PGS ...................................................................................................................... 60
2
SPIN!, IST-99-10536, 15.06.1999
3
B3. Objectives
To develop an integrated interactive internet-enabled spatial data mining system. Data mining
systems (DMS) and geographical information systems (GIS) are complementary tools for describing,
transforming, analysing and modelling data about real world systems. Most contemporary GIS
facilitate only very basic spatial analysis and data mining functionality and many are confined to
simplistic analysis that involves comparing maps or descriptive statistical displays like histograms and
pie charts. There is growing demand for integrated geographical or spatial data mining systems
(SDMS) from public and private sector organisations who need both enhanced decision making
capabilities and innovative solutions to a wide range of different problems. An integrated, user
friendly SDMS operable over the internet offers exciting new possibilities for all manner of
geographical research and spatial decision making. Thus the overall objective of SPIN! is to develop a
state of the art, fully functional, truly integrated, internet-enabled, easily extendable and modifiable
GIS-DMS platform, SPIN - a comprehensive and intuitive SDMS for data of public interest. In recent
years, a number of project partners have developed the technological components and scientific tools
that are needed to develop the kernel of this type of SDMS. During this project these individual
efforts and the associated expertise and experience will be united in a joint European effort. SPIN!
Consortium partners from statistical offices and seismic research centres will use the system in
applied research and provide feedback to direct the development efforts. The applications of SPIN
will clearly demonstrate the generic utility and additional benefits that this type of SDMS will have
over existing technologies. Industrial partners will develop a business model for web-based
information brokering with georeferenced statistical data, and estimate the likely economic impacts of
the technology. The following scenarios describe some of the wide ranging potential benefits that
statistical analysts, environmental decision makers, seismic data experts, biodiversity researchers and
other public and private sector users can expect from such a system and introduce some of the main
features that SPIN will include.
To improve knowledge discovery by providing an enhanced capability to visualise data mining
results in spatial temporal and attribute dimensions. Imagine a statistical officer has to prepare a
report describing unusual aspects of African demography inter-related with socio-economics and the
physical environment. Suppose initially the officer applies a data mining technique to classify all
countries based on death rate and life expectancy and one classified subgroup with unusually high
death rate and low life expectancy includes 40 African countries and only 51 in all. Suppose the
officer creates a statistical display of all the classified groups (Fig. 1) and then decides to map the
geographical distribution of the unusual subgroup distinguishing between African countries and those
elsewhere (Fig. 2). The geographical distribution of the subgroups shown by the map may initiate
ideas for further analysis. For instance, the analyst may wish to select sets of countries from the map
to take a closer look at their demography and other geographical variables that describe socioeconomic and environmental conditions. In addition, the officer may wish to discover what
demographic attributes best characterise each continent at different points in time and investigate
which groups of demographic attributes have interesting spatio-temporal co-distributions and interrelationships with other socio-economic and environmental variables. All the analysis, some of which
is quite complex could clearly be performed quicker and easier if an integrated SDMS with a linked
display component and reporting system were available for use. It would be a major benefit if the
maps and other data displays were automatically generated by a knowledge base of statistical display
and thematic data mapping and these were automatically linked so that information the officer is
focussing on during the analysis is simultaneously highlighted in all the relevant displays. This type of
linked GIS style display component will be developed as a fundamental part of the integrated
visualisation component of SPIN, which would facilitate this kind of statistical analysis (see partner
P1, publication 3).
3
SPIN!, IST-99-10536, 15.06.1999
4
Figure 1. Descriptions of interesting subgroups
Figure 2. Visualisation of the subgroup.
To develop new and integrated ways of revealing complex patterns in spatio-temporally
referenced data that were previously undiscovered using existing methods. Suppose an
environmental decision maker is asked to look for relations between lung cancer and environmental
pollution. What may be desired initially is some kind of exploratory spatial data analysis (ESDA)
technique that automatically detects unusual spatial clustering of lung cancer incidence in the entire
data set and for specific time periods. Additional spatial and aspatial analysis methods might then
used to try and explain any unusual spatial clustering patterns observed using a range of other spatiotemporal and aspatio-temporal variables. In SPIN, exploratory spatio-temporal pattern analysis
techniques derived from existing ESDA tools will be integrated with a wide variety of temporal,
spatial and aspatial analysis methods. Partner P4 has developed a suite of ESDA tools that detect
unusual clusters of incidence and produce mapable output that reveals the clustering pattern.
Temporal versions of these tools and outputs will be developed along with the mechanisms for
exporting the results of the analysis into other temporal, spatial and aspatial data mining techniques.
Having all the tools available in one integrated SDMS would allow the decision maker to perform an
in-depth, spatio-temporal analysis quickly and thereby help develop understanding of the geographical
processes and inter-relationships that may result in an increased risk of contracting lung cancer. The
analytical speed up will allow the decision maker to generate and test more hypotheses regarding the
observed spatial, temporal and spatio-temporal patterns and to investigate even more advanced
hypotheses about causal relationships.
To enhance decision making capabilities by developing interactive GIS techniques, which
provide an integrated exploratory and statistical basis for investigating spatial patterns. Seismic
data experts regularly use GIS to help them spot geoenvironmental data patterns related to seismic
activity. However, the complexity of geoenvironmental processes and noise in the spatial patterns of
these variables makes it very difficult to objectively compare seismic maps with other
4
SPIN!, IST-99-10536, 15.06.1999
5
geoenvironmental maps and identify interesting patterns and relationships. To help reduce the
likelihood of becoming overly subjective, a seismologist may wish to initially classify and select
groups of areas with similar geoenvironmental characteristics and then perform statistical tests to
investigate general differences in localised distributions of selected areas belonging to the same
geoenvironmental group in the classification. An interactive version of SPIN will clearly aid the
seismologist in the process classifying and selecting these areas and in performing the statistical tests.
By simplifying this analysis task, the user can focus on looking for interesting patterns and testing a
great number of alternative hypotheses.
To deepen the understanding of spatio-temporal patterns by visual simulation. Imagine a
biodiversity researcher wants to investigate the migratory flight route of a flock of storks travelling
from Europe to Africa. Suppose the researcher uses a global positioning system (GPS) to track the
progress of these birds and wishes to visually simulate the migration to provide an overview of the
migratory route, the speed of different parts of the journey and identify areas where the storks rested
along the way. SPIN will provide the capability to develop and play back this type of simulation over
the internet. The same technique can be applied in many other areas, for example, logistics companies
may want to use it to help keep track of orders and optimise transport routes or transport planners may
desire it to aid the development of integrated transport networks.
To publish and disseminate geographical data mining services over the internet. Suppose the
various analysts described above (i.e. the statistical officer, the environmental decision maker, the
seismic data expert and the biodiversity researcher) want to distribute their results quickly and cost
effectively to encourage similar applications and promote world-wide scientific exchange of their
research. Furthermore, suppose they want to publish both the conclusions and the details of their
entire geographical data mining investigation so that other similar research can extend, generalise and
build on their analyses. Imagine also that these researchers want to enable others to access and use the
same analysis tools that were available to them. To realise all of this, they would probably need a
relatively automatic way to plug-in their specific application to a Java-based internet enabled SDMS.
This would then enable anyone with a standard web-browser to replicate and perform similar analyses
wherever and whenever desired (see partner P4, publications 2 and 9; partner P1, publication 1,2;
partner P3, publication 1,2). The proposed SDMS, SPIN will provide this type of capability in an
integrated organised fashion.
B4. Contribution to programme/key action objectives
The proposal contributes to the IST programme objective of building key, user-friendly applications
that enable the potential of the information society in several ways:



Merging data mining and GIS based technology offers exciting new possibilities for spatial data
research that is applicable in a wide variety of problem domains. Much expert geographical
analysis has been restricted by prescribing in advance and exclusively following either a statistical
or a GIS based approach. When both approaches have been applied, error prone and cumbersome
data transfer between different applications has been necessary, nonetheless, useful information
has been extracted from georeferenced data much more effectively by employing both approaches
simultaneously. Clearly an integrated SPIN will facilitate such analysis and help to develop
understanding of a wide range of geographical processes faster enhancing research and decision
making in diverse application areas.
SPIN will provide a user friendly interface to advanced data mining functionality, GIS and
exploratory spatial data analysis tools that can be accessed via the internet.
The system will enable quick and cost effective dissemination of information via the internet and
enhance web-based research capabilities.
The objective of nurturing emergent technologies is supported by the development of an innovative
business model. A web-based brokering service is proposed that is designed to add value to the
5
SPIN!, IST-99-10536, 15.06.1999
6
dissemination of data and information providing a key to the commercialisation of the software and
the service it facilitates.
The proposal contributes to CPA4 (New indicators and statistical methods) by developing new tools
for extracting information from data by adapting data mining functions specifically for spatial
analysis. This includes adapting methods from Bayesian statistics, machine learning and other
adaptive techniques so they can be launched from an integrated environment, which assists
experimental comparison of their relative strengths and weaknesses.
A further contribution to CPA4 derives from developing technology for the user-friendly
dissemination of statistical data. SPIN will enable the dissemination of interactive statistical maps and
provide data mining services over the internet, where the users need nothing but a standard webbrowser such as Netscape or Internet Explorer. Many of the problems relevant to this use of SPIN will
be addressed in an application that aims to facilitate the analysis of census data over the internet. The
proposed web-based brokering service aims to go even further by enhancing the user-friendly and
cost-effective dissemination of data.
The proposed system will be generic and easily adaptable to diverse application areas and the research
is specifically relevant to the following key actions of the cross-programmatic action (CPA) of the
IST programme:





Key Action I.4: Systems and services for citizen administration; systems enhancing the efficiency
and user-friendliness of administrations. This is addressed in work package WP9 by the
application to develop user friendly dissemination of statistical data.
Key Action I.5: Intelligent environmental monitoring and management systems; environmental
risk and emergency management systems (in conjunction with hazards and earth observation).
These are addressed in work package WP8 by an application of the proposed system to the
analysis of seismic and volcano data.
Key Action II.3.2: New methods of work and electronic commerce. New market mediation
systems, to develop innovative market place concepts and technologies. This will be addressed in
the web-based brokering application in work package WP9.
Key Action II.4.3: Digital object transfer. This will be addressed by a specific task within work
package WP2 that aims to develop efficient and appropriate means of distributing data and maps
over the internet.
Key Action III.1: The future priority action line concerning geographic information is also clearly
addressed.
B5. Innovations
State of the Art
Contemporary GIS are monolithic closed systems that can be difficult to use and are usually very
expensive. In the last few years a new generation of GIS has been emerging that enable interactive,
dynamic maps to be disseminated via the Internet (see partner P1, publication 1, 3; partner P4,
publication 4; partner P3, publication 10, 11). So far, most of these systems are confined to projecting
descriptive statistical displays, such as histograms or pie charts, onto geographical space (maps). As
decision making and inference using these projected map displays is not always straight-forward, data
mining offers great potential benefits. The range of application areas is huge and there are many
different types of applications in statistical analysis, urban planning, environmental decision making,
and geomarketing for example.
Largely unconnected to GIS research a wide range of analysis techniques now commonly referred to
as data mining functions have been developed. These data mining functions are extensions of
analytical techniques known for decades and have been packaged in various ways to form a large
number of essentially very similar data mining systems (DMS). Some DMS provide user friendly
6
SPIN!, IST-99-10536, 15.06.1999
7
interfaces and visual programming environments that the non-expert can use to help automate the
search for hidden patterns in large databases. Interest in DMS has boomed in recent years partly as a
result of the packaged nature of the technology and improving graphical user interfaces, but mainly
because of the desperate need for commercial enterprises to make returns on often large investments
in data warehouses. Since the GIS revolution in the early 1980s there has been an explosion of
geographically referenced information forming a rapidly expanding geocyberspace (see partner P4,
publication 1), wherein much of the data is also temporally referenced. Commercial enterprises and
government organisations have been swamped by this data explosion with few tools to extract useful
information that can be applied in decision making contexts to solve problems and improve their
function. By combining the strengths of GIS and DMS the proposed SDMS, SPIN, will have even
greater functionality and should be a huge help to decision makers and spatial analysts charged with
the task of backing up their intuitive insights using real world data. Some of the integrated
components not currently present in either GIS or DMS include exploratory spatial data analysis
methods that search for geographical patterns and relationships in complex space-time-attribute
domains.
Extending and integrating GIS and DMS to develop an internet enabled geographical data mining
system is a logical progression for spatial data analysis technology. This development is poised to
play a major role in the proposed terms of reference 1999-2003 of the Commission on Visualisation
and Virtual Environments of the International Cartographic Association (MacEachren and Kraak
1999 1) and it can be expected that a great deal of research effort is needed to this effect in coming
years. DMS and GIS are quite complex tools with wide ranging functionality and capabilities, so the
SPIN! Consortium does not propose to start from scratch, but to build on existing tools. Many of these
existing tools have been developed by various partners during 4th framework research, and many have
passed the prototype stage and have well established user communities. One major advantage of the
SPIN! Consortium is that the software developers will have access to the source code of all the
various module components, which facilitates a seamless integration of all the technology in SPIN.
(This would not be possible if the system were to be developed on top of third party proprietary
products.) The system will be based on open standards such as Java and TCP/IP. The evolutionary
prototype development approach proposed has many benefits. Users will be able to provide feedback
on SPIN prototype requirements and performance throughout the project (starting from day one), and
progressive prototype versions of the system will guide the development effort to fulfil user
expectations by the end. The early development of prototypes is known to be one of the most effective
counter-measures to limit the risks of such software development.
Technological & Scientific Advances
First system that tightly integrates state of the art GIS and data mining functionality in an
open, extensible, internet-enabled plug-in architecture. The system will integrate a rich
functionality:
 a data mining platform (see partner P1 and P5, publication 10);
 an internet enabled tool for interactive manipulation of statistical maps (P1, publication 1,2);
 an application for exploratory spatial data analysis (partner P4, publication 2);
 new modules for spatial data mining (see below);
 new modules for visualising temporal data and spatial data mining results; and
 a Java based GIS (partner P6, publication 1).
The generic system architecture is easily adaptable to diverse application areas such as seismic data
analysis and hazard management, environmental decision making, and census data dissemination.
Adapting machine learning methods to spatial analysis. It is generally accepted that currently there
exists no single data mining or machine learning method that is efficacious in every case. Available
1
See the following URL for details: http://www.geovista.psu.edu/ica/icavis/terms.html
7
SPIN!, IST-99-10536, 15.06.1999
8
methods differ in many ways in terms of complexity, representational power, accuracy, scalability,
comprehensibility, and their ability to cope with noise and missing values, and many others factors.
Different methods based in different approaches make different assumptions about the data being
analysed which may not matter in some cases and maybe totally inappropriate in other cases. It is
therefore important that users have access to a variety of spatial data mining methods, and help so
they choose and combine whichever methods seem most appropriate for their task. In developing
SPIN we will advance the state of the art in spatial data mining in several ways.
Symbolic machine learning methods will be adapted to spatial data analysis, in particular, inductive
logic programming (ILP) algorithms for the discovery of subgroups and spatial association rules.
Efficient methods for the discovery of (non-spatial) association rules have been proposed in the field
of data mining, most of which can deal with propositional, or zero th-order representations; however,
they are unsuitable to express higher order spatial relationships. ILP is based on first-order predicate
logic which allows for the representation of relations such as adjacent_to, inside, and close_to. This
makes ILP a natural and promising approach to many forms of spatial data mining. Methods for the
induction of first-order rules have been extensively investigated within ILP. Some of these methods
have already been applied to the automated interpretation of topographic maps (see partner P2,
publication 2,3). In this case, symbolic first-order descriptions of cells of a map are automatically
extracted from a vector representation of maps stored in an object-oriented database. Intelligent map
feature extraction is a challenging task. Advances in this field would open new possibilities for
enhancing intelligent automated map design; also first-order descriptions of maps could be fed into
(future) first-order learning systems as background knowledge, e.g. for topographically informed
subgroup discovery.
Combining the expressive power of first-order learning methods with the coherence and
scalability of Bayesian statistics. First-order machine learning methods tend to be search intensive,
and when dealing with large sets of data and highly dimensional dependencies, scalability might
become a problem. To overcome this problem, we will investigate how scalability can be improved by
the use of adaptive sampling, i.e. active learning techniques based on Bayesian Decision Theory. This
will also help to bridge the gap between first-order learning and statistics.
Applies advanced Bayesian classification, prediction, and interpolation to spatial data. In the last
years computationally intensive Bayesian methods have been developed that compare favourably with
classical approaches. Instead of selecting an “optimal” model they generate a whole distribution of
models which characterise their uncertainty in the light of the available data. On the one hand they
derive predictive distributions for new inputs reflecting the actual uncertainty and information. On the
other hand they allow a rigorous assessment of the adequacy of different model types. This method
has already been successfully applied by partner P1 (see partner P1, publication x13) to credit scoring
and will now be adapted to spatial data.
Automating the exploratory spatial data analysis of geographical data. Various exploratory
spatial data analysis tools have been developed by partner P4 (see partner P4, publication 2) and made
available for research via the internet. However the current format of the application may be criticised
in that it is not user-friendly enough, and users are restricted to a select few input and output data
formats. The search methods used in it are unintelligent brute force heuristics that could be improved
by the application of artificial intelligence methods to direct the search. Early experiments by partner
P4 indicate that there is great potential for these heuristics especially when analysing data in a multiattribute space-time-attribute tri-space (see partner P4, publication 3). So by improving the quality of
the search procedure the belief is that much larger more complex data sets can be investigated in a
scalable way. To address the need for the system to communicate with other packages, both local and
remote, the tool developed will make use of CORBA for data input and results output. Partner P4 also
plans to develop improved visualisation tools to allow users to view the outputs of the tools developed
in an easy and obvious way that aids their understanding of the results instead of hampering them as
many current tools do.
8
SPIN!, IST-99-10536, 15.06.1999
9
Uses knowledge based systems technology to involve the expertise on thematic cartography in
supporting visual mining of spatial and temporal data. Currently there is a recognised need in
combining cartographic visualisation (meaning building maps to facilitate visual data exploration)
with data mining (see, for example, special issue of Int. J. Geographical Information Science on
Visualization for Exploration of Spatial Data, v.13(4), June 1999). Within the project we plan to
develop both cartographical interface for preparing (selecting, preprocessing, etc.) data for data
mining and interactive map presentation of results of data mining dynamically linked with specially
designed non-geographic illustrations. Especial attention will be paid to interactivity of maps and
other graphical displays and to the visualisation and analysis of the temporal aspect of data.
Use of new techniques for efficient distribution of large maps for low bandwidth networks.
Special attention will be given to develop efficient mechanisms that reduce the amount of data that
has to be transferred from the client to the server.
9
SPIN!, IST-99-10536, 15.06.1999
B1.
Workpackage
No2
WP1
WP2
10
Workpackage list
Workpackage title
Coordination
Identify user needs, define and
realize a generic system
architecture that integrates GIS
and Data Mining functionality
Lead
contract
or
No3
Personmonths4
Start
month5
P1
34
0
P1
69
End
month
Phas
e7
Deliverable
No8
36
-
D1.11.4
0
36
-
D2.12.6
6
WP3
Extend machine-learning
methods to spatial mining
P2
42
0
36
-
D3.13.9
WP4
Generalize Bayesian Markov
Chain Monte Carlo to spatial
mining
P1
40
0
36
-
D4.14.7
WP5
Adapt and integrate methods for
spatial pattern analysis
P4
40
0
36
-
D5.15.7
WP6
Develop support of visual
analysis of time-dependent
spatial data
P1
40
0
36
-
D6.16.6
WP7
Develop methods for
visualization of Data Mining
results within GIS
P1
40
0
36
-
D7.17.6
WP8
Application to seismic and
volcano data
P7
70
0
36
-
D8.18.9
WP9
Application to web-based
dissemination of data from
statistical offices
P8
49
0
36
-
D9.19.6
Workpackage number: WP 1 – WP n.Number of the contractor leading the work in this workpackage.
4
The total number of person-months allocated to each workpackage.
5
Relative start date for the work in the specific workpackages, month 0 marking the start of the
project, and all other start dates being relative to this start date.
6
Relative end date, month 0 marking the start of the project, and all end dates being relative to this
start date.
7
Only for combined research and demonstration projects: Please indicate R for research and D for
demonstration.
8
Deliverable number: Number for the deliverable(s)/result(s) mentioned in the workpackage: D1 Dn.
2
3
10
SPIN!, IST-99-10536, 15.06.1999
11
WP10
Develop a business model for
web based information and
service brokering with georeferenced data
P6
24
0
36
-
D10.110.5
WP11
Dissemination
P8
38
0
36
-
D11.111.5
TOTAL
482
Distribution of Workload on work packages
Partner
P1
P2
Coord
WP1
Techn. Dev. WP2
ML
WP3
Bayes
WP4
ESDA
WP5
Vis. Spa-T
WP6
Vis. DM
WP7
Seis.Dat
WP8
Stat. Off.
WP9
Web-Brok. WP10
Dissem.
WP11
28
30
18
30
28
28
3
3
2
2
172
P3
P4
P5
P6
2
6
9
18
10
P8
P8 Total
24
18
24
20
4
36
12
12
3
6
8
96
6
2
2
12
2
36
12
4
10
14
56
32
34
4
36
8
42
34
69
42
40
36
40
40
70
49
24
38
482
11
SPIN!, IST-99-10536, 15.06.1999
12
B2.
Deliverables list
Deliverable
No9
Deliverable title
Delivery
date
Nature
Dissemination
level
10
11
12
D1.1
Project workplan
3
R
PU
D1.2
Reports for EC
period.
R
PU
D1.3
Project handbook
6
R
PU
D1.4
Project meetings
period.
R
PU
D2.1
System design document
8
R
CO
D2.2
Prototype 0 (incl. documentation)
12
P
CO
D2.3
Implementation of efficient methods for map transfer
15
P
CO
D2.4
Prototype 1 (incl. documentation)
18
P
CO
D2.5
Prototype 2 (incl. documentation)
30
P
CO
D2.6
Revision Release Prototype 2 (incl. documentation) (Final
Release)
32
P
CO
D3.1
Theoretical report on spatio-temporal subgroup discovery
6
R
PU
D3.2
Theoretical report on adaptive sampling
21
R
PU
D3.3
Theoretical report on spatial association rules
5
R
PU
D3.4
Specifications of the descriptions to be automatically
extracted from vectorized maps
15
R
CO
D3.5
Implementation of subgroup discovery
8
P
CO
Deliverable numbers in order of delivery dates: D1 – Dn
Month in which the deliverables will be available. Month 0 marking the start of the project, and all
delivery dates being relative to this start date.
11
Please indicate the nature of the deliverable using one of the following codes:
R = Report
P = Prototype
D = Demonstrator
O = Other
12
Please indicate the dissemination level using one of the following codes:
PU = Public
PP = Restricted to other programme participants (including the Commission Services).
RE = Restricted to a group specified by the consortium (including the Commission Services).
CO = Confidential, only for members of the consortium (including the Commission Services).
9
10
12
SPIN!, IST-99-10536, 15.06.1999
13
D3.6
Implementation of adaptive sampling for subgroup
discovery
23
P
CO
D3.7
Implementation of spatial association rules
11
P
CO
D3.8
Software for the extraction of symbolic descriptions from
vectorized maps
18
P
CO
D3.9
Report evaluating the application of first-order learning
methods to spatial data
36
R
PU
D4.1
Report reviewing current Bayesian approaches
6
R
PU
D4.2
Software Implementation for bootstrap
11
P
CO
D4.3
Report on advanced spatial models and corresponding
Bayesian models
15
R
PU
D4.4
Implementation of MCMC
18
P
CO
D4.5
Implementation of model selection
28
P
CO
D4.6
Performance evaluation and guidelines
36
R
PU
D4.7
Generic software library for spatial data transformations
6
P
CO
D5.1
Theoretical paper on algorithms for handling interaction
with spatial location
5
R
PU
D5.2
Software for handling interaction with spatial location
11
P
CO
D5.3
Theoretical paper evaluating statistical clustering tests
14
R
PU
D5.4
Implementation of selected statistical clustering tests
18
P
CO
D5.5
Theoretical paper on algorithms for multiple search
24
R
PU
D5.6
Implementation of algorithms for multiple search
30
P
CO
D5.7
Reports on testing and evaluation of Spatial Analysis
software tool
36
R
PU
16
P
CO
D6.1
Rule base on application of visualisation and interaction
techniques depending on characteristics of data and the
type of their time variation.
D6.2
Software library implementing the proposed methods
26
P
CO
D6.3
Expert system engine performing selection of methods
according to characteristics of data
30
P
CO
D6.4
Theoretical paper on algorithms for investigation of
temporal changes
18
R
PU
D6.5
Implementation of algorithms for investigation of temporal
changes
24
P
CO
D6.6
Evaluation report
36
R
PU
13
SPIN!, IST-99-10536, 15.06.1999
14
D7.1
Description of the presentation methods proposed to apply
to results of the considered data mining methods
6
R
PU
D7.2
Implementation of visualization method for subgroup
discovery
11
P
CO
D7.3
Implementation of visualization method for spatial
association rules
12
P
CO
D7.4
Implementation of visualization method for Bayesian
classification
17
P
CO
D7.5
Implementation of best-practice methods for visualisation
in ESDA
17
P
CO
D7.6
Report on current & potential application methods in
ESDA
36
R
PU
D8.1
Definition of user requirements
3
R
PU
D8.2
Description of the methods of space-time analysis and data
mining of seismic data
10
R
PU
D8.3
Description of the methodology for designing seismic
hazard information models
15
R
PU
D8.4
Software implementing the proposed methods within the
SPIN! architecture
26
P
CO
D8.5
Evaluation report
24
R
PU
D8.6
Application of the software tools to the seismic active
Eastern Mediterranean region
34
P
CO
D8.7
Application of the software tools to the high risk Merapi
volcano
36
P
CO
D8.8
Integration of continuous monitoring data into the analysis
process
36
P
CO
36
R
PU
3
R
PU
D8.9
D9.1
Report on the application of Spatial Mining to seismic and
volcano data
User requirements document for dissemination of
statistical data
D9.2
Description of data model
12
R
CO
D9.3
A prototype web site with interactive thematic maps that
can be accessed over the internet
16
P
CO
D9.4
Prototype web-site based on SPIN prototype 2
30
P
CO
D9.5
Report about different user acceptance, recommendation
for use, etc.
24
R
PU
D9.6
Report: recommendation of use
36
R
PU
D10.1
Define requirements for web-brokering
3
R
PU
14
SPIN!, IST-99-10536, 15.06.1999
15
D10.2
Report describing existing brokering services, business
model and property of rights problematic
8
R
PU
D10.3
Report addressing technical infrastructure
24
R
CO
D10.4
Prototype web-site for web-brokering
30
R
PU
D10.5
Final report on web-brokering
36
R
CO
D11.1
Project web page
3
R
PU
D11.2
Project description for the general public
2
P
PU
D11.3
First dissemination workshop
24
O
PU
D11.4
Second dissemination workshop
36
O
PU
D11.5
Feasibility study about commercialization
33
R
PU
15
SPIN!, IST-99-10536, 15.06.1999
16
Introduction to workpackages
The workpackages fall into several categories: technology development, research, application,
exploitation. Figure 1 shows the main dependencies between the workpackages, but does not display
feedback mechanisms which will be set up between all workpackages, as described in the section
about project management.
Building a spatial mining system is a demanding task. It requires expertise in many fields including
Geographic Information Systems, Cartography, Statistics, Machine Learning, and Databases, as well
as excellent software engineering skills. The consortium has been carefully chosen to ensure
uncomprising competence in all these areas. It includes
 two industrial partners active in Data Mining and Geographic Information Systems (partner P5
and P6),
 a university and a national research center active in the areas of Data Mining, Machine Learning,
and GIS (partners P2 and P1),
 an institute for geography active in Exploratory Spatial Data Analysis since the 80ies (partner P4),
 a university having a leading role in the dissemination of statistical data (partner P8), and
 two institutes active in seismic data research (partner P3 and P7).
Each partner in the consortium has a unique area of competence not shared by the others, and brings
into the consortium his expertise as well as his technologies.
Visualization of Data Mining
results
Adapt, Bayes Markov
Chain Monte Carlo
to Spatial Mining
Methods for spatio-temporal
visualization
Develop, adapt, Machine
Learning algorithms
to Spatial Mining
Develop, adapt, Spatial
Point Pattern Analysis
Design, integrate GIS & DM
platform
Extending system for
application to Seismic
Data
Application to Statistical
Offices
Coordination
Technology
Web-Based Information
Brokering
Dissemination
Research
Application
Exploitation
Figure 3. Main dependencies between work packages.
16
SPIN!, IST-99-10536, 15.06.1999
17
Risk management
Many research and technology development projects fail since the typical risks of such a project are
not taken into account. To prevent such a failure, the workplan has been designed to prevent typical
causes of failure in advance. The main approaches taken towards risk management are:
 software reuse and incremental evolution of existing technology
 modular design of software components (plug-in architecture)
 strong user involvement
 early delivery of prototypes
Involving users at all stages of the systems development is of utmost importance. The development
process will implement iterative improvements to an incremental version of the system having
delivered an original prototype for users to evaluate and suggest generic design modifications. The
users will be involved in defining the system analysis requirements and in designing and testing the
system right from the start. The users are responsible for providing evaluation reports, which serve as
input to specific system design modifications.
Since important modules of the final system already exist in a preliminary and non-integrated form,
the users will be trained in using the individual systems at an early stage. This will help to shape their
expectations and provide valuable feedback to the software developers. The users in work package
WP9 already use the GIS technology developed by partner P1, so they can formulate specific
requirements at an early stage minimising the likelihood that generic system requirements will
undergo continuous change.
The base integrating system platform will be an object-oriented plug-in style architecture to facilitate
technological integration. The dependencies between work packages are reduced as plug-in
components can be incorporated incrementally as they become available. In this way, revisions to the
internal structure of either the client or the server should not affect the other parts. CORBA and RMI
will be evaluated as integrating middle ware.
Strong modularization should minimise the dangers of integrating technology developed separately by
different groups. If for some reason one module were not delivered on time, this would not necessarily
affect the implementation of other modules. Since partners P1, P3, and P4 have implemented major
parts of the existing technology in Java anyway, risks of technology integration problems are already
low. The Unified Modelling Language (UML) will be used for documentation and design to ensure
product quality.
Potential performance bottlenecks should be easy to spot at an early stage by applying the existing
technology on test data provided by the users. The system needs to be interactive and users should not
be made to wait too long for analysis results. Performance issues are addressed in a special task within
WP2.
Our approach to risk management has been tightly integrated within the overall technology
development cycle of SPIN. Since an evolutionary approach containing several iterations is chosen,
all work packages start at the kick-off meeting and end with the final workshop.
17
SPIN!, IST-99-10536, 15.06.1999
18
Gantt Chart
18
SPIN!, IST-99-10536, 15.06.1999
19
Main stages of technology development cycle
Month Event
Description of Event
A kick-off-meeting will be held, where the users are informed in detail about the
prospects of developing an SDMS, where alternative approaches will be discussed, and
where the users will articulate specific expectations and requirements for the system.
There will also be a tutorial session on Spatial Mining based on the existing technology
1
Kick-OffMeeting
3
The developer teams and the users will jointly define the user requirement report which
User
is due by month 3, and for which the users are responsible. This will be a major input for
requirement
the system design.
s report
5
The existing, non-integrated systems will be applied to example data sets for further
Test
clarifying user need, to spot performance bottlenecks at an early stage etc…
applications
8
12
15
18
24
27
30
The design specification is due in month 8. It is located mainly in WP2, but all work
packages will contribute from their perspective. The report defines the intended
Design
applications on a detailed level. On the basis of this document, the integration of the
specification
existing technologies will start and they will be merged in a single, coherent architecture.
Developer
version
(prototype
0)
Revised
system
design
document
A developer version (prototype 0) is due by month 12. This will be used for integrating
the modules developed in WP3-7, which will start at month 12. Users will get access to
this version as a technology preview.
Initial feedback from users and developers will be used for making a revised system
design document which is due to month 15.
This will be used for developing the prototype 1, which is due in month 18. In this
prototype, functionality from all work packages WP3-WP7 will be integrated, however,
some functionality will still be missing (e.g. adaptive sampling for subgroup discovery in
Prototype 1
WP3). This prototype will be delivered to the users that will use them in their
experimental applications.
User
evaluation
report
Users will evaluate whether the system meets the requirements specified in user
requirements, and whether it meets the system design. The users will write an evaluation
report, which is due to month 24. In this month, an external workshop will be held
(WP11), where additional user groups and partners for commercial exploitation (WP10)
will be targeted. Users will have installed internally and even partially externally
accessible web-sites, which will feature initial applications of the technology.
Final design The user evaluation of prototype 1 will lead to modifications of the system design, where
document the final design document will be delivered in month 27.
revision
This will be input for the development of the prototype 2, which is due to month 30. It
will integrate all technology developed in work packages WP3-WP7, and will be
delivered to the users. With the full functionality available, the users will work intensely
Prototype 2
on their applications. The web-sites should be publicly accessible, so that feedback from
a wider audience can be gathered.
32
Experience in applications will lead to a revision release of prototype 2 in month 32.
Revision
The revision will cover the base system as well as the modules from work packages
release of
WP3-WP7.
prototype 2
36
Final user
evaluation;
Disseminati
on
workshop
At the end of the project, the users will deliver a report describing their applications,
and they will give a final evaluation. A workshop for dissemination to a wider
audience, for identifying partners for follow-up projects (WP11), and for partners for
potential commercialisation (WP10) will be held in this month.
19
SPIN!, IST-99-10536, 15.06.1999
20
Pert diagram
The diagram shows dependencies between tasks. To give a better overview, we have grouped tasks by
category. Task numbers refer to the Gantt-Chart, which shows the exact starting and end date of tasks
Kick-Off meeting
2.1
1
User requirements
8.1, 8.2
9.1,
10.1
Visualization
Requirements
6.1, 7.1
System design
2.2
8
Data Mining
3.1, 3.3, 3.5,3.7,
4.1, 4.2, 4.7,
5.1, 5.4
Visualization
7.1-7.2
Prototype 0
2.4
12
Test &
Evaluation
8.3,
9.2, 9.3, 9.4
10.2
Design revision
2.2
15
Data Mining
3.4, 3.8,
4.3, 4.4,
5.3, 5.6
Visualization
6.1, 6.4
Prototype 1
2.5
18
Seismic data
& statistical
offices
8.4, 8.5
9.4, 9.5
10.3, 10.4,
11.3
Evaluation
2.2
Data Mining
3.2, 3.6,
4.5,
5.2, 5.5
Visualization
6.2, 6.3, 6.5
24
Prototype 2
2.6
30
Real-world
Application
8.6, 8.7, 8.8,
8.9, 9.6, 9.7,
10.5
Final Workshop
11.5
36
20
SPIN!, IST-99-10536, 15.06.1999
21
Work package description
Co-ordination
The project brings together researchers, software developers, and users from a number of European
countries, with different backgrounds and different approaches to spatial analysis and geographical
modelling. To manage technology development, research, and exploit the component tools and system
effectively, working package WP1 is devoted to co-ordination. Special attention has been given to
define clear responsibilities and modular work package responsibilities and deliverables. The SPIN
consortium will meet approximately every four months to establish and maintain an effective team.
The management plan is based on a successfully applied EU project co-ordinated by partner P1 that is
detailed in section C5 below.
Technology development
WP2 has the objective of designing an integrated system for Data Mining and GIS. This work
package has the overall task of the technological integration of the existing GIS and Data Mining
software, and to incorporate the modules developed in the other work packages in a coherent manner.
It’s the project‘s technological hub, to which all partners will deliver, and whose deliverables all
partners will need to have access to at some point. This will serve as a technological basis. We
conceptually distinguish a base system and an integrated Spatial Mining system.
Figure 4. The basic architecture of SPIN. Spatial mining and visualization methods can be added as
plug-ins to the base system. Clients can access the system over the internet
21
SPIN!, IST-99-10536, 15.06.1999
22
The base system contains






internet enabled GIS for automatic generation of interactive thematic maps
Data Mining methods for nearest neighbour, decision trees, association rules, subgroup discovery,
inductive logic programming,
visualisation for these methods
data transformation capabilities for discretization, restriction, projection, union, join, and
calculated rows
access to heterogeneous data sources (JDBC-compliant databases, ODBC, flat files, spatial data
interfaces etc.), also over the internet
facilities for organising and documenting analysis tasks.
The existing Data Mining methods complement the spatial mining methods in the task of “explaining”
spatial patterns in terms of non-spatial attributes. The internet enabled basis GIS module contains
facilities for interactive manipulation of thematic maps. To provide automated visualisation, the GIS
incorporates the knowledge of thematic cartography in the form of generic, domain-independent rules.
To choose the adequate presentation techniques for given data, it takes into account data
characteristics and relations among data components or attributes. The automation of map generation
releases the user from the necessity of thinking how to present the data and from the routine work of
map building and allows you to concentrate on the analysis of your data. This work package includes
the steps of requirement analysis, design, implementation, testing, and documentation.
Building the base system requires to integrate an already existing GIS tool and an existing Data
Mining platform, both developed by partner P1. For tight integration a common Task manager, Data
Management Layer, Extension API, and user interface have to be defined and implemented. The
integrated system incorporates the Spatial Mining and visualisation methods developed in WP3-7
into the base system.
Main input of this work package are the existing Data Mining and GIS systems, and the modules
developed in WP3-7, the main output will be the integrated system. This integrated system will be
developed in three main stages: prototype 0 (developer version), prototype 1 and prototype 2. User
feedback will be gathered and evaluated from the first day on and will be used for improving the
system.
Research
Work packages WP3, WP4, WP5 develop methods for Spatial Data Mining that can be added as a
plug-in to the base system. A variety of methods have been selected for implementation, partially
depending on previous experiences and results of the partners. Each partner has chosen a method for
adaptation to whose advancement he has already made a theoretical and practical contribution, so that
he is well acquainted with the subtleties of the chosen method; yet by combining the project partners
expertise a broad range of advanced Data Mining techniques will be covered, from




Bayesian Statistics (Partner P1, publication 6,8,9) and
Neural Networks (Partner P1, publication 7) to
symbolic approaches from Machine Learning and Inductive Logic Programming (Partner P1,
publication 4, 10,11, Partner P2, publication 1,2,3) and
genuine approaches to Spatial Cluster Analysis (Partner P4, publication 2,4).
This gives the project a quite unique blend of depth of expertise with a broad range of methods
covered. Since all these methods can be launched within a single, coherent platform, the project can
also contribute to a comparison of the relative strengths and weaknesses of the methods and develop
guidelines for their use in spatial mining.
22
SPIN!, IST-99-10536, 15.06.1999
23
All these work packages include a) state of the art review; b) theoretical advances, which will be
communicated in a report; c) implementation and validation of the methods; d) integration with the
base system; e) application to real-world tasks; f) documentation and final report.
These stages are synchronised with the technology development cycle. These work packages have as
their input previous theoretical and practical work of the partners and will have as their main output a
theoretical description of the respective methods.
Machine Learning (WP3). This work package is mainly concerned with the adaptation of symbolic
machine learning methods to spatial data analysis. In particular methods to be adapted are Inductive
Logic Programming algorithms for the discovery of subgroups and spatial association rules. They tend
to be search intensive, and when dealing with large sets of data and high dimensional dependencies,
scalability might become a problem. Moreover, most have been developed in order to satisfy classical
properties of consistency and completeness, while in spatial data mining people are interested to
detect patterns that satisfy minimum criteria for support and consistency. Adaptation of these machine
learning tools will be based on the use of adaptive sampling, i.e. active learning techniques based on
Bayesian Decision Theory, or on more efficient search strategies. Another contribution of this work
package is the definition of appropriate algorithms for the automated extraction from vectorised maps
of symbolic descriptions of parts (e.g., cells) of a map.
Bayesian Statistics (WP4). A spatial relation may be described by a number of different models,
leading to widely varying results. Currently the support for assessing and selecting models in GIS is
very limited. Based on the extrapolation of the uncertainty of individual predictions of different
models we will develop methods for a well-founded selection or combination of models. In the last
years computationally intensive Bayesian methods have been developed that compare favourably with
classical approaches. Instead of selecting an “optimal” model they generate a whole distribution of
models which characterise their uncertainty in the light of the available data. On the one hand they
derive predictive distributions for new inputs reflecting the actual information. On the other hand they
allow a rigorous assessment of the adequacy of different model types. Partner P1 (publication 8,9) has
developed Bayesian classification methods which use a Bayesian ensemble of decision trees or neural
networks. These methods have already been successfully applied to credit scoring and will now be
adapted to spatial data.
Exploratory Spatial Data Analysis (WP5). This work package will explore methods of extending
existing methods of spatial pattern detection. Currently ESDA methods tend to be concerned solely
with the detection of spatial pattern and often overlook other data attributes. This shortcoming will be
addressed by extending existing tools developed by partner P4 to handle attribute interaction with
spatial location and to consider how temporal changes in spatial data can be investigated (see partner
P4, publications 4 and 2). The tool will be expanded to use multiple search methods in addition to the
current heuristic search used currently. There is also potential to investigate how different statistical
tests of clustering can be used in the tool.
Work packages WP6 and WP7 develop methods for visualisation of spatial and temporal
information, and for the visualisation of Data Mining methods developed in WP3-5.
Visualisation of spatial and temporal data (WP6). In most areas, spatially referenced data also
refer to different moments or intervals in time. The study of such data is meaningless if their
development in time is not taken into account. Analysis of spatially referenced data should be
supported by their visual presentation in maps. Spatio-temporal data require substantial advancement
of the traditional map form of presentation towards dynamics and high user interactivity. The work
package aims at development of methods of visualisation of spatio-temporal data that can facilitate
analysis of such data. The methods include not only graphical presentation by itself but also various
data transformations and interactive manipulation of the displays.
23
SPIN!, IST-99-10536, 15.06.1999
24
Visualisation of Data Mining results (WP7). The form of presentation of data mining results to the
user is crucial for their appropriate interpretation. Large amounts of information or complex concepts
can be more easily comprehended when represented graphically. This especially applies to data and
concepts having spatial reference or distribution. The objective of this work package is to design
appropriate graphical techniques to represent results of the data mining methods developed within the
project. The approach to be taken is a combination of cartographic and non-cartographic displays
linked together through simultaneous dynamic highlighting of the corresponding parts (see partner P1,
publication 1). The non-cartographic displays will represent the data mining results in summarised,
generalised form while maps will provide the transition from general descriptions to individual spatial
objects and phenomena characterised by them.
Application
The system will be used in several applications. One criterion for the selection of application areas is
that a broad range of problem domains of special importance for the EU is covered, underlining the
generality of the approach. A second criterion is that each of these areas should contribute in a unique
way to evaluating/validating the adequacy of the chosen approach to Spatial Mining. This makes the
evaluation process more focussed. An objective common to all application areas is to explore the
applicability of advanced Data Mining methods. Specifically, spatial subgroup discovery, spatial
Markov Chain Monte Carlo, and localised Spatial Point Pattern Analysis will be evaluated in each
application area.
Application to Seismic Data (WP8). In WP 1-7 a generic Spatial Mining System is developed. Such
a kind of system has the important advantage that it has a potentially broad range of application areas
and promotes technology reuse. However, some application areas will also need to incorporate
specialised analysis methods. One of the main risks associated with the development of generic
information technology is that an architecture that is not extensible may end up in not addressing the
real needs of the user. Work package WP8 addresses this problem in an exemplary way. This will
ensure that the generic system will be designed in a modular and extensible way right from the start.
A key component is the plug-in architecture of the already existing Data Mining platform developed
by partner P1, that allows for an easy integration of new modules. The application area selected for
this task is earthquake prediction. This is a well-established scientific field belonging to physical
geography, where a great amount of spatio-temporally referenced data from different sources is
available. Research in this area has an obvious and great potential benefit for public health and quality
of life. Advances in earthquake prediction could help to prevent massive financial losses. The
objective of this work package is to adapt the generic system to the specialised application area of
earthquake prediction and hazard assessment by integrating methods for natural hazard assessment
that have been developed by partner P3. For achieving this goal, an integration layer between the
generic Spatial Mining system and the specialised methods implemented by partner P3 has to be
designed.
Partner P7, which is active in the area of earthquake prediction for a long time, will profit from this
technology by getting access to advanced and complementary methods for data analysis and by
getting an instrument for the web-based dissemination of research results.
Web-based dissemination of census data from statistical offices. A second application area is the
analysis and web-based dissemination of census data from statistical offices. Here the main objective
is to put to practical use the timely, cost-effective dissemination of statistical information over the
internet. Partner P8 has several years’ experience in developing tools for web based access to large
spatial data sets and provides an academic service for access to census data. These tools are primarily
for visualising database contents, data browsing and locating and mapping spatial data and they can
handle spatial and aspatial referencing systems. Partner P8 also has access to a SUNE6500 superserver for academic applications. Additionally the project will be supported by the national census
agency, which currently with the partner are planning the tools and services for public access to the
forthcoming national census in 2001.
24
SPIN!, IST-99-10536, 15.06.1999
25
This work package will allow evaluation of the efficiency of the developed methods and of the
responsiveness of the application as well as acceptance by customers of statistical offices. Potential
problem areas are the availability of bandwidth, the number of concurrent users, and the size of maps
and data sets. Especially if Data Mining analysis over the internet is permitted, the performance of the
server will be of central importance. Experiences in this application area will be crucial for improving
the prototype 1 system for better efficiency (which is a task within WP2).
Dissemination and Exploitation
Web-based brokering. Statistical offices, public agencies, and scientific institutions often face the
problem that their initial efforts to build up a public database are externally funded, but the
maintenance of such a service is not. Funding agencies require more and more that these institutions
develop business plans for commercialising such a service in the long-run (at least for-non scientific
use). The aim of this work package, for which the industrial partners will be responsible, is to develop
a detailed concept for a web based information brokering service with georeferenced data as a
foundation for a cost-effective dissemination of data.
Web-based, interactive Spatial Mining can add a tremendous value to the mere distribution of data.
This added value can be the key for commercialising the distribution of data for statistical offices,
public agencies, and scientific institutions. What is new about this proposal is that the customer does
not need to buy or to install any complex and expensive software on his computer, yet is not confined
to the usual printed, non-interactive reports. An interactive thematic map is delivered over the internet
using the Java technology. This map can be used by the customer for further exploration as well as for
presentation and decision making. There will be different levels of service, as suggested by the
following example business scenarios. The project will deliver technology to solve tasks 1-4 and
provides the technological basis for task 5. The feasibility of this concept will be tested in a
demonstrator.
Customer needs
Business Solution
Customer
supplies
Customer gets
1. An institute for ecological
studies prepares a environmental
report and needs a visualisation for
their vegetation data and vegetation
maps to make a presentation
2. A statistical office needs a
visualisation of data about land use
Building a
thematic map for
predefined data
and map
Data & Maps
Interactive map on the
internet
Building a
thematic map for
predefined data
Building a map,
data & map
brokering
Data
Interactive map on the
internet
4. A company running a power
plant needs visualisation of
monthly aggregated environmental
data for monitoring.
Maps periodically
updated from a
database via the
internet
5. A consulting company prepares
a market study for the chances of
sustainable tourism; for this it
needs access to data from different
Geomarketing
consulting
Description
Location;
Data that have
to be
periodically
refreshed
A descriptive
task
3. A department for urban
development needs a local map
showing hazard risks for decision
making
Description of Interactive Map with cluster
Data &
detection, significance
Location
testing
Interactive Map with cluster
detection, significance
testing, periodically updated
Interactive Map with cluster
detection, significance
testing, visualisation of data
mining results; a summary
25
SPIN!, IST-99-10536, 15.06.1999
sources such as census data and
data about nature protection and
pollution in this area.
26
report about Data Mining
results
Dissemination. The technology developed in this project is of a generic nature and has a broad range
of potential applications. Yet potential user groups may be unaware of the existence of the type of
technology the project develops, or they may have false expectation about it. The aim of this work
package is to address the general public, as well potential users and partners for commercial
exploitation.
Dissemination will be an ongoing activity and will include organisation of workshops, maintaining a
project web page, systematically identifying additional user groups that could act as partners in
follow-up projects, providing project descriptions for the general public.
Partner 6 will perform a feasibility study for commercialising technology developed especially within
the application to seismic data. To this end they will actively search for a partner in the area of noiselevel zoning. This is expected to become a major issue in the next two to three years in Holland,
because of anticipated new legislation. This third application, where the partner will not be directly
involved into the project, is also an application that demonstrates the potential of the technology for
environmental decision making.
A project sheet will be due in month 3, as well as a project web-site. Beginning with month 12, when
a technological preview version will be available, potential additional user groups and potential
customers will be systematically identified and contacted, so that knowledge about the project will be
spread around. This activity will increase when the prototype 1 becomes available in month 18. A
public workshop will be organised bringing together users, developers, potential users, as well as
other interested people, in month 24. A second public workshop will be organised in month 36,
concluding the project.
26
SPIN!, IST-99-10536, 15.06.1999
B3.
Workpackage number :
Start date or starting event:
Participant number:
Person-months per participant:
27
Workpackage description
WP1 - Coordination
0
P1
P4
28
6
Objectives
Overall and technical management. This will involve
A) Overall Management






Ensure that the various phases of the project are properly coordinated
Development of project workplan
Monitoring and reviewing progress of work
Handling administrative procedures relating to European Commission
Reporting to the European Commission
Supporting a good communication between the partners
B) Technical Management
 Writing of a project handbook including quality management plan
 Responsibility for critical technical decision which affect the project as a whole
 Definition of quality standards relevant to the project and determination how to satisfy them
Description of work
A) Overall Management
T1. Ensure that the various phases of the project are properly coordinated
T2. Development of project workplan (partners P1, P4)
T3. Monitoring and reviewing progress of work
T4. Handling administrative procedures relating to European Commission
T5. Reporting to the European Commission
T6. Scheduling of meetings
B) Technical Management
T7. Write a project handbook including quality management plan (partners P1, P4)
T8. Responsibility for critical technical decision which affect the project as a whole (partners P1, P4)
T9. Define quality standards relevant to the project and determination how to satisfy them (partners P1, P4)
Deliverables
D1. Project workplan (T2)
D2. Reports for EC (T5)
D3. Project handbook (T7)
D4 Periodical project meetings (T6)
Milestones and expected result
Milestones of this workpackage are synchronized with the milestones of WP2:
M1: System design (8), M2: Prototypes 0 (12), M3: prototype 1 (18), M4: prototype 2 (30)
27
SPIN!, IST-99-10536, 15.06.1999
B3.
Workpackage number :
Start date or starting event:
Participant number:
Person-months per participant:
28
Workpackage description
WP2 Integrate Data Mining and GIS (Technology development)
0
P1
P4
P3
P5
P6
30
9
2
18
10
Objectives
This workpackage has the overall task of the technological integration of the existing GIS and Data Mining software,
and to incorporate the modules developed in the other workpackages in a coherent manner. It’s the project‘s
technological hub, to which all partners will deliver, and whose deliverables all partners will need to have access to at
some point. For tight integration of existing components a common Task manager, Data Management Layer, Extension
API, and user interface have to be defined and implemented. The base system is designed as an object-oriented plug-in
architecture, facilitating technological integration. Unified Modelling Language (UML) will be used for documentation
and design to ensure product quality. CORBA and RMI as a middleware for integration will be evaluated. The
integrated system incorporates the Spatial Mining and visualization methods developed in WP3-7 into the base system.
Description of work
T1. Organize kick-off meeting for identification of users needs
T2. Design of the SPIN! system architecture
T3. Develop efficient methods for transfer of data and maps over the internet (partner P6)
T4. Implementation of developer version (prototype 0)
T5. Technological integration of software developed in Task 1.3, 1.4 with spatial mining modules and visualization
modules, resulting in prototype 1
T6. Testing and validation, revision of design, getting user input, improving system, resulting in prototype 2
T7. Revision release of second prototype (final release)
Deliverables
D1. System design document (T1, T2)
D2. Prototype 0 (software & documentation) (T3)
D3. Implementation of efficient methods for transfer of data and maps over the internet (partner P6)
D4. Prototype 1 (software & documentation) (T4, T5)
D5. Prototype 2 (software & documentation) (T6)
D6. Revision release of prototype 2 (Final Release) (software & documentation) (T7)
Milestones and expected result

A user-friendly, internet enabled, extensible Spatial Mining software tightly integrating Data Mining and GIS
functionality

System providing a broad variety of methodological approaches to Spatial Mining that can be operated within a
single environment
M1. Specification of design (month 8)
M2. Delivery of Prototype 0 (month 12)
M3. Delivery of prototype 1 (month 18)
M4. Delivery of prototype 2 (month 30)
28
SPIN!, IST-99-10536, 15.06.1999
B3.
Workpackage number :
Start date or starting event:
Participant number:
Person-months per participant:
29
Workpackage description
WP3 – Extending machine learning methods to spatial mining
0
P2
P1
24
18
Objectives
This workpackage mainly concerns with the adaptation of symbolic machine learning methods to spatial data analysis.
In particular methods to be adapted are Inductive Logic Programming algorithms for the discovery of subgroups and
spatial association rules. Moreover, some have been developed in order to satisfy classical properties of consistency and
completeness, while in spatial data mining people are interested to detect patterns that satisfy minimum criteria for
support and consistency. Adaptation of these machine learning tools will be based on the use of adaptive sampling, i.e.
active learning techniques based on Bayesian Decision Theory, or on more efficient search strategies, to increase
scalability. Another contribution of this workpackage is the definition of appropriate algorithms for the automated
extraction from vectorized maps of symbolic descriptions of parts (e.g., cells) of a map.
By evaluating Bayesian posterior distributions or their approximations, the uncertainty of subgroup quality indicators
may be assessed. Relatively large subgroups with potentially high indicator values have a high utility and the sampling
of new data from the corresponding spatial locations is rewarding. Active learning stops if the cost (negative utility) of
collecting new data is higher than the expected utility of the subgroups that might be discovered.
Description of work
T1. Develop concepts for the definition of subgroup criteria linking space, time, domain knowledge.
T2. Define criteria for adaptive sampling integrating the utility of subgroups as well as the cost of data collection and
computation. Develop adaptive sampling methods based on Bayesian posterior distributions or their approximations
T3. Investigate properties of spatial association rules and adapting rule discovery system to spatial association rules
T4. Investigate the representation language to be adopted for the representation of parts of a vectorized map.
T5. Software implementation of spatio-temporal subgroup discovery (without adaptive sampling)
T6. Software implementation of spatio-temporal subgroup discovery with adaptive sampling
T7. Software for the discovery of spatial association rules
T8. Develop algorithms for the extraction of symbolic descriptions from vectorized maps
T9. Application and evaluation of implemented methods to real-world data
Deliverables
D1. Theoretical report on spatio-temporal subgroup discovery (T1)
D2. Theoretical report on adaptive sampling (T2)
D3. Theoretical report on spatial association rules (T3)
D4. Specifications of descriptions to be automatically extracted from vectorized maps (T4)
D5. Software for spatio-temporal subgroup discovery (T5)
D6. Software for adaptive sampling (T6)
D7. Software for the discovery of spatial association rules (T7)
D8. Software for the extraction of symbolic descriptions from vectorized maps (T8)
D9. Report evaluating the application of first-order learning methods to spatial data (T9)
Milestones and expected result
The work done in this workpackage will advance the state of the art in spatial data analysis by adapting methods from
Machine Learning to Spatial Mining, especially first-order learning methods. They are a natural and promising approach
to Spatial Mining, since they allow to represent spatial relations directly. Work in this package is synchronized with the
milestones M1-M4 of WP2: for each prototype a set of methods will be delivered
29
SPIN!, IST-99-10536, 15.06.1999
B3.
Workpackage number :
Start date or starting event:
Participant number:
Person-months per participant:
30
Workpackage description
WP4 - Generalize Bayesian Markov Chain Monte Carlo to Spatial
Mining
0
P1
P4
P6
30
4
6
Objectives
Currently the support for assessing and selecting models in GIS is very limited. Based on the extrapolation of the
uncertainty of individual predictions of different models we will develop methods for a well-founded selection or
combination of models. Partner P1 has developed Bayesian classification methods which use a Bayesian ensemble of
decision trees or neural networks, which will be adapted to spatial data. We will use the Bayesian approach in several
directions: calculation of a predictive density characterizing the predictive or classification uncertainty for new inputs
The main algorithms use asymptotic expansions and Markov Chain Monte Carlo (MCMC); selection of optimal models
by comparing their performance according to the Bayes factor and related methods; Generation of ensembles of models
of different type, e.g. using Bayesian model averaging and reversible jump MCMC. An approximate Bayesian
techniques is the bootstrap. We will analyse the relative merits of this approach in comparison to Bayesian models.
Besides the classical spatial statistics models (e.g. kriging) we will concentrate on localized models which adaptively
partition the input area and generate different submodels. Promising candidates are radial basis functions, mixtures of
experts and multivariate adaptive regression splines. Selection criterion is their adequacy for the intended application.
Description of work
T1. Report reviewing current approaches of spatial classification, prediction and interpolation
T2. Implementation of selected current approaches using bootstrap techniques.
T3. Report on advanced spatial models and the corresponding Bayesian algorithms.
T4. A basic implementation of Bayesian MCMC for selected models.
T5. Implementation of MCMC- or approximate Bayesian model selection / averaging.
T6. Report on performance evaluation for spatial mining methods and guidelines for selecting models depending on data
and prior conditions.
T7. Implement a generic library for spatial data transformations used by the mining algorithms (Partner P6, P4)
Deliverables
D1. Report reviewing current approaches of spatial classification, prediction and interpolation (T1)
D2. Implementation for bootstrap (T2)
D3. Report on advanced spatial models and the corresponding Bayesian models (T3)
D4. Implementation for MCMC (T4)
D5. Implementation for model selection (T5)
D6. Report on performance evaluation for spatial mining methods and guidelines (T6)
D7. Generic software library for spatial data transformations (T7)
Milestones and expected result
 adaptation of several advanced statistical models to the spatial domain,
 a comprehensive assessment of prediction/classification uncertainty for GIS,
 flexible framework for model formation, and model checking in a GIS-context.
Work in this package is synchronized with the milestones M1-M4 of WP2, where methods will be delivered
30
SPIN!, IST-99-10536, 15.06.1999
B3.
Workpackage number :
Start date or starting event:
Participant number:
Person-months per participant:
31
Workpackage description
WP5 – Adapt and integrate methods for spatial pattern analysis
0
P4
36
Objectives
This work package will explore methods of extending existing methods of spatial pattern detection. Currently ESDA
methods tend to be concerned solely with the detection of spatial pattern and often overlook other data attributes. This
shortcoming will be addressed by extending existing tools developed by partner P4 (Partner P4, publication 3) to handle
attribute interaction with spatial location and to consider how temporal changes in spatial data can be investigated.
The tool will be expanded to use multiple search methods in addition to the current heuristic search used currently.
These methods will include genetic algorithms, artificial life, and multi-agent techniques (WP 3). Partner P4 has already
carried out some limited experiments with these techniques (Partner P4, publication 3) but will also investigate ways that
the search techniques can be used together in the form of a hybrid search system.
There is also potential to investigate how different statistical tests of clustering can be used in the tool. The development
of the system as a modular Java based program allows other tests to be dropped into the tool for testing and comparison.
Combined with this work, the methods developed in this work package will be designed to work closely with input and
output functions developed in work packages 2 and 7. This will include the evaluation of CORBA and ODBC methods
for data input and output.
Description of work
T1. Investigate algorithms for handling attribute interaction with spatial location
T2. Implement attribute interaction with spatial location
T3. Evaluate statistical clustering tests
T4. Implement selected statistical clustering tests
T5. Investigate algorithms for multiple search
T6. Implement algorithms for multiple search
T7. Testing and evaluation of software tool.
Deliverables
D1. Theoretical paper on algorithms for handling attribute interaction with spatial location (T1)
D2. Implementation of attribute interaction with spatial location (T2)
D3. Theoretical paper evaluating statistical clustering tests (T3)
D4. Implementation of selected statistical clustering tests (T4)
D5. Theoretical paper on algorithms for multiple search (T5)
D6. Implementation of algorithms for multiple search (T6)
D7. Reports of testing and evaluation of software tool. (T7)
Milestones and expected result
This workpackage will provide a variety of spatial pattern analysis methods for SPIN! system.
Work in this package is synchronized with the milestones M1-M4 of WP2, where the implemented methods will be
successively integrated into the prototype
31
SPIN!, IST-99-10536, 15.06.1999
B3.
Workpackage number :
Start date or starting event:
Participant number:
Person-months per participant:
32
Workpackage description
WP6 - Support of visual analysis of time-dependent spatial data
0
P1
P4
28
12
Objectives
In most areas, spatially referenced data also refer to different moments or intervals in time. The study of such data is
meaningless if their development in time is not taken into account. Analysis of spatially referenced data should be
supported by their visual presentation in maps. Spatio-temporal data require substantial advancement of the traditional
map form of presentation towards dynamics and high user interactivity. The workpackage aims at development of
methods of visualisation of spatio-temporal data that can facilitate analysis of such data. The methods include not only
graphical presentation by itself but also various data transformations (e.g. calculation of the absolute or relative
magnitude or the rate of change since the previous or the specified moment, time aggregation, etc.) and interactive
manipulation of the displays. Thus, the user may move forth and back along the time axis, vary the animation step or the
length of the aggregation interval, select objects or areas in the map to view data and temporal trends for them in detail,
possibly, in supplementary non-cartographic displays, and so on. The results of data transformation may be directed to
the data mining procedures.
Description of work
T1. Review the existing types of time variation (e.g. changes in object existence, position, shape, or associated attribute
values) and analysis tasks that can emerge in relation to these types.
T2. Develop combined visualisation-interaction methods productive for fulfilling these analysis tasks.
T3. Software implementation of the visualisation and interaction methods and their selection depending on
characteristics of data and their temporal variation.
T4. Develop algorithms for investigation of temporal changes
T5. Implementation of algorithms for investigation of temporal changes
T6. Evaluate methods developed in T1-T5 in applciations
Deliverables
D1. Rule base on application of visualisation and interaction techniques depending on characteristics of data and the
type of their time variation. (T1)
D2. Software library implementing the combined visualisation-interaction methods proposed.(T2)
D3. Expert system engine performing selection of methods according to characteristics of data. (T3)
D4. Report describing the implemented visualisation-interaction methods (T3)
D5. Theoretical paper on algorithms for investigation of temporal changes (partner 4) (T4)
D6. Implementation of algorithms for of temporal changes (partner 4) (T5)
D7. Evaluation report (T6)
Milestones and expected result
Advancing the state of the art in visualization methods especially for the visualization of temporal data
Work in this package is synchronized with the milestones M1-M4 of WP2, where the implemented methods will be
successively integrated into the prototype.
32
SPIN!, IST-99-10536, 15.06.1999
B3.
Workpackage number :
Start date or starting event:
Participant number:
Person-months per participant:
33
Workpackage description
WP7 - Visualisation of data mining results
0
P1
P4
28
12
Objectives
The form of presentation of data mining results to the user is crucial for their appropriate interpretation. Large amounts
of information or complex concepts can be more easily comprehended when represented graphically. This especially
applies to data and concepts having spatial reference or distribution. However, to play their role effectively, graphical
displays must be properly designed in respect to the principles of human perception. The objective of this workpackage
is to design appropriate graphical techniques to represent results of the data mining methods developed within the
project. This work will be informed by the results obtained by partner P4 (publication 9) during development of public
access GIS. The approach to be taken is a combination of cartographic and non-cartographic displays linked together
through simultaneous dynamic highlighting of the corresponding parts (Partner P1, publication 3, Partner P4,
publication 3). The non-cartographic displays will represent the data mining results in summarised, generalised form
while maps will provide the transition from general descriptions to individual spatial objects and phenomena
characterised by them.
The techniques to be developed will apply general principles of graphical presentation established through cognitive
psychological studies and analysis of ”best practice” in visualisation. These principles are expounded in the literature on
graphics design and cartography.
Description of work
T1: Identify the types and formats of results produced by the data mining and statistical methods to be developed and
develop a methodology of visual representation of the results based on principles of graphics design. (partners P1, P4)
T2. Implementation of visualization method for spatial subgroup discovery (partners P1)
T3. Implementation of visualization method for spatial association rules (partners P1)
T4. Implementation of visualization method for Bayesian classification (partner P1)
T5. Implementation of best practice in visualisation methods in ESDA (partners P4)
T6. Testing & validation in applications (partners P1, P4)
Deliverables
D1. Description of the presentation methods proposed to apply to results of the considered data mining methods (T1)
D2. Implementation of visualization method for spatial subgroup discovery (T2)
D3. Implementation of visualization method for spatial association rules (T3)
D4. Implementation of visualization method for Bayesian classification (T4)
D5 Implementation of best practice in visualisation methods for ESDA (T5)
D6 Report on current and potential visulalisation methods in ESDA (T6)
Milestones and expected result
This WP provides visualizations for the methods developed in WP3-5, so that they can be used in linked displays.
Work in this package is synchronized with the milestones M1-M4 of WP2, where the implemented methods will be
successively integrated into the prototype
33
SPIN!, IST-99-10536, 15.06.1999
B3.
Workpackage number :
Start date or starting event:
Participant number:
Person-months per participant:
34
Workpackage description
WP8 – Application to seismic and volcano data
0
P7
P3
P1
P4
P6
32
18
3
3
12
P5
2
Objectives
The objective of this workpackage is to adapt the generic system to the specialised application area of earthquake and
volcanic eruption prediction, and seismic hazard assessment. This will be achieved by integrating Data Mining methods
for natural hazard assessment that have been developed by partner P3 and P7. Partner P7 runs monitoring observatories
at the Merapi volcano, which has been classified as a high-risk volcano. Integration of these monitoring data into the
analysis process may give new results for understanding volcanic hazards. As an extension to the core SPIN! system, an
online intelligent system for processing and analysis of natural hazard monitoring data as well as for decision making
support will be integrated. For achieving this goal, an integration layer between the generic Spatial Mining system and
the specialized methods implemented by partner P3 has to be designed.
From the point of view of the application area, the technical objectives of using the SPIN! system are:
 Cartographic representation of earthquakes and volcano monitoring information.
 Automatic extraction of the most essential information on seismic hazard (parameters of seismic regime such as
seismic activity, seismic energy, b-value and so on) from seismic monitoring data.
 Automatic extraction of essential information on volcanic hazard (seismic activity, dome deformation measurement,
rock fall, gas chromatography as well as geology, topography and land use data).
 Support of the administrative decisions referenced to seismic and volcano hazard.
 Support of earthquake and volcano eruption prediction research.
This application will test the SPIN! system on scientifically highly important real-world problems and will provide
valuable feedback for directing the development efforts. A succesful application would demonstrate the usefulnes of the
SPIN! system for applications in the natural sciences.
Description of work
T1. Define user requirements
T2. Investigate methods of space-time analysis and data mining of seismic data.
T3. Investigate methodology for designing seismic hazard information models
T4. Implement software for the proposed methods as a plug-in to the SPIN! architecture.
T5. Evaluation report
T6. Apply of software tools to the seismic active Eastern Mediterranean region.
T7. Apply software tools to the high risk Merapi volcano.
T8. Integrate of continuous monitoring data into the analysis process.
T9. Final Report on the application of Spatial Mining to seismic and volcano data
Deliverables
D1. User requirements report (T1)
D2. Description of the methods of space-time analysis and data mining of seismic data. (T2)
D3. Description of the methodology for designing seismic hazard information models (T3)
D4. Software implementing the proposed methods as a plug-in to the SPIN! architecture. (T4)
D5. Evaluation report (T5)
D6. Application of the software tools to the seismic active Eastern Mediterranean region. (T6)
D7. Application of the software tools to the high risk Merapi volcano. (T7)
D8. Integration of continuous monitoring data into the analysis process. (T8)
D9. Report on the application of Spatial Mining to seismic and volcano data (T9)
34
SPIN!, IST-99-10536, 15.06.1999
35
Milestones and expected result
Cartographic representation of earthquakes and volcanic monitoring information, automatic extraction of the most
essential information on seismic hazard from seismic monitoring data; automatic extraction of essential information on
volcanic hazard, support of the administrative decisions referenced to seismic and volcanic hazardand support of
earthquake and volcanic eruption prediction research.
M1. Description of methodology (10)
M2. Software integration into SPIN system
M3. Applications to seismic and volcano data
Work in this package is synchronized with the milestones M1-M4 of WP2, where the application area helps to direct the
development by formulating user requirements and evaluations.
35
SPIN!, IST-99-10536, 15.06.1999
B3.
Workpackage number :
Start date or starting event:
Participant number:
Person-months per participant:
36
Workpackage description
WP9 – Web based dissemination of data from statistical offices
0
P8
P4
P1
P5
P6
34
6
3
2
4
Objectives
A second application area is the analysis and web-based dissemination of census data from statistical offices. Here the
main objective is to put to practical use the timely, cost-effective dissemination of statistical information over the
internet. This will allow to evaluate the efficiency of the developed methods, the responsiveness of the application, as
well as acceptance by customers of statistical offices. Potential problem areas are the availability of bandwidth, the
number of concurrent users, and the size of maps and data sets. Especially if Data Mining analysis over the internet is
permitted, the performance of the server will be of central importance. Experiences in this application area will be
crucial for improving the prototype 1 system for better efficiency (which is a task within WP2).
Description of work
T1. Defining user requirements
T2: Selecting and preparing maps and data for the application
T3: Adapt generic Spatial Mining system to the specific needs of the application and building a prototype web site with
maps and data
T4: Using the GIS as a web-based front end for Data Mining Tasks, using prototype 1 from WP2
T5: Collecting user experiences and improving the system
T6: Delivering the final release, using prototype 2 form WP2
T7: Writing a final report about the application and about user acceptance
Deliverables
D1. User requirements document (T1)
D2: Report containing description of data model and data characterization schema (T3)
D3: A prototype web site with interactive thematic maps that an be accessed over the internet (T2, T4)
D4: Evaluation report (T5)
D5: A prototype web site with interactive thematic maps based on SPIN! prototype 2 (T6)
D6: Report about user acceptance, recommendation for use (T7)
Milestones and expected result

A web site based at a statistical office used for web published dissemination of statistical data and spatial mining
over the internet

Knowledge about different types of users accessing such a system
 Practical experiences for the use of internet based spatial mining
M1. Prototype web site
M2. Extended web site containing prototype 1 from WP2
M3. Extended web site containing prototype 2 from WP2
Work in this package is synchronized with the milestones M1-M4 of WP2, where the application area helps to direct the
development by formulating user requirements and evaluations
36
SPIN!, IST-99-10536, 15.06.1999
B3.
Workpackage number :
Start date or starting event:
Participant number:
Person-months per participant:
37
Workpackage description
WP10 – Web-based information brokering service
0
P5
P6
P1
12
10
2
Objectives
Statistical offices, public agencies, and scientific institutions often face the problem that their initial efforts to build up a
public database are externally funded, but the maintenance of such a service is not. Funding agencies require more and
more that these institutions develop business plans for commercializing such a service in the long-run (at least for-non
scientific use). The aim of this workpackage, for which the industrial partners will be responsible, is to develop a
detailed concept for a web based information brokering service with geo-referenced data as a foundation for a costeffective dissemination of data.
Web-based, interactive Spatial Mining can add a tremendous value to the mere distribution of data. This added value
can be the key for commercializing the distribution of data for statistical offices, public agencies, and scientific
institutions. A web based information brokering service which distributes interactive thematic maps over the internet is
proposed. What is new about this proposal is that the customer does not need to buy or to install any complex and
expensive software on his computer, yet is not confined to the usual printed, non-interactive reports. An interactive
thematic map is delivered over the internet using the Java technology. This map can be used by the customer for further
exploration as well as for presentation and decision making. There can be different levels of service.
Description of work
T1. Identify needs of statistical offices, public agencies, and scientific institutions with respect to dissemination of georeferenced data
T2. Make a survey on existing services, describe business models applicable to this service, address property of rights
problematic
T3. Describe required technical infrastructure for web-based information brokering service
T4. Build a prototype web site for information brokering
T5. Prepare final report with specific recommendations for setting up such a service
Deliverables
D1. Report defining requirements of the application (T1)
D2. Report addressing existing services, business model and property of rights problematic (T2)
D3. Report technical infrastructure (T3)
D4. Prototype website (T4)
D5. Final report (T5)
Milestones and expected result
Detailed technical and economical guidelines for setting a web based dissemination service
Identification of advantages and risks
M1. Delivery of D2
M2. Delivery of D3
M3. Delivery of D4
37
SPIN!, IST-99-10536, 15.06.1999
38
B3.
Workpackage number :
Start date or starting event:
Participant number:
Person-months per participant:
Workpackage description
WP11 - Dissemination
0
P6
P5
P1
14
2
2
P8
8
P7
4
P4
8
Objectives
The technology developed in this project is of a generic nature and has a broad range of potential applications. Yet
potential user groups may be unaware of the existence of the type of technology the project develops, or they may have
false expectation about it. The aim of this workpackage is to address the general public, as well as potential users and
partners.
Dissemination will be an ongoing activity and will include organization of workshops, maintaining a project web page,
systematically identifying additional user groups that could act as partners in follow-up projects, and providing project
descriptions for the general public.
Partner P6 will make a feasibility study for commercializing the SPIN! system in the area of noise-pollution.
Description of work
T1. Maintaining a project web page,
T2. Providing project descriptions for the general public
T3. Organization of dissemination workshop 1,
T4. Feasibility study for commercialization
T5. Organization of dissemination workshop 2,
T6. Systematically identifying additional user groups that could act as partners in follow-up projects,
Deliverables
D1. Project web page (T1)
D2. Project description for the general public (T2)
D3. First dissemination workshop (T3)
D4. Second dissemination workshop (T5)
D5. Feasibility study on prospects of commercialization (partner P6) (T4)
Milestones and expected result

Effective dissemination of project results
 Identification of user groups for follow-up projects
M1. Project web-page (month 3)
M2. Workshop 1 (month 24)
M3. Workshop 2 (month36)
38
SPIN!, IST-99-10536, 15.06.1999
39
Part C
C1. Title. Spatial Mining for Data of Public
Interest
SPIN!
Proposal No. IST-1999-10536
Proposal for:
IST programme, 1.1.2-5.1.4 Cross-Programme Action CPA4: New Indicators and statistical
methods
39
SPIN!, IST-99-10536, 15.06.1999
40
C2. Contents for part C
C3. Community added value and contribution to EU policies 3
C4. Contribution to Community social objectives 4
C5. Project management 5
C6. Description of the consortium 6
C7. Description of the participants 8
C8. Economic development and scientific and technological prospects 18
Appendix – Publications of partners cited in part B
40
SPIN!, IST-99-10536, 15.06.1999
41
C3. Community added value and contribution to EU policies
Building a spatial mining system is a demanding task, since it requires expertise in many fields
including
 Geographic Information Systems,
 Cartography,
 Statistics,
 Machine Learning, and
 Databases,
as well as excellent software engineering skills. The consortium has been carefully chosen to ensure
uncompromising competence in all these areas. It includes





two industrial partners active in Data Mining and Geographic Information Systems (Dialogis
GmbH, Germany; PGS, Holland),
a university and a national research center for informatics active in the areas of Data Mining,
Machine Learning, and GIS (University of Bari, Italy; GMD, Germany),
an institute for geography active in Exploratory Spatial Data Analysis since the 80ies (University
of Leeds, England),
a university having a leading role in the dissemination of statistical data in the UK (Manchester
Metropolitan University/MIMAS), and
two institutes active in seismic data research (IITP, Russian Academy of Sciences, Moscow;
GeoForschungszentrum Potsdam, Germany).
Thus partners come from four different EU countries – England, Holland, Italy, and Germany – and
from one NIS country (Russia), forming a truly European consortium.
Involvement of the Russian Academy of Sciences promotes scientific exchange with NIS countries.
Europe gets added value by getting access to the work of a group that has more than 20 years of
expertise in this field, and has developed some very mature technologies. The group combines
technological skill with expertise in their application area earthquake prediction and hazard
management in a unique way. Earthquake prediction and hazard management is an area that has an
enormous an obvious potential impact on quality of life and health. It could help to prevent massive
financial losses. It is a vital interest of the EU to get access to technologies for an improved hazard
management. Independently of the proposal, GMD and GFZ have made appointments to invite
members of IITP as guest researchers. This will promote scientific interchange with NIS countries and
will make an intense collaboration possible.
In recent years, the partners have individually developed many of the technological and
methodological pieces needed to build an integrated spatial mining system. A project that wanted to
build a spatial mining system from scratch would need dozens of person years for developing the tools
which are already available as the starting point for the SPIN! consortium. The existence of this body
of technology is a precondition for the iterative approach to software development chosen by the
consortium, since user input can be gathered right from the start of the project. This in turn reduces
the risk of failure. A concentration of expertise and existing tools such as in the SPIN! consortium can
not be found within a single European country. Only by joining efforts on a European scale the
critical mass needed to develop a spatial mining system ready for real-world applications can be
achieved. This will offer perspectives for the dissemination and exploitation of the results that were
impossible on a national level.
One such area for further exploitation is European biodiversity research. The Organisation for
Economic Co-operation and Development (OECD) working group for Biodiversity Informatics has
recommended the installation of a Global Biodiversity Information Facility (GBIF). Key technologies
41
SPIN!, IST-99-10536, 15.06.1999
42
needs that have been identified include some kind of integrated DMS and GIS. Here is an exciting
opportunity to develop a spatial mining solution as a coordinated European effort which can be linked
to develop a European perspective within GBIF, which is currently dominated by research focussed in
the USA and Australia. Biological informatics is perceived as a key technology of the next century.
From the strategic perspective of GMD knowledge discovery team, biodiversity informatics will be a
major application area in which the techniques developed in this project can be put to very good use,
supporting several European conventions, especially the Convention on Biological Diversity (CBD).
The project supports EU policies directed towards SME’s. Both Dialogis and PGS are SME’s active
in the Data Mining and GIS market. These companies will be responsible for the exploitation of the
SPIN! technology. The Data Mining platform Kepler and the GIS tool Descartes are the technological
basis on which the SPIN! system will be built. Both systems have been co-developed and
commercially distributed by Dialogis, and its market position will be significantly increased by the
new technology. PGS also plans to incorporate the technology in its product line (see C8 for details).
C4. Contribution to Community social objectives
New economic prospects
The SPIN! project has the goal of combining state of the art research with commercial exploitation.
Both goals have been kept firmly in mind in the design of the workpackages. The commercial
potential of this new technology has been specifically addressed in workpackages 10 and 11. In WP10
a business concept for a web-based information brokering service is developed by the industrial
partners Dialogis GmbH and PGS.
A key goal of this business concept is to support public agencies etc. in the cost-effective
dissemination of data of public interest. Funding agencies require more and more that such services
are commercialized in the long-run (at least for-non scientific use). The added value which the SPIN!
technology offers can be a key for commercialization.
Sustainable Development
The SPIN! project can have a major impact on promoting the Local Agenda 21. The Local Agenda 21
is the process that aims to involve local people and communities in the design of a way of life that can
be sustained and thus protect the quality of life for future generations. It originates in the Rioconference in 1992 which led to the agreement of an Agenda 21 document detailing a series of
strategies for action world-wide. The Local Agenda 21 is a highly democratic, consensus-building and
empowering process. This can only be achieved with the help of leading-edge information technology.
The SPIN! project, with its focus on data of public interest, provides such a technology. Specifically,
it helps statistical offices in the user-friendly dissemination of census data, where customers get
access to powerful yet easy to use analysis tools. The Descartes system is already used by statistical
offices and by urban planners in several European countries, and they are also potential users of the
new technology.
Quality of life & health
As the German national research centre for Earth sciences, the GFZ carries out research and
development projects on a very broad scale of fields which are of direct relevance for the fulfilment of
the principle of a sustainable development as enshrined in the Treaty of Amsterdam. Namely the Fifth
European Community environment programme: "Towards sustainability" (see European Parliament
and Council Decision 2179/98/EC) is asking for appropriate measures to improve health and safety in
particular in relation to the management of natural and industrial hazards, nuclear safety and radiation
protection as well as the improvement in energy efficiency, a reduction in the consumption of fossil
fuels and the promotion of renewable energy sources (see e.g. Communication from the Commission
COM(1998) 571 final). Co-operative research executed on a long-term basis at a very high level
allocates substantial contributions to the construction of the legislative framework aimed at combating
42
SPIN!, IST-99-10536, 15.06.1999
43
pollution and protecting the environment like documented Communication from the Commission
COM(97) 592 final. Through the creation of the European Macroseismic Scale 1998 (EMS-98) and
the resulting as well as associated regulations and standards like EUROCODE-8 with all its
accompanying National Application Documents, the GFZ has set an important cornerstone for the
safety of the local communities of the Europe of tomorrow. The application of the SPIN! system to
hazard management will help GFZ to contribute to those policies by improving the quality of data
analysis and by providing means for a timely distribution of data relevant for hazard management.
Protection of environment
PGS will apply the SPIN! system to the problem of noise-level zoning, which is expected to become a
major issue in the next two to three years in Holland, because of anticipated new legislation (WP11).
This will demonstrate the potential of the new technology for the protection of environment.
C5. Project management
Work Organization
The work is organized in a set of well identified work-packages. For each work-package a partner
acting as coordinator is identified. The work-package coordinator is responsible for managing the
execution of tasks associated with his work-package. In turn, for each single task an operative
partner is identified The work-package coordinators have been chosen among the partners according
to their past experience and present role in the specific technical fields.
Work-package coordinators are responsible for the performance of their associated operative partners
and will have discretion to manage the resources allocated to them. Furthermore, work-package
coordinators will directly respond to the technical committee (see below). See the work-plan for a
detailed description of the work organization.
Team Organization
The team organization directly reflects the division of the work into work-packages. For each workpackage a working team will be constituted by the several operative-partners and the respective
coordinator.
In addition to these work-package teams, a Technical Committee will be appointed. Its mission will
be to manage the project developments in terms of technical content. It will be constituted by a
member of every SPIN! partner.
Overall Management
The project management of SPIN! is seen itself as a work-package and as such it will have an
appointed coordinator. GMD will be coordinator of this work-package, thus acting as the overall
project-manager of this proposal.
The overall project-manager will act as the contact point between the consortium and the Commission
project Officer, and is responsible for the overall execution and performance of the project.
The management work-package of SPIN! is divided into the following tasks;


Overall management
Technical management
Overall Management. The overall management objective is to ensure that the various phases of the
project are properly coordinated in order to maximize the project success.
43
SPIN!, IST-99-10536, 15.06.1999
44
Two complementary steering activities are foreseen:


Development of the project work plan. From this activity should result a formally approved workplan document used to manage and control the project execution. The work-plan will serve as a
basis for follow-up. Nevertheless, it should be expected to be complemented and modified over
time as more detailed information becomes available.
Project monitoring and review. Project performance must be measured on a regular basis to
identify deviations from the plan.
In addition to the “continuous” monitoring done in an informal way – mainly by e-mail - status
review meetings are scheduled on a quarterly basis. These meetings will be attended by all the workpackage coordinators. The result of these meetings should be a detailed assessment about the work
progress of the consortium. All project plan change requests should be presented and discussed in this
forum. The result of these meetings will be updated work-plan documents. At each status review
meeting, the work-package coordinators should present a progress report which addresses the
following issues:


Current progress of the work-package in general and for each task in particular,
Unresolved issues and required actions to solve them.
All administrative procedures between the consortium and the Commission belong to the
responsibility of the overall project manager. They include:



Distribution of EC funding to the participating partners.
Preparation of documentation in view of the required periodic reports.
Managing all the procedures related with the potential commercial exploitation of the results.
GMD will co-ordinate the project and will be the point of contact to the Commission for all
administrative and financial business.
Technical Management. Technical management will be carried out by the Technical Committee. Its
role is to address specific technical issues, namely


To take critical technical decisions which affect the project as a whole, such as general system
architecture, integration requirements for the several software components, common development
tools, and so on.
To define which quality standards are relevant to the project and determining how to satisfy them.
It will as well be responsible for monitoring specific project results, determining whether they
comply with the relevant quality standards, and identifying ways to eliminate causes of
unsatisfactory quality performance.
The Technical Committee will meet whenever considered necessary, with a minimum periodicity of
four months. Ideally, the technical meetings will be merged with the status review meetings. In the
course of the first three months of the project, the Technical Committee should document the
organization of the technical work in written form by providing a Project Handbook that also includes
a Quality Management Plan.
IPR. The SPIN! Consortium has agreed to handle IPR related matters along the following lines:
1. Each partner will keep the rights for the software and methods he brings into the project
2. Each partner will get the rights for the commercial exploitation of his main deliverables
3. If a partner contributes a small part to another partner's main deliverable, this second
partner can exploit the contribution for free
4. If a partner wants to exploit the main deliverable of another partner, a special agreement is needed
44
SPIN!, IST-99-10536, 15.06.1999
45
Several such agreements are foreseen and are vital for a joint exploitation, the details however have to
be settled in the first three months after the project starts. Dialogis and PGS already have made an
successful joint exploitation agreement in the CommonGIS project.
C6. Description of the consortium
GMD will be the coordinator of the project. It will provide the technology on which the project is
based and will bring in its expertise in Data Mining and GIS. Within several EU projects, original
Data Mining methods have been developed. The GIS tool Descartes is used in several web based
applications in the area of nature protection, urban decision making and census data by several
countries. There exists extensive experience in design and implementation of Data Mining tools and
client/server systems. During the last years, the data mining platform Kepler and the GIS-tool
Descartes have been developed.
The University of Bari (Machine Learning) will be active in the evaluation, adaptation and
development of machine learning algorithms to the task of spatial analysis. More specifically, they
will be in charge of the specification of quality measures for spatial association rules and the
adaptation/implementation and test of the algorithms for the discovery of such rules. A further
contribution from the University of Bari will be the specification and implementation of algorithms
for the automated feature extraction from maps.
The University of Leeds (Geography) has theoretical and practical expertise about web based
mapping, developing and applying spatial analysis and modelling tools to geographical data, and will
be responsible for the spatial pattern analysis module. In the last year, an internet enabled version of
the Geographical Analysis Machine, whose origins go back to the 1980s, has been developed to allow
exploratory spatial data analysis to be carried out over the internet. This work has also developed
other more advanced spatial analysis tools, which can take attributes relating to cases into account.
The group also has experience in development of web based mapping tools based on a Java toolkit,
GeoTools.
The Institute for Information Transmission Problems, Russian Academy of Sciences (IITP
RAS) will bring in their expertise in seismic data analysis, spatial statistics, and decision support. It
will join the project as a full partner, yet without funding. The group of Valeri Gitis is working in the
field of geoinformation technology for more than 20 years. Members of group have fundamental
knowledge and experience both in modern information technology and in seismology. The group has
got original results on pattern recognition and artificial intelligence. A part of these results got an
award from Hewlett Packard in Competition of Works on Pattern Recognition in 1992. The Group
developed several original geoinformation technologies for natural hazard assessment and
environmental zonation. The basic direction of the group activity nowadays is devoted to developing
intelligent network geoinformation technologies and systems.
Dialogis GmbH, a SME located in Sankt Augustin, will be responsible for technology integration and
exploitation. Dialogis commercially distributes Descartes and Kepler, and is active in the areas of
Data Mining and GIS consulting, and develops Data Mining solutions for database marketing. It has
strong experiences in software design and development, as well as Data Mining and GIS consulting.
45
SPIN!, IST-99-10536, 15.06.1999
46
PGS, Amsterdam PGS will make its Lava/Magma products available to the project. It will adapt and
extend the interfaces to Lava/Magma so that it can be integrated with the knowledge acquisition tools
and the knowledge based visualization environment. PGS will develop software modules for data
characterization tools and the data visualization environment. PGS will be responsible, together with
Dialogis, for the packaging of the developed software into commercially viable components, that can
be integrated into its Lava/Magma product line.
GeoForschungszentrum Potsdam, Physik des Erdkörpers und Desasterforschung. The Section
Earthquakes and Volcanism of GFZ is a research group (10 Scientists, 8 PhD students, 10 technicians
and engineers, several students) with research focus and experience on origins of hazards,
development and installation of monitoring networks and early warning systems, and training experts
in seismic hazard assessment, in particular in developing countries.
Manchester Metropolitan University is the largest non-federal university in the UK. Within the
Department of Environmental and Geographical Sciences is the GIS and Remote Sensing Research
Group. The main areas of research are in Internet mapping, the access to spatial databases over the
Internet, web based educational technologies, satellite remote sensing, digital image processing and
environmental modelling.
C7. Description of the participants
GMD - German National Research Center for Information Technology
Description of the partner
GMD is Germany's national research center for information technology. It is a non-profit, limited
liability private company (GmbH) whose shareholders are the Federal Republic of Germany and the
Federal States of Hesse, Berlin and North Rhine-Westphalia. GMD has a staff of about 1300. The
annual budget is approximately Euro 95 million, almost 30% of which come from externally funded
R&D projects. GMD's main research areas are communication and co-operation, intelligent
multimedia systems, system design technology and scientific computing. Research and development
activities are application-oriented, and most projects co-operate with partners from industry and
science. The Institute for Autonomous intelligent Systems is one of the eight institutes of GMD and
has a staff of about 150 people.
The Knowledge Discovery Team (KD) is a research group (12 scientists, 2 PhD students, several
students) located in the field of artificial intelligence. A recent survey established that GMD as a
whole is the leading German research institution in this area in terms of publications and citations.
Professional experience and expertise in this group include Data Mining, Inductive Logic
Programming, Bayesian Statistics, Neural Networks, Geographic information systems, and databases.
The Knowledge Discovery team has extensive experiences with EU projects: it participates in several
EU projects (currently ILP2 (Inductive Logic programming), KESO (Knowledge Extraction for
Statistical Offices), MLNet 2 (Machine Learning Network of Excellence 2) coordinates one
(CommonGIS (Common Access to Geographically Referenced Data)), and has participated in several
others in the past.
The team has a lot of experience in design and implementation of commercial quality software
systems. During the last years, it developed the data mining platform Kepler and the GIS-tool
Descartes. Kepler and Descartes are also used by scientific partners in USA, Russia, Netherlands, UK,
Portugal, and Germany.
46
SPIN!, IST-99-10536, 15.06.1999
47
Key personnel
Dr. Willi Klösgen has developed methods and tools for partially automating data exploration at GMD
since the mid eighties. He has led various projects for a wide range of data mining applications,
including market research, tax and transfer legislation, medical research, production control. Willi
Klösgen has contributed to the main KDD (Knowledge Discovery in Databases) workshops and
conferences and has organized the international conferences on New Techniques and Technologies for
Statistics for Eurostat. He is the chief editor of the Handbook of Data Mining and Knowledge
Discovery, which will appear later this year at Oxford University Press. He is a member of the
editorial boards of KDD and related journals. He has studied mathematics, statistics, and physics at
several German universities and received his Ph.D. in 1972 from Bonn University.
Dr. Gerhard Paass has designed statistical and knowledge-based algorithms and tools for extracting
structure from data at GMD since the mid eighties. Among others he has worked on probabilistic
Bayes networks, neural networks, bootstrap methods, Bayesian Markov Chain Monte Carlo
procedures and Bayesian decision theory. He has led a number of projects aiming at the elicitation of
the information content and uncertainty of statistical procedures with applications in database
security, vague reasoning, adaptive sampling and exploration as well as credit scoring of enterprises.
He is adjunct Prof. of Neurocomputing, at the Queensland University of Technology, Brisbane and on
the editorial board of the International Journal of Uncertainty, Fuzziness and Knowledge-Based
Systems. He has studied mathematics, statistics, computer science, and economy and received his
Ph.D. in 1983 from Bonn University.
Dr. Gennady Andrienko and Dr. Natalia Andrienko received a Ph.D. equivalent in Computer
Science from Moscow State University in 1992 and 1993, respectively. They worked on knowledgebased systems at the Mathematics Institute of Moldavian Academy of Sciences (Kishinev, Moldova),
then at the Institute on Mathematical Problems of Biology of Russian Academy of Science (Pushchino
Research Center, Russia). Dr. Gennady Andrienko also worked as an assistant professor at Pushchino
State University, conducting a course on GIS and supervising students for their master degree. In 1995
and 1996, they visited GMD as guest researchers. Since July 1997 they have a research position at
GMD. Since November 1998 they play a key roles in EU-funded CommonGIS project. They are
authors of many papers that have been published in international journals and conference
proceedings. Their research interests and experiences are interactive computer graphics, automated
knowledge-based cartographic visualization, and visual geo-data exploration.
Dr. Michael May is the team leader of GMD Knowledge Discovery Team and coordinates the
research efforts at the intersection of Data Mining and Knowledge Discovery, visual knowledge
exploration, and databases. He studied Philosophy of Science and Computer Science and holds a PhD
degree in Philosophy of Science, where he worked on a computer simulation of causal reasoning. He
has professional experience as a software consultant and database developer for major companies,
among them Deutsche Shell AG and SAP AG. His research interest are the formalization of causal
reasoning processes and the application of Data Mining and Data Warehousing especially to the
analysis of biological data.
Dr. Hans Voss received his Diploma in Computer Science from Bonn University in 1981, and his
Research Doctorate in Computer Science from University of Kaiserslautern in 1986. Since 1986 he is
working at GMD, Sankt Augustin, Germany. He was head of projects on real-time expert systems, on
the product development of the hybrid expert system tool Babylon, and on knowledge engineering in
the context of diagnostic expert systems. From 1992 until 1998 he was head of the research area on
Cooperative Design (some twenty researchers). He was coordinator of the EU-funded project
GeoMed-F, and he is currently coordinator of the EU-funded project CommonGIS. He is particularly
interested in the integration of various technologies in order to support cooperative/competitive
spatio-temporal planning and decision-making.
47
SPIN!, IST-99-10536, 15.06.1999
48
Department of Informatics of the University of Bari
Description of the partner
The “Università degli Studi” of Bari is one of the largest state universities of Italy both in the number
of enrolled students (above sixty-thousand) and in the number of curricula and specialisation courses
available (above forty).
The Department of Informatics was founded in 1991 in order to guarantee administrative autonomy to
the former Institute of Information Science created in 1973. It recruits 200-250 undergraduates
annually for the Diploma (three-years) and Laurea (five-years) Degree in Informatics, and currently
has an academic board of 32 full/associate/assistant professors and administrative/technical staff of 15
people. The research strengths of the Department fall into four main categories: Machine Learning,
Image Processing and Pattern Recognition, Software Engineering, Human-Computer Interaction.
The Machine Learning Group, which will be the main research group of the Department involved in
the SPIN! project, comprises 3 permanent members, 2 Ph.D. students and approximately 3 external
collaborators. The Group has access the facilities of the research laboratory LACAM (Laboratorio di
Acquisizione della Conoscenza ed Apprendimento nelle Macchine). Other members of LACAM
laboratory that collaborate with the Machine Learning Group have competence in human-computer
interaction and automated interpretation of topographic/cadastral maps.
The Machine Learning Group has been active in the area of knowledge acquisition and machine
learning since 1986. Its members have developed several machine learning systems, both supervised
and unsupervised ones. They have worked on real-world applications of machine learning tools and
techniques, such as intelligent document processing, digital libraries and geographic information
systems. The Group has been involved in several national and European research projects, which
include: Esprit Project N.5203 INTREPID (INnovative Techniques for REcognition and ProcessIng of
Documents), 1991-93; ESPRIT project SODAS 20821 (Symbolic Official Data Analysis System).
ESPRIT project CONCERTO 29159 (Conceptual Indexing, Querying and Retrieval of Digital
Documents) 1998-2000. National project Intelligent Agents (funded by the Italian Ministry for
Universities and Scientific/Technological Research), 1998-1999. In this project our unit has to
develop a learning server available on-line for intelligent agent-based applications.
The Machine Learning Group is a node of MLNet, the European Network of Excellence on
Machine Learning (Esprit Projects 7115 and 29288), and of Compunet European Network of
Excellence on Computational Logic (Esprit Project 7230). Its members have also participated to the
project LHM (Human and Machine Learning) funded by the European Science Foundation.
Key personnel
Floriana Esposito is full professor of Computer Science and responsible of the laboratory LACAM.
Since 1997 she has been director of the Interdepartmental Centre for Logic and Applications of the
University of Bari, and Dean of the Faculty of Informatics of the University of Bari. Currently, she
lectures on “Algorithms and Data Structures” and “Knowledge Engineering and Expert Systems”.
Currently, her main research interests are in similarity based learning, multistrategy learning,
incremental learning and discovery of causal models. She is author of more than 100 papers published
in refereed journals and conference proceedings. She is in the directorial board of the Italian
Association for Artificial Intelligence (AI*IA) and is currently responsible of the national Machine
Learning Group. She has been in the program committees of many international conferences
(ECML’94-98-2000, AI*IA’93-95-97-99, ECAI’96, ICDAR’97-99, ICML’99); she organised the 13th
Int. Conf. on Machine Learning (ICML’96), and co-chaired the 4th Int. Workshop on Multistrategy
Learning MSL’98.
Donato Malerba is an associate professor at the University of Bari, Department of Informatics,
where he lectures on “Databases and Knowledge-Base Systems” and “Computer Programming.” For
48
SPIN!, IST-99-10536, 15.06.1999
49
the past decade, he has been active in machine learning and its applications to intelligent document
processing, knowledge discovery in databases, map interpretation, and intelligent interfaces. He has
published several papers in refereed conferences and journals and received the best paper award of
the Symposium on “Knowledge Discovery in Databases” - 13th European Meeting on Cybernetics and
Systems Research. He has served in the program committee of the Int. Conf. on Machine Learning
(ICML’96, ICML’99), of the AI*IA workshop on Machine Learning and Natural Language
Processing (Turin, December 1997), of the ICML’99 Workshop on Machine Learning in Text Data
Analysis, and of the 2nd Int. Conf. on Innovation through Electronic Commerce (IeC’99). He
acted/acts as key personnel in all ESPRIT projects in which the Machine Learning Group of Bari has
been involved into.
Antonietta Lanza is assistant professor at the University of Bari, Department of Informatics, where
she teaches courses for the computers science curriculum. She received her first appointment with the
University of Bari in July 1984. From 1978 to 1981 she was fellowship affiliated with the C.S.A.T.A.
(Centro Studi di Automazione e Tecnologie Avanzate) in Bari and the Institute of Physics of the
University of Bari. Initially her research interests were in student modeling and computer-basedinstruction with applications of many technologies (CAI, CBT, Hypertext, AI, ITS). At present, her
main research activity is in man-machine interaction, machine learning and knowledge acquisition;
applications include pre-processing, feature-extraction and interpretation of topographic charts and
cadastral maps. She has published several papers on national and international journals and
conferences on the above topics.
School of Geography at the University of Leeds
Description of the partner
The Centre for Computational Geography is the largest research group within the School of
Geography at the University of Leeds consisting of 14 researchers and 8 postgraduate research
students. The School of Geography was rated five (on a scale of 1-5*) in the last UK university
research assessment exercise. Research in the centre involves problems from both human and physical
geography. The group specialises in the development and application of exploratory spatial data
analysis tools and other artificial intelligence techniques in geography. This work has included the
application of fuzzy logic to areas such as flood forecasting, new geodemographics systems. Other
projects the group have worked on include the design of census output areas for both the UK and
Italy, flexible output systems that assure the confidentiality of the data. The group has recently
finished a project looking at predicting the interaction of human systems and land use degradation
processes in the Mediterranean basin (MEDALUS).
The centre is also the premier high performance computing group involved in social science research
in the UK. The group have been active in developing a culture of use of high performance computers
in the social sciences and especially in geography by developing key parallel applications for others to
use.
Key personnel
Stan Openshaw is the professor of Human Geography and a fellow of the Royal Geographical
Society and of the Royal Statistical Society. He is the director of the Centre for Computational
Geography. He has been researching intelligent hyperspace search methods and AI techniques for a
significant period of his career. He has written a book on AI in geography and developed several
generations of search machines for cluster location in multi-dimensional space. His other research
interests are in the application of parallel computing techniques to the development of computational
geography and human systems modelling.
He gained his PhD from the University of Newcastle, UK in 1974 and was employed at the University
of Newcastle until 1992 when he moved to the School of Geography at Leeds University to become
professor of human geography.
49
SPIN!, IST-99-10536, 15.06.1999
50
Ian Turton is a senior departmental research fellow in the Centre for Computational Geography.
After completing his BSc in Geophysics and Planetary Physics at the University of Newcastle in 1988
he moved to the University of Edinburgh where he completed a PhD in Geophysics in 1992. He has
been a researcher in the Centre for Computational Geography for the past seven years. During this
time he has worked on a variety of projects utilising artificial intelligence methods in geographical
applications and in the application of parallel programming methods to hard problems in geography.
At the present time he is working on the development of smart pattern detection methods for the
analysis of rare diseases. He is the co-author, with Prof. Openshaw, of a textbook on parallel
programming applications for geography. He and Prof. Openshaw teach a masters level module in
Java for Geographers and are writing a book on this topic. Ian is also working with a postgraduate
student in the CCG on the development of a portable mapping toolkit written in Java.
Linda See is a research fellow in the Centre for Computational Geography. She obtained a degree in
physical geography and environmental management at the University of Toronto in 1988, she then
completed an MSc at McMasters University in Climatology in 1990. Linda then moved to become an
Associate Professional Officer at the Max Plank Insitut für Aeronomie in Germany. In 1991 she
became the technical co-ordinator in the Global Information and Early Warning System of the Food
and Agriculture Organisation (FAO) of the UN in Rome. In 1995 she began a PhD in Fuzzy Logic
Applications in Geography at the School of Geography, Leeds. Since completing this in 1998 she has
worked on the application of soft computing methods, i.e., fuzzy logic, neural networks and genetic
algorithms, to spatial problems. She is currently involved in the development of better
geodemographic systems using these technologies.
Andy Turner is a research fellow in the Centre for Computational Geography. In 1996 he completed
a degree in Maths, Statistics and Geography from the University of Leeds. Following this he gained a
MA in GIS from the School of Geography at Leeds. Since 1997 he has been employed in the Centre
for Computational Geography on a variety of projects. Recently Andy has compared the performance
of two major commercial data mining packages with the capabilities of in-house spatial analysis tools
for geodemographic targeting. He also has extensive experience with neuro-fuzzy methods, spatial
pattern analysis and geographical information systems (GIS).
The Institute for Information Transmission Problems, Russian Academy of
Sciences (IITP RAS)
Description of the partner
IITP RAS was founded in 1961. At present basic direction of researches are the information theory
and applied mathematics, computer and communication sciences in technique, management, nature,
language and living systems. Among the most important topics researched by Institute are the
problems of the theory linear analysis of complex systems, image processing, pattern recognition,
intelligent geoinformation technologies, error coding correction. There is a stable scientific body of
highly trained and young specialists, composed of mathematicians, physicists, biologists, linguists,
computer scientists and engineers – a total of 320 collaborators. Now there are 9 members and
corresponding members of Russian Academy of Sciences, 205 full professors and doctor of sciences.
IITP RAS will bring in their expertise in seismic data analysis, spatial statistics, and decision support.
It will join the project as a partner, yet without funding. The group of Valeri Gitis is working in the
field of geoinformation technology for more than 20 years. Members of group have fundamental
knowledge and experience both in modern information technology and in seismology. The group has
got original results on pattern recognition and artificial intelligence. A part of these results got an
award from Hewlett Packard in Competition of Works on Pattern Recognition in 1992. The Group
developed several original geoinformation technologies for natural hazard assessment and
50
SPIN!, IST-99-10536, 15.06.1999
51
environmental zonation. The basic direction of the group activity nowadays is devoted to developing
intelligent network geoinformation technologies and systems.
Key personnel
Valeri Gitis is Ph. doctor in Technical Cybernetics and Information and is the head of the department
on Geoinformation Technologies and Systems. His fields of research are geoinformation technology,
artificial intelligence, seismic hazard and risk assessment,. earthquake prediction. He holds grants by
the Russian Basic Research Foundation (N 97-07-90326) ”Information technology for space-time
forecasting in Earth sciences” – leader of Project, INCO-COPERNICUS (IC 15 CT97 0200)
"Assessment of Seismic Potential in European Large Earthquake Areas (ASPELEA)" – leader of
Russian part of Project, Russian Basic Research Foundation (N 99-07-90326) ”Word Data Center
Online” – researcher.
Arkadi Vainchtok, education: Moscow Electrotechnical Institute of Telecommunication, 1970. He is
a Member of the Artificial Intelligence Association of Russia. His fields of research are pattern
recognition, expert systems, speech analysis and recognition, geoinformation technology and
instrumental environment.
Boris Osher holds a Ph.D in Physics and Mathematics: "Uncertainties in earthquake maximum
magnitude estimation" at the Institute of Physics of the Earth, Russian Acad. Sciences, Moscow
(1997). His main scientific interests are Geographic information systems; Estimation of seismic
hazard and risk; Time and space variations and interconnection of geophysical features; Earthquake
records processing;
Dialogis Software & Services GmbH, St. Augustin, Germany
Description of the partner
Dialogis Software & Services GmbH is a spin-off company of GMD, the German National Research
Center for Information Technology. Dialogis was founded in March 1997. Dialogis turns, in close
collaboration with GMD, research prototypes into marketable products. The currently evolving
product line of Dialogis is composed of three packages that offer support for different facets of
decision making processes:

The data mining system Kepler offers facilities to extract knowledge from collections of data,
hence providing the user with information for an informed decision;

The geographic visualization tool dialoGIS (Descartes) displays geographically based data in an
easily understood manner, hence making otherwise often incomprehensible data intuitive;

The Zeno system for mediated decision making processes helps groups of users to effectively
employ each participants knowledge to arrive at a decision that optimally reflects the entire
groups knowledge and opinions.
Dialogis is also a partner in the EU ESPRIT project CommonGIS number 28983.
Key personnel
Dietrich Wettschereck holds a Ph.D. in Computer Science (machine learning) from Oregon State
University, USA. From 1994 to 1997 he was a post-doctoral researcher at GMD where he pursued
research topics related to data mining and machine learning, participated in several European research
projects, and participated in the development of the data mining system Kepler. He is co-founder and
technical director of Dialogis Software & Services GmbH. His responsibilities at Dialogis include
supervision of and participation in research and development projects as well as data mining
consultancy
Andrea Lüthje works as a Data Mining Consultant and product manager for Kepler at Dialogis. Her
responsibilities at Dialogis include coordination of user requirements and software development as
51
SPIN!, IST-99-10536, 15.06.1999
52
well as providing data mining consultancy and end-user training. She has a Diploma in the Social
Sciences.
Professional GEO Systems B.V. (PGS), Amsterdam
Description of the partner
Professional GEO Systems B.V. was founded in March 1996, by a group of researchers from the TNO
(a national Dutch research organization), and the University of Amsterdam. PGS has developed the
first commercial pure Java map viewing environment (Lava/Magma). This was a further development
of GEO++ (developed by TNO and marketed by PGS).
Currently PGS offers the following products and services:

The Java based map/viewing environment Lava/Magma (used for example in a decision support
system build by the United States Geological Survey (http://dss1.er.usgs.gov/)).

The ‘Vastgoed Informatie Web’ (Real-Estate Information Web). A datawarehouse based system
for integrated information management for all real-estate related information in a municipality
(including cadastral, environmental, planning, zoning and management infomation). The map
viewing environment is based on Lava/Magma, and the system is completely Internet based.

Services for implementing the VIW at municipalities in Holland.

General consulting services related to the design and implementation of geographical information
systems.
Key personnel
Frank Tuijnman holds a Ph.D. in Computer Science from the University of Amsterdam USA. As a
lecturer at the University of Amsterdam he pursued research topics related AI, robotics and
distributed database management systems, and has published numerous articles on these topics. He
has participated in several European research projects. He is co-founder and a director of Professional
GEO Systems B.V.
GeoForschungsZentrum, Potsdam, Germany
Description of the partner
The GeoForschungsZentrum (GFZ) is the German national research centre for Earth sciences,
founded on January 1st, 1992 on the Telegrafenberg in Potsdam. Financing is provided by the Federal
Ministry of Education and Research. GFZ has a staff of about 600, out of which are 300 scientists.
The annual budget is approximately Euro 35 million, about 30% are externally funded.
As the first of its kind world-wide, the GFZ combines all solid earth science fields including geodesy,
geology, geophysics, mineralogy and geochemistry, in a multidisciplinary research centre. 22 sections
are organised in five divisions according to the main topics of the GFZ: Kinematics and Dynamics of
the Earth, Solid Earth Physics and Disaster Research, Structure and Evolution of the Lithosphere,
Material Properties and Transport Processes and Rock Mechanics and Management of Drilling
Projects. Research is accomplished by the use of a broad spectrum of methods and techniques, such as
satellite geodesy and remote sensing, geophysical deep sounding, scientific drilling, experiments
under in-situ conditions and modelling of geo-processes.
The GFZ maintains various instrument pools for field research and global measurement campaigns, a
team of engineers for the development of geoscientific instruments and a group of specialists for the
Task Force Earthquake. An underlying principle is to combine the geoscientific know-how of
universities and other research centres in national and international joint projects.
The Section Earthquakes and Volcanism is a research group (15 Scientists, 8 PhD students, 10
technicians and engineers, several students) with research focus and experience on origins of hazards,
52
SPIN!, IST-99-10536, 15.06.1999
53
development and installation of monitoring networks and early warning systems, and training experts
in seismic hazard assessment, in particular in developing countries. The group has experience in with
EU projects (PRENLAB 1,2; BBMT 1,2 and EPOC-CT91/0043).
Key Personnel
Prof. Dr. Jochen Zschau, since 1992 director of GFZ division “Disaster Research” and since 1996
director of GFZ division “Solid Earth Physics and Disaster Research” and head of the section
“Earthquakes and Volcanism”. He holds a Ph.D. since 1974 from the Kiel University, and became a
professor of geophysics in 1980 at the Kiel University. His field of research is general and theoretical
geophysics, potential theory, regional and global dynamics, rheology, earthquake prediction and
volcano monitoring. He is member of the European Seismol. Commission – Subcomm. on Earthquake
Prediction (Chairman, since 1996), European Advisory Evaluation Committee for Earthquake
Prediction (Council of Europe, Vice president, since 1994), Scientific Advisory Board of the German
Committee for the IDNDR (Member, since 1994), IASPAI Subcommission on Earthquake Prediction
(Member, since 1993) and German Task Force Committee for Earthquakes (Chairman, since 1993).
Heiko Woith, holds a Ph.D. since 1996 from the University of Kiel in geology with research in
hydrology, nuclear physics and earthquake prediction. He is the responsible scientist and manager of
the project READINESS (REAltime Data Information Network in Earth ScienceS) which is related to
large scale fault zone interaction in the Eastern Mediterranean.
Claus Milkereit, holds a Ph.D. since 1998 from the University of Potsdam in geophysics with
research in theoretical geophysics, time series analysis, seismology and earthquake prediction. His
main task is monitoring of the seismic activity at the western end of the North Anatolian Fault near
Istanbul.
Malte Westerhaus, holds a Ph.D. since 1996 from the University of Kiel in geophysics with research
in tilt and well level tides along active faults, volcano monitoring and earthquake prediction. He is the
responsible scientist and manager of the ground deformation within the project MERAPI and the
deformation network at the North Anatolian Fault in Turkey.
Anita Pfaff, holds a Diploma in Geography and is Ph.D. student on presentation of geological and
geophysical mapping and monitoring data.
Manchester Metropolitan University/MIMAS
Description of the partner
Manchester Metropolitan University is the largest non-federal university in the UK. Within the
Department of Environmental and Geographical Sciences is the GIS and Remote Sensing Research
Group. The main areas of research are in Internet mapping, the access to spatial databases over the
Internet, web based educational technologies, satellite remote sensing, digital image processing and
environmental modelling. The group also hosts the UNIGIS, which is a world-wide consortium of
educational establishments providing a common programme of distance education in GIS. Currently
this comprises over twenty institutions in sixteen countries and operates through a web-based system
of education management and delivery.
The group has led several research projects in the field of GIS and World Wide Web technologies.
The main ones are the KINDS projects for the access to large spatial data sets over the Internet
(http://midas.ac.uk/kinds). A summary of the functionality of the KINDS system is given in Table 1.
The KINDS Projects are undertaken in collaboration with the University of Salford IT Institute and
Manchester Computing which has MIMAS (formerly MIDAS) the national academic data provider
which hosts and supports the use of national spatial data sets including the census and Bartholomew’s
and Ordnance Survey Map data. A major collaborator in the KINDS Project is the Office of National
Statistics that produces and distributes the UK Census.
53
SPIN!, IST-99-10536, 15.06.1999
54
Key Personnel
Dr. Jim Petch is the leader of the GIS and Remote Sensing Group. He is coordinating several projects
in the Application of Mathematical and Statistical Models of Complex Systems to the Analysis of
Spatial Strucutre of Remotely Sensed Images, the Effective Sustainable Use of Network Accessible
Datasets, and Catchment modelling of Hydrological Parameters.
C8. Economic development and scientific and technological
prospects
Public access to the immense volume of existing geo-data and their exploitation is of significant value
for the development of an open and democratic ”information society” and a true global market. The
widespread use of geo-data and GIS will promote general public awareness and further social
cohesion. Publicly available geo-data is, however, of little use unless people can easily access and
easily exploit it. Here, the SPIN! system will advance the state of the art, and we expect that the
exploitation of the results will be done throughout the following axes: Software Components;
Demonstrators; Application framework; additional user groups; other dissemination activities.
Software Components. The software architecture of SPIN! is based on a set of reusable and selfcontained components. The great advantage of generic approach to Spatial Mining in SPIN! is that the
components are independent of the specific-application domain and can be used as building blocks for
developing particular applications. Robustness, scalability, platform-independence and timeliness are
the foreseen benefits when following this approach for developing new applications. Two results of
the project will be of particular commercial value:
(i) Integrated software system
(ii) Application to web based brokering
The target market for the first products will be end users themselves, software developers aiming to
incorporate in their GIS applications the ”intelligence” of an automatic data analysis mechanism, and
GIS companies who would like to add this functionality to their solutions. The target market for the
second product are information providers in the public sector, but also commercial companies, e.g.
active in geomarketing.
Dialogis intends to commercialize the results of this project. The Descartes environment already
enables government agencies or statistical analysts to make their geographic information available
over the Internet. Kepler allows users in the industry to analyze their data with Data Mining methods.
The markets for data mining tools and services as well as for geographical information systems are
growing at a rate substantially higher than that for the entire sector of information technology. The
results of the proposed project will enable Dialogis to market a greatly improved and highly
competitive data mining / GIS product. We strongly believe that the resulting product will:
 enable Dialogis to establish the resulting system world wide as a highly competitive European
data mining tool / GIS tool,
 make data mining and GIS technology accessible to and affordable for clients that currently
refrain from investing into such technology due to the high recurring consulting costs,
 open up entirely new markets for Dialogis due to the substantially enhanced functionality of
Kepler and dialoGIS that will be part of the resulting product.
Dialogis sees the resulting product at the core of its product line, and will exploit its results of the
project by marketing the improved product to all existing and future customers of Dialogis. Existing
customers will be utilized as reference customers for the end result of SPIN!. Further customers can
be acquired through the standard sales channels of Dialogis (own sales activities, affiliates outside of
Germany, value added resellers and OEM partners). The results of this project are of such importance
to Dialogis that we see it as a core precondition for the international expansion of Dialogis in the data
54
SPIN!, IST-99-10536, 15.06.1999
55
mining market. Conservatively estimated, we expect a two-fold increase in sales through the proposed
project.
PGS expects to integrate the results of the project into its VIW and Lava/Magma product lines. The
VIW is a data-warehouse system for real-estate related information for Dutch municipalities. The
system is completely based on Internet technology, so that all information is accessible to any
(authorized) internet user. Currently most projects with VIW concentrate on building-up the data
warehouse. Typically this is a complex process, requiring organizational changes. The reason is that a
unified view in the entire organization is required on all information. Even for basic information (such
as addresses and who lives where) different departments (social security, tax, planning) use different
datasets. We expect that in a two to three years a substantial number of municipalities will have
constructed a datawarehouse with VIW, and have a good, consistent dataset. We anticipate that at that
time a great interest will arise in an easy to use analysis and data-mining environment to fully exploit
the information in the datawarehouse.
PGS will also exploit the results of the project to improve its consulting capabilities for complex
geographical analysis operations. In particular we expect that noise-level zoning is expected to
become a major issue in the next two to three years in Holland, because of anticipated new legislation.
The Lava/Magma environment already enables local and other government agencies to effectively
make their geographic information available through the Internet to a large public. With the results of
this project added to that environment PGS expects to dramatically improve the attractiveness of its
product line for expert users, that want to carry out complex analysis operations on their data-sets.
The technology developed in this project is uniquely suited for innovative ways of web based
information brokering and has a broad range of applications. PGS and Dialogis are convinced that a
shared exploitation of the SPIN! results will considerably enhance the market potential of both
companies. They therefore intend joint marketing and sales activities building on their respective
expertise and customer base. They already made a similar agreement for Lava/Magma and Descartes
within the CommonGIS project.
Manchester Metropolitan University(MMU) with MIMAS (formerly MIDAS) runs the KINDS
Project (http://www.midas.ac.uk/kinds) for accessing national spatial data sets over the Internet. The
SPIN! Project will enhance considerably the service which can be offered to academic users by
providing a major extension of functionality to complement the data browsing, data access and
visualisation services which are currently available. MIMAS provides the main academic data service
to the UK academic community and the SPIN! Project will have an immediate and maintained role in
this service.
A collaborator of MMU in the SPIN Project is the Office of National Statistics that produces and
distributes the UK National Census. A new census will be undertaken in 2001. The SPIN Project is
timely in the planning phase for the distribution of the Census data to commercial and academic users
and is expected to be a major platform for the dissemination of data.
Additional user groups and Scientific exploitation. The Global Biodiversity Information Facility
(GBIF), whose installation is recommended by the OECD, has identified DMS and GIS as key
technologies. Here is an exciting opportunity to develop a spatial mining solution as a coordinated
European effort which can be linked to develop a European perspective within GBIF. From the
strategic perspective of GMD knowledge discovery team, biodiversity informatics will be a major
application area in which the techniques developed in this project can be put to very good use,
supporting several European conventions.
Partners at the University of Leeds have also been involved in environmental EU research, namely,
MEDALUS III. This Mediterranean desertification and land use project completed a third stage of
research this year and a further proposal has been submitted to the framework 5 research program.
CCG research in MEDALUS III was geared to designing and developing a Synoptic Prediction
55
SPIN!, IST-99-10536, 15.06.1999
56
System (SPS) which aimed to be able to forecast future land use change impacts and land degradation
risks based on imposing climate change scenarios.
The researchers of the Department of Informatics of the University of Bari (I) have been working for
a decade on the application of machine learning tools and techniques to problems related to image
processing and computer vision. The SPIN! project provides a natural extension of the work done in
Bari (I) along two new directions: the embedding of the machine learning algorithms in a platforms
that tightly integrates GIS and Data Mining tools, and the application of the developed research
techniques and tools to a new domain, namely earthquake prediction and hazard assessment. The
former extension is important in order to define standard algorithms for the automated extraction of
features from maps. Currently, many proprietary formats of vectorised maps are available, which
make tools for automated extraction of information from maps hardly reusable. The collaboration
with researchers having experience on GIS will provide researchers from the University of Bari (I) a
better understanding of how to develop interoperable feature extraction algorithms for vectorised
maps. Moreover, the strict collaboration with end users requiring innovative tools for discovery of
geographic knowledge in data-rich environments will potentially result with better understanding of
the possible application areas and also open some new research problems. Experiences and nonconfidential research results will be disseminated in the scientific community by publishing papers
and by organizing a workshop in collaboration with other partners of the European Network of
Excellence on Machine Learning II (Esprit Project 29288), especially those actively involved in the
“Industrial Application Initiative”, and partners of the ESPRIT project SODAS 20821 (Symbolic
Official Data Analysis System), namely the statistical offices that already have geographically
referenced data.
The partner GFZ will contribute to the design and application of a demonstrator to seismic data
research. GFZ will profit from the technology by getting access to advanced and complementary
methods for data analysis. It intends to maintain a service to make research results on seismic and
volcano data as well as on hazard management accessible via the Internet using the technology
developed in this project.
The research activities of the research group "Earthquakes and Volcanism" of the GFZ are worldwide, permanent observation networks in the Mediterranean region and Indonesia. Co-operative
partner institutions are for example in Greece, Italy, Turkey, Armenia, Israel, Venezuela, China and
Indonesia. World-wide data exchange and exchange of scientific results between research groups
therefore is a crucial and important point. As many countries don't possess a fast Internet connection
yet, exchange of large data sets or graphic information is still a bottleneck. As long as there are
limitations in the bandwidth of telecommunication lines, scientists are in need of intelligent and
effective methods for transferring geographical information and results, so that not only results can be
examined by scientists and administrative persons in developed but also in developing countries.
Other Dissemination Activities. With regard to promotion and diffusion the final goal for the
partnership should be to offer to the end users concrete examples of how to disseminate geo-data to a
wide audience of users and how to exploit such data: the success of these demonstrators will
significantly contribute to the visibility of the European GIS and Data Mining technology.
In the course of the project, when the first prototype becomes available, the establishment of an
Advisory Board with external members will be seriously considered. The project will then have to
assign some financial resources for paying expenses of the Board members. We have already asked
the EEA (European Environment Agency) for their participation, and they have shown explicit
interest. EEA has also shown interest to become end user of the results of the project, having in mind
the recently started work on Sustainable Local indicators, together with DGXI., and also having in
mind their information gathering, assessment and reporting cycle activities. For example, EIONET –
the European Environment Information and Observation NETwork - was created as the main vehicle
56
SPIN!, IST-99-10536, 15.06.1999
57
of the European Environment Agency to collect data, information and knowledge for the process of
reporting on the state of environment.
57
SPIN!, IST-99-10536, 15.06.1999
58
Appendix – Publications of partners cited in part B
References partner P1 – GMD
The Descartes system can be found at the following places:
http://allanon.gmd.de/and/java/iris/
http://ais.gmd.de/descartes/IcaVisApplet/
1.
2.
3.
4.
Andrienko, G. and Andrienko N. Interactive Maps for Visual Data Exploration.
International Journal Geographical Information Science, 1999, 13(4). pp 355-374.
Andrienko, G. and Andrienko N. Intelligent Visualization and Dynamic Manipulation:
Two Complementary Instruments to Support Data Exploration with GIS. In
Proceedings of AVI'98: Advanced Visual Interfaces Int. Working Conference
(L'Aquila - Italy, May 24-27, 1998), ACM Press, pp.66-75
Andrienko, G. and Andrienko N. Knowledge-Based Visualization to Support Spatial
Data Mining. In Proceedings Intelligent Data Analysis IDA'99, Springer-Verlag, 1999
(accepted)
Klösgen, W. (1998). Deviation and association patterns for subgroup mining in
temporal, spatial, and textual data bases. In: Polkowski, L., Skowron, A. (eds): Rough
sets and current trends in computing. Lecture Notes in Artificial Intelligence, Vol.
1424, pp 1-18, Springer, Berlin, Heidelberg, New York.
5.
Klösgen, W. , and Zytkow, J. (1999) (eds). Handbook of Data Mining and Knowledge
Discovery, Oxford University Press, New York.
6.
G. Paass and J. Kindermann (1995), G. Tesauro, D. Touretzky, T. Leen (eds.):
Bayesian Query Construction for Neural Network Models, Advances in Neural
Information Processing Systems 7, pp 443--450, MIT Press
Kindermann, J. and Paaß, G., Weber, F. (1995), Query Construction for Neural
Networks Using the Bootstrap, in: Fogelman-Soulie, F. and Gallinari, P.Proc. (eds.)
ICANN 95, International Conference on Artificial Neural Networks, Paris, 135-140
EC2 & Cie
Gerhard Paaß, Jörg Kindermann: Bayesian Classification Trees with Overlapping
Leaves Applied to Credit-Scoring In: X. Wu , R. Kotagiri, K.B. Korb (eds.): Research
and Development in Knowledge Discovery and Data Mining. Springer-Verlag, Berlin
1998 pp. 234 - 245
J. Kindermann and G. Paass (1998), Model Switching for Bayesian Classification
Trees with Soft Splits in: J. Zytkow and M. Quafafou: Principles of Data Mining and
Knowledge Discovery, 148-157, Springer
Stefan Wrobel. Scalability Issues in Inductive Logic Programming Data. In Proc. 9th
Int. Workshop onAlgorithmic Learning Theory (ALT-98), Berlin, 1998. Springer
Verlag.
7.
8.
9.
10.
58
SPIN!, IST-99-10536, 15.06.1999
59
References partner P2 - University of Bari
1.
Malerba D., Esposito F., and Lisi, F.A. (1998). Learning recursive theories with
ATRE. In H. Prade (Ed.), Proceedings of the 13th European Conference on Artificial
Intelligence, 435-439, John Wiley & Sons, Chichester, England.
2.
F. Esposito, A. Lanza, D. Malerba, & G. Semeraro (1997). Machine learning for map
interpretation: An intelligent tool for environmental planning. Applied Artificial
Intelligence: An Artificial Intelligence Journal, 11, 10, 673-696.
3.
F. Esposito, A. Lanza, D. Malerba, & G. Semeraro (1998). Information capture from
topographic maps using machine learning. Proceedings of the Joint Workshop of the
Italian Association for Artificial Intelligence (AI*IA) and the International Association
for Pattern Recognition - Italian Chapter (IAPR-IC) on "Artificial Intelligence and
Pattern Recognition Techniques for Computer Vision", 122-127.
References partner P3 – IITP, Russian Academy of Sciences
The GeoProcessor system can be accessed at:
http://www.iitp.ru/projects/geo/index.html
http://www.iitp.ru/projects/geo/geoprocessor.html
1.
2.
3.
4.
5.
Gitis V., Dovgyallo A., Osher B. An information technology for analysis of geological
and geophysical data in INTERNET. Proceedings of VI national conference on
Artificial Intelligence, Puschino, 1998, 473-479 (in Russian).
Gitis V., Dovgyallo A., Osher B., Gergely T. GeoNet: an information technology for
WWW on-line intelligent Geodata analysis. Abstracts of 4th EC-GIS Workshop,
Hungary, 1998.
Gitis V., Dovgyallo A., Osher B., Gergely T. An approach to Online Geoinformation
Modeling. – Proceedings of the 1st International Workshop on Computer Science and
Information Technologies, Moscow, January 18-22, 1999, 181-186.
Gitis V.G. GIS technology for the design of computer-based models in seismic hazard
assessment.- Geographical Information Systems in Assessing Natural Hazards,
A.Carrara and F.Guzzetti (eds), 1995, Kluver Academic Publishers, 219-233.
Gitis V.G., Jurkov E.F, Osher B.V., Pirogov S.A., Ponomarev A.V., Sobolev G.A. A
system for analysis of geological catastrophe precursors.- Journal of Earthquake
Prediction Research 3, 1994, 540-555.
References partner 4 – Leeds
The internet version of the GAM system can be found at:
http://www.ccg.leeds.ac.uk/smart/gam/gam.html
1.
2.
Openshaw, S. and Perrée, T. (1996) ‘User centred intelligent spatial analysis of point
data’, in Parker, D. (eds) Innovations in GIS 3 ,Taylor and Francis, London, 119-134.
Openshaw, S., Turton, I., Macgill, J. and Davy, J., (1999) Putting the Geographical
Analysis Machine on the Internet in Gittings, B. (ed.) Innovations in GIS 6, Taylor and
Francis, London, (in press)
59
SPIN!, IST-99-10536, 15.06.1999
3.
4.
5.
6.
7.
8.
9.
60
Openshaw, S., Turner, A., Turton, I., Macgill, J. and Brunsdon, C., (2000) Testing
space-time and more complex hyperspace geographical analysis tools in Martin, D.
(ed.) Innovations in GIS and GeoComputation 7, Taylor and Francis, London, (in
press)
Openshaw, S. (1998) ‘Building automated Geographical Analysis and Exploration
Machines’, in Longley, P. A., Brooks, S. M. and Mcdonnell, B. (eds) Geocomputation:
A primer Macmillan Wiley Chichester, p95-115.
Turton I, (1999) Using Pattern Recognition to Discover Concepts in Spatial Data, in
Gittings, B. (ed.) Innovations in GIS 6, Taylor and Francis, London, (in press)
Turton, I., (1999) Application of Pattern Recognition to Concept Discovery in
Geography, in Allan, R.J., Guest, M.F., Simpson, A., Henty, D. and Nicole, D., HighPerformance Computing p467-486, Plenum Press, New York
Openshaw S, and Turton, I. (1998) Application of GAM to crime analysis, Crime
Mapping Research Centre Report, U.S. Department of Justice.
Openshaw, S, Turton, I. and Macgill, J., (1999) Using the Geographical Analysis
Machine to analyse census limiting long term illness, Geographical & Environmental
Modelling, vol. 3.1 p83-99
Carver, S., Blake, M., Turton, I. and Duke-Williams, O., (1997) Open spatial decision
making: Evaluation of the potential of the world wide web, in Z. Kemp (ed.),
Innovations in GIS 4, pp 267- 278, Taylor and Francis, London.
References partner P5 – Dialogis
1.
Wrobel, S., Wettschereck, D., Sommer, E., Emde, W. (1996). Extensibility in Data
Mining Systems, Proceedings of the 2nd International Conference on Knowledge
Discovery and Data Mining, AAAI Press, Menlo Park, California
References partner P6 – PGS
The Lava/Magma system can be accessed at:
http://www.pgs.nl/
1.
C. van den Berg, F.Tuinman, T.Vijbrief, C.Meijer, P. van Oosterom, and H. Uitermark
(1999), Multi-server Internet GIS: Standardization and Practical Experiences, In
Goodchild, M., Egenhofer, M., Fegeas, R., and Kottman, C. (eds.) Interoperating
Geographic Information Systems. Boston: Kluwer Academic Publishers, 1999,
pp.365-377
60