Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
TerraLib: Open Source Tools for GIS Application Development Gilberto Câmara INPE – Brasil www.terralib.org International Course on Geographic Information Technologies The issue “Developing countries and their donor partners should review policies for procurement of computer software, with a view to ensuring that options for using low-cost and/or open-source software products are properly considered and their costs and benefits carefully evaluated” (UK IPR report, 2002) Yes, but… We need much more than Linux! Who will develop the open source software we need? Can it be done in developing countries? The discussion today The nature of open source software Spatial information technology The need for open source GIS and Remote Sensing software Developing an open source GIS in Brazil A realistic model for OS projects 20 years of institutional, nation-wide efforts Technology as social construction Some lessons learned How can we do OS software in the South? Some questions What is open source software? How is open source software built? What is an open source GIS? Why build a custom GIS instead of using a ready-made one? How does a distributed/entreprise GIS work? How are spatial data structures stored in an object-relational DBMS? Why do we need to do this? Are there certain paradigms for GIS libraries that are general enough not be vendor-specific? How much effort is involved in building a custom-made GIS? How does one design a GIS library to be extensible? How does one design a GIS to include space-time models? What is Open Source Software? Open source software OSS is software whose source code is available and can be used, copied and distributed with or without change OSS can be charged, but not hidden Examples of OSS Linux, Apache, Open Office, PERL 2/3 of web servers use Apache Linux and Scalability General Advantages of Open Source Software Greater social benefit Independence from proprietary technology Longer hardware life Avoiding the “software bloat” Possibility of building custom application and redistribution of them User-oriented products Otimization of available competence Reconfigurable systems and applications (in general) Distribution and testing of new ideas and new conceptions What Open Source Hopes to Achieve When an OSS project reaches a “critical size” we obtain: Robustness and security: many programmers have access to the source code, so this increases the capacity to detect errors Distributed support Many companies provide support for OSS Open Source Software Licences Copyrights Before making OSS available, the authors choose the degree of freedom they allow to modification and distruibution of the software. Some OSS licences GNU Public License (GPL - “copyleft”): any modification of GPL-ed software should also be GPL BSD (Berkeley): few restrictions on use, modifications and redistribution of derived works. O software pode ser vendido e não há obrigações quanto a inclusão do código fonte, podendo o mesmo ser incluído em software proprietário. GNU Library License (LGPL) Forbids that OSS be included in proprietary software. OSS can be included in proprietary software. Final product must have the OSS portion freely available More OSS licenses in www.opensource.org How is open source software built? Idealized view of OS community Network of committed individuals (“peer production”) Based on a limited number of examples Reality of software projects Problem granularity Conceptual design Degree of innovation Social context of technology Naïve view of open source projects Software Development network Large number of developers, single repository Open source products Product of an individual or small group (peer-pressure) Based on a “kernel” with “plausible promise” View as complex, innovative systems (Linux) Incentives to participate Operate at an individual level (“self-esteem”) Wild-west libertarian (“John Waynes of the modern era”) Idealized model of OS software Networks of committed individuals The reality of open source projects Problem granularity Effective peer-production requires high granularity (Benkler) Each type of software induces a breakdown strategy Conceptual design and Innovation What works for an operating system will not work for a database! Most OS software is based on established paradigms (Linux is a 1970’s design) Design is the hardest part of software (Fred Brooks) Social context of technology Software development requires closely-knit teams Software will do nothing by itself Complex software requires informed users Limiting Factors for Open Source Previous existence of conceptual designs of similar products (the potential for reverse engineering) Problem granularity (the potential for distributed development) Potential for Reverse Engineering Post-mature A private company develops a software product. Product becomes popular and it becomes part of the “public commons”. Others develop a public domain equivalent (e.g.,Open Office) Standards-led Standards consolidate a technology Allow compatible solutions to compete in the marketplace. SQL database standard (e.g.,mySQL and PostgreSQL). POSIX standard (guidance to Linux) OpenGIS specifications (e.g.,Degree, MapServer, GeoServer) Potential for Distributed Development Parts of a software product Operating systems (Linux) kernel and additional functions that use it (its periphery). well-defined kernel for process control periphery consisting of programs such as device drivers, applications, compilers and network tools. Database management systems strong kernel of highly integrated functions (such as the parser, scheduler, and optimizer) much smaller periphery. Potential for Distributed Development Each type of software product - periphery/kernel ratio Kernel a tightly-organized and highly-skilled programming team. Periphery constrains the potential for distributed development More widespread programmers of various skills Example Out of more than 400 developers, the top 15 programmers of the Apache web server contribute 88% of added lines [Mockus, 2002 #2293]. Four Types of Open Source Software High reverse engineering, high distribution potential High reverse engineering, low distribution potential Low reverse engineering, high distribution potential Low reverse engineering, low distribution potential Type 1 – High-High High reverse engineering, high distribution potential: Archetypical open source projects Developers The “Linux” model. May have a separate job Time allocated in agreement with their employer. community-led projects. Type 2 – High-Low High reverse engineering, low distribution potential Large number of projects Databases, office automation tools, web services. Large presence of private companies products similar to market leaders. reduced risk in reverse engineering. main design decisions take place within the institution Examples mySQL and PostgreSQL DBMS, GNOME from Ximian corporation-led projects. Type 3 – Low/High Low reverse engineering, high distribution potential Stable kernel, innovative periphery Origin academic environments Examples usually there is no commercial counterpart share a relatively simple software kernel GRASS GIS software and the R suite of statistical tools. academic-led projects Type 4 – Low/Low Low reverse engineering, low distribution potential Innovative kernel, small periphery Small teams under a public R&D contract High mortality rate addressing specific requirements aiming to demonstrate novel scientific work. most of them are restricted to the lifetime of a research grant. innovation-led products. High-Low Potential Rev Eng High-High mySQL Linux PostgreSQL OpenOffice Apache Postgres php NCSA browser TerraLib Low-Low Low-High Potential Distrib Develop Sustainability of Open Source Projects The Low-Low case Research community many scientific areas not enough market incentives for commercial companies usually not interested in a direct involvement in long-term open source projects. Maintaining and supporting an open source software project requires considerable resource beyond the reach of most university groups. Taking low-low projects to the marketplace Migration to high-low or low-high situations The reality of open source projects Linux model is not scalable Key components Other types of software are less modular Reverse engineering potential Modularity Requirements for success Long-term investment Very qualified personnel Accessible mostly to organizations, not to individuals Real-life model of OS software Networks of committed organizations Geoinformation Technologies: Promises and Challenges Earth observation and GIS technologies Satellite images and Digital Maps Great successes of advanced information technology Transformed our understanding of geographical space Developing nations Essential for public policies Deforestation assessment, urban planning, agriculture, ... Turning Observations into Knowledge Source: Gassem Asrar (NASA) Products Knowledge gap for spatial data Imbalance of public expenditure Governments build data-gathering satellites… ….and they hope the market will do the rest ENVISAT = Us$ 1 billion EOS (Terra/Aqua) = Us$ 1 billion Leading remote sensing software product US$25 M (gross) The model does not add up! There is not enough market to cover large R&D expenses The result is the “knowledge gap” Knowledge gap for spatial data source: John McDonald (MDA) Bridging the Knowledge gap “Deadlock” situation Small size of commercial GIS and Remote Sensing market Improvements on information extraction Needed for the market to grow Making use of the deluges of data Not enough income for large R&D investment Government-funded software development Strong integration with scientific community Open Source GIS projects Provide innovative ways to use spatio-temporal data Effective means of advancing environmental applications Knowledge gap for spatial data: An Example Exctracting information from remote sensing imagery Recipe analogy Most applications use the “snapshot” paradigm Take 1 image (“raw”) “Cook” the image (correction + interpretation) All “salt” (i.e., ancillary data) Serve while hot (on a “GIS plate”) But we have lots of images! Immense data archives (Terabytes of historical images) How many image database mining application we have? MSS – Landsat 2 – Manaus(1977) TM – Landsat 5 – Manaus (1987) Spatial information technology Basis of the technology Computer representation of spatio-temporal phenomena Discrete objects (e.g., parcels) Continuous fields (e.g., topography) Uses of GIS (geographical information systems) Commercial applications Location-based services Business geographics Public good applications Urban cadastral systems Environmental protection and prediction Agriculture crop forecasting Hydrological modeling Why Spatial Information Engineering? BIG GIS Geographic Information Systems are built for Urban and Regional Planning Public Utilities Logistics Companies These GIS have as a single user a large organization for which various types of information in adapted formats are produced. The systems are large and the sizable cost are paid by the organization. Cost-Benefit evaluation is typically difficult. source: Andrew Frank (TU-Wien) Why Spatial Information Engineering? Service providers build GIS to provide information of value to many users. For example: Real estate agent support systems to find suitable new apartment or home. Systems for navigation assistance for drivers. Routing for service vehicles Hotel locator The information is produced by the service provider and consumed by many individuals, not related to the service provider. source: Andrew Frank (TU-Wien) Why Spatial Information Engineering? Technological change Current generation of GIS Built on proprietary architectures Interface+function+database = “monolythic” system Geometric data structures = archived outside of the DBMS New generation of object-relational DBMS All data will be handled by DBMS Standardized access methods (e.g. OpenGIS) Users can develop customized applications Evolution of Spatial Information Technology Global Data Centre Institucional Spatial database Individual GIS Evolution of Spatial Information Technology Global Data Centre Institucional Spatial database Individual GIS Spatial database Different GIS Architectures Desktop GIS Single-user Emphasis in friendly interfaces and analysis functions Distributed GIS Multiple users Data sharing Emphasis in access and concurrency control Web services Using the Internet to disseminate data and services Emphasis in usability and flexibility Tools Challenge Why open-source GIS+IP tools? “Learning by doing” Unresolved need of spatial analysis Learning by doing Provides understanding of core aspects of GIS Capacity to dissect the “black-boxes” Establishing a strategy for continuos innovation Who Builds Open Source GIS? Survey of 70 GIS open source projects (freegis.org) Individual-size projects the project team consists of 1-3 individuals shapelib and Gstat Collaborative networks project core team consists of a team of 15+ individuals, geographically distributed. GRASS and R. Corporation-based: the project core team is part of an institution. PostgreSQL,TerraVision Who Builds Open Source GIS? Total Post-mature Standards-led Innovation-led Individual-based 37 (53%) 12 19 6 Networked Team 4 (6%) 1 1 2 29 (41%) 6 18 5 70 19 (27%) 38 (54%) 13 (19%) Corporation-based Maturity and Support of Open Source GIS Scale – 1 to 5 where 1 is worst and 5 is best Maturity Individual-led Support Functionality 2.3 1.7 1.8 3.7 3.7 3.7 3.2 3.1 3.0 Networked team Corporationbased Corporative environment much better suited for long-term software development than an individual’s perspective “Critical mass” community-developed software Best of all (but only 6% of all projects) What’s the Current Status of Open Source GIS? High-Low products Low-high products Standards-based Spatial DBMS: mySQL, PostgreSQL OpenGIS + Web: MapServer, Degree Stable kernel, innovation at the periphery GRASS and R What about GIScience challenges? spatio-temporal data models, geographical ontologies, spatial statistics and spatial econometrics, dynamic modelling and cellular automata, environmental modelling, neural networks for spatial data TerraLib: Open source GIS library Data management Functions All of data (spatial + attributes) is in database Spatial statistics, Image Processing, Map Algebra Innovation Based on state-of-the-art techniques Same timing as similar commercial products Web-based co-operative development http://www.terralib.org Operational Vision of TerraLib DBMS TerraLib Geographic Application Spatial Operations API for Spatial Operations Spatial Operations Access Oracle Spatial MySQL Postgre SQL TerraLib MapObjects + ArcSDE + cell spaces + spatio-temporal models TerraLib applications Cadastral Mapping Public Health Indicators of social exclusion in innercity areas Land-use change modelling Spatial statistical tools for epidemiology and health services Social Exclusion Improving urban management of large Brazilian cities Spatio-temporal models of deforestation in Amazonia Emergency action planning Oil refineries and pipelines (Petrobras) Spatial database components DBMS Support for blobs (Access) Support for spatial data types (ORACLE, PostgreSQL) Middleware Function libraries Interface Middleware TerraLib, ArcSDE Interface TerraView SIGMUN, ArcGIS 8.0 DBMS Geoprocessamento e Políticas Públicas: Ordenamento Territorial TerraCrime TerraCadastre for Santos 150.000 cadastral units on-line + 3 Gb of aerial photos Palm-top Exemplos de Produtos Web Requirements for Spatio-Temporal Models for Dynamic Modelling Dealing with Data Representation of Space Spaces of places + spaces of networks (anisotropy) Cells as autonomously evolving entities Extensibility of Models Storage and retrieval of large-scale datasets Inclusion of data from external source Algorithms should be independent of data structures Different Models have different rules (CA, Markov chain, regression) Dealing with Modellers Cognitively meaningful interfaces (language?, data-flow?) Suitable visualization enviroments TerraLib Structure Java Interface COM Interface OGIS Services C++ Interface Functions kernel Visualization Controls Spatio-Temporal Data Structures File and DBMS Access I/O Drivers External Files DBMS Software structure Kernel Data Structures Vector Raster DBMS Drivers Oracle Oracle Spatial Topology Ops PostgreSQL Data Containers mySQL Generic DBMS API Spatial Reference Systems Ado Software Structure Visualization View Algorithms Simple Statistics Theme Spatial Autocorrelation Data Conversion OLS Regression Vector Kernel Estimator MapInfo GWR ArcView Regionalization SPRING Variogram Raster GeoTIFF JPEG Kriging TerraLib Data Model Data Base Layer Static Properties Attributes Concrete Classes Geometries Polygons Abstract Concepts Lines Cells Network Points Dynamic Properties Events Dynamic Models Requirements for a Good GIS Library Modularity Divided into independent components Database, memory containers, algorithms Changes in one component should not affect other Extensibility Library can be extended with no disruption of existing code Algorithms do not know about data model Changes in data model affect database component only Key Design Decisions Algorithms are independent of data structures and data containers Same algorithm will work on a variety of spatial geometries Visualization control is separate from data retrieval Same data can be shown in various ways Visualization types should be separate from data types Data model has spatio-temporal data representations Include provisions for events, dynamic models, changing values Key concepts in TerraLib Spatio-Temporal Data Model How data is organized on spatio-temporal DBMS Concrete choice (dictated by technological possibilities) Data containers Contain organized “chunks” of data Layer – set of spatial objects from the same type STObjectSet – set of objects resulting from a query Visualization concepts View – abstract description of a canvas (contains a set of Themes) Theme – graphical presentation of a set of objects of the same type Modularity in TerraLib Components of TerraLib are designed to be independent of each other “Glues” bind components together Algoritms and data containers – iterators Data containers and database API – query processor Data containers and visualization applications – view, theme Glues that bind TerraLib together Spatio-Temporal Queries Visualization (theme) Algorithms grouping Query parser iterators Query processor Containers Database API Database TerraLib and OpenGIS Vector data structures and topological operators TL follows Open GIS specifications Vector data storage TL does not follow OpenGIS TL has additional data structures (e.g., cell spaces) TL keeps information on visualization and processing status Simple Feature Query Language (SF-SQL) Web Services (WMS, WCS, WFS) TL will fully support OpenGIS specifications Data conversion (GML) Fully implementable over TerraLib (included in roadmap) TL will fully support GML-based data exchange Building OpenGIS services over TerraLib is straightforward! OpenGIS Geometry Model Geometry Point Line Curve Surface LineString Polygon LinearRing GeometryCollection MultiSurface MultiCurve MultiPolygon MultiLineString MultiPoint TerraLib Geometry Model TeGeometry TePoint TeLine2D TeLinearRing TePolygon TePolygonSet TeLineSet TePointSet TerraLib: Geometry Model TeGeometry TePoint TeLine2D TeLinearRing TePoint Point TePolygon TePolygonSet TeLineSet TePointSet TerraLib: Geometry Model TeGeometry TePoint TeLine2D TeLinearRing TeLine2D LineString TePolygon TePolygonSet TeLineSet TePointSet TerraLib: Geometry Model TeGeometry TePoint TeLine2D TePolygon TeLinearRing TePolygon Polygon TePolygonSet TeLineSet TePointSet TerraLib: Geometry Model TeGeometry TePoint TeLine2D TePolygon TePolygonSet TeLineSet TePointSet TeLinearRing TePolygonSet MultiPolygon ... TerraLib: Geometry Model TerraLib has additional geometries TeCell and TeCellSet TeArc and TeArcSet TeNode and TeNodeSet TeSample and TeSampleSet TeContourLine and TeContourLineSet TeText and TeTextSet Storage of spatial data structures TerraLib is different from OpenGIS: The geometry sets of TerraLib (TePointSet, TeLineSet, TePolygonSet) are fragmented in as many lines as the elements of each set. In each line of a TerraLib spatial table we will have one geometry (TePoint, TeLine2D, TePolygon) The union of the object’s geometries is done by an “object_id” field that identifies the all the lines of the geometry table that pertain to the same object. Rationale Multi-temporal storage of different versions of the same object Better performance Topological Operations on Vector Geometries OpenGIS recommends the Egenhofer operators (9-intersection dimension-extended matrix) TerraLib complis with OpenGIS specifications Ex: TeOverlaps(X, Y) (dim(Xo) = dim(Yo) = dim (Xo Yo)) (X Y X) (X Y Y) Area/Area (Xo Yo ) (Xo Y- ) (X- Yo ) Line/Line (dim(Xo Yo) = 1) (Xo Y- ) (X- Yo ) Metadata OpenGIS metadata Restricted to geometrical data types TerraLib metadata Tables with geometries. Many other tables for spatial data handling Visualization information (themes/views) Name of table and of collumn with a geometry type Coordinate systems Spatial, temporal and attribute restrictions Rationale TerraLib data model is more than a storage data model Built to support visualization and data analysis applications Operations on Vector Data SQL functions for vector data : Equals ( g1 Geometry, g2 Geometry) : Integer Disjoint ( g1 Geometry, g2 Geometry) : Integer Touches ( g1 Geometry, g2 Geometry) : Integer Within ( g1 Geometry, g2 Geometry) : Integer Overlaps ( g1 Geometry, g2 Geometry) : Integer Crosses ( g1 Geometry, g2 Geometry) : Integer Intersects ( g1 Geometry, g2 Geometry) : Integer Contains ( g1 Geometry, g2 Geometry) : Integer Relate ( g1 Geometry, g2 Geometry, patternMatrix string) : Int Image Data Handling Satellite images Growing importance in GIS Large data volumes Images in TerraLib Indexed by tiles Multi-level structure for efficient visualization SQL extensions Vector x Raster (statistic values): Count ( g1 Geometry, r1 Raster) : Integer Minimum ( g1 Geometry, r1 Raster) : Double Maximum ( g1 Geometry, r1 Raster) : Double Average ( g1 Geometry, r1 Raster) : Double Variance ( g1 Geometry, r1 Raster) : Double StdDeviation ( g1 Geometry, r1 Raster) : Double Median (g1 Geometry, r1 Raster) : Double Value (point Geometry,r1 Raster) : Double Others: assimetry, curtosis, coefficient of variation, mode SQL extensions - Raster Function Histogram (r1 Raster) : Integer Array Spatial operators WC2RC (wc PointGeometry, r1 Raster) : PointGeometry RC2WC (rc PointGeometry, r1 Raster) : PointGeometry Mask (r1 Raster, r2 Raster) : Raster Mask (g1 Geometry, r1 Raster) : Raster Reclassify (r1 Raster, rl Rules) : Raster Slice (r1 Raster, rl Rules) : Raster Weight (r1 Raster,rl Rules) : Raster Calculate (r1 Raster,...,rn Raster,mathexp String) : Raster Spatio-Temporal Models in TerraLib Static Data Events Dinamic objects Ex: Crimes Cell spaces Moving Objects Ex: evolution of parcels in an urban cadastre Events time Near in space, near in time? y x Dynamical Spatial Model f ( I (t) ) f ( I (t+1) ) F f ( I (t+2) ) f ( I (tn )) F .. “A dynamical spatial model is a mathematical representation of a real-world process when a location changes in response to external forces (Burrough) SIMULATIONS OUTPUTS S2 Reality - Bauru in 1988 S3 Cell Spaces Algorithms in TerraLib Algorithms basic core of most successful GIS large number of them do not depend on some particular implementation of a data structure based a few fundamental semantic properties of the structure properties can be - for example - the ability to get from one element of the data structure to the next, and to compare two elements of the data structure . Spatial analysis algorithms can be abstracted away from a particular data structure and described only in terms of their properties. Same Algorithm, Different Geometries Generic GIS Programming How to decouple algorithms from data structures ? Idea: Iterators (“inteligent pointers”) Algoritms are not classes !! “Decide which algorithms you want; parametrize them so they work for a variety of suitable types and data structures” Algorithms Iterators Geometries Generic GIS Programming A motivating example Type stack of element Functions new: stack push: element x stack stack empty: stack {true, false} pop: stack stack top: stack element This builds stacks of anything! Generic GIS Programming Idea Find fundamental laws that drive software components Design interoperable modules based on these laws Generic GIS Programming Idea Find fundamental laws that drive software components Design interoperable modules based on these laws How can we apply generic programming to GIS? Find regularities in spatial data handling Generalize these regularies into abstract types Process of formalization TerraLib Community in Brazil Exército Brasileiro RoadMap for TerraLib Kernel Algorithms Build a language for dynamic models using cell spaces New algorithms for spatial data analysis Programming environment for external users Improve spatio-temporal model to include objects with changing boundaries Improve performance by enhanced indexing Improve the Java-TerraLib and COM-TerraLib connections Documentation, documentation, documentation! Aim To put TerraLib in the “Low-High” quadrant Support inovation, but allowing a distributed development What does it take to do it? SPRING and TerraLib project Development and Application Team Software: 40 senior programmers (10 with PhD) Applications: 30 PhDs in Earth Sciences plus students Building a resource base Major emphasis on “learning-by-doing” Graduate Programs in Computer Science and Remote Sensing SPRING and Terralib: 20 PhD thesis and 35 MsC dissertations Institutional effort Requires long-term planning and vision The Road Ahead: Can Technology Help? Advances in remote sensing are giving computer networks the eyes and ears they need to observe their physical surroundings. Sensors detect physical changes in pressure, temperature, light, sound, or chemical concentrations and then send a signal to a computer that does something in response. Scientists expect that billions of these devices will someday form rich sensory networks linked to digital backbones that put the environment itself online. (Rand Corporation, “The Future of Remote Sensing”) The Road Ahead: Smart Sensors SMART DUST Autonomous sensing and communication in a cubic millimeter Sources: Silvio Meira and Univ Berkeley, SmartDust project The Carbonsink of Amazonian Forest and climate Sink Strength 1 to 7 t C ha-1 yr-1 1 0.5? 2 Preliminary synthesis of the carbon cycle for Amazonian forests. Units: t C ha-1 yr-1. GPP= gross primary productivity; Ra= autotrophic respiration; Rh=heterotrophic respiration; VOC= volatile organic carbon compounds. Source: Carlos Nobre, Alterra, INPA, IH, Edinburgh Un., Washington Un. Source: LUCC Uncertainty on basic equations Limits for Models Social and Economic Systems Quantum Gravity Particle Physics Living Systems Global Change Chemical Reactions Applied Sciences Solar System Dynamics Complexity of the phenomenon Meteorology source: John Barrow Conclusions Open Source software model Spatial information technology The Linux example is not applicable to all situations Moving from the individual level to the organization level Large R&D is needed to bridge the “knowledge gap” Open source GIS software has a large role Open source projects in developing nations Combination of institutional vision, qualified personnel and strong links to user community Government-funded to be viable