Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Brigham Young University BYU ScholarsArchive International Congress on Environmental Modelling and Software 6th International Congress on Environmental Modelling and Software - Leipzig, Germany - July 2012 Jul 1st, 12:00 AM Knowledge Discovery for Biodiversity: from Data Mining to Sign Management Noel Conruyt David Grosser Régine Vignes Lebbe Follow this and additional works at: http://scholarsarchive.byu.edu/iemssconference Conruyt, Noel; Grosser, David; and Lebbe, Régine Vignes, "Knowledge Discovery for Biodiversity: from Data Mining to Sign Management" (2012). International Congress on Environmental Modelling and Software. 297. http://scholarsarchive.byu.edu/iemssconference/2012/Stream-B/297 This Event is brought to you for free and open access by the Civil and Environmental Engineering at BYU ScholarsArchive. It has been accepted for inclusion in International Congress on Environmental Modelling and Software by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected]. International Environmental Modelling and Software Society (iEMSs) 2012 International Congress on Environmental Modelling and Software Managing Resources of a Limited Planet, Sixth Biennial Meeting, Leipzig, Germany R. Seppelt, A.A. Voinov, S. Lange, D. Bankamp (Eds.) http://www.iemss.org/society/index.php/iemss-2012-proceedings Knowledge Discovery for Biodiversity: from Data Mining to Sign Management 1 1 2 Noel Conruyt , David Grosser , Régine Vignes Lebbe LIM - IREMIA – EA 25-25 – Laboratory of Mathematics and Computer Science University of Reunion Island, 2 rue Joseph Wetzell, 97490, Sainte-Clotilde, France 2 LIS, UMR 7207-CR2P, Université Pierre et Marie Curie, Paris Universitas, France [email protected] 1 Abstract: Knowledge discovery from data in environmental sciences is becoming more and more important nowadays because of the deluge of information found in databases of digital ecosystems, coming altogether from institutions and amateurs. For example in biodiversity science, all these data need to be validated by specialists with the help of Intelligent Environmental Decision Support Systems (IEDSSs), then enhanced and certified into qualitative knowledge before reaching their audience. Data mining through classification or clustering is the dedicated inductive process of grouping descriptions based on similarity measures, then building classes and naming them. Later, the formed concepts can be reused for identification purpose with new observations. The problem is that when using such knowledge-based systems, we tend to forget the fundamental role of subjects (endusers) in the definition, observation and description of objects. In order to get good identification results, a consensus must be found between these experts and amateurs for interpreting correctly the observed objects. Thus, a new method of Knowledge discovery is necessary by switching from Data mining to Sign management. The method focuses on the process of building knowledge by sharing signs and significations (Semiotic Web), more than on knowledge transmission with intelligent object representations (Semantic Web). Sign management is the shift of paradigm for Biodiversity Informatics that we have investigated in such domains as enhancing natural heritage with ICT. In this paper, we will present Sign management and illustrate this concept with two knowledge bases built in La Reunion Island for corals’ classification with IKBS (Iterative Knowledge Base System) and plants identification with Xper2 software platform. Keywords: Data mining; Sign management; Knowledge discovery; Environmental sciences; Biodiversity informatics. 1 KNOWLEDGE DISCOVERY IN ENVIRONMENTAL SCIENCES th At the end of the 20 century, a lot of researches have been made in computer science for enhancing the use of computers for data analysis (Breiman et al. 1984) (Diday 1994), machine learning (Mitchell 1997) and knowledge management (Skyrme 1998). Across a wide variety of fields such as science, business, industry, data have been collected and accumulated at a dramatic pace, generating a new st challenge for the 21 century, called the Big Data problem (Nature 2008) (Cukier 2010). In the domain of environmental sciences, the management of natural resources such as soil, water, air, biodiversity, alimentation is facing the same deluge of information in databases (Fukuda 2009). To tackle this problem, Fayyad et al. (1996) explained that there is an urgent need for a new generation of computational theories and tools to assist humans in extracting useful information or knowledge from the rapidly growing volumes of Noel Conruyt et al. / Knowledge Discovery for Biodiversity: from Data Mining to Sign Management digital data. These topics were the subject of the emerging field of Knowledge Discovery in Databases (KDD). KDD is the “non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”. st But in the digital era of the 21 century, with these torrents of new data on the Web, discovering knowledge in environmental sciences is adding a more complex challenge (Poch et al. 2004). Complexity comes from the fact that environmental 1 data are natural (i.e. not invented by humans). They are polymorph (variable), incomplete (missing), noisy (difficult to observe), and therefore source of errors (Gibert et al. 2008). This is particularly true for data management in the new field of Biodiversity Informatics (Sarkar 2009). It leads to different interpretations of data by 2 people, called information . These pieces of information can be managed to 3 discover knowledge but information issued from human sensors (sensitive organs such as the eye) are more subjective than data issued from machine sensors (probes). Thus, Knowledge Discovery in Biodiversity Informatics (KDBI) is not only a problem of extracting knowledge from a flow of data with rational mining techniques and algorithms (Gibert 2010). It is also a problem of human and machine interaction with qualitative contents. Quality is a matter of good interpretation, visualisation and signification of the triptych (data, information and knowledge) from a subject viewpoint. Indeed, the challenge of KDBI is to make sense of the deluge of contents and forms of delivery altogether in space (geo location of data), in time (cycling or iterative process of generating knowledge) and at a certain level of observation (multi-layered study’s scale). In fact, KDBI is embedded in the Life Science problem of understanding emerging theories (Umwelt, Activity and Semiotics) and tools that are the foundations of the concept of Sign management (Uexküll 1926) (Engeström 1987) (Peirce 1965). For taking the right decision (i.e. make a diagnosis or identify a problem) and act properly, the application of computer-assisted tools to extract knowledge from experts in knowledge bases (by representation and classification of objects) is not enough; Ontology building and case-based reasoning with the description logics formats (OWL, RDFs, SPARQL) of the Semantic Web brings syntactical advantages, but this is not sufficient for its utility and usability. We do need to know the purpose of knowledge discovery and formulate the right question before acquiring data. The alternative that we suggest for specimens’ identification is then to apply a more natural or human-centred approach to knowledge discovery by applying descriptive logics of the Semiotic Web, based on the aim to tackle real usage. Our idea is to manage not only objective knowledge with formal representations for ontologies (written notations). We must also share subjective know-how or interpretations of knowledge with living significations (multimedia annotations) that can be shown by individual subjects. As suggested by Frey (2008), let us reinvent a more living knowledge discovery on the Web for st biodiversity management in the 21 century! 2 BIODIVERSITY INFORMATICS Biodiversity Informatics is a young and rapidly growing field that brings information science and technologies to bear on the data and information generated by the study of organisms, their genes, and their interactions. In doing so, it is creating unprecedented global access to information on biological species and their role in nature (e-Biosphere 2009). The term "Biodiversity Informatics" is generally used in the broad sense of applying to computerized handling of any biodiversity information and must not be confused with the term “Bioinformatics” that 1 A Datum is a raw element of Information that is attached to a thing or entity (Object) and qualifies it with a Value measured by an instrument. 2 Information is the interpretation of a Datum measured on an Object by a Subject. It is the shaping of the datum in a message that circulates between Subjects by a channel. The interpreter makes a more or less explicit characterization of the Datum with an Attribute (character or variable) to qualify the Object. 3 Knowledge is the result of negotiating pieces of information from different Subjects by analyzing and synthesizing them (through comparison, discrimination and generalization). Noel Conruyt et al. / Knowledge Discovery for Biodiversity: from Data Mining to Sign Management computerizes data in the specialized area of molecular biology. At University of Reunion Island, our application domain in environmental sciences has been to manage Biodiversity with Information and Communication Technologies (ICT) at the level of Systematics. It is the scientific discipline that deals with listing, describing, naming, classifying and identifying living beings (Matile et al. 1987) (Winston 1999). In this domain, specific IEDSSs are needed to help taxonomists describe data for classification and identification purposes (Conruyt and Grosser 2007) (Ung et al. 2010). This phenetic domain of knowledge discovery, also called alpha e-taxonomy or cyber-taxonomy (Mayo et al. 2008) is concerned with ontogenetic descriptions of specimens and taxa and is not to confuse with phylogeny that deals with species history. Indeed, being able to describe, classify and identify a specimen from morphological characters is a first step for monitoring biodiversity, because it gives access to information relative to its species name (Biology, Geography, Ecology, Taxonomy, Bibliography, Photography, etc.). These tasks can be assisted in Biodiversity Informatics with a databases management system for storing unstructured information, and a knowledge bases management system for acquiring codified data and knowledge (Conruyt et al. 2008). These modules are part of a Biodiversity Information System that we want to open to 4 5 other platforms such as GBIF and ViBRANT e-infrastructures by using Web Services (Conruyt et al. 2010). 3 METHOD OF KNOWLEDGE DISCOVERY In Grosser et al. (2010), we present an inductive approach for identifying specimen descriptions with domain knowledge. An expert is assisted with computer science decision support tools for helping him to define a descriptive model (ontology), describe related cases, classify these descriptions and let other biologists identify specimens. For these complex tasks, classical discrimination methods developed in the frame of data analysis or machine learning, such as classification (Breiman 1984) or decision trees (Quinlan 1993) or others methods developed in the data mining field such as association rules mining (Piatetsky-Shapiro 1991), Multifactor dimensionality reduction (Zhu and Davidson 2007) or in computer vision (Mansur et al. 2006) are not sufficient. They do not cope with relations between attributes, or with missing data, and they are not also very tolerant to errors in descriptions. The considered problem that we are faced with is to determine the class of a structured description that is partially answered and eventually contains errors, from a referenced case base, this last one being a priori classified by qualified experts in k-classes. The proposed discrimination method proceeds by inference of successive neighbouring. It is inductive, interactive, iterative and semi-directed. It combines inductive techniques of discriminatory variables and neighbours search, with the help of a similarity measure that takes into account the structure (dependencies of variables) and the content (missing and unknown values) of descriptions. It is based on the description space (inferior half-lattice) to guarantee the coherency of descriptions and the computation of generalized descriptions. Finally, the aim of our symbolic and numeric biological objects analysis is to facilitate description and classification (for specialists) and identification (for practitioners) of complex biological specimens on the Internet. 3.1 From data mining for classification and identification of taxa … Statistical work in taxonomy has already been done for these processing tasks since forty years with computer science tools for data analysis of phenetic attributevalue pairs (Sneath and Sokal 1973), i.e. flat attributes. All these numerical approaches led to knowledge discovery from data mining in large databases (Pankhurst 1991), (Fayyad et al. 1996). On the knowledge acquisition process, artificial intelligence techniques (Frames) and object oriented languages (Smalltalk, Java) brought also symbolic representational capacities for modelling more complex specimen descriptions. At the same time, live memory capacities of 4 5 [http://www.gbif.org, visited on 05/10/2012] [http://www.vbrant.eu, visited on 05/10/2012] Noel Conruyt et al. / Knowledge Discovery for Biodiversity: from Data Mining to Sign Management computers has grown fast to store more and more complex object descriptions. For biological objects, Lebbe (1995) mentioned that there was a real need for implementing representation models of the class/instance type, as well as following knowledge acquisition methodologies, in order to process, refine and communicate such evolving knowledge. As a result, we use Object-Attribute-Value representations that give nowadays experts and knowledge engineers the possibility to model morphological descriptions by structuring them in knowledge bases rather than in databases. Knowledge bases are applications of Knowledge Based Management Systems (KBMS) funded on a more qualitative reasoning than databases: they focus on the quality of descriptions with a lot of dependant objects, attributes and values rather than on the quantity of records (for statistical significance) with independent variables. The difference comes also from the fact that this type of knowledge bases enhances the individual level of description (collection specimens) represented by individual cases rather than the conceptual level of domain definition (taxa) represented in a record of a database or a line of a data table. The whole qualitative work that we have done in this context of specimen description is to capture this morphological and individual complexity in an Iterative Knowledge Base System (Conruyt and Grosser 2007). In fact, acquiring structured specimen descriptions is particularly important in zoology because it gives sense to the representation of entities in description trees with body subparts, viewpoints, specializations of objects and values (figure 1). Figure 1. A part of description tree of a specimen of coral (Stylophora subseriata) This anthropomorphic and individual descriptive approach is less used in botanical modelling of specimens because plants are mostly considered as species entities representing populations with independent morphological characters (Operational Taxonomic Units) (Dallwitz et al. 1995). This approach at the species level is 2 covered by another type of KBMS called Xper , which is a management system for storage, editing, analysis and on-line distribution of descriptive data (Ung et al. 2 2010). A description made with Xper is represented by a set of character-states describing one species (figure 2), whereas a description made with IKBS is represented by a set of Object-Attribute-Value describing one specimen (figure 1). These two knowledge acquisition processes are complementary: they have been 6 7 used on both corals and plants classification and identification tasks at University 6 [http://coraux.univ-reunion.fr/, visited on 05/10/2012] Noel Conruyt et al. / Knowledge Discovery for Biodiversity: from Data Mining to Sign Management 2 of Reunion Island. Xper knowledge bases are built in intension (conceptual description), starting from species description found in monographs and applying a top-down approach for identification. IKBS knowledge bases are made in extension (class description), for enhancing the collections of specimens found in museums and applying a bottom-up approach for classification, then identification. Figure 2. A description list of a species of plant (Trochetia granulata) The first method is interesting when experts want to transmit an encyclopaedic knowledge rather quickly and when taxa are well known and stable. Some difficulties reveals once the domain knowledge evolves with new knowledge (a new specimen description, a new character), and that consequently the application needs to maintain its consistency (Pullan et al. 2000). The second method is more time consuming for the expert who has to consider the intra-specific variation of species and decide how many specimen descriptions he will store in the knowledge base. 3.2 … to sign management for definition and description of specimens For our coral species identification process, specimen entities to describe are complex: they are altogether individuals, objects and populations. Colonies that are found in museum collections are composed of corallites (objects) that form the habitat of polyps (individuals). The colony (population of polyps) presents polymorphic characters with conjunction of values (simultaneous presence of states or values). It expresses intra-specific variability that depends on the biological and geographical sampling context of colonies (reef zone, lightning, profoundness, waves, etc.). Le Renard and Conruyt (1994) have specifically designed descriptive logics for making descriptive models of these biological entities. These descriptive logics of the Semiotic Web must not be confused with the description logics of the Semantic Web. Description logics (OWL, RDF) are the propositions of computer scientists that conceive syntactical formal representations for expressing ontologies and descriptions on the business object side of the machine’s interaction. Conversely, descriptive logics are propositions of domain experts for guiding the building of ontologies that are meaningful and understandable by the community. These other significant logics are user-centred; they constitute the usage object side of the interaction with humans. First, it makes sense that a description tree is better suited for visualization with dependent 7 [http://mahots.univ-reunion.fr/, visited on 05/10/2012] Noel Conruyt et al. / Knowledge Discovery for Biodiversity: from Data Mining to Sign Management objects and attributes (the absence of components causes the absence of subcomponents and the irrelevance of its attributes, see figure 1). Second, the knowledge acquisition of these descriptions is made at a multi-level scale with macroscopic and microscopic observations; these viewpoints are sources of error (noisy characters, what to observe, influence of light under the binocular and the microscope) and can lead to a lot of unknown and missing values. So, it is important to establish visually the spatial level of observation by introducing these viewpoints in the descriptive model (ecology, geography, morphology, molecular, etc.). The structuration of the domain could also propose the temporal level of observation of a specimen by introducing life cycles as another dimension of the specimen description (i.e. larval, juvenile, adult). Third, modelling such ontologies closes the world of possible descriptions. The expert defines what is observable in the domain at the root of the description tree. All observed descriptions are then comparable by using the same frame of reference. He can decide to update his descriptive model by changing objects, attributes and values and then iterate on the description process. Moreover, different persons can give their own description of the same specimen in the case base. This was verified as a source of robustness for identification purpose to avoid misinterpretations of end-users (Conruyt and Grosser 2007). The design of a descriptive model with the help of descriptive logics and the possibility to introduce inter-observer variation is what makes sense of creating useful and usable knowledge bases, then used because the service espouses the needs of end-users for teaching and learning observational sciences. This communication process is called signification or construction of the Sign. It is at the root of a new paradigm for utility, usability and effective use of e-services in context called Sign management. A Sign (figure 3) is the interpretation of an Object communicated by a Subject at a given time and place, which takes into account its content (Data, fact, event), its form (Information), and its meaning (Knowledge). Figure 3. What is a Sign? Sign management involves end-users with researchers and entrepreneurs for making them participate to the design of the product/service that they want. The problem that we have to face when making knowledge bases is that their usefulness depends on the correct interpretation of questions that are proposed by the system to obtain a good result of identification. Hence, in order to get a right identification, it is necessary to acquire qualitative descriptions. But these descriptions rely themselves on the observation guide that is proposed by the descriptive model. Moreover, the definition of this ontology is dependent upon easy visualization of descriptive logics. At last, the objects that are part of the descriptive model must be explained in a thesaurus for them to be correctly interpreted by targeted end-users. Behind each Object, there is a Subject that models this Object and gives it an interpretation. In life sciences, these objects can Noel Conruyt et al. / Knowledge Discovery for Biodiversity: from Data Mining to Sign Management be shown to other interpreters and this communication between Subjects is compulsory for sharing interpretations, and not only transmitting knowledge. The challenge of Sign management for KDES is to involve all types of end-users in the co-design of Sign bases for them to be really used (e-service). It is why we emphasize the instantiation of a Living Lab in Teaching and Learning at University of Reunion Island (Conruyt 2010) for sharing interpretations of objects and specimens on the table rather than concepts and taxa in the head of subjects. VALIDATION Our knowledge management methodology has been applied in the domains of Mascarene Islands corals classification and plants identification for several years 8 now. For our experts on corals of the Mascarene archipelago, the iterative process of knowledge acquisition is the most important task for building a robust knowledge base. IKBS brings them a reflexive tool for evaluating their observed descriptions in regards to the evolution of their projections in observable descriptive models. For families of coral knowledge bases, there are more than 30 iterations with IKBS before the expert finds a robust ontology that well explains his interpretations of observations. In Conruyt and Grosser (1999), we analysed the results of the identification process with end-users. We have found that better identification results are produced when we introduce other non-expert descriptions of the same specimens in the initial case base. Consequently, the expert, aware of the importance of transmitting his knowledge to other biologists, postulates more precise and relevant characters that may be easier to observe and/or offer less ambiguous values (easier to interpret) in his descriptive model. For example, he will refine on the basis of mutually exclusive values, monosemic attributes, frames of reference, warning signals, and enhanced illustrations on noisy questions. Misinterpretations between end-users are also at the root of identification problems 9 for plant identification. As we can watch it in the film on the BACOMAR project on an endemic family of the Mascarene Islands (Dombeyoideae), a lot of illustrations are needed to explain the terminology and characters to observe when using Xper2. This is the reason why it is mandatory to pass from knowledge transmission with data mining to know-how sharing with sign management. A collaborative workplace for teaching and learning will be the physical and virtual space for exchanging interpretations of objects between the different actors that want to manage biodiversity. CONCLUSION Since several years, we proposed a knowledge management method based on top-down transmission of experts’ knowledge, i.e. acquisition of a descriptive model and structured cases and then processing of these specimens’ descriptions with decision trees and case-based reasoning. We designed tools to build knowledge bases. But the fact is that Knowledge is transmitted with text, not shared with multimedia, and there is a gap between interpretations of specialists and end-users that prevents these lasts from getting the right identification. Today, we prefer to deliver a sign management method for teaching and learning how to identify these specimens on a Co-Design or Creativity Platform. This bottom-up approach is more pragmatic and user-centred than the previous one because it implicates end-users at will and is open to questions and answers. The role of experts is to show amateurs how to observe, interpret and describe specimens in situ and on the table. The responsibility of biomaticians (biologists and computer scientists) is to store and share experts’ interpretations of their observation, i.e. know-how rather than knowledge in sign bases with multimedia annotations for helping experts to define terms, model their domain, and allow end-users to describe and identify correctly the objects. 8 9 [http://coraux.univ-reunion.fr/spip.php?article101&lang=en, visited on 05/10/2012] [ttp://www.canal-u.tv/video/universite_de_la_reunion_sun/bacomar_mahots.7598] Noel Conruyt et al. / Knowledge Discovery for Biodiversity: from Data Mining to Sign Management ACKNOWLEDGMENTS We are grateful to the PO-FEDER 2-06 ICT measure for supporting our NEXTIC and BACOMAR projects: [http://www.reunioneurope.org/UE_PO-FEDER.asp] and to FP7 VIBRANT project [www.vbrant.eu] for giving us access to the International Biodiversity Informatics community for enhancing new ideas and concepts. REFERENCES Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J., Classification and Regression Trees. Belmont, Calif.: Wadsworth, 1984. Cukier, K., Data, data everywhere, special report, The economist, 2010. Conruyt, N., Grosser, D., Managing complex in knowledge environmental sciences, Proceedings of the International Conference on Case-Based Reasoning (ICCBR'99), 401–414, Munich, July 27-30, 1999. Conruyt, N., Grosser, D., Knowledge management in environmental sciences with IKBS: application to Systematics of Corals of the Mascarene Archipelago, Selected Contributions in Data Analysis and Classification, Series: Studies in Classification, Data Analysis, and Knowledge Organization, Brito, P., Bertrand, P., Cucumel, G., De Carvalho, F. (Eds.), 333–344, Springer, 2007. Conruyt, N., Sébastien, D., Payet, D., Geynet, Y., Caron, D., and Grosser D., Managing Insular Tropical Environment through Data and Knowledge Bases by using Web Services: a case study on Corals and Herbarium of La Réunion Island, 4th International Congress on Environmental Modelling and Software, iEMSs’2008, M. Sànchez-Marrè, J. Béjar, J. Comas, A. Rizzoli and G. Guariso (Eds.), volume 1, 264–271, Barcelona, 2008. Conruyt, N., Sebastien, D., Vignes Lebbe, R., Cosadia, S., Touraivane, Moving from Biodiversity Information Systems to Biodiversity Information Services, L. Maurer, K. Tochtermann (Eds.): Information and Communication Technologies for Biodiversity Conservation and Agriculture, Shaker Verlag, Aachen, 2010. Conruyt, N., The Reunion Living Lab model: towards Sign Management on a Creativity Platform, 1st European Living Labs Summer School, Collaborative Innovation through Living Labs, August 25-27, Paris, France, 2010. Dallwitz, M., Pain, T., Zurcher, E., User’s guide to the DELTA system, a general system for coding taxonomic descriptions, CSIRO Div. Entomol., 1995. Diday, E., New approaches in classification and data analysis, Studies in classification, data analysis, and knowledge organization, NATO Asi Series. Series F, Computer and Systems Sciences, Springer-Verlag, 1994. e-Biosphere, book of abstracts, 2009. [http://www.e-biosphere09.org/] Engeström, Y., Learning by expanding: an activity-theoretical approach to developmental research, Orienta-Konsultit Oy, Helsinki, 1987. Fayyad, U. M., Piatetsky-Shapiro, G., and Smyth, P., From Data Mining to Knowledge Discovery: An Overview. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1–30. Menlo Park, Calif.: AAAI Press, 1996. Frey, J.G., The Semiotic Web: Don’t forget the user amongst all the semantics, Proceedings of the First International Workshop on Understanding Web Evolution, WebEvolve 2008, 27–34, Beijing, China, 2008. Fukuda, K., Computer-enhanced knowledge discovery in environmental science, Phd of the University of Canterbury, 2009. Gibert, K., Spate, J., Sànchez-Marré, M., Athanasiadis, I., and Comas, J., Data Mining for Environmental Systems. In Environmental Modelling, Software and Decision Support: State of the art and new perspective, Jakeman, A., Voinov, A., Rizzoli, A. E., and Chen, S., (Eds.), 205–228, Elsevier, 2008. Gibert, K., Sànchez-Marré, M., and Codina, V., Choosing the Right Data Mining Technique: Classification of Methods and Intelligent Recommendation, 5th International Congress on Environmental Modelling and Software, iEMSs’2010, David A. Swayne, Wanhong Yang, A. A. Voinov, A. Rizzoli, T. Filatova (Eds.), Ottawa, Canada, 2010. [http://www.iemss.org/iemss2010/proceedings.html] Grosser, D., Conruyt, N., and Ralambondrainy, H., Identification with iterative nearest neighbors using domain knowledge, Proceedings of the International Noel Conruyt et al. / Knowledge Discovery for Biodiversity: from Data Mining to Sign Management Congress on Tools for Identifying Biodiversity, Progress and problems, BioIdentify, P. Luigi and R. Vignes Lebbe (eds.), Paris, September 20-22, 2010. Lebbe J., Systématique et informatique. Systématique et biodiversité, Bourgoin T. (Ed), Biosystema, 13:71–79, Paris, 1995. Le Renard, J., and Conruyt, N., On the representation of observational data used for classification and identification of natural objects, Studies in Classification, Data Analysis and Knowledge Organization, New approaches in Classification and Data Analysis, Diday, E., Lechevallier, Y., Shader, M., Bertrand, P., Burtschy, B. (Eds.), Springer Verlag, 308–315, Paris, 1994. Mansur, A., Hossain, M. A., and Kuno, Y., Integration of Multiple Methods for Class and Specific Object Recognition, chap 4, 841–849, Springer LNCS, Berlin, 2006. Matile, L., Tassy, P., Goujet, D., Introduction à la systématique zoologique, In Biosystema, vol. 1, Société Française de Systématique (Eds.), 1987. Mayo, S. J., Allkin, R., Baker, W., Blagoderov, V., Brake, I., Clark, B., Govaerts, R., Godfray, C., Haigh, A., Hand, R., Harman, K., Jackson, M., Kilian, N., Kirkup, D. W., Kitching, I., Knapp, S., Lewis, G. P., Malcolm, P., von Raab-Straube, E., Roberts, D. M., Scoble, M., Simpson, D. A., Smith, C., Smith, V., Villalba, S., Walley, L. and Wilkin, P., Alpha e-taxonomy: responses from the systematics community to the biodiversity crisis, Kew Bulletin, 63:1–16, 2008. Mitchell, T., Machine Learning, McGraw Hill, 1997. Nature 455,1, 2008. [http://www.nature.com/news/specials/bigdata/index.html] Pankhurst, R.J., Practical taxonomic computing. Cambridge University Press, Cambridge, 1–202, 1991. Peirce, C.S., Elements of Logic, In Collected Papers of C. S. Peirce (1839 - 1914), C. H. Hartshone & P. Weiss, Eds. The Belknap Press, Harvard Univ. Press, Cambridge, MA, 1965. Piatetsky-Shapiro, G., Discovery, analysis, and presentation of strong rules, Knowledge Discovery in Databases, AAAI/MIT Press, Cambridge, MA, 1991. Poch, M., Comas, J., Rodríguez-Roda, I., Sànchez-Marrè, M., and Cortés, U., Designing and Building real Environmental Decision Support Systems, Environmental Modelling & Software, 19(9): 857–873, 2004. Pullan, M.R., Watson, M.F., Kennedy, J.B., Raguenaud, C., and Hyam, R., The Prometheus Taxonomic Model: a practical approach to representing multiple classifications, Taxon 49: 55–75, 2000. Quinlan, J. R., C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning, 1993. Sarkar, I.N., Biodiversity Informatics: the emergence of a field, BMC Bioinformatics, 10(14), 2009. Skyrme, D.J., Measuring The Value Of Knowledge, Business Intelligence, 1998. Sneath, E., and Sokal, E., Numerical taxonomy, W.H. Freeman, San Francisco, 1973. Uexküll, J. von, Theoretical Biology. (Transl. by D. L. MacKinnon. International Library of Psychology, Philosophy and Scientific Method.) London: Kegan Paul, Trench, Trubner & Co. xvi+362, 1926. Ung, V., Dubus, G., Zaragueta-Bagils R., and Vignes-Lebbe, R., Xper2: introducing e-taxonomy, Bioinformatics, 26 (5): 703-704, 2010. [http://lis-upmc.snv.jussieu.fr/lis/?q=en/resources/software/xper2]. Winston, J. E., Describing Species: Practical Taxonomic Procedure for Biologists. New York: Columbia University Press, 1999. Zhu, X. and Davidson, I., Knowledge Discovery and Data Mining: Challenges and Realities with Real World Data. Idea Group Inc, 2007.