Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies Jon Atle Gulla Norwegian University of Science and Technology, Trondheim, Norway Email: [email protected] 1. Linguistics in search 2. Semantics for interoperability 4. Linguistics in news reporting Jon Atle Gulla 3. Ontologies in process mining Språkteknologi og innovasjon 1 Who am I? Professor, Information Systems group, IDI/NTNU Education: Siv.ing./dr.ing. (information systems, NTH) Cand.philol. (linguistics, AVH) MSc (management, London Business School) Work experience: Fast Search & Transfer, Munich (linguistics in search) Norsk Hydro, Brussels (enterprise systems) GMD, Darmstadt (information retrieval) Field of research: Search technologies Semantic Web Social Web Sentiment analysis and recommendations Jon Atle Gulla ICEIS 2008 2 1. The FAST Alltheweb.com site 2000: Alltheweb.com was one of the largest search engines on the Internet FAST acquired Elexir Sprachtechnologie in Munich Intended to add linguistics to search engine Query Retrieved documents Jon Atle Gulla Språkteknologi og innovasjon Linguistic Techniques in FAST Linguistics in search: Presentational techniques Transformational techniques Query Categorizing techniques <none> All selected Documents Search options Keyword-based search Category-based selection Relevant documents Transformed query Content-based search Transformed documents Title-based access Content-based access List of documents Improved transparency Increased semantics Categories of documents Reduced search space Language identification Spam detection Topic categorization Jon Atle Gulla Lemmatization Phrasing Anti-phrasing Språkteknologi og innovasjon Presentation of document list Clustering The FAST Experience Linguistics a small part of a large system Linguistics as behind-the-scene technology Linguistics not a major breakthrough Linguistics is not easy: Data-intensive Only statistical approaches feasible at the time What happened to FAST? 2003: Internet part sold to Overture (Yahoo) 2009: Enterprise part sold to Microsoft Jon Atle Gulla ICEIS 2008 5 2. Semantics in Interoperability Semantic Web: Adding semantics to data/services for humans and computers to communicate better Ontology: Explicit representation of a shared conceptualization (domain terminology model) Semantic markup languages for ontology building (OWL, RDF) 2003: Petromax IIP project for construction of ontology for the oil & gas sector (based on ISO15926) 2011: EU LinkedDesign project for use of ontologies in manufacturing processes Jon Atle Gulla ICEIS 2008 6 Silly Semantic Conflicts Prevent Data harmonization Mean time between failure 1 2 3 4 5 6 7 Even simple terms are misunderstood Jon Atle Gulla 8 “A period of time which is the mean period of time interval between failures” “The time duration between two consecutive failures of a repaired item” (International Electrotechnical Vocabulary online database) “The expectation of the time between failures” (International Electrotechnical Vocabulary online database) “The expectation of the operating time between failures” (MIL-HDBK-29612-4) “Total time duration of operating time between two consecutive failures of a repaired item” (International Electrotechnical Vocabulary online database) “Predicts the average number of hours that an item, assembly, or piece part will operate before it fails” (Jones, J. V. Integrated Logistics Support Handbook, McGraw Hill Inc, 1987) “For a particular interval, the total functional life of a population of an item divided by the total number of failures within the population during the measurement interval. The definition hoolds for time, rounds, miles, events, or other measure of life units”. (MIL-PRF-49506, 1996, Performance Specification Logistics Management Information) “The average length of time a system or component works without failure” (MIL-HDBK-29612-4) ICEIS 2008 7 OWL petroleum ontology <owl:Class rdf:about="#CHRISTMAS_TREE"> … <dc:description rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> An artefact that is an assembly of pipes and piping parts, with valves and associated control equipment that is connected to the top of a wellhead and is intended for control of fluid from a well. </dc:description> <dc:title rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> CHRISTMAS TREE </dc:title> … <rdfs:subClassOf rdf:resource="#ARTEFACT"/> </owl:Class> Jon Atle Gulla ICEIS 2008 8 SemanticWeb Lessons Learned Data integration and harmonization improved in sector But: Demanding and complex technologies Semantic Web technologies still immature and expensive So far few commercial solutions using semantic technologies (Some work on ontology-driven search applications) Jon Atle Gulla ICEIS 2008 9 3. Ontologies in Process Mining Process mining: Techniques and tools for discovering process flow, control, data, organizational and social structures from enterprise systems’ event logs Dynamic reporting for exposing real business flows and explaining interesting transaction patterns Semantic process mining: Using ontologies to improve the interpretation of event logs and the construction of business flows Jon Atle Gulla ICEIS 2008 10 Semantic Process Mining Detected process flow Formal definition of process terminology Ontology Jon Atle Gulla ICEIS 2008 11 Commercialization of Technology 2004: Businesscape founded Ongoing work on Enterprise Visualization Suite: Combines two challenging technologies (data mining and Semantic Web) Substantial improvement from traditional process mining (and traditional reporting tools) However: Difficult to explain the complexity and capability of solution to customers Few customers competent enough to distinguish process mining from traditional reporting Jon Atle Gulla ICEIS 2008 12 4. Linguistics in News Reporting Semantic approaches to news reporting: Extract content from news articles Validate content of articles Opinion mining from news articles and social sites Model user preferences for news recommendation Combine/aggregate knowledge from heterogenous sources Commercial potential uncertain Jon Atle Gulla ICEIS 2008 13 Conclusions Linguistics often a supporting technology Good linguistic resources tedious and expensive to develop Not always easy to justify inclusion of linguistics Linguistics in our projects: Enable new services and products Enhance existing services and products Jon Atle Gulla ICEIS 2008 14