Download Språkteknologi i industrielle anvendelser

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Språkteknologi i industrielle anvendelser
Or: How we have commercialized linguistic technologies
Jon Atle Gulla
Norwegian University of Science and Technology, Trondheim, Norway
Email: [email protected]
1. Linguistics in search
2. Semantics for interoperability
4. Linguistics in
news reporting
Jon Atle Gulla
3. Ontologies in process mining
Språkteknologi og innovasjon
1
Who am I?

Professor, Information Systems group, IDI/NTNU

Education:
Siv.ing./dr.ing. (information systems, NTH)
Cand.philol. (linguistics, AVH)
MSc (management, London Business School)
Work experience:
Fast Search & Transfer, Munich (linguistics in search)
Norsk Hydro, Brussels (enterprise systems)
GMD, Darmstadt (information retrieval)


Field of research:
 Search technologies
 Semantic Web
 Social Web
 Sentiment analysis and recommendations
Jon Atle Gulla
ICEIS 2008
2
1. The FAST Alltheweb.com site
2000:
 Alltheweb.com was one of the largest search engines on the Internet
 FAST acquired Elexir Sprachtechnologie in Munich
 Intended to add linguistics to search engine
Query
Retrieved documents
Jon Atle Gulla
Språkteknologi og innovasjon
Linguistic Techniques in FAST

Linguistics in search:
Presentational techniques
Transformational techniques
Query
Categorizing techniques
<none>
All
selected
Documents
Search
options
Keyword-based
search
Category-based
selection
Relevant
documents
Transformed
query
Content-based
search
Transformed
documents
Title-based
access
Content-based
access
List of
documents
Improved transparency
Increased semantics
Categories of
documents
Reduced search space
Language identification
Spam detection
Topic categorization
Jon Atle Gulla
Lemmatization
Phrasing
Anti-phrasing
Språkteknologi og innovasjon
Presentation of
document list
Clustering
The FAST Experience

Linguistics a small part of a large system
Linguistics as behind-the-scene technology
Linguistics not a major breakthrough

Linguistics is not easy:




Data-intensive
Only statistical approaches feasible at the time
What happened to FAST?
2003: Internet part sold to Overture (Yahoo)
2009: Enterprise part sold to Microsoft
Jon Atle Gulla
ICEIS 2008
5
2. Semantics in Interoperability

Semantic Web:





Adding semantics to data/services for humans and computers to
communicate better
Ontology: Explicit representation of a shared conceptualization
(domain terminology model)
Semantic markup languages for ontology building (OWL, RDF)
2003: Petromax IIP project for construction of ontology
for the oil & gas sector (based on ISO15926)
2011: EU LinkedDesign project for use of ontologies in
manufacturing processes
Jon Atle Gulla
ICEIS 2008
6
Silly Semantic Conflicts Prevent Data
harmonization
Mean time between failure
1
2
3
4
5
6
7
Even simple terms are
misunderstood
Jon Atle Gulla
8
“A period of time which is the mean period of time
interval between failures”
“The time duration between two consecutive failures of a
repaired item” (International Electrotechnical Vocabulary
online database)
“The expectation of the time between failures”
(International Electrotechnical Vocabulary online
database)
“The expectation of the operating time between failures”
(MIL-HDBK-29612-4)
“Total time duration of operating time between two
consecutive failures of a repaired item” (International
Electrotechnical Vocabulary online database)
“Predicts the average number of hours that an item,
assembly, or piece part will operate before it fails” (Jones,
J. V. Integrated Logistics Support Handbook, McGraw
Hill Inc, 1987)
“For a particular interval, the total functional life of a
population of an item divided by the total number of
failures within the population during the measurement
interval. The definition hoolds for time, rounds, miles,
events, or other measure of life units”. (MIL-PRF-49506,
1996, Performance Specification Logistics Management
Information)
“The average length of time a system or component
works without failure” (MIL-HDBK-29612-4)
ICEIS 2008
7
OWL petroleum ontology
<owl:Class rdf:about="#CHRISTMAS_TREE">
…
<dc:description rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
An artefact that is an assembly of pipes and piping parts, with valves and
associated control equipment that is connected to the top of a wellhead and
is intended for control of fluid from a well.
</dc:description>
<dc:title rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
CHRISTMAS TREE
</dc:title>
…
<rdfs:subClassOf rdf:resource="#ARTEFACT"/>
</owl:Class>
Jon Atle Gulla
ICEIS 2008
8
SemanticWeb Lessons Learned

Data integration and harmonization improved in sector

But:



Demanding and complex technologies
Semantic Web technologies still immature and expensive
So far few commercial solutions using semantic technologies
(Some work on ontology-driven search applications)
Jon Atle Gulla
ICEIS 2008
9
3. Ontologies in Process Mining

Process mining:



Techniques and tools for discovering process flow,
control, data, organizational and social structures from
enterprise systems’ event logs
Dynamic reporting for exposing real business flows
and explaining interesting transaction patterns
Semantic process mining:
Using ontologies to improve the interpretation of event
logs and the construction of business flows
Jon Atle Gulla
ICEIS 2008
10
Semantic Process Mining
Detected process flow
Formal
definition of
process
terminology
Ontology
Jon Atle Gulla
ICEIS 2008
11
Commercialization of Technology


2004: Businesscape founded
Ongoing work on Enterprise Visualization Suite:



Combines two challenging technologies (data mining
and Semantic Web)
Substantial improvement from traditional process
mining (and traditional reporting tools)
However:


Difficult to explain the complexity and capability of solution to
customers
Few customers competent enough to distinguish process
mining from traditional reporting
Jon Atle Gulla
ICEIS 2008
12
4. Linguistics in News Reporting

Semantic approaches to
news reporting:






Extract content from news articles
Validate content of articles
Opinion mining from news articles
and social sites
Model user preferences for news recommendation
Combine/aggregate knowledge from heterogenous sources
Commercial potential uncertain
Jon Atle Gulla
ICEIS 2008
13
Conclusions

Linguistics often a supporting technology
Good linguistic resources tedious and expensive
to develop
Not always easy to justify inclusion of linguistics

Linguistics in our projects:




Enable new services and products
Enhance existing services and products
Jon Atle Gulla
ICEIS 2008
14