Download Data Mining and Intelligent Agents

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Mining
and
Intelligent Agents
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Outline
• Data Mining Overview
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Proliferation of Data
• Indexes
–
–
–
–
–
–
–
–
–
PAC-INFO
Public Records Online
Florida gun licenses (look up John Smith)
Lee County property records (look up John F. Smith)
Death index
Investigative Resources National
STR82U
Allegheny County Property
Online Public Records (fosson.com)
• Pay services
– uspublicinfo.com
– USsearch.com
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Data Mining
“The key in business is to know something that
nobody else knows.”
— Aristotle Onassis
“To understand is to perceive patterns.”
— Sir Isaiah Berlin
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
PHOTO: LUCINDA DOUGLAS-MENZIES
PHOTO: HULTON-DEUTSCH COLL
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Data Mining
• Extracting previously unknown relationships from
large datasets
– discover trends, relationships, dependencies
– make predictions
– target customers
• In eCommerce, data comes from
–
–
–
–
–
–
customers themselves
cookies
external databases
data matching
DoubleClick, etc.
Digital rights management tools (what we read and how
much)
– library records
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Taxonomy of Data Mining Methods
Data Mining Methods
Predictive
Modeling
• Decision Trees
• Neural Networks
• Naive Bayesian
• Branching criteria
Database
Segmentation
Link
Analysis
Text
Mining
Deviation
Detection
Semantic Maps
• Clustering
• K-Means
Rule Associa tion
Visualization
SOURCE: WELGE & REINCKE, NCSA
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Predictive Modeling
• Objective: use data about the past to predict future
behavior
• Sample problems:
– Will this (new) customer pay his bill on time? (classification)
– What will the Dow-Jones Industrial Average be on October
15? (prediction)
• Technique: supervised learning
– decision trees
– neural networks
– naive Bayesian
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Neural Networks
Networks of processing units called neurons. This is the j th neuron:
Neuron computes a linear
function of the inputs
n INPUTS
x1, …, xn
1 OUTPUT yj
depends only on
the linear function
Neurons are
easy to simulate
n WEIGHTS
w1j , …, wnj
SOURCE: CONSTRUCTING INTELLIGENT AGENTS WITH JAVA
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Neural Networks
Learning through back-propagation
1. Network is trained by giving it many inputs whose output is known
2. Deviation is “fed back” to the neurons to adjust their weights
3. Network is then ready for live data
DEVIATION
SOURCE: CONSTRUCTING INTELLIGENT AGENTS WITH JAVA
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Neural Network Demos
• Demo: Notre Dame football
• Financial applications:
– Churning: are trades being instituted just to generate
commissions?
– Fraud detection in credit card transactions
– Kiting: isolate float on uncollected funds
– Money Laundering: detect suspicious money transactions
(US Treasury's Financial Crimes Enforcement Network)
• Insurance applications:
– Auto Insurance: detect a group of people who stage
accidents to collect on insurance
– Medical Insurance: detect professional patients and ring of
doctors and ring of references
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Database Segmentation (Clustering)
• “The art of finding groups in data” Kaufman &
Rousseeuw
• Objective: gather items from a database into sets
according to (unknown) common characteristics
• Much more difficult than classification since the
classes are not known in advance (no training)
• Examples:
– Demographic patterns
– Topic detection (words about the topic often occur together)
• Technique: unsupervised learning
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Clustering Example
• Are there natural clusters in the data (36,10), (12,8),
(38,42), (13,6), (36,38), (16,9), (40,36), (35,19),
(37,7), (39,8)?
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Clustering
• K-means algorithm
• To divide a set into K clusters
• Pick K points at random. Use them to divide the set
into K clusters based on nearest distance
• Loop:
– Find the mean of each cluster. Move the point there.
– Redefine the clusters.
– If no point changes cluster, done
• K-means demo
• Agglomerative clustering: start with N clusters & merge
• Agglomerative clustering demo
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Rule Association Demos
• Magnum Opus (RuleQuest, free download)
• See5/C5.0 (RuleQuest, free download)
• Cubist numerical rule finder (RuleQuest, free
download)
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Text Mining
• Objective: discover relationships among people &
things from their appearance in text
• Generation of “knowledge map”, a graph
representing terms/topics and their relationships
• SemioMap demo (Semio Corp.)
–
–
–
–
Phrase extraction
Concept clustering (through co-occurrence) not by document
Graphic navigation (link means concepts co-occur)
Processing time: 90 minutes per gigabyte
• Semio Taxonomy available for legal documents
• Automatic summarization (Extractor demo)
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Visualization
• Objective: produce a graphic view of data so it
become understandable to humans
• Hyperbolic trees (Inxight.com) grocery, UTC
• Table Lens (inxight.com)
• SpotFire (free download from www.spotfire.com)
• OpenViz
• Internetivity
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Intelligent Agents
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Outline
• What is an agent?
• Why do we need them?
– Important tasks are too time-consuming, not economical
– Too much information (filtering)
• What kinds of agents are there?
• How do they work?
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
What is an Agent?
• In real life, a person who acts on your behalf
• In ecommerce, a computer program that acts on your
behalf
• Agents often perform tasks usually associated with
humans
• But: there is no magic
• An agent is just a computer program
• Synonyms: bot, daemon (a supernatural being of Greek
mythology intermediate between gods and men)
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Sample Shopping Agent
0
User
Communicate
needs
SOURCE: DAVID ELLIMAN
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Agent Properties
• Autonomous
– Acts by itself (independent of user)
• Reactive
– Responds to its environment, initiates actions
• Communicative
– Communicates with people and other agents
• Goal-driven
– Acts until it accomplishes its purpose or learns that it can’t
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Examples of Agents
• Search agents
– Find web pages. FastSearch, Google,
NorthernLight
– Find search engines. Searchenginecollosus.com
• Metacrawlers
– Search multiple indexes. LEXIBOT
• Text agents
– Summarization. Extractor demo
• News agents
– Locate relevant news stories. TotalNEWS
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Information Agents
• Monitors, update agents
– Notify user when events occur, e.g. page is modified
Mind-it , javElink, CyberAlert (company news), Enfish tracker
(tracks email, web pages, files) EoMonitor, MorningPaper
– eWatch, CyberAlert
• Web intelligence. NetCurrents
• Addresses, phone numbers, reverse directories
– AT&T AnyWho, BigYellow, InfoSpace (by address!)
• Stock bots (financial information, charts, news)
– StockPoint, StreetEYE, Yahoo
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Shopping Agents
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Shopping Agents
• Price bots
– BestBookBuys, BottomDollar, PriceGrabber, StoreRunner
(CBS)
• Sale locators
– ShoppingList (brick & mortar), ValueFind
• Auction notification
– AuctionWatch, BidFind
• Browser buttons
– ValueSpeed
• Recommenders
– ActiveBuyersGuide, ProductReviewNet
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Travel Agents
• Information about flights, trains, purchase tickets
– Orbot, USAirways, Travelocity
• Discount Hotels
– hoteldiscount!com
• Price auctions
• Where is the human travel agent going?
• Airplanes in flight
– FlightTracker
– JFK Tower audio
• CMU Bot List
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Agent Technologies
•
•
•
•
Table-driven (data lookup)
Rule-based
Goal-directed
Utility-based
inputs
“
20-751 ECOMMERCE TECHNOLOGY
”
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Rule-Based Agents
Condition-action rule:
if car-in-front-is-braking then start-braking
SOURCE: ANDREAS GEYER-SCHULZ
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Rule-Based Agents
• Businessmen are not programmers
• Need natural rule specification language + rule
follower
• Need memory modified and accessed by rules
• Example: classifying a vehicle
IF wheels < 1 THEN vehicle = NOT land_vehicle
IF wheels == 1 THEN vehicle = unicycle
IF wheels > 2 AND wheels < 4 THEN vehicle = cycle
IF wheels > 4 THEN vehicle = truck
IF wheels > 3 AND weight < 2400 AND length < 8
THEN vehicle = car
;logic incomplete here
IF wheels > 12 THEN vehicle = semi
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Business Rules
• Grocery store example
IF inBasket(french_fries) AND NOT asked(ketchup)
THEN ask(ketchup)
; ask “Would you care for ketchup to go
;
with your french fries?”
• Rules that learn
IF inBasket(french_fries)
THEN prob(want_ketchup) = SQL( <sql_query> )
; query might involve customer data and
; demographics
IF prob(want_ketchup) > 0.3 AND NOT asked(ketchup)
THEN ask(ketchup)
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Goal-Directed Agents
Actions are evaluated
with respect to goals
Will this action get me closer to the goal state?
SOURCE: ANDREAS GEYER-SCHULZ
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Static versus Mobile Agents
Static Agent
System
Mobile Agent
System
SOURCE: MITSUBISHI
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Cooperating Agents
SOURCE: PETER FINGAR
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Applications
• Intelligent freight planning
• TeleTruck DFKI GmbH Saarbrücken
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
SOURCE: K. FISCHER
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
SOURCE: K. FISCHER
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
SOURCE: K. FISCHER
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Key Takeaways
• Agents are the wave of the future
– laziness + information overload = agents
• Agent systems are object-oriented and distributed
• Agents are mobile
• Agents negotiate with and talk to other agents
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS
Q&A
20-751 ECOMMERCE TECHNOLOGY
SUMMER 2001
COPYRIGHT © 2001 MICHAEL I. SHAMOS