Download Application of AI- and ML-Techniques to FT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression programming wikipedia , lookup

Catastrophic interference wikipedia , lookup

Minimax wikipedia , lookup

Genetic algorithm wikipedia , lookup

Rete algorithm wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Transcript
CS717
Application of AI- and ML-Techniques
to Fault-Tolerant Routing
Arjun Rao
CS 717
November 16 and 18, 2004
CS717
Papers Covered
• [1] Loh, Peter K.K., “Artificial Intelligence
Search Techniques as Fault-Tolerant Routing
Strategies”
• [2] Loh, Shaw., “A Genetic-Based FaultTolerant Routing Strategy for Multiprocessor
Networks”
CS717
Papers Covered (cont.)
• [3] Loh, Schröder, Hsu., “Fault-Tolerant
Routing on Complete Josephus Cubes” (not
AI-related but interesting nevertheless)
If time permits, also:
• [4] Bradley, Tyrrell., “Immunotronics:
Hardware Fault Tolerance Inspired by the
Immune System”
CS717
The Problem of Routing
• Communication between nodes
– Servers
– Microprocessors
• Desire shortest, most efficient paths
– Multiprocessor network topologies, e.g.
hypercubes, Josephus cubes, etc.
• Desire availability of paths
– What to do when links/nodes fail?
– How to remain (close to) optimal?
CS717
Intro to Fault-Tolerant Routing
• Current algorithms adaptive but non-minimal
• Misrouting
• Routing strategies tied to specific topologies
– k-ary, n-cubes, meshes, etc.: Regular structures
and symmetry
– Constrained by fault number and types
• More general strategies vulnerable to
deadlock and livelock
CS717
“Turn Model” [Glass, Ni]
• Widest application scope
– k-ary, n-cubes, nD-meshes, torus geometries, etc.
• “West-First” algorithm (on 2D-mesh)
– Messages prevented from turning “west” again
– Prevents cyclesdeadlocks
– Routing along virtual channels in strictly
decreasing or increasing order
CS717
Turn Model and Channel Numbering
CS717
Turn Model (cont.)
• Three examples of
routing
• “F” = FAILURE
• Full adaptation w/o
deadlock and livelock
requires more global
infomore overhead
CS717
AI Search Techniques
• Arbitrary topology  Search space
• Search space  Search tree(s)
• Adaptive but still non-minimal
• Characteristic recursion impractical on
loosely-coupled, distributed network
CS717
AI Logical Abstraction
• Abstraction:
–
–
–
–
S: Problem space
O: Set of objectives
P: Search paths
S = (O, P), where oi  O and pj  P, each pj
connects tuple (ok, ol), k  l
Abstraction used to model…
CS717
Multiprocessor Network w/ Generic
Topology
• Network
– N: Nodes
– L: Links between nodes
– G = (N, L), where ni  N and lj  L, each lj
connects tuple (nk, nl), k  l
• Objective  Node
• Search path  Link
CS717
Abstract Routing Model
• Search :
– (os, ot): S x S  S*, where S = (O, P) and S* =
(O*, P*)
– ox,oy  O and ox,oy  O*  Successful search
– ox,oy  O and ox  O*, oy  O*  Unsuccessful 
• Routing attempt R:
– R(ns, nd): G x G  G*, where G = (N, L) and G* =
(N*, L*)
– ni,nj  N and ni,nj  N*  Complete route
– ni,nj  N and ni  N*, nj  N*  Incomplete 
CS717
Routing Analogy
• AI search equivalent to routing attempt
• Successful search  Route between source
and destination nodes
• Unsuccessful search  Incomplete route to
destination
CS717
Caveats of Analogy
• No specific search algorithm  No routing
strategy
• No optimality constraints
• Nothing about deadlocks/livelocks
• Nothing about fault tolerance!!
CS717
Fault-Tolerant Routing Model
• Model considers two aspects:
– Routing system configuration
• Must be generic enough!
– Message propagation protocols and policies
• Following slides introduce what is needed for
AI searches (w/ physical message
backtracking)
CS717
FT Routing Model (cont.)
CS717
FT Routing Model (cont.)
• Eager readership of input messages
• Single input buffer to avoid polling
• Multiple output buffers to accommodate different
delivery rates
• Router process:
– AI/FT routing strategy implemented here
– Physical message backtracking  Increased message sizes
– Increased message sizes/overhead  Requires
communications router at each node
CS717
Communications Router
CS717
Communications Router (cont.)
• Communication router constitutes router
process and connections
• Main components: LCM and CP
• ROM: Stores link management and routing
software
• RAM: Stores routing table, link status table,
associated link lists
CS717
CR Data Structure: Routing Table
CS717
CR Routing Table
• For each node, up to n links
• For each link:
– Connected with status OK and node ID of
neighbor
– Not connected with status NC and node ID –1
• Link fault represented by timeout:
– Status reset to NC
• Processor fault represented by timeouts in
neighbors
CS717
CR Data Structures: Link Status Table,
Lists
CS717
Message Packets
• Six fields:
– Router Control (4 bits): Type of message,
including NORMAL and BACKTRACK
– Destination Node ID (10 bits): Supports network
of size up to 1024 nodes
– Pending Nodes (20 bytes): Stack of node IDs that
may receive packet but have not yet
– Traversed Nodes (20 bytes): Stack of nodes
traversed, with most recent on top
CS717
Message Packets (cont.)
– Traversed Nodes Index (10 bits): Index to
previous traversed nodes field. Supports
simulation of physical message backtracking
– Data Field (n-bit pointer): Points to information
content of packet
CS717
(Finally) AI Search Strategies
• Brute Force:
– Depth-First Search
– Random Climbing
• Heuristic:
– Hill Climbing
– Best-First Search
– A*
CS717
AI Search Strategies (cont.)
• In presence of network faults:
– Prevent cycles  No deadlocks
– Prevent more than two traversals of nodes/links 
No livelocks and necessary for AI searches
• Adaptations of search algorithms
• Problems:
– Recursion? Nope (PMB)
– Overhead? Fixed (Well, mostly…)
CS717
Common Beginning
Extracts header and disassembles it
IF Destination Node is reached, pass packet to host
processor
ELSE
IF Router Control is BACKTRACK
IF Pending Nodes top node is directly linked
Route packet to that node
Set Router Control to NORMAL
ELSE
Backtrack packet to previous node in traversed
Pop current node ID from Pending Nodes
Push current node ID onto Traversed Nodes
CS717
Depth-First Search
• Travel as far as possible
– Do not consider alternative paths just yet
• If fault or dead-end, backtrack to most recent
possible path
CS717
DFS (cont.)
Following common beginning:
Look for directly linked successor nodes
IF they are already traversed, ignore
ELSE IF they are in Pending Nodes, ignore
ELSE push them onto Pending Nodes
Read top node of Pending Nodes
IF directly linked (no fault), route packet to it
ELSE Set BACKTRACK and route to last traversed
node
END
CS717
DFS Example
CS717
DFS Example (cont.)
CS717
Random Climbing
Following the common beginning:
…
ELSE
Select a successor node randomly
Push unselected successor nodes onto
Pending Nodes
…
CS717
Hill Climbing
• Heuristic: Estimated remaining distance
Following common beginning:
…
ELSE
Sort successor nodes according to est. remaining
distance
Push sorted nodes onto Pending Nodes
…
CS717
Best-First Search
• Resumes partial routes not previously
considered
• Looks at immediate neighbors, neighbors of
predecessors
– Sorts by est. remaining distance
• Leads to non-minimal routes!
CS717
BFS (cont.)
…
ELSE
Push (directly linked successor nodes) onto
Pending Nodes
Sort Pending Nodes according to est. remaining
distance
…
CS717
A*
• Two heuristics:
– Estimated remaining distance: h
– Path length traversed: g
• Partial paths sorted by f = g + h
• When no faults, always finds minimal route
CS717
A* (cont.)
After current ID processing:
Record path length traversed, g
…
ELSE
Calculate and store f for new successor
nodes
Push them onto Pending Nodes sorted by f
…
CS717
Performance Testing
• Simulated 125-node multiprocessor network
• Max 8 links per node (maps to many
topologies)
• Faulty links and processors
– Pre-specified or dynamically generated
• Testing:
– Messages between every pair of nodes
– 20 trials at 0%, 5%, 10%, 15%, 20% faulty links
– 125 x 125 x 20 x 6 = 1,875,000 tests (??)
CS717
Test Results
• As faults increase, heuristic strategies fair
better (esp. > 15%)
• A* best search technique but slow
• Hill climbing and BFS do not consider nodes
traversed
– Hill climbing considers only immediate neighbors
CS717
Test Results (cont.)
CS717
Main Point
Using AI search techniques, we abstract from
routing in networks to searching in trees
(topology-independent, quantity and type of
faults irrelevant)
CS717
Next Paper
• [1] Loh, Peter K.K., “Artificial Intelligence
Search Techniques as Fault-Tolerant Routing
Strategies”
• [2] Loh, Shaw., “A Genetic-Based FaultTolerant Routing Strategy for Multiprocessor
Networks”
CS717
Our Little Problem…
• AI search techniques topology- and fault-type
independent…
• …but non-minimal routes utilized
• Follow-up work shows how genetic
algorithms (combined with heuristics) can find
minimal routes in presence of network faults
CS717
Genetic Algorithms: Overview
• Optimization strategy
• Population of potential solutions evolve over
series of generations
• Each element of population is chromosome;
each unit of chromosome is gene
• Chromosomes undergo crossover and
mutation
• Most fit chromosomes selected for next
generation, based upon fitness function
CS717
Abstract Model
• Same as before (including definitions of S
and G)
• Pure abstraction suffers from same caveats
as before
• Basic idea: Instead of AI search for adaptive
route, optimize over population of routes to
find best
CS717
Message Packets
• Simplified version:
CS717
Chromosome
• Route  Chromosome
• Node on route  Gene in chromosome
• Length of route  Size of chromosome
– Chromosome size directly reflects routing
performance!
• Distance traversed basis of fitness
CS717
Population Creation
CS717
Mutation and Crossover
• Mutation: Swap and/or shift
• Normal crossover destroys routes, messes
with source and destination; problem w/
different lengths
– Use one-point random crossover
CS717
Fitness Function
• F = (Dmax – Droute) / Dmax + 
– Dmax: Maximum distance between source and
destination
– Droute: Distance traveled by specific route
– : Predefined value to ensure non-zero fitness
• Higher value  More fit
CS717
Selection Scheme
• Roulette Wheel
– Sum of fitness values * random value from [0,1]
– Select chromosomes with fitness greater than product
• Tournament Selection
– Most fit chromosomes selected
• Stochastic Remainder
– Probabilities used to select route
• Which scheme has best performance selecting
optimal route?
CS717
Reroute
CS717
Genetic Hybrid Algorithm