Download Probabilistic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Node of Ranvier wikipedia , lookup

Recurrent neural network wikipedia , lookup

Convolutional neural network wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Pattern recognition wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Transcript
Emerging Architectures
Architecting for Causal Intelligence at Nanoscale
Csaba Andras Moritz
Santosh Khasanvis (PhD student)
2016 Copyright C Andras Moritz and Santosh Khasanvis – All rights reserved
Outline
 An example of unconventional architecture with emerging
nanotechnology
• One of the 5 selected papers for the IEEE Computer “Rebooting
Computing” Special Issue, December 2015
2
Introduction

Emerging opportunities with recent advances in critical research areas
•
•

Personalized medicine, big data analytics, cyber-security, etc.
Cognitive computing frameworks such as Bayesian networks (BNs) may be helpful
Challenges
•
•
High computational complexity; require persistence
Implementation on CMOS Von Neumann microprocessors inefficient
• Layers of abstraction, emulation on deterministic Boolean logic, rigid separation of memory and
computation

Rethink computing from the ground-up leveraging emerging nanotechnology
•
•
•
Architecting with Physical Equivalence – as direct mapping as possible of conceptual
framework to physical layer
Disruptive technology: Potential for orders of magnitude efficiency
This talk: Architecting for probabilistic reasoning with BNs
3
Bayesian Networks (BNs)


Probabilistic modeling of domain knowledge for reasoning under uncertainty
Graphical representation of a domain
•
•
•
Structure: Directed Acyclic Graph; Nodes  domain variables (w/ several states); Edges  relationships/dependence
between variables
Parameters: Conditional probability distributions (or tables; CPTs) for strength of relationship
Inference task: Find probability of unobserved variables given observed quantities (evidence)
Bayesian Networks are graphs, representing domain knowledge using
probabilities and involve probability computations for inference
Inference
C B
D=1 D=0
Evidence
BEL(lung cancer) =
Adapted from Slides by Irina Rish, IBM – “A Tutorial on Inference and Learning in Bayesian Networks”
Available online: http://www.ee.columbia.edu/~vittorio/Lecture12.pdf
4
Overview of Approach: Architecting for Causal Intelligence
Architectural Approach
•
•
Reconfigurable Bayesian Cell Architecture to map
Bayesian Networks
Information Encoding
Probabilities tied to physical layer, encoded in
electrical signals/S-MTJ resistances used in circuits
Circuit Framework
•
•
•
Mixed-signal hybrid circuits (S-MTJ + CMOS)
Direct computation on probabilities (memory in-built)
Bayesian Cells incorporate these circuits
Physical Layer
Non-volatile Straintronic magnetic tunneling junctions
S-MTJ
(S-MTJs) + CMOS
5
Outline
 Technology Overview: Nanoscale Straintronic MTJs (S-MTJs)
 Physically Equivalent Intelligent System for Reasoning with BNs
•
Data Encoding: Mapping probabilities in physical layer
•
Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian
computations
•
Reconfigurable Bayesian Cell Architecture for BN Mapping
 Evaluation
 Summary
6
Non-Volatile Straintronic-MTJ (S-MTJ)
Device Structure Schematic
Circuit Schematic
Vh
Device Characteristics
Vh
Input Voltage vs. Resistance
V1
V2
Rhigh
Rlow
A. K. Biswas, Prof. Bandyopadhay, Prof. Atulasimha,
Virginia Commonwealth Univ.



Voltage-controlled magneto-electric devices
Stacked nanomagnets separated by spacer layer: Resistance depends on relative
magnetization orientation of nanomagnets
Strain-based switching
A. K. Biswas, S. Bandyopadhyay and J. Atulasimha, “Energy-efficient magnetoelastic non-volatile memory,” Appl. Phys. Lett., 104, 232403,
2014.
7
Outline
 Technology Overview: Nanoscale Straintronic MTJs
 Physically Equivalent Intelligent System for Reasoning with BNs
•
Data Encoding: Mapping probabilities physically using S-MTJs
•
Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian
computations
•
Reconfigurable Bayesian Cell Architecture for BN Mapping
 Evaluation
 Summary
8
Encoding Probability

Represented as non-Boolean flat probability vector of spatially distributed digits
p1


1
1
Vh
Vh
0
0
r1 = Rlow

p3
…
pn
Resolution = 1/n; where n: #digits
Physical Equivalence: Direct correlation to S-MTJ resistances and electrical signals
E.g. Using 10 digits, pi∈ {0, 1} ↔ Resistance ri ∈ {ROFF, RON} ↔ Voltages Vi1, Vi2∈ {0V, 40mV}
1
V
p2
r2 =Rlow
1
Vh
Vh
0
r3 = Rlow
0
0
0
Vh
Vh
0
r4 = Rlow
0
r5 = Rhigh
0
Vh
0
r6 = Rhigh
0
Vh
0
r7 = Rhigh
0
Vh
0
r8 = Rhigh
Vh
0
0
P = 0.4
Equivalent Digital
Voltages
r9 = Rhigh r10 = Rhigh
Equivalent
S-MTJ
Resistances
Digit pi related to S-MTJ resistance ri as follows
β and ε are
constants
9
Circuit Framework


Unconventional magneto-electric mixed-signal circuit framework
Physical Equivalence: Directly implements Bayesian computations on probabilities using
underlying circuit principles in analog domain
•

Input: Digital; Output: Analog
Approach
•
•
•
Operating on spatial probability digital vectors that are converted into an analog representation of single
probability value  this is referred to as Probability Composer
Probability Addition, Multiplication Composers internally use Probability Composers
Cascade computational blocks for Bayesian functions: Enabled by Decomposers*
Probabilities
Incorporates S-MTJs
+ CMOS support for
mixed-signal
computations
Probability
* S. Khasanvis, et al., “Self- similar magneto-electric nanocircuit technology for probabilistic inference engines,” IEEE Transactions on Nanotechnology,
Special Issue on Cognitive Computing with Nanotechnology, in press, 2015.
10
Probability Composer Circuit


Needed to convert spatial probability representation (digital)  analog quantity representing total
probability value in current/voltage domain
Parallel topology of S-MTJs; effective resistance encodes probability
•
Individual S-MTJ resistances set using digital voltages as shown earlier
RPC – Effective resistance
ri – Resistance of i-th S-MTJ
P – Encoded probability value


n – No. of digits = No. of S-MTJs
β, ε – S-MTJ device parameters
Probability Composer: Collection of S-MTJs
Non-volatility
Simulated Output Characteristics (HSPICE)
- Probability
value
Resistance read-out
using reference
voltageencoded in 1/RPC
Output Voltage (V)
Vout = in
Iout.R
, RL <<- RRead-out
current/voltage
L
PC
RPC
Output
VREF = 1V
RL = 100KΩ
RPC = 2-4MΩ
Radj = 4MΩ
10 S-MTJs
2 S-MTJs ON
1 S-MTJ ON
All S-MTJs OFF
Input Probability
11
Elementary Arithmetic Composer Circuits
Addition Composer Circuit
Multiplication Composer Circuit
Ohm’s law
Current Addition
Input PA: Voltage domain
Input PB: S-MTJ Resistance
Vout = Iout.RL
Vout
Iout
, Vout = Iout.RL
Simulated Output Characteristics (HSPICE)
Output Voltage (V)
Output Voltage (V)
Simulated Output Characteristics (HSPICE)
Sum of Probabilities
Output Probability
12
Combining Elementary Composers: Add-Multiply

Example: Pout = Pa.Pb + Pc.Pd;
typical in BN inference computations
ADD{ MUL(Pa, Pb) , MUL(Pc, Pd)}; two levels of hierarchical instantiation
Elementary Composers = MUL, arranged in topology self-similar to ADD (Dominator Composer)
Add-Multiply Composer Circuit
Simulated Output Characteristics
(HSPICE)
Output Voltage (V)

Output Probability
13
Outline
 Technology Overview: Nanoscale Straintronic MTJs
 Physically Equivalent Intelligent System for Reasoning with BNs
•
Data Encoding: Mapping probabilities in physical layer
•
Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian
computations
• Elementary Arithmetic Composers
• Inference in BNs: Belief Propagation Algorithm Overview
• Composers for BN Inference Operations
•
Reconfigurable Bayesian Cell Architecture for BN Mapping
 Evaluation
 Summary
14
Bayesian Inference: Pearl’s Belief Propagation


Compute belief P(Xi I E) based on evidence E using local
computations and message propagation
Each node maintains
•
•
•

Repeated
application of
Bayes Rule
Local node computations using messages from neighbors
•
•
•

Conditional probability tables (CPTs): CPTjk(Xi) = P(Xi=j | Pa(Xi)=k)
Likelihood λ(Xi) = P(E-|Xi) and Prior π(Xi) = P(Xi|E+) Vectors
Belief Vector BEL(Xi) = P(Xi I E)
E+
λ messages from child to parent to compute λ(Xi)
π messages from parent to child nodes for π(Xi)
BEL(Xi) = λ(Xi) . π(Xi)
Applicable to trees and poly-trees
E-
J. Pearl, Probabilistic reasoning in intelligent systems: Networks of plausible inference, San Francisco, CA, USA: Morgan
Kaufmann Publishers Inc., 1988.
15
Composer Circuits for BN Inference Operations






Uses either elementary arithmetic composers or combines
them
Likelihood Estimation
Prior Estimation
Belief Update
Diagnostic Support to Parent
Predictive Support to Child nodes
Add-Multiply Composers for
Prior Estimation, Diagnostic Support
Multiplication Composers for Likelihood Estimation, Belief Update, Predictive
Support
16
Outline
 Technology Overview: Nanoscale Straintronic MTJs
 Physically Equivalent Intelligent System for Reasoning with BNs
•
Data Encoding: Mapping probabilities in physical layer
•
Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian
computations
•
Reconfigurable Bayesian Cell Architecture for BN Mapping
 Evaluation
 Summary
17
Physically Equivalent Architecture for BNs



Physical Equivalence: Every node in DAG mapped to a Bayesian Cell in H/W; incorporates non-volatile
Arithmetic Composers for Bayesian computations
Reconfigurable links using Switch Boxes (similar to FPGAs) to map any BN structure
Persistence in configuration + computation through non-volatile Composers; no need for external memory
18
Outline
 Technology Overview: Nanoscale Straintronic MTJs
vs.for Reasoning with BNs
 Physically Equivalent Intelligent System
 Evaluation
•
Methodology
•
System-level Evaluation for BN Inference using Physically Equivalent Framework
•
Analytical Modeling of BNs Inference Performance on CMOS Multi-core Processors
and Comparison
 Summary
19
Example Bayesian Graph to Estimate System-level Performance

Assuming a balanced binary tree structure for system level performance estimation
•
•

Each parent has 2 child nodes; each node has 4 states (applications like gene expression networks require 3*)
All leaf nodes are treated as evidence variables
Total number of nodes scaled from ~100 to ~1 million
Root:
Level n-1
Level n-2
Level n-3
Level 1
Level 0
(Leaf Nodes)


BN inference execution time estimated based on critical path delay (TBC) in each BC and Switch Box
communication delay (TSB) for worst-case
For Bayesian Network with n levels; (active nodes in a time-step operate in parallel)
Texec = (2n-1) x TBC + Tcomm
* N. Friedman, M. Linial, I. Nachman, and D. Pe'er, “Using Bayesian networks to analyze expression data,” J. Comput. Biol., 7(3-4), pp. 601-20, 2000.
20
Evaluation Methodology for BN Composer Circuits

Delay, power measured using HSPICE
simulations
•

HSPICE behavioral macromodels built for S-MTJs
Accounting for S-MTJ spacing to minimize magnetic
interactions
Dipole Coupling
S-MTJ
500nm
Low coupling energy implies
minimal magnetic interaction
500nm
S-MTJ Cell
Area
Collaboration: Data provided by VCU group (Prof.
Atulasimha, Prof. Bandyopadhay)
Worst-case
Power
(μW)
Likelihood
Estimation
(Multiplication
Composersx4)
144
20
4.57
Belief Update
(Multiplication
Composersx4)
144
20
4.57
Prior Estimation
(Add-multiply
Composersx4)
137
50
Diagnostic
Support
(Add-multiply
Composersx4)
137
50
11.24
Prior Support
(Multiplication
144
Composersx8) S-MTJ
40
9.14
132.9
240
11.37
100
95.4
89.32
10
398.8
0.85
Decomposer
(x60)
CMOS Op-Amp
(x176)
S-MTJ
S-MTJ Center-Center Distance
Area (μm2)
Module
Area determined by number of S-MTJs + CMOS
support
•
Critical
Path
Delay
(ns)
Switch Box
11.24
21
Path Delays within Bayesian Cell for Inference
4
A
λ To Parent
1
BEL
Node X
2
π From
Parent
X
4
λ From Child
π To Child
3
Y
1
2
3
All possible paths for information flow
λ From Child
π To Child
Z
Path Label
Total Path Delay (ns)
1
746.8
2
754.2
3
998.2
4
991.2
Worst-case Delay TBC
22
Implementation of BNs on Multi-core Processors


Hardware platform: Multi-core processor (100 cores) based on TILEPro from Tilera Corp.*
Lower bound execution time analytically estimated based on computation + memory
requirements for inference using Belief Propagation algorithm
•

Maximum idealized parallelism and operation cost, no network contention, no synchronization cost
Power and area from specifications
Architecture of a Tilera 100-Core Processor
* “Tile Processor Architecture Overview for the TILEPro Series”, Doc No. UG120, Feb. 2013, Tilera Corporation.
* C. Ramey, “TILE-Gx100 manycore processor: Acceleration interfaces and architecture”, Aug. 2011, Tilera Corporation.
23
Comparison vs. Multi-Core Processors
Log-scale
Delay Comparison for Bayesian Inference
Speedup over 100-Core Processors
12x
80x
8686x
(PEAR)
24
Comparison vs. Multi-Core Processors (contd.)
Power Comparison
Log-Scale
4788x Efficiency
(Power x Delay)
Area Comparison
Log-Scale
25
Summary

Physically equivalent intelligent system for probabilistic reasoning using Bayesian
Networks (BNs)
•
•
•
Architected from ground-up and enabled by emerging nanotechnology
Probability encoding based mixed-signal magneto-electric circuit framework
Reconfigurable Bayesian Cell architecture

Up to 8686x inference speed-up, 4788x lower energy for BNs with ~1M nodes for
resolution 0.1 vs. 100-core processor

Reasoning/learning tasks on complex problems with million variables made feasible

Embed real-time intelligence capabilities at smaller scale (100s of variables)
everywhere
26
Thank you
Acknowledgements


Collaboration with Prof. Atulasimha, Prof. Bandyopadhyay, VCU
Sponsored by National Science Foundation (CCF-1407906, ECCS-1124714, CCF-1216614,
CCF-1253370)
27