* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Probabilistic
Survey
Document related concepts
Transcript
Emerging Architectures
Architecting for Causal Intelligence at Nanoscale
Csaba Andras Moritz
Santosh Khasanvis (PhD student)
2016 Copyright C Andras Moritz and Santosh Khasanvis – All rights reserved
Outline
An example of unconventional architecture with emerging
nanotechnology
• One of the 5 selected papers for the IEEE Computer “Rebooting
Computing” Special Issue, December 2015
2
Introduction
Emerging opportunities with recent advances in critical research areas
•
•
Personalized medicine, big data analytics, cyber-security, etc.
Cognitive computing frameworks such as Bayesian networks (BNs) may be helpful
Challenges
•
•
High computational complexity; require persistence
Implementation on CMOS Von Neumann microprocessors inefficient
• Layers of abstraction, emulation on deterministic Boolean logic, rigid separation of memory and
computation
Rethink computing from the ground-up leveraging emerging nanotechnology
•
•
•
Architecting with Physical Equivalence – as direct mapping as possible of conceptual
framework to physical layer
Disruptive technology: Potential for orders of magnitude efficiency
This talk: Architecting for probabilistic reasoning with BNs
3
Bayesian Networks (BNs)
Probabilistic modeling of domain knowledge for reasoning under uncertainty
Graphical representation of a domain
•
•
•
Structure: Directed Acyclic Graph; Nodes domain variables (w/ several states); Edges relationships/dependence
between variables
Parameters: Conditional probability distributions (or tables; CPTs) for strength of relationship
Inference task: Find probability of unobserved variables given observed quantities (evidence)
Bayesian Networks are graphs, representing domain knowledge using
probabilities and involve probability computations for inference
Inference
C B
D=1 D=0
Evidence
BEL(lung cancer) =
Adapted from Slides by Irina Rish, IBM – “A Tutorial on Inference and Learning in Bayesian Networks”
Available online: http://www.ee.columbia.edu/~vittorio/Lecture12.pdf
4
Overview of Approach: Architecting for Causal Intelligence
Architectural Approach
•
•
Reconfigurable Bayesian Cell Architecture to map
Bayesian Networks
Information Encoding
Probabilities tied to physical layer, encoded in
electrical signals/S-MTJ resistances used in circuits
Circuit Framework
•
•
•
Mixed-signal hybrid circuits (S-MTJ + CMOS)
Direct computation on probabilities (memory in-built)
Bayesian Cells incorporate these circuits
Physical Layer
Non-volatile Straintronic magnetic tunneling junctions
S-MTJ
(S-MTJs) + CMOS
5
Outline
Technology Overview: Nanoscale Straintronic MTJs (S-MTJs)
Physically Equivalent Intelligent System for Reasoning with BNs
•
Data Encoding: Mapping probabilities in physical layer
•
Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian
computations
•
Reconfigurable Bayesian Cell Architecture for BN Mapping
Evaluation
Summary
6
Non-Volatile Straintronic-MTJ (S-MTJ)
Device Structure Schematic
Circuit Schematic
Vh
Device Characteristics
Vh
Input Voltage vs. Resistance
V1
V2
Rhigh
Rlow
A. K. Biswas, Prof. Bandyopadhay, Prof. Atulasimha,
Virginia Commonwealth Univ.
Voltage-controlled magneto-electric devices
Stacked nanomagnets separated by spacer layer: Resistance depends on relative
magnetization orientation of nanomagnets
Strain-based switching
A. K. Biswas, S. Bandyopadhyay and J. Atulasimha, “Energy-efficient magnetoelastic non-volatile memory,” Appl. Phys. Lett., 104, 232403,
2014.
7
Outline
Technology Overview: Nanoscale Straintronic MTJs
Physically Equivalent Intelligent System for Reasoning with BNs
•
Data Encoding: Mapping probabilities physically using S-MTJs
•
Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian
computations
•
Reconfigurable Bayesian Cell Architecture for BN Mapping
Evaluation
Summary
8
Encoding Probability
Represented as non-Boolean flat probability vector of spatially distributed digits
p1
1
1
Vh
Vh
0
0
r1 = Rlow
p3
…
pn
Resolution = 1/n; where n: #digits
Physical Equivalence: Direct correlation to S-MTJ resistances and electrical signals
E.g. Using 10 digits, pi∈ {0, 1} ↔ Resistance ri ∈ {ROFF, RON} ↔ Voltages Vi1, Vi2∈ {0V, 40mV}
1
V
p2
r2 =Rlow
1
Vh
Vh
0
r3 = Rlow
0
0
0
Vh
Vh
0
r4 = Rlow
0
r5 = Rhigh
0
Vh
0
r6 = Rhigh
0
Vh
0
r7 = Rhigh
0
Vh
0
r8 = Rhigh
Vh
0
0
P = 0.4
Equivalent Digital
Voltages
r9 = Rhigh r10 = Rhigh
Equivalent
S-MTJ
Resistances
Digit pi related to S-MTJ resistance ri as follows
β and ε are
constants
9
Circuit Framework
Unconventional magneto-electric mixed-signal circuit framework
Physical Equivalence: Directly implements Bayesian computations on probabilities using
underlying circuit principles in analog domain
•
Input: Digital; Output: Analog
Approach
•
•
•
Operating on spatial probability digital vectors that are converted into an analog representation of single
probability value this is referred to as Probability Composer
Probability Addition, Multiplication Composers internally use Probability Composers
Cascade computational blocks for Bayesian functions: Enabled by Decomposers*
Probabilities
Incorporates S-MTJs
+ CMOS support for
mixed-signal
computations
Probability
* S. Khasanvis, et al., “Self- similar magneto-electric nanocircuit technology for probabilistic inference engines,” IEEE Transactions on Nanotechnology,
Special Issue on Cognitive Computing with Nanotechnology, in press, 2015.
10
Probability Composer Circuit
Needed to convert spatial probability representation (digital) analog quantity representing total
probability value in current/voltage domain
Parallel topology of S-MTJs; effective resistance encodes probability
•
Individual S-MTJ resistances set using digital voltages as shown earlier
RPC – Effective resistance
ri – Resistance of i-th S-MTJ
P – Encoded probability value
n – No. of digits = No. of S-MTJs
β, ε – S-MTJ device parameters
Probability Composer: Collection of S-MTJs
Non-volatility
Simulated Output Characteristics (HSPICE)
- Probability
value
Resistance read-out
using reference
voltageencoded in 1/RPC
Output Voltage (V)
Vout = in
Iout.R
, RL <<- RRead-out
current/voltage
L
PC
RPC
Output
VREF = 1V
RL = 100KΩ
RPC = 2-4MΩ
Radj = 4MΩ
10 S-MTJs
2 S-MTJs ON
1 S-MTJ ON
All S-MTJs OFF
Input Probability
11
Elementary Arithmetic Composer Circuits
Addition Composer Circuit
Multiplication Composer Circuit
Ohm’s law
Current Addition
Input PA: Voltage domain
Input PB: S-MTJ Resistance
Vout = Iout.RL
Vout
Iout
, Vout = Iout.RL
Simulated Output Characteristics (HSPICE)
Output Voltage (V)
Output Voltage (V)
Simulated Output Characteristics (HSPICE)
Sum of Probabilities
Output Probability
12
Combining Elementary Composers: Add-Multiply
Example: Pout = Pa.Pb + Pc.Pd;
typical in BN inference computations
ADD{ MUL(Pa, Pb) , MUL(Pc, Pd)}; two levels of hierarchical instantiation
Elementary Composers = MUL, arranged in topology self-similar to ADD (Dominator Composer)
Add-Multiply Composer Circuit
Simulated Output Characteristics
(HSPICE)
Output Voltage (V)
Output Probability
13
Outline
Technology Overview: Nanoscale Straintronic MTJs
Physically Equivalent Intelligent System for Reasoning with BNs
•
Data Encoding: Mapping probabilities in physical layer
•
Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian
computations
• Elementary Arithmetic Composers
• Inference in BNs: Belief Propagation Algorithm Overview
• Composers for BN Inference Operations
•
Reconfigurable Bayesian Cell Architecture for BN Mapping
Evaluation
Summary
14
Bayesian Inference: Pearl’s Belief Propagation
Compute belief P(Xi I E) based on evidence E using local
computations and message propagation
Each node maintains
•
•
•
Repeated
application of
Bayes Rule
Local node computations using messages from neighbors
•
•
•
Conditional probability tables (CPTs): CPTjk(Xi) = P(Xi=j | Pa(Xi)=k)
Likelihood λ(Xi) = P(E-|Xi) and Prior π(Xi) = P(Xi|E+) Vectors
Belief Vector BEL(Xi) = P(Xi I E)
E+
λ messages from child to parent to compute λ(Xi)
π messages from parent to child nodes for π(Xi)
BEL(Xi) = λ(Xi) . π(Xi)
Applicable to trees and poly-trees
E-
J. Pearl, Probabilistic reasoning in intelligent systems: Networks of plausible inference, San Francisco, CA, USA: Morgan
Kaufmann Publishers Inc., 1988.
15
Composer Circuits for BN Inference Operations
Uses either elementary arithmetic composers or combines
them
Likelihood Estimation
Prior Estimation
Belief Update
Diagnostic Support to Parent
Predictive Support to Child nodes
Add-Multiply Composers for
Prior Estimation, Diagnostic Support
Multiplication Composers for Likelihood Estimation, Belief Update, Predictive
Support
16
Outline
Technology Overview: Nanoscale Straintronic MTJs
Physically Equivalent Intelligent System for Reasoning with BNs
•
Data Encoding: Mapping probabilities in physical layer
•
Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian
computations
•
Reconfigurable Bayesian Cell Architecture for BN Mapping
Evaluation
Summary
17
Physically Equivalent Architecture for BNs
Physical Equivalence: Every node in DAG mapped to a Bayesian Cell in H/W; incorporates non-volatile
Arithmetic Composers for Bayesian computations
Reconfigurable links using Switch Boxes (similar to FPGAs) to map any BN structure
Persistence in configuration + computation through non-volatile Composers; no need for external memory
18
Outline
Technology Overview: Nanoscale Straintronic MTJs
vs.for Reasoning with BNs
Physically Equivalent Intelligent System
Evaluation
•
Methodology
•
System-level Evaluation for BN Inference using Physically Equivalent Framework
•
Analytical Modeling of BNs Inference Performance on CMOS Multi-core Processors
and Comparison
Summary
19
Example Bayesian Graph to Estimate System-level Performance
Assuming a balanced binary tree structure for system level performance estimation
•
•
Each parent has 2 child nodes; each node has 4 states (applications like gene expression networks require 3*)
All leaf nodes are treated as evidence variables
Total number of nodes scaled from ~100 to ~1 million
Root:
Level n-1
Level n-2
Level n-3
Level 1
Level 0
(Leaf Nodes)
BN inference execution time estimated based on critical path delay (TBC) in each BC and Switch Box
communication delay (TSB) for worst-case
For Bayesian Network with n levels; (active nodes in a time-step operate in parallel)
Texec = (2n-1) x TBC + Tcomm
* N. Friedman, M. Linial, I. Nachman, and D. Pe'er, “Using Bayesian networks to analyze expression data,” J. Comput. Biol., 7(3-4), pp. 601-20, 2000.
20
Evaluation Methodology for BN Composer Circuits
Delay, power measured using HSPICE
simulations
•
HSPICE behavioral macromodels built for S-MTJs
Accounting for S-MTJ spacing to minimize magnetic
interactions
Dipole Coupling
S-MTJ
500nm
Low coupling energy implies
minimal magnetic interaction
500nm
S-MTJ Cell
Area
Collaboration: Data provided by VCU group (Prof.
Atulasimha, Prof. Bandyopadhay)
Worst-case
Power
(μW)
Likelihood
Estimation
(Multiplication
Composersx4)
144
20
4.57
Belief Update
(Multiplication
Composersx4)
144
20
4.57
Prior Estimation
(Add-multiply
Composersx4)
137
50
Diagnostic
Support
(Add-multiply
Composersx4)
137
50
11.24
Prior Support
(Multiplication
144
Composersx8) S-MTJ
40
9.14
132.9
240
11.37
100
95.4
89.32
10
398.8
0.85
Decomposer
(x60)
CMOS Op-Amp
(x176)
S-MTJ
S-MTJ Center-Center Distance
Area (μm2)
Module
Area determined by number of S-MTJs + CMOS
support
•
Critical
Path
Delay
(ns)
Switch Box
11.24
21
Path Delays within Bayesian Cell for Inference
4
A
λ To Parent
1
BEL
Node X
2
π From
Parent
X
4
λ From Child
π To Child
3
Y
1
2
3
All possible paths for information flow
λ From Child
π To Child
Z
Path Label
Total Path Delay (ns)
1
746.8
2
754.2
3
998.2
4
991.2
Worst-case Delay TBC
22
Implementation of BNs on Multi-core Processors
Hardware platform: Multi-core processor (100 cores) based on TILEPro from Tilera Corp.*
Lower bound execution time analytically estimated based on computation + memory
requirements for inference using Belief Propagation algorithm
•
Maximum idealized parallelism and operation cost, no network contention, no synchronization cost
Power and area from specifications
Architecture of a Tilera 100-Core Processor
* “Tile Processor Architecture Overview for the TILEPro Series”, Doc No. UG120, Feb. 2013, Tilera Corporation.
* C. Ramey, “TILE-Gx100 manycore processor: Acceleration interfaces and architecture”, Aug. 2011, Tilera Corporation.
23
Comparison vs. Multi-Core Processors
Log-scale
Delay Comparison for Bayesian Inference
Speedup over 100-Core Processors
12x
80x
8686x
(PEAR)
24
Comparison vs. Multi-Core Processors (contd.)
Power Comparison
Log-Scale
4788x Efficiency
(Power x Delay)
Area Comparison
Log-Scale
25
Summary
Physically equivalent intelligent system for probabilistic reasoning using Bayesian
Networks (BNs)
•
•
•
Architected from ground-up and enabled by emerging nanotechnology
Probability encoding based mixed-signal magneto-electric circuit framework
Reconfigurable Bayesian Cell architecture
Up to 8686x inference speed-up, 4788x lower energy for BNs with ~1M nodes for
resolution 0.1 vs. 100-core processor
Reasoning/learning tasks on complex problems with million variables made feasible
Embed real-time intelligence capabilities at smaller scale (100s of variables)
everywhere
26
Thank you
Acknowledgements
Collaboration with Prof. Atulasimha, Prof. Bandyopadhyay, VCU
Sponsored by National Science Foundation (CCF-1407906, ECCS-1124714, CCF-1216614,
CCF-1253370)
27