Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Emerging Architectures Architecting for Causal Intelligence at Nanoscale Csaba Andras Moritz Santosh Khasanvis (PhD student) 2016 Copyright C Andras Moritz and Santosh Khasanvis – All rights reserved Outline An example of unconventional architecture with emerging nanotechnology • One of the 5 selected papers for the IEEE Computer “Rebooting Computing” Special Issue, December 2015 2 Introduction Emerging opportunities with recent advances in critical research areas • • Personalized medicine, big data analytics, cyber-security, etc. Cognitive computing frameworks such as Bayesian networks (BNs) may be helpful Challenges • • High computational complexity; require persistence Implementation on CMOS Von Neumann microprocessors inefficient • Layers of abstraction, emulation on deterministic Boolean logic, rigid separation of memory and computation Rethink computing from the ground-up leveraging emerging nanotechnology • • • Architecting with Physical Equivalence – as direct mapping as possible of conceptual framework to physical layer Disruptive technology: Potential for orders of magnitude efficiency This talk: Architecting for probabilistic reasoning with BNs 3 Bayesian Networks (BNs) Probabilistic modeling of domain knowledge for reasoning under uncertainty Graphical representation of a domain • • • Structure: Directed Acyclic Graph; Nodes domain variables (w/ several states); Edges relationships/dependence between variables Parameters: Conditional probability distributions (or tables; CPTs) for strength of relationship Inference task: Find probability of unobserved variables given observed quantities (evidence) Bayesian Networks are graphs, representing domain knowledge using probabilities and involve probability computations for inference Inference C B D=1 D=0 Evidence BEL(lung cancer) = Adapted from Slides by Irina Rish, IBM – “A Tutorial on Inference and Learning in Bayesian Networks” Available online: http://www.ee.columbia.edu/~vittorio/Lecture12.pdf 4 Overview of Approach: Architecting for Causal Intelligence Architectural Approach • • Reconfigurable Bayesian Cell Architecture to map Bayesian Networks Information Encoding Probabilities tied to physical layer, encoded in electrical signals/S-MTJ resistances used in circuits Circuit Framework • • • Mixed-signal hybrid circuits (S-MTJ + CMOS) Direct computation on probabilities (memory in-built) Bayesian Cells incorporate these circuits Physical Layer Non-volatile Straintronic magnetic tunneling junctions S-MTJ (S-MTJs) + CMOS 5 Outline Technology Overview: Nanoscale Straintronic MTJs (S-MTJs) Physically Equivalent Intelligent System for Reasoning with BNs • Data Encoding: Mapping probabilities in physical layer • Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian computations • Reconfigurable Bayesian Cell Architecture for BN Mapping Evaluation Summary 6 Non-Volatile Straintronic-MTJ (S-MTJ) Device Structure Schematic Circuit Schematic Vh Device Characteristics Vh Input Voltage vs. Resistance V1 V2 Rhigh Rlow A. K. Biswas, Prof. Bandyopadhay, Prof. Atulasimha, Virginia Commonwealth Univ. Voltage-controlled magneto-electric devices Stacked nanomagnets separated by spacer layer: Resistance depends on relative magnetization orientation of nanomagnets Strain-based switching A. K. Biswas, S. Bandyopadhyay and J. Atulasimha, “Energy-efficient magnetoelastic non-volatile memory,” Appl. Phys. Lett., 104, 232403, 2014. 7 Outline Technology Overview: Nanoscale Straintronic MTJs Physically Equivalent Intelligent System for Reasoning with BNs • Data Encoding: Mapping probabilities physically using S-MTJs • Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian computations • Reconfigurable Bayesian Cell Architecture for BN Mapping Evaluation Summary 8 Encoding Probability Represented as non-Boolean flat probability vector of spatially distributed digits p1 1 1 Vh Vh 0 0 r1 = Rlow p3 … pn Resolution = 1/n; where n: #digits Physical Equivalence: Direct correlation to S-MTJ resistances and electrical signals E.g. Using 10 digits, pi∈ {0, 1} ↔ Resistance ri ∈ {ROFF, RON} ↔ Voltages Vi1, Vi2∈ {0V, 40mV} 1 V p2 r2 =Rlow 1 Vh Vh 0 r3 = Rlow 0 0 0 Vh Vh 0 r4 = Rlow 0 r5 = Rhigh 0 Vh 0 r6 = Rhigh 0 Vh 0 r7 = Rhigh 0 Vh 0 r8 = Rhigh Vh 0 0 P = 0.4 Equivalent Digital Voltages r9 = Rhigh r10 = Rhigh Equivalent S-MTJ Resistances Digit pi related to S-MTJ resistance ri as follows β and ε are constants 9 Circuit Framework Unconventional magneto-electric mixed-signal circuit framework Physical Equivalence: Directly implements Bayesian computations on probabilities using underlying circuit principles in analog domain • Input: Digital; Output: Analog Approach • • • Operating on spatial probability digital vectors that are converted into an analog representation of single probability value this is referred to as Probability Composer Probability Addition, Multiplication Composers internally use Probability Composers Cascade computational blocks for Bayesian functions: Enabled by Decomposers* Probabilities Incorporates S-MTJs + CMOS support for mixed-signal computations Probability * S. Khasanvis, et al., “Self- similar magneto-electric nanocircuit technology for probabilistic inference engines,” IEEE Transactions on Nanotechnology, Special Issue on Cognitive Computing with Nanotechnology, in press, 2015. 10 Probability Composer Circuit Needed to convert spatial probability representation (digital) analog quantity representing total probability value in current/voltage domain Parallel topology of S-MTJs; effective resistance encodes probability • Individual S-MTJ resistances set using digital voltages as shown earlier RPC – Effective resistance ri – Resistance of i-th S-MTJ P – Encoded probability value n – No. of digits = No. of S-MTJs β, ε – S-MTJ device parameters Probability Composer: Collection of S-MTJs Non-volatility Simulated Output Characteristics (HSPICE) - Probability value Resistance read-out using reference voltageencoded in 1/RPC Output Voltage (V) Vout = in Iout.R , RL <<- RRead-out current/voltage L PC RPC Output VREF = 1V RL = 100KΩ RPC = 2-4MΩ Radj = 4MΩ 10 S-MTJs 2 S-MTJs ON 1 S-MTJ ON All S-MTJs OFF Input Probability 11 Elementary Arithmetic Composer Circuits Addition Composer Circuit Multiplication Composer Circuit Ohm’s law Current Addition Input PA: Voltage domain Input PB: S-MTJ Resistance Vout = Iout.RL Vout Iout , Vout = Iout.RL Simulated Output Characteristics (HSPICE) Output Voltage (V) Output Voltage (V) Simulated Output Characteristics (HSPICE) Sum of Probabilities Output Probability 12 Combining Elementary Composers: Add-Multiply Example: Pout = Pa.Pb + Pc.Pd; typical in BN inference computations ADD{ MUL(Pa, Pb) , MUL(Pc, Pd)}; two levels of hierarchical instantiation Elementary Composers = MUL, arranged in topology self-similar to ADD (Dominator Composer) Add-Multiply Composer Circuit Simulated Output Characteristics (HSPICE) Output Voltage (V) Output Probability 13 Outline Technology Overview: Nanoscale Straintronic MTJs Physically Equivalent Intelligent System for Reasoning with BNs • Data Encoding: Mapping probabilities in physical layer • Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian computations • Elementary Arithmetic Composers • Inference in BNs: Belief Propagation Algorithm Overview • Composers for BN Inference Operations • Reconfigurable Bayesian Cell Architecture for BN Mapping Evaluation Summary 14 Bayesian Inference: Pearl’s Belief Propagation Compute belief P(Xi I E) based on evidence E using local computations and message propagation Each node maintains • • • Repeated application of Bayes Rule Local node computations using messages from neighbors • • • Conditional probability tables (CPTs): CPTjk(Xi) = P(Xi=j | Pa(Xi)=k) Likelihood λ(Xi) = P(E-|Xi) and Prior π(Xi) = P(Xi|E+) Vectors Belief Vector BEL(Xi) = P(Xi I E) E+ λ messages from child to parent to compute λ(Xi) π messages from parent to child nodes for π(Xi) BEL(Xi) = λ(Xi) . π(Xi) Applicable to trees and poly-trees E- J. Pearl, Probabilistic reasoning in intelligent systems: Networks of plausible inference, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1988. 15 Composer Circuits for BN Inference Operations Uses either elementary arithmetic composers or combines them Likelihood Estimation Prior Estimation Belief Update Diagnostic Support to Parent Predictive Support to Child nodes Add-Multiply Composers for Prior Estimation, Diagnostic Support Multiplication Composers for Likelihood Estimation, Belief Update, Predictive Support 16 Outline Technology Overview: Nanoscale Straintronic MTJs Physically Equivalent Intelligent System for Reasoning with BNs • Data Encoding: Mapping probabilities in physical layer • Circuit Framework: Mixed-signal circuits operating on probabilities for Bayesian computations • Reconfigurable Bayesian Cell Architecture for BN Mapping Evaluation Summary 17 Physically Equivalent Architecture for BNs Physical Equivalence: Every node in DAG mapped to a Bayesian Cell in H/W; incorporates non-volatile Arithmetic Composers for Bayesian computations Reconfigurable links using Switch Boxes (similar to FPGAs) to map any BN structure Persistence in configuration + computation through non-volatile Composers; no need for external memory 18 Outline Technology Overview: Nanoscale Straintronic MTJs vs.for Reasoning with BNs Physically Equivalent Intelligent System Evaluation • Methodology • System-level Evaluation for BN Inference using Physically Equivalent Framework • Analytical Modeling of BNs Inference Performance on CMOS Multi-core Processors and Comparison Summary 19 Example Bayesian Graph to Estimate System-level Performance Assuming a balanced binary tree structure for system level performance estimation • • Each parent has 2 child nodes; each node has 4 states (applications like gene expression networks require 3*) All leaf nodes are treated as evidence variables Total number of nodes scaled from ~100 to ~1 million Root: Level n-1 Level n-2 Level n-3 Level 1 Level 0 (Leaf Nodes) BN inference execution time estimated based on critical path delay (TBC) in each BC and Switch Box communication delay (TSB) for worst-case For Bayesian Network with n levels; (active nodes in a time-step operate in parallel) Texec = (2n-1) x TBC + Tcomm * N. Friedman, M. Linial, I. Nachman, and D. Pe'er, “Using Bayesian networks to analyze expression data,” J. Comput. Biol., 7(3-4), pp. 601-20, 2000. 20 Evaluation Methodology for BN Composer Circuits Delay, power measured using HSPICE simulations • HSPICE behavioral macromodels built for S-MTJs Accounting for S-MTJ spacing to minimize magnetic interactions Dipole Coupling S-MTJ 500nm Low coupling energy implies minimal magnetic interaction 500nm S-MTJ Cell Area Collaboration: Data provided by VCU group (Prof. Atulasimha, Prof. Bandyopadhay) Worst-case Power (μW) Likelihood Estimation (Multiplication Composersx4) 144 20 4.57 Belief Update (Multiplication Composersx4) 144 20 4.57 Prior Estimation (Add-multiply Composersx4) 137 50 Diagnostic Support (Add-multiply Composersx4) 137 50 11.24 Prior Support (Multiplication 144 Composersx8) S-MTJ 40 9.14 132.9 240 11.37 100 95.4 89.32 10 398.8 0.85 Decomposer (x60) CMOS Op-Amp (x176) S-MTJ S-MTJ Center-Center Distance Area (μm2) Module Area determined by number of S-MTJs + CMOS support • Critical Path Delay (ns) Switch Box 11.24 21 Path Delays within Bayesian Cell for Inference 4 A λ To Parent 1 BEL Node X 2 π From Parent X 4 λ From Child π To Child 3 Y 1 2 3 All possible paths for information flow λ From Child π To Child Z Path Label Total Path Delay (ns) 1 746.8 2 754.2 3 998.2 4 991.2 Worst-case Delay TBC 22 Implementation of BNs on Multi-core Processors Hardware platform: Multi-core processor (100 cores) based on TILEPro from Tilera Corp.* Lower bound execution time analytically estimated based on computation + memory requirements for inference using Belief Propagation algorithm • Maximum idealized parallelism and operation cost, no network contention, no synchronization cost Power and area from specifications Architecture of a Tilera 100-Core Processor * “Tile Processor Architecture Overview for the TILEPro Series”, Doc No. UG120, Feb. 2013, Tilera Corporation. * C. Ramey, “TILE-Gx100 manycore processor: Acceleration interfaces and architecture”, Aug. 2011, Tilera Corporation. 23 Comparison vs. Multi-Core Processors Log-scale Delay Comparison for Bayesian Inference Speedup over 100-Core Processors 12x 80x 8686x (PEAR) 24 Comparison vs. Multi-Core Processors (contd.) Power Comparison Log-Scale 4788x Efficiency (Power x Delay) Area Comparison Log-Scale 25 Summary Physically equivalent intelligent system for probabilistic reasoning using Bayesian Networks (BNs) • • • Architected from ground-up and enabled by emerging nanotechnology Probability encoding based mixed-signal magneto-electric circuit framework Reconfigurable Bayesian Cell architecture Up to 8686x inference speed-up, 4788x lower energy for BNs with ~1M nodes for resolution 0.1 vs. 100-core processor Reasoning/learning tasks on complex problems with million variables made feasible Embed real-time intelligence capabilities at smaller scale (100s of variables) everywhere 26 Thank you Acknowledgements Collaboration with Prof. Atulasimha, Prof. Bandyopadhyay, VCU Sponsored by National Science Foundation (CCF-1407906, ECCS-1124714, CCF-1216614, CCF-1253370) 27