* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Equivalence Verification of Large Galois Field
Survey
Document related concepts
Transcript
Equivalence Verification of Large Galois Field Arithmetic ∗ Circuits using Word-Level Abstraction via Gröbner Bases Tim Pruss Priyank Kalla Florian Enescu ECE University of Utah ECE University of Utah Math & Stats Georgia State University [email protected] [email protected] [email protected] ABSTRACT Custom arithmetic circuits designed over Galois fields F2k are prevalent in cryptography, where the field size k is very large (e.g. k = 571-bits). Equivalence checking of such large custom arithmetic circuits against baseline golden models is beyond the capabilities of contemporary techniques. This paper addresses the problem by deriving word-level canonical polynomial representations from gatelevel circuits as Z = F (A) over F2k , where Z and A represent the output and input bit-vectors of the circuit, respectively. Using algebraic geometry, we show that the canonical polynomial abstraction can be derived by computing a Gröbner basis of a set of polynomials extracted from the circuit, using a specific elimination (abstraction) term order. By efficiently applying these concepts, we can derive the canonical abstraction in hierarchically designed, custom arithmetic circuits with up to 571-bit datapath, whereas contemporary techniques can verify only up to 163-bit circuits. Categories and Subject Descriptors B.6.3 [Logic Design]: design aids – verification General Terms Verification, Arithmetic Circuits Keywords Hardware Verification, Word-Level Abstraction, Gröbner Bases 1. INTRODUCTION Arithmetic circuits designed over Galois fields of the type F2k find application in areas such as hardware security, cryptography, error-correction codes, VLSI testing, among others. In such applications, the field size – and thus the circuit data-path size (k) – can be very large. For example, the US National Institute for Standards and Technology (NIST) recommends fields F2k corresponding to k = 163, 233, 283, 409, and 571, for Elliptic Curve Cryptography (ECC). The large size and high-complexity of such ∗This research is funded in part by NSF grants CCF-1320335 and CCF-1320385. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. DAC ’14 June 01 - 05 2014, San Francisco, CA, USA ACM 978-1-4503-2730-5/14/06 ...$15.00. http://dx.doi.org/10.1145/2593069.2593134. architectures necessitates custom hierarchical design [1] [2]. Custom design raises the potential for bugs in the implementation. As bugs can compromise the security of cryptosystems [3], formal verification of Galois field circuits becomes an imperative. Verification of these circuits is very challenging, as custom architectures are usually structurally very dissimilar from the baseline specification (golden) models. Contemporary verification techniques [4] (including the recent approaches targeted for Galois field circuits [5]) are unable to prove equivalence of such large circuits. This paper presents an automatic combinational equivalence verification approach for very large Galois field arithmetic circuits. At the core of our approach is a symbolic method to derive the wordlevel, canonical, polynomial representation from a given combinational circuit. It employs concepts from commutative algebra and algebraic geometry — notably, Gröbner bases [6] theory — to derive the word-level abstraction. The approach is well-suited to arithmetic circuits that are hierarchically designed and cases where the verification instances are structurally dissimilar. Verification Problem: Given: i) a Galois field F2k , along with the primitive polynomial P(X) used for its construction; ii) the golden model circuit C1 (called Spec); iii) the custom implementation C2 (called Impl), along with any available design hierarchy. Prove or disprove the functional equivalence C1 ≡ C2 ; i.e. prove whether or not C1 ,C2 implement the same (polynomial) function over F2k . To solve this problem, we analyze the circuits C1 ,C2 separately, and derive unique canonical polynomial representations F1 , F2 , respectively. The equivalence test is then performed by simply matching the coefficients of F1 , F2 . The polynomial extraction approach is based on the following novel mathematical insights: The mathematical framework: A combinational circuit C with kbit inputs and k-bit outputs implements Boolean functions that are mappings between k-dimensional Boolean spaces: f : Bk → Bk , where B = {0, 1}. The function f , which is a mapping among 2k elements, can also be construed as a function over a Galois field of 2k elements, f : F2k → F2k . There is a well-known “textbook” result [7] which states that: i) over a Galois field (Fq ) of q elements, every function f : Fq → Fq is a polynomial function; and ii) there exists a unique canonical polynomial F that describes f . Motivated by this fundamental result, we devise an approach to derive a word-level, canonical, polynomial abstraction of the function as Z = F (A) over F2k , where Z = {z0 , . . . , zk−1 }, A = {a0 , . . . , ak−1 } are, respectively, the output and input bit-vectors (words) of the circuit, and F denotes a polynomial representation of the circuit’s functionality. The approach easily generalizes to circuits with arbitrary number of word-level inputs — i.e. (multivariate) functions f : Fn2k → F2k represented by a polynomial Z = F (A1 , . . . , An ). The polynomial F can be derived by means of the Lagrange interpolation formula [7] [8]. However, this requires to analyze f over the entire field F2k , which is exhaustive and infeasible. To make this approach practical, we propose a symbolic method based on computer algebra and algebraic geometry to derive the canonical polynomial abstraction and employ it for design verification. Contributions: Using polynomial abstractions, we analyze the given circuits and model the gate-level Boolean operators as elements of a multivariate polynomial ring with coefficients in F2k . By exploiting concepts of Nullstellensatz, Gröbner bases, elimination ideals and projections of varieties [6], we formulate the polynomial abstraction problem as one of computing a Gröbner basis of this set of polynomials, using a specific elimination term order — termed as the abstraction term order >. Computing Gröbner bases using elimination orders is infeasible for large circuits. To overcome this limitation, we refine the term order based on a topological analysis of the circuit. Using this refinement, we guide the S-polynomial computations in the Buchberger’s algorithm [9] to derive the polynomial representation of the circuit’s functionality. This approach identifies the function implemented by the given Galois field arithmetic circuits for verification. We experiment with different architectures of Galois field multipliers and show that: i) when the circuits are given as flattened netlists, we can abstract the polynomial for up to 409-bit NIST specified fields; and ii) when the design hierarchy is available, our approach can identify the polynomial up to 571-bits, i.e. for all NIST-specified Galois fields F2k used in ECC. Our approach scales well for practical verification, whereas other techniques [5] fail beyond 163-bit circuits. 2. RELATED PREVIOUS WORK Canonical Representations: The Reduced Ordered Binary Decision Diagram (ROBBD) [10] — and its variants FDDs, ADDs, BMDs, etc. — are canonical DAG representations of functions that are employed in design verification. The various decomposition principles behind these diagrams are based on point-wise, binary decomposition, w.r.t. each (Boolean) variable. As such, these do not fully provide word-level abstraction capabilities from bitlevel representations. Taylor Expansion Diagrams (TEDs) [11] are a word-level canonical representation of a polynomial expression, but they do not represent a polynomial function canonically. MODDs [12] are a DAG representation of the characteristic function of a circuit over Galois fields F2k . MODDs come close to satisfying our requirements as a canonical word-level representation that can be employed over Galois fields. However, MODDs do not scale well w.r.t. the circuit size. MODDs are known to be infeasible in representing functions over larger than 32-bit vectors [12]. Equivalence Checking: Modern equivalence checkers employ techniques based on AIG-based reductions [4] and circuit-SAT solvers [13]. Such techniques are able to identify internal structural equivalences between the Spec and Impl circuits and reduce the instances for verification. However, when the arithmetic circuits are structurally very dissimilar, these techniques are infeasible in proving equivalence (Tables I and II in [5] depict such experiments). Word-Level Verification of Galois field circuits: In [14], the authors present the BLUEVERI tool for verification of Galois field circuits for error correcting codes against an algorithmic spec. The implementation consists of a set of (pre-designed and verified) circuit blocks that are interconnected to form the system. Their objective is to prove the equivalence of the implementation against a “check file” (spec), for which they employ a Nullstellensatz and Gröbner basis formulation. In their setting, the polynomial function representation of the sub-circuit blocks is already available. In [5], Lv et al. present computer algebra techniques for formal verification of Galois field arithmetic circuits. Given a specification polynomial F , and a circuit C, they formulate the verification problem as an ideal membership test using the Gröbner basis theory. Verification is performed by a sequence of divisions modulo the polynomials of the circuit. This approach moves the verification complexity solely to that of polynomial division — which results in the size-explosion of intermediate remainders in the division. As a result, their approach does not scale beyond 163-bit circuits. In contrast to [14] [5], we are not given the specification polynomial F . Given the circuit C, we have to derive (extract) the wordlevel specification F . Moreover, we perform a Gröbner basis computation on a subset of polynomials to derive the abstraction polynomial, which is the reason behind the success of our approach. Polynomial Interpolation: Interpolation can be used to derive a polynomial representation for a function over F2k . However, Newton’s dense interpolation techniques exhibit very high complexity. While such techniques have been investigated by logic synthesis and testing communities [8], they are feasible only over small fields — e.g. for computing Reed-Muller forms for multi-valued logic. 3. PRELIMINARIES Galois fields and Polynomial functions: A Galois field (Fq ) is a field with a finite number (q) of elements, where q is a power of a prime integer — i.e. q = pk , where p is a prime integer, and k ≥ 1. We consider fields where p = 2 and k > 1 — i.e. binary Galois extension fields F2k — as they are employed in hardware implementations of cryptography primitives. To construct F2k , we take the polynomial ring F2 [x], where F2 = {0, 1}, and an irreducible polynomial P(x) ∈ F2 [x] of degree k, and construct F2k as F2 [x] (mod P(x)). As a result, all field operations are performed modulo the irreducible polynomial P(x) and the coefficients are reduced modulo p = 2. Any element A ∈ F2k can be represented in polynomial form as A = a0 + a1 α + · · · + ak−1 αk−1 , where ai ∈ F2 , i = 0, . . . , k − 1, and α is a root of the irreducible polynomial, i.e. P(α) = 0. Note that A is essentially represented as a k-bit vector. The field F2k can therefore be construed as a kdimensional vector space over F2 , so F2 ⊂ F2k . Polynomial Functions f : F2k → F2k : Arbitrary mappings among k-bit vectors can be constructed; each such mapping generates a function f : Bk → Bk . Every such function is also a polynomial function over Galois fields: f : F2k → F2k . T HEOREM 3.1. From [7]: Any function f : Fq → Fq is a polynomial function over Fq , that is there exists a polynomial F ∈ Fq [x] such that f (a) = F (a), for all a ∈ Fq . An important property of Galois fields is that for all elements A ∈ Fq , Aq = A, and hence Aq − A = 0. Therefore, the polynomial X q − X vanishes on all points in Fq . Consequently, any polynomial F (X) can be reduced (mod X q − X) to obtain a canonical representation (F (X) (mod X q − X)) with degree at most q − 1. D EFINITION 3.1. Any function f : Fdq → Fq has a unique canonical representation (UCR) as a polynomial F ∈ Fq [x1 , . . . , xd ] such that all its nonzero monomials are of the form x1i1 · · · xdid where 0 ≤ i j ≤ q − 1, for all j = 1, . . . d. Modulo-multipliers over F2k : Over Galois fields F2k , multiplication is performed as Z = A × B (mod P(x)), where A, B ∈ F2k are k-bit inputs and P(x) is the given irreducible polynomial. The multiplier circuit takes bit-level inputs {a0 , . . . , ak−1 , b0 , . . . , bk−1 } and i=k−1 ai αi , B = produces output Z = {z0 , . . . , zk−1 }, such that A = ∑i=0 i=k−1 i=k−1 i i ∑i=0 bi α and Z = ∑i=0 zi α . First, the bit-wise multiplication S = A × B is computed using an array multiplier architecture, and then the result S is reduced (mod P(x)) to obtain Z = S (mod P(x)). Such architectures are termed Mastrovito multipliers [15]. Mastrovito multipliers are inefficient, specially for cryptosystems where multiplication is often performed repeatedly. For such applications, Montgomery reduction operations are proposed [1] [2]. Montgomery reduction (MR) computes: MR(A, B) = A·B·R−1 (mod P(x)), where A, B are k-bit inputs, R is suitably chosen as R = αk , R−1 is multiplicative inverse of R in F2k , and P(x) is the irreducible polynomial. Since Montgomery reduction cannot directly compute A · B (mod P(x)), we need to pre-compute A · R and B · R, as shown in Fig. 1. R MR AR MR B A B R MR G=A B (mod P) BR 2 R D EFINITION 3.3. [Gröbner Basis] For a monomial ordering >, a set of non-zero polynomials G = {g1 , g2 , · · · , gt } contained in an ideal J, is called a Gröbner basis for J ⇐⇒ ∀ f ∈ J, f 6= 0, there exists i ∈ {1, · · · ,t} such that lm(gi ) divides lm( f ); i.e., G = GB(J) ⇔ ∀ f ∈ J : f 6= 0, ∃gi ∈ G : lm(gi ) | lm( f ). As a consequence of Definition 3.3, the set G is a Gröbner basis of ideal J if and only if for all f ∈ J, dividing f by polynomials of G 2 A V (g1 , . . . , gt ). A Gröbner basis is a representation of an ideal which allows to solve many polynomial decision questions. MR "1" Figure 1: Montgomery multiplication over F2k using four MRs. Clearly, Montgomery multipliers are hierarchically designed as an interconnection of MR blocks (Fig. 1). These circuits are structurally dissimilar from the baseline Mastrovito multipliers. In this paper, Mastrovito and Montgomery multipliers are used as Spec and Impl benchmarks, respectively, for equivalence verification. 3.1 Computer Algebra Preliminaries Let Fq [x1 , . . . , xd ] be the polynomial ring with indeterminates x1 , . . . , xd , where q = 2k . A monomial is a power product X = x1α1 · x2α2 · · · xdαd , where αi ≥ 0, i ∈ {1, . . . , d}. A polynomial f ∈ Fq [x1 , . . . , xd ], f 6= 0, is a finite sum of terms f = c1 X1 + c2 X2 + · · · + ct Xt . Here c1 , . . . , ct are coefficients and X1 , . . . , Xt are monomials. A monomial ordering > is imposed on the ring such that X1 > X2 > · · · > Xt . Subject to such an ordering, lt( f ) = c1 X1 , lm( f ) = X1 , lc( f ) = c1 , are the leading term, leading monomial and leading coefficient of f , respectively. Similarly, tail( f ) = c2 X2 + · · · + ct Xt . Division of a polynomial f by polynomial g gives reg mainder polynomial r, denoted f → − + r. Similarly, f can be reduced (divided) w.r.t. a set of polynomials F = { f1 , . . . , fs } to obF G gives 0 remainder: G = GB(J) ⇐⇒ ∀ f ∈ J, f −→+ 0. Buchberger’s algorithm [9], shown in Algorithm 1, computes a L Gröbner basis over a field. Spoly( f , g) = lt(Lf ) · f − lt(g) · g where L = LCM(lm( f ), lm(g)). Note that Spoly( f , g) cancels the leading G′ terms of f , g, and the remainder r obtained in Spoly( f , g) −→+ r gives a new leading term. A Gröbner basis is computed when all G′ Spoly( f , g) −→+ 0. A Gröbner basis can be further reduced; a reduced Gröbner basis is a canonical representation of the ideal w.r.t. the set monomial order. Algorithm 1: Buchberger’s Algorithm Input: F = { f1 , . . . , fs } Output: G = {g1 , . . . , gt } G := F; repeat G′ := G; for each pair { f , g}, f 6= g in G′ do G′ Spoly( f , g) −→+ r ; if r 6= 0 then G := G ∪ {r} ; end end until G = G′ ; 4. WORD-LEVEL ABSTRACTION USING GRÖBNER BASIS tain a remainder r, denoted f −→+ r, such that no term in r is divisible by the leading term of any polynomial in F. An ideal J generated by polynomials f1 , . . . , fs ∈ Fq [x1 , . . . , xd ] is: J = h f1 , . . . , fs i = {∑si=1 hi · fi : hi ∈ Fq [x1 , . . . , xd ]}. The polynomials f1 , . . . , fs form the basis or generators of J. Let a = (a1 , . . . , ad ) ∈ Fdq be a point, and f ∈ Fq [x1 , . . . , xd ] be a polynomial. We say that f vanishes on a if f (a) = 0. For any ideal J = h f1 , . . . , fs i ⊆ Fq [x1 , . . . , xd ], the affine variety of J over Fq is: V (J) = {a ∈ Fd : ∀ f ∈ J, f (a) = 0}. In other words, the variety corresponds to the set of all solutions to f1 = · · · = fs = 0. We are given a circuit C with k-bit inputs and outputs that performs a polynomial computation Z = F (A) over Fq = F2k . Let P(x) be the given irreducible or primitive polynomial used for field construction, and let α be its root, i.e. P(α) = 0. Note that we do not know the polynomial representation F (A) and our objective is to identify (the coefficients of) F (A). Let {a0 , . . . , ak−1 } denote the primary inputs and let {z0 , . . . , zk−1 } be the primary outputs of C. Then, the word-level and bit-level correspondences are: D EFINITION 3.2. For any subset V of Fdq , the ideal of polynomials that vanish on V , called the vanishing ideal of V , is defined as: I(V ) = { f ∈ Fq [x1 , . . . , xd ] : ∀a ∈ V, f (a) = 0}. Therefore, if a polynomial f vanishes on a variety V , then f ∈ I(V ). We analyze the circuit and model all the gate-level Boolean operators as polynomials in F2 ⊂ F2k . To this set of Boolean polynomials, append the polynomials of Eqn. (1) that relate the wordlevel and bit-level variables. Denote this set of polynomials as F = { f1 , . . . , fs } over the ring R = Fq [x1 , . . . , xd , Z, A]. Here x1 , . . . , xd denote, collectively, all the bit-level variables of the circuit — i.e. primary inputs, primary outputs and the intermediate circuit variables — and Z, A, are the word-level variables. Denote the generated ideal as J = hFi ⊂ R. Also, denote the (unknown) “specification” of the circuit as a polynomial f : Z − F (A), or equivalently as f : Z + F (A), as −1 = +1 in F2k . As Z = F (A), clearly f : Z + F (A) agrees with the solutions to the circuit equations f1 = · · · = fs = 0. This means that f : Z + F (A) vanishes on the variety VFq (J). If f : Z + F (A) vanishes on T HEOREM 3.2. Strong Nullstellensatz over Fq : (From [16]): q q Let J ⊆ Fq [x1 , . . . , xd ] be an ideal, and let J0 = hx1 − x1 , . . . , xd − xd i be the ideal of all vanishing polynomials. Let VFq (J) denote q the variety of J over Fq . Then, I(VFq (J)) = J + J0 = J + hx1 − q x1 , . . . , xd − xd i. Gröbner Bases: An ideal J may have many different generators (representations): i.e. F = { f1 , . . . , fs } and G = {g1 , . . . , gt } such that J = h f1 , . . . , fs i = hg1 , . . . , gt i and V (J) = V ( f1 , . . . , fs ) = A = a0 + a1 α + · · · + ak−1 αk−1 ; Z = z0 + z1 α + · · · + zk−1 αk−1 ; (1) VFq (J), then due to Definition 3.2, f : Z + F (A) is a member of the ideal I(VFq (J)). Strong Nullstellensatz over Galois fields (Theorem 3.2) tells us that I(VFq (J)) = J + J0 , where J0 = hx12 − x1 , . . . , xd2 − xd , Z q − Z, Aq − Ai is the ideal of all vanishing polynomials in R. From these results, we deduce that: D EFINITION 4.2. Abstraction Term Order >: Using the variable order x1 > x2 > · · · > xd > Z > A, impose a lex term order > on the polynomial ring R = Fq [x1 , . . . , xd , Z, A]. This elimination term order > is defined as the Abstraction Term Order. The relative ordering among x1 , . . . , xd can be chosen arbitrarily. P ROPOSITION 4.1. The (unknown) specification polynomial f : Z + F (A) ∈ (J + J0 ). T HEOREM 4.2. Abstraction Theorem: Using the setup and notations from Problem Setup 4.1 above, compute a Gröbner basis G of ideal (J + J0 ) using the abstraction term order >. Then: (i) G must contain a polynomial of the form Z + G (A); and (ii) Z + G (A) is such that F (A) = G (A), ∀A ∈ Fq . In other words, G (A) and F (A) are equal as polynomial functions over Fq . The variety VFq (J) is the set of all consistent assignments to the nets (signals) in the circuit C. If we project this variety on the word-level input and output variables of the circuit C, we essentially generate the function f implemented by the circuit. Projection of varieties from d-dimensional space Fdq onto a lower dimensional subspace Fd−l is equivalent to eliminating l variables from q the corresponding ideal. D EFINITION 4.1. (Elimination Ideal) From [6]: Given J = h f1 , . . . , fs i ⊂ Fq [x1 , . . . , xd ], the lth elimination ideal Jl is the ideal of Fq [xl+1 , . . . , xd ] defined by Jl = J ∩ Fq [xl+1 , . . . , xd ]. In other words, the lth elimination ideal does not contain variables x1 , . . . , xl , nor do the generators of it. Moreover, Gröbner bases may be used to generate an elimination ideal by using an “elimination term order.” One such ordering is a pure lexicographic ordering, which features into the following theorem: T HEOREM 4.1. (Elimination Theorem) From [6]: Let J ⊂ Fq [x1 , . . . , xd ] be an ideal and let G be a Gröbner basis of J with respect to a lex ordering where x1 > x2 > · · · > xd . Then for every 0 ≤ l ≤ d, the set Gl = G ∩ Fq [xl+1 , . . . , xd ] is a Gröbner basis of the lth elimination ideal Jl . E XAMPLE 4.1. Consider polynomials f1 : x2 − y − z − 1, f2 : x − y2 − z − 1, f3 : x − y − z2 − 1 and ideal J = h f1 , f2 , f3 i ⊂ C[x, y, z]. Let us compute a Gröbner basis G of J w.r.t. lex term order with x > y > z. Then G = {g1 , . . . , g4 } is obtained as: g1 : x − y − z2 − 1; g2 : y2 − y − z2 − z; g3 : 2yz2 − z4 − z2 ; g4 : z6 − 4z4 − 4z3 − z2 . Notice that the polynomial g4 contains only the variable z, and it eliminates variables x, y. Similarly, polynomials g2 , g3 , g4 , contain variables y, z and eliminate x. According to Theorem 4.1, G1 = G ∩ C[y, z] = {g2 , g3 , g4 } and G2 = G ∩ C[z] = {g4 } are the Gröbner bases of the 1st and 2nd elimination ideals of J, respectively. The above example motivates our approach: since we want to derive a polynomial representation from a circuit in variables Z, A, we can compute a Gröbner basis of J + J0 w.r.t. an elimination order that eliminates all the (d) bit-level variables of the circuit. The Gröbner basis Gd = G ∩ Fq [Z, A] of the d th elimination ideal of (J + J0 ) will contain polynomials in only Z, A. P ROBLEM S ETUP 4.1. Given a circuit C with k-bit inputs and outputs which computes a polynomial function f : F2k → F2k . Let A = {a0 , . . . , ak−1 } and Z = {z0 , . . . , zk−1 } be the inputs and outputs of the circuit, respectively, such that A = a0 + a1 α + · · · + ak−1 αk−1 and Z = z0 + · · · + zk−1 αk−1 , where P(α) = 0. Let Z = F (A) be the unknown polynomial function implemented by the circuit. Denote by xi , i = 1, . . . , d all the Boolean variables of the circuit. Let R = F2k [xi , Z, A : i = 1, . . . d] denote the corresponding polynomial ring and let ideal J ⊂ F2k [xi , Z, A : i = 1 . . . d] be generated by the bit-level and word-level polynomials of the circuit. k k Let J0 = hxi2 − xi , Z 2 − Z, A2 − A : i = 1, . . . , di denote the ideal of vanishing polynomials in R. ✷ P ROOF. (i) Since f : Z + F (A) is a polynomial representation of the circuit, Z + F (A) ∈ J + J0 , due to Proposition 4.1. Therefore, according to the definition of a Gröbner basis (Definition 3.3), the leading term of Z + F (A) (which is Z) should be divisible by the leading term of some polynomial gi ∈ G. The only way lt(gi ) can divide Z is when lt(gi ) = Z itself. Moreover, due to our abstraction (lex) term order, Z > A, so this polynomial must be of the form Z + G (A). (ii) As Z = F (A) represents the function of the circuit, Z + F (A) ∈ J + J0 . Moreover, V (J + J0 ) ⊂ V (Z + F (A)). Project this variety V (J +J0 ) onto the co-ordinates corresponding to (A, Z). What we obtain is the graph of the function A 7→ F (A) over F2k . Since Z + G (A) is an element of the Gröbner basis of J + J0 , V (J + J0 ) ⊂ V (Z + G (A)) too. Due to this inclusion of varieties, the points that satisfy (J + J0 ) also satisfy Z + G (A) = 0 and Z + F (A) = 0. Therefore, Z = G (A) gives the same function as Z = F (A), for all A ∈ F2k . C OROLLARY 4.1. Computing a reduced Gröbner basis Gr of J +J0 , we will obtain one and only one polynomial in Gr of the form Z + G (A), such that Z = G (A) is the unique, minimal, canonical representation of the function f implemented by the circuit. As a consequence of Theorem 4.2 and Corollary 4.1, if we compute a reduced Gröbner basis G of J + J0 using the abstraction term order, we will always find the one and only polynomial of the form Z + G (A) in the Gröbner basis, such that Z = G (A) is the unique canonical polynomial representation of the circuit. The above results trivially extend to circuits with multiple wordlevel input variables A1 , . . . , An , and the canonical polynomial representation obtained by computing a reduced Gröbner basis Gr of J + J0 using > is of the form Z = F (A1 , . . . , An ). E XAMPLE 4.2. Demonstration of our approach: Consider the 2-bit multiplier circuit over F22 given in Fig. 2, which implements a polynomial function: Z = A × B, Z, A, B ∈ F4 . Here, A = a0 + a1 α, B = b0 +b1 α are the word-level inputs and Z = z0 +z1 α is the output in F4 , and P(x) = x2 + x + 1 (given) where P(α) = 0. The Figure 2: A 2-bit Multiplier over F22 . The gate ⊗ corresponds to AND- gate, i.e. bit-level multiplication modulo 2. The gate ⊕ corresponds to XOR-gate, i.e. addition modulo 2. functionality of the circuit is described using the following polynomials derived from the Boolean gate-level operators: f1 : z0 + z1 α + Z; f2 : b0 + b1 α + B; f3 : a0 + a1 α + A; f4 : s0 + a0 · b0 ; f5 : s1 + a0 · b1 ; f6 : s2 + a1 · b0 ; f7 : s3 + a1 · b1 ; f8 : r0 + s1 + s2 ; f9 : z0 + s0 + s3 ; f10 : z1 + r0 + s3 . Ideal J = h f1 , . . . , f10 i. Generate J0 as the ideal of vanishing polynomials. Impose the following abstraction term order, i.e. a lex order with “circuit variables” > “Output Z” > “Inputs, A, B”, and compute a Gröbner basis G of J + J0 . We find the following polynomials in the basis: g1 : z0 + z1 α + Z; g2 : b0 + b1 α + B; g3 : a0 + a1 α + A; g4 : s3 + r0 + z1 ; g5 : s1 + s2 + r0 ; g6 : s0 + s3 + z0 ; g7 : Z + AB; g8 : a1 b1 + a1 B + b1 A + z1 ; g9 : r0 + a1 b1 + z1 ; g10 : s2 + a1 b0 , and the polynomials of J0 . The polynomial g7 : Z + AB describes Z = AB as the (canonical) polynomial function implemented by the circuit. 5. IMPROVING OUR APPROACH Computing Gröbner bases w.r.t. elimination orders is infeasible for large circuits. The worst-case complexity of computing GB(J + J0 ) in Fq [x1 , . . . , xd ] is known to be bounded by qO(d) [16], which is prohibitive over large fields. Therefore, we need to improve our approach to overcome this complexity. Notice that our approach “searches” for only one polynomial (Z + G (A)) in the Gröbner basis, and it does so by computing the entire Gröbner basis. This motivates us to investigate whether it is possible to guide J+J 0 a sequence of Spoly( f , g) −−−→ + r computations to arrive at the desired word-level polynomial, without considering other polynomials in the generating set. For this purpose, we exploit the wellknown product criteria: L EMMA 5.1. [Product Criterion [17]] Let f , g ∈ F[x1 , · · · , xd ] be polynomials. If the equality lm( f ) · lm(g) = LCM(lm( f ), lm(g)) G holds, then Spoly( f , g) −→+ 0. The above result states that when the leading monomials of f , g are relatively prime, then Spoly( f , g) always reduces to 0 modulo G. Thus Spoly( f , g) need not be considered in Buchberger’s algorithm. Recall that in the Abstraction Term Order (Definition 4.2), we have “circuit variables x1 , . . . , xd ” > Z > A, where the relative ordering among x1 , . . . , xd is not important. We will now further refine the abstraction term order while exploiting the product criteria. D EFINITION 5.1. Refined Abstraction Term Order >r : Starting from the primary outputs of the circuit C, perform a reverse topological traversal toward the primary inputs. Order each variable of the circuit according to its reverse topological level: i.e. xi > x j if xi appears earlier in the reverse topological order. Impose a lex term order >r on Fq [x1 , . . . , xd , Z, A] with “circuit variables ordered reverse topologically” > Z > A. This term order >r is called the refined abstraction term order (RATO). When RATO is imposed on the set of polynomials F = { f1 , . . . , fs }, J = hFi, it is easy to see that each polynomial in F is of the form fi = xi +Pi , where xi is a gate-output and Pi = tail( fi ) represents the function implemented by that gate. Moreover, each indeterminate x j that appears in Pi satisfies xi > x j (acyclic circuit). Furthermore, each gate output is a leading term of some polynomial in F. Since each gate output is a unique signal, fi = xi + Pi and f j = x j + Pj have relatively prime leading terms (xi 6= x j ). So, Spoly( fi , f j ) need not be considered in the Gröbner basis computation. However, there is one (and only one) pair of polynomials ( fw , fg ) ∈ F with leading terms that are not relatively prime: i) the wordlevel polynomial ( fw ) corresponding the outputs: fw = z0 + z1 α + · · · + zk−1 αk−1 + Z, with gate output z0 as the leading term; and ii) the polynomial fg that models the function at the gate z0 . Due to J+J 0 RATO, Spoly( fw , fg ) −−−→ + r is the only candidate critical pair to be evaluated at the start of Buchberger’s algorithm. Based on these concepts, we devise the following approach to efficiently search for the polynomial function: 1. Impose RATO on the ring. Select the only critical pair ( fw , fg ) that does not have relatively prime leading terms, and comF,F0 pute Spoly( fw , fg ) −−→+ r. 2. Then r will contain only the following variables: i) the bitlevel primary input variables of the circuit; ii) the word-level output Z; and iii) the word-level input A. The remainder r will not contain any bit-level variable corresponding to the output of any gate in the design; i.e. primary output bits and intermediate variables of the circuit do not appear in r. To prove this, assume that a non-primary-input variable x j appears in a monomial term m j in r. Since there always exists a polynomial f j ∈ F such that f j = x j + tail( f j ), lt( f j ) divides monomial m j and m j can be canceled. Therefore, all such terms m j with non-primary-input bit-level variables can be eliminated. 3. Two cases need to be considered: (a) (Case 1:) Remainder r does not even contain the primary input bits. Then, r contains only the word-level variables Z, A. Since RATO is lex with Z > A, the remainder r corresponds to the desired canonical polynomial representation: r : Z + G (A). (b) (Case 2:) Remainder r contains both the bit-level primary input variables (call this set XPI ), as well as the word-level variables. Then, due to Lemma 5.1, we only need to consider the set F ′ = {r, fwi } and F0′ = {xi2 − xi , Z q − Z, Aq − A : xi ∈ XPI }, where fwi = a0 + a1 α + · · · + ak−1 αk−1 + A is the polynomial that relates the word-level (A) and bit-level inputs {a0 , . . . , ak−1 }. Compute the reduced Gröbner basis G′ of F ′ ∪F0′ , which is a much simplified computation. Then, G′ will definitely contain a polynomial of the form Z + G (A), which will be the canonical polynomial representation of the function of the circuit. E XAMPLE 5.1. Consider, again, the example shown in Example 4.2, corresponding to the multiplier circuit of Fig. 2. Impose RATO: {z0 > z1 } > {r0 > s0 > s3 } > {s1 > s2 } > {a0 > a1 > b0 > b1 } > Z > A. Then, the polynomials f1 . . . , f10 shown in Example 4.2 are already represented in RATO. Assume that the circuit is correct and it has no bugs. Then f1 and f9 are the only two polynomials whose leading terms are not relatively prime. Computing F,F0 Spoly( f1 , f9 ) −−→+ r, we find that r = Z + A · B — which is the word-level polynomial representation of the circuit. Now, let us introduce a bug in the design. Replace the polynomial f8 : r0 + s1 + s2 in F with f8 : r0 + s0 + s2 (bug introduced). F,F0 Computing Spoly( f1 , f9 ) −−→+ r, we find that r = αa1 b1 + (α + 1)a1 B + b1 A + Z + (α + 1)AB. Note that in addition to word-level variables Z, A, B, we also have bit-level primary inputs a1 , b1 in r. Moreover, all other polynomials in F have leading terms that are relatively prime w.r.t. lt(r). Now we take F ′ = {r, a0 + a1 α + A, b0 + b1 α + B} and F0′ = 2 {a0 − a0 , a21 − a1 , b20 − b0 , b21 − b1 , A4 − A, B4 − B, Z 4 − Z} and compute the reduced Gröbner basis G′ of F ′ ∪ F0′ . We find the polynomial Z + (α) · A2 · B2 + A2 · B + (α + 1) · A · B2 + (α + 1) · A · B in G′ which is indeed the polynomial representation of the buggy circuit! 6. EXPERIMENTAL RESULTS Using the approach described in Section 5, we have performed experiments to prove equivalence between Mastrovito (C1 ) and Montgomery (C2 ) multiplier circuits. The Mastrovito multiplier, baseline golden model (Spec), is provided as a bit-blasted/flattened gatelevel netlist. The (Impl) is given as the hierarchically designed Montgomery multiplier, as shown in Fig. 1; i.e. each MR block is given as a flattened gate-level netlist, and these MR blocks are interconnected to construct the multiplier circuit. For equivalence checking using AIG and SAT-based methods, a miter is constructed between Spec and Impl, and the ABC tool [4] and CSAT solver [13] are used. These tools cannot prove equivalence beyond 16-bit multiplier circuits within 24-hours; none of the NIST-specified ECC circuits can be verified. This is exactly the same observation made by the authors of [5] (cf. Table I & II in [5]). When we apply the approach of [5], we are able to prove equivalence only up to 163-bit multipliers, beyond which the verification tool of [5] runs into a memory explosion. We apply our abstraction-based approach to derive the canonical word-level polynomials F1 , F2 from circuits C1 ,C2 and then prove equivalence by checking if F1 = F2 (coefficient matching). First, we use the SINGULAR computer algebra tool [18] to derive the polynomial abstraction by computing a full Gröbner basis of J + J0 (using the slimgb command), and find that the technique is infeasible (memory explosion) beyond only 32-bit circuits; as the full Gröbner basis using elimination orders is extremely large. Finally, we apply the approach presented in Section 5 to specifically guide the search for the abstraction polynomial. Since this approach constitutes only a sequence of polynomial divisions, we exploit an F4-style reduction approach, described in [5] (Section 7), for which we built a custom tool. All experiments are conducted on Intel Xeon 6-core CPU running Scientific Linux 6.2 x86_64 with 96GB RAM. Timeout limit for all experiments, for all tools, was restricted to 24 hours. Table I depicts the time required to derive the polynomial abstraction from Mastrovito circuits. The tool takes the circuit as input, performs a reverse topological traversal to determine RATO, applies the approach presented in Section 5 and derives the polynomial representation Z = A · B. For up to 409-bit multipliers, with 508K gates, our approach is successful. Table II depicts the results for Montgomery multipliers. In the table, ’BLK A’ and ’B’ denote the input MR blocks, ’BLK Mid’ denotes the middle block and ’BLK Out’ is the output block. While each block is an MR block, some have been simplified by constant-propagation (recall, R = αk ), hence they have different sizes. First, a polynomial is extracted for each MR block (gate-level to word-level abstraction), and then the approach is re-applied at word-level to derive the input-output relation (solved trivially in < 1 second). Our approach can extract the word-level polynomial for up to 571-bit circuits! Table 1: Abstraction of Mastrovito multipliers. Time given in seconds, memory given in MB. T O = 24 hours. Size (k) # of Gates 163 233 283 409 571 153K 167K 399K 508K 1.6M Time 4,351 5,777 40,114 72,708 TO Our tool Max Mem 162 168 381 509 - 7. CONCLUSION This paper has presented a technique to derive a word-level, canonical polynomial representation from a circuit by modeling the function over the Galois field F2k . We show that this can be achieved by computing a Gröbner basis of the ideal generated by the constraints Table 2: Abstraction of Montgomery blocks. Time given in seconds, memory is given in MB Circuit Size (k) Blk A Blk B # of Gates Blk Mid Blk Out Blk A Blk B Time Blk Mid Our Tool Blk Out Total Time Max Mem 163 233 283 409 33K 55K 82K 168K 33K 55K 82K 168K 85K 163K 241K 502K 32K 54K 81K 168K 145 322 1,011 5,084 101 306 1,058 5,381 264 1,014 5,085 20,294 126 267 1,032 3,243 636 1,909 8,186 34,002 34 71 104 224 571 330K 330K 980K 328K 14,288 12,298 47,364 13,508 87,458 477 derived from the circuit using an elimination term order. To overcome the complexity of computing the Gröbner basis, we have proposed a refinement of the abstraction term order, using which we can more efficiently guide the search for the word-level polynomial abstraction. Using our approach, we can identify the polynomial function and thus prove the correctness of Galois field multiplier circuits with up to 571-bit data-path size. 8. REFERENCES [1] C. K. Koc and T. Acar, “Montgomery Multiplication in GF(2k )”, Designs, Codes and Cryptography, vol. 14, pp. 57–69, 1998. [2] Huapeng Wu, “Montgomery Multiplier and Squarer for a Class of Finite Fields”, IEEE Transactions On Computers, vol. 51, May 2002. [3] E. Biham, Y. Carmeli, and A. Shamir, “Bug Attacks”, in Proceedings on Advances in Cryptology, pp. 221–240, 2008. [4] A. Mishchenko, S. Chatterjee, R. Brayton, and N. Een, “Improvements to Combinational Equivalence Checking”, in Proc. Intl. Conf. on CAD (ICCAD), pp. 836–843, 2006. [5] J. Lv, P. Kalla, and F. Enescu, “Efficient Grb̈ner Basis Reductions for Formal Verification of Galois Field Arithmetic Circuits”, in IEEE Trans. on CAD, vol. 32, pp. 1409–1420, 2013. [6] D. Cox, J. Little, and D. O’Shea, Ideals, Varieties and Algorithms, Springer-Verlag, 1997. [7] Rudolf Lidl and Harald Niederreiter, Finite Fields, Cambridge University Press, 1997. [8] Z. Zilic and Z. Vranesic, “A deterministic multivariate interpolation algorithm for small finite fields”, IEEE Trans. Comp., vol. 51, 2002. [9] B. Buchberger, Ein Algorithmus zum Auffinden der Basiselemente des Restklassenringes nach einem nulldimensionalen Polynomideal, PhD thesis, Philosophiesche Fakultät an der Leopold-Franzens-Universität, Austria, 1965. [10] R. E. Bryant, “Graph Based Algorithms for Boolean Function Manipulation”, IEEE Trans. on Comp., vol. C-35, pp. 677–691, 1986. [11] M. Ciesielski, P. Kalla, and S. Askar, “Taylor Expansion Diagrams: A Canonical Representation for Verification of Data-Flow Designs”, IEEE Transactions on Computers, vol. 55, pp. 1188–1201, 2006. [12] A. Jabir et al., “A Technique for Representing Multiple Output Binary Functions with Applications to Verification and Simulation”, IEEE Trans. on Comp., vol. 56, pp. 1133–1145, 2007. [13] F. Lu, L. Wang, K. Cheng, and R. Huang, “A Circuit SAT Solver With Signal Correlation Guided Learning”, in IEEE Design, Automation and Test in Europe, pp. 892–897, 2003. [14] A. Lvov, L. Lastras-Montaño, V. Paruthi, R. Shadowen, and A. El-Zein, “Formal Verification of Error Correcting Circuits using Computational Algebraic Geometry”, in Proc. Formal Methods in Computer-Aided Design (FMCAD), pp. 141–148, 2012. [15] E. Mastrovito, “VLSI Designs for Multiplication Over Finite Fields GF(2m )”, Lecture Notes in CS, vol. 357, pp. 297–309, 1989. [16] S. Gao, “Counting Zeros over Finite Fields with Gröbner Bases”, Master’s thesis, Carnegie Mellon University, 2009. [17] B. Buchberger, “A criterion for detecting unnecessary reductions in the construction of a groebner bases”, in EUROSAM, 1979. [18] W. Decker, G.-M. Greuel, G. Pfister, and H. Schönemann, “S INGULAR 3-1-3 — A computer algebra system for polynomial computations”, 2011, http://www.singular.uni-kl.de.