Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Security and Testing by Kurt Rosenfeld M.S., City College of New York, 2004 B.S., City College of New York, 2002 A thesis submitted to the Faculty of the Graduate School of the Polytechnic Institute of NYU in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science and Engineering 2012 Microfilm or copies of this dissertation may be obtained from: UMI Dissertation Publishing ProQuest CSA 789 E. Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106-1346 Vita Kurt Rosenfeld was born in 1972, in Palo Alto, California. He received the B.S. degree in Electrical Engineering from The City College of New York in 2002 and the M.S. degree in Electrical Engineering from The City College of New York in 2004. From 2005 to the present, he has been in the Information Systems and Internet Security Lab at Polytechnic institute of NYU, studying with Professors Ramesh Karri and Nasir Memon. His research interests include hardware security, distributed system security, and testing. He is an engineer at Google, Inc. Acknowledgements I would like to express my gratitude for all of the help I got from my family, advisors, friends, colleagues, and my employer. Without your support I would have failed. Your patience, generosity, and encouragement have been my extremely good fortune. This work was partly supported by NSF award numbers 0831349 and 0621856. Abstract This dissertation presents research on improving the security of computing platforms at a physical and logical level. The main contributions are to improve the security of: 1. test data communication between chips 2. test data communication within chips 3. communication between sensors and chips 4. verification of chip authenticity We investigated the security of IEEE 1149.1 JTAG and studied existing attacks. We invented two new attacks and experimentally verified them. After generalizing the threats, we designed and implemented a security-enhanced backwards compatible version of JTAG. We identified security vulnerabilities that stem from the use of shared test onchip test data wiring in system-on-chip (SoC) designs, particularly where trusted and untrusted cores coexist. We developed an efficient architecture and protocol that mitigates test-related risks. We extended the concept of the physical unclonable functions to encompass sensors. The result is a sensor whose measurement can be verified by the logic inside the trust perimeter. We propose countermeasures to the growing problem of counterfeit components. We developed an inexpensive end-to-end scheme for ensuring the authenticity of parts received by a system integrator. The four platform security enhancements we developed complement each other. They solve non-overlapping problems that exist today and they can be applied individually or together. Applied together, they significantly raise the bar for platform security. Contents Chapter 1 Introduction 1 1.1 The Core Root of Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Trustworthy Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Testing 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Security of Digital System Testing 2.1 3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.1 Development of Test Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.2 Example: Testing a Two-bit State Machine . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.3 Fault Testing versus Trojan Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.4 VLSI Testing: Goals and Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.5 Conflict Between Testability and Security . . . . . . . . . . . . . . . . . . . . . . . . . 12 Scan-based Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Scan-based Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Countermeasures for Scan Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 BIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 JTAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.1 JTAG hacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4.2 JTAG Defenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 v 2.5 2.6 2.7 SoC test Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5.1 SoC Test Hacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5.2 Defenses for SoC Test Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Emerging Areas of Test Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6.1 OBD-II for Automobile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6.2 Medical Implant Interface Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Recapitulation and Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3 JTAG Security 33 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 JTAG Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.1 BYPASS Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.2 EXTEST Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.3 INTEST Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3 JTAG Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.1 Sniff Secret Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.2 Read Out Secret . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3.3 Obtain Test Vectors and Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.4 Modify State of Authentic Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3.5 Return False Responses to Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3.6 Forcing TMS and TCK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4 Prior Work on Attacks and Defenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.5 Defenses for JTAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.5.1 Secure JTAG Communication Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.5.2 Level 0 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5.3 Level 1 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5.4 Level 2 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 vi 3.5.5 3.6 3.7 Level 3 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Costs and Benefits Associated with JTAG Defenses . . . . . . . . . . . . . . . . . . . . . . . . 52 3.6.1 Die Area Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.6.2 Test Time Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.6.3 Operational Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.6.4 Impact of Defenses on Known Threats . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Testing Cores in a System on Chip 4.1 55 57 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.1.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.1.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.1.3 Core Test Wrappers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2 Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3.1 Security-enhanced Test Wrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3.2 A Security Overwrapper for Prewrapped Cores . . . . . . . . . . . . . . . . . . . . . . 66 4.3.3 Interoperability with Noncompliant Cores . . . . . . . . . . . . . . . . . . . . . . . . . 66 Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.1 Die Area Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.2 Test Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.3 Effort for SoC Integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4 4.5 5 Integrity and Authenticity of Sensors 5.1 71 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.1.2 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 vii 5.1.3 5.2 5.3 5.4 5.5 5.6 Security Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Candidate Sensor PUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.2 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Electrical Analysis and Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.3.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.3.2 Distribution of Cut Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.3.3 Hamming Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.3.4 Verification of Offset Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Security Context of Sensor PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.4.1 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.4.2 Tampering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.4.3 Manufacturer Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.4.4 Sensor Decoupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.5.1 Attack Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.5.2 Attack Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Future Work in Sensor PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6 Future Work in Hardware Security 95 7 Publications 96 Bibliography 98 Figures Figure 2.1 A two-bit counter with synchronous reset has four states. From each state, there are two possible next states. This realization provides an output signal that is asserted when the counter is in state S03. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 5 The machine behaves similarly to the machine shown in Figure 2.1, but deviates for certain rare inputs. Starting in initial state S00, if RST is given the sequence 0,1,0,0,1,0,0,0 the machine enters state S23, at which point the behavior of the system deviates from that shown in Figure 2.1. S23 is a terminal state. The only way to exit S23 is to reinitialize the system (e.g., cycle the power). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 A cascadable section of a synchronous binary counter with synchronous reset. . . . . . . . . . 10 2.4 The simplest scan flip-flop cell is simply composed of a multiplexer and a regular D flip-flop. The Q output of one scan cell can be connected to the TEST INPUT of another scan cell, enabling a chain configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 13 Secure Scan state diagram. The only way to get from secure mode, where the mission key is loaded, to insecure mode, where the chip is testable, is to go through a power cycle reset, which wipes all volatile state variables. 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Secure Scan architecture. The mirror key register (MKR) is loaded only when Load Key is active, which is controlled by the state machine shown in Figure 2.5. . . . . . . . . . . . . . 16 ix 2.7 Bed of nails test fixture. Automated test equipment (ATE) generates stimulus signals and measures responses. The ATE is connected to the test fixture, which contains one nail per test channel. Each nail is spring-loaded so it maintains a controlled pressure when contacting the test points on the printed circuit board being tested. . . . . . . . . . . . . . . . . . . . . . 2.8 18 The JTAG state machine. There are 16 states. The TMS signal determines the next state. The SHIFT DR state is used for applying stimuli and collecting responses. From any state, the TEST LOGIC RESET state can be reached by holding TMS high for five clock cycles. . 2.9 20 A typical JTAG system. TMS, TCK, and TRST are bussed to all of the devices. TDO of each component is connected to TDI of the next component, thereby forming a daisy-chain topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.10 The essential components of a basic JTAG implementation include a test access port state machine, an instruction register, one or more data registers, and an output multiplexer. Each chain of scan flip-flop cells (internal or boundary) appears to JTAG as a data register that can be selected with the appropriate instruction. . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.11 A chain of scan cells is used for distributing keys to each of the cores. The scan cells are configured not to expose key bits at their outputs while they are being shifted. . . . . . . . . 3.1 29 The typical deployment of JTAG is a chain of several devices on a printed circuit board. Each device may come from a different vendor. The test mode select (TMS), test clock (TCK), and test reset (TRST) signals are typically common to all chips. The test data in (TDI) signal and test data out (TDO) signals loop through the chips. The path returns to the source, which is usually either a PC or an embedded microcontroller, functionally called a “system controller.” 34 x 3.2 Conceptual security model: A set of attackers A1, A2, and A3 have a set of goals G1 through G6. Each attacker has a set of attack capabilities, some or all of which are masked by defenses that are in place. There is a set of attacks, K1 through K4, each of which requires a certain set of unmasked attack capabilities. Each attack can be used to reach some set of goals. This example shows that attacker A1 can achieve goals G2 and G4 since it has capabilities P2 and P4, which are the requirements for attack K2. Attackers A2 and A3 do not have sufficient unmasked capabilities to execute any of the attacks. . . . . . . . . . . . . . . . . . . . . . . . 36 3.3 The attacker obtains secret data by sniffing the JTAG data path. . . . . . . . . . . . . . . . 38 3.4 The attacker obtains an embedded secret by forcing test vectors onto the JTAG lines. . . . . 39 3.5 The attacker obtains a copy of the test vectors and normal responses of a chip in the JTAG chain. This can be a passive attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 The attacker can intercept test vectors that are sent to another chip, and can send false responses to the tester. 3.7 40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 A Philips 8052 (the system controller) was programmed to keep one of its pins low. The I-V curve for this output driver was extracted using a pulsed I-V measurement. A Xilinx Spartan 3e was programmed to keep one of its pins high. This pin’s I-V curve was also extracted. The result is shown. The solid line is the FPGA; the dashed line is the microcontroller. The intersection is at 2.1V, exceeding VIH for most 3.3V logic. 3.8 . . . . . . . . . . . . . . . . . . . A length of PCB wiring connects the hijacker to the JTAG master. This allows the attacker to inject short pulses onto the wiring without being hindered by the master. 3.9 44 . . . . . . . . . 45 We define four levels of assurance. Levels correspond to the set of assurances that are provided. 49 xi 3.10 Area overhead is shown for the protection levels 1 through 3, from bottom to top. The cost of the security enhancements independent of design complexity, so the percentage overhead is lower for more complex designs. The four levels of assurance provide progressively higher levels of assurance. An indication of the area cost of each protection level is given by the number of additional FPGA slices used by the enhanced JTAG circuitry. These figures are for a Spartan 3e. There are no fuses in the FPGA so fuses are modeled as hard-coded bit vectors. Overhead in an ASIC will be less. 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Data sent from the test controller to core 2 passes through core 1, giving core 1 an opportunity to intercept it. Likewise, data passing from core 2 back to the tester passes through core 3, giving core 3 an opportunity to intercept it. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 A chain of scan cells is used for distributing keys to each of the cores. The scan cells are configured not to expose key bits at their outputs while they are being shifted. . . . . . . . . 4.3 59 62 The key setup scan chain conveys data from the test controller to the core wrapper key registers without allowing it to be sniffed or modified by other cores. Other than the basic distributed shift register functionality, the only extra functionality we require of our scan cell is an output inhibit input (O INH) to ensure that the key is not leaked during shifting. After the tester has the key bits shifted to their intended location, the tester deasserts the output inhibit signal so that the cores receive their key data. . . . . . . . . . . . . . . . . . . . . . . . 4.4 64 In a typical security-enhanced wrapped core, a word of compressed test data arrives via the parallel data input, is decrypted instantaneously, decompressed and applied to the inputs of the core’s scan chains. The outputs are compressed, encrypted, and sent out. Standard wrapper components are not shown, such as the parallel bypass. . . . . . . . . . . . . . . . . 5.1 The naı̈ve secure sensor architecture does not bind the sensing with the cryptography, allowing the analog link between the sensing element and the crypto processor to be easily attacked. . 5.2 65 72 A conventional silicon PUF has a binary input and a binary output. The sensor PUF has a binary input, physical quantity being sensed, and a binary output. . . . . . . . . . . . . . . . 73 xii 5.3 The analog portion of the light level sensor PUF includes the coating, the photodiode groups, the switches, the summing junctions, and the analog comparator. The challenge applied to the sensor PUF determines the keystream input to the control circuit, which controls the random selection of left gate signals GL.i and the right gate signals GR.i, which determine the set of sensors that are included in the summations. The left and right sums are compared, producing one raw bit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 76 a: The offset generator produces a DC voltage that is determined by the optical transmittance of the coating at the sites of photodiodes P D1 and P D2 . b: The slope generator produces a voltage proportional to the light input at photodiode P3 . . . . . . . . . . . . . . . . . . . . . . 5.5 77 In all four subfigures, the sensor input level is on the x-axis and the electrical response is on the y-axis. Subfigure (a) shows eight photodiode group response lines generated by simulation of the candidate light sensor PUF. The bold line is the sum of the eight lines. Subfigures (b), (c), and (d) show pairs of response lines that occur for different values of the left and right gate signals. Assuming a sensor input value of 10 and assuming that solidline > dashedline is interpreted as a “1”, evaluating the comparisons for the line pairs shown in (b), (c), and (d) gives the raw bit sequence “0”, “1”, “0”. In our simulations, the raw bit sequences are 256 bits long. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.6 The probability density function of offset current ratios observed in simulation. . . . . . . . . 82 5.7 The density function of the offset signal of each individual photodiode group. . . . . . . . . . 82 5.8 The green trace shows the probability density function of the cut points in the sensor input domain, as observed in simulation. The red trace shows the Cauchy density function for χ = 0 and γ = 20. 5.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Hamming distance for five statistically independent instances of the candidate sensor PUF . . 84 5.10 The offset generator circuit shown in Figure 5.2.1 was constructed and tested using light from an LED driven by a variable current source. The offset voltage is plotted across the range of currents for three different relative transmittances. We see ±2.5% variation over the range. . 86 5.11 SPICE simulation of the offset generator output voltage across a range of light intensity values. 86 xiii 5.12 Attack tree for replay. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.13 Attack tree for cloning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.14 Attack tree for inducing errors in measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Chapter 1 Introduction 1.1 The Core Root of Trust Trust in information systems is built from the ground up. Just as human knowledge is built on a set of core beliefs, information system security is based on a set of assumptions. These assumptions form the root of trust for the system. For example, common assumptions in microprocessor-based systems are that data written to memory will be read back correctly and that the computer’s internal state is secret unless an I/O operation is made that explicitly writes the data. The root of trust can be different for different systems, but if a system is to provide any kind of assurance, there must be a root of trust. Untrustworthy components can be useful parts of a trustworthy system as long as they are outside the core root of trust, explicitly untrusted, and appropriate design measures are taken. For example, consider a computer where the secrecy of the data in main memory cannot be ensured. The architecture can encrypt data when writing it to main memory and decrypt it when reading it, thereby eliminating exposure of secret data. Here, although the main memory is untrusted, the encryption, decryption, and key management is trusted. This is a typical example of trust relocation. The architect has freedom to relocate the root of trust but cannot eliminate it. 1.2 Trustworthy Hardware Traditionally, in most information systems, the root of trust has been the behavior of the hardware. This trust has been informally justified by the relative difficulty for an attacker to change the behavior of 2 the hardware. Although it remains to be seen whether trustworthy hardware is actually a requirement for trustworthy computation, it is certainly a convenient assumption. The field of hardware security aims to make the assumption correct. When we say that hardware is secure, we mean that it provides functionality the can be relied on despite physical threats. For example, a soda vending machine is secure hardware. It is intended to behave correctly in a moderately hostile physical environment. In contrast, personal computers generally are not secure hardware. In a hostile physical environment, most personal computers yield total control to an attacker. Once a personal computer enters the attacker’s physical control, the machine can no longer be trusted by its rightful owner. Trusted platform modules are only a partial remedy. In their most common use case, they are islands of trust in a sea of untrustworthy components. Personal computers with trusted platform modules offer little security in physically hostile environments. 1.3 Testing Testing and security are closely related. Both aim to provide some kind of assurance to humans about what we can expect from a system. A battery tester tells us how much more life we can expect from a battery. A security evaluation of a system tells us what threats the system can be expected to resist. Testing, as opposed to measurement, is rarely passive. Testing typically involves application of a stimulus and observation of a response. Since this generalization encapsulates all functional interactions between modules in a system, the stimulus-response frameworks that are built for testing are often used for other maintenance tasks, like configuring and programming the system. Chapter 2 Security of Digital System Testing 2.1 Introduction Test interfaces are present in nearly all digital hardware. In many cases, the security of the system depends on the security of the test interfaces. Systems have been hacked in the field using test interfaces as an avenue for attack. Researchers in industry and academia have developed defenses over the past twenty years. A diligent designer can significantly reduce the chance of system exploitation by understanding known threats and applying known defenses. 2.1.1 Development of Test Interfaces Test interfaces have been part of man-made systems for at least 100 years. They address a need for applying a stimulus and/or making an observation via a path other than the primary functional path. For example, large tanks holding liquids or gases usually have inspection ports. These ports allow the internal condition of the tank to be visually evaluated to avoid unexpected failure due to corrosion. Otherwise, it would be difficult to predict failure. Brake systems on cars often have inspection holes in the calipers. This allows the condition of the brake pads to be assessed without disassembling the brakes. More than just the functional question of whether the brakes work, the inspection hole allows the mechanic to answer the deeper question of how much more life is left in the brake pads. In areas where operational reliability and efficiency are valued, features are added to products to make them testable, to let their maintainers probe their internal condition. As electronic devices grew more complex in the mid 20th century, it became difficult to tune them or 4 diagnose problems with only an input-output view of the system. Take, for example, a 1960’s radio receiver. These receivers contain several filters and mixers cascaded to form the desired frequency response. There are dozens of adjustments, many of which interact, and all of which affect the output. Optimal receiver performance is achieved for a specific vector of settings. Applying a signal to the input while observing the output, it is almost impossible for the technician to infer which adjustment to change to bring the receiver closer to correct alignment. To make their equipment maintainable, manufacturers provided test points in their circuits, where signals could be measured or injected. This allowed the circuit to be structurally decomposed to make maintenance straightforward. Each section can be independently aligned, a process involving only a small number of adjustments. When electronic computers were first developed in the 1940’s and 50’s, it was customary to write “test” or “checkout” programs that could be run on the system to verify correct functionality of the hardware. Test programs were designed so that if it failed, it would provide the technician with an indication of where it failed, speeding diagnosis and repair. The method of running programs on the computer to test the computer is really just functional testing, and since there isn’t enough time for the tests to cover all possible states and transitions of the hardware, this testing paradigm can never provide rigorous assurance of the hardware even if all of the tests pass. As the complexity of computers grew in the 1960’s, designers sought stronger assurance from testing, and faster fault isolation. From an operational standpoint in the field, designers wanted to minimize the mean time between when a fault is detected and when the system is back up. As computers began to play crucial roles in real-time operations, high availability became a goal, in addition to the traditional performance goals. All of these factors led major computer developers such as IBM to develop techniques for testing the structural blocks independently. 2.1.2 Example: Testing a Two-bit State Machine As an illustration of a digital circuit testing problem, consider testing a 2-bit synchronous circuit with the state diagram shown in Figure 2.1. The circuit has two inputs: clock and reset. It has one output signal, which is high if and only if the state is S3. On every rising edge of the clock signal, the next state is 5 determined as follows: • If the reset signal is asserted, the next state is S0. • Otherwise, the next state is oldstate+1 mod 4. RST=0 S01 RST=1 RST=0 S02 RST=1 S00 RST=0 S03 RST=X RST=1 Figure 2.1: A two-bit counter with synchronous reset has four states. From each state, there are two possible next states. This realization provides an output signal that is asserted when the counter is in state S03. The process of testing a practical state machine implementation is affected by three main considerations: (1) What are we trying to determine? (2) What can we assume about the device under test? (3) How is our testing constrained? 2.1.2.1 What are we trying to determine? For the first question, one possible answer is that we are trying to determine whether the device under test is equivalent to a reference implementation or model. Equivalence, in this case, means that for all possible input sequences, the outputs are the same. Another possibility is that we aim to determine whether the device is equivalent to the reference implementation for some restricted set of inputs. Yet another possibility is that we are checking whether an invariant rule is satisfied for some set of input sequences. 6 2.1.2.2 What can we assume? Assumption 1: Number of States The second question we must ask when testing a circuit is what we can assume. A state machine is composed of a set of states, a set of edges (transitions), a set of inputs, a set of outputs, and an initial state. It is profoundly helpful to know how many states the system has, how many flip-flops are in the circuit. If we know how many states it has, we can, for example, apply the pumping lemma for regular languages [3], to place limits on the state machine’s behavior. The pumping lemma states that if the number of states is finite, there must be an infinite set of input sequences that result in the same final state. This has practical ramifications for testing. If the number of states is bounded, it is possible, at least in theory, to completely test the circuit using a finite set of test vectors. Under adverse security circumstances, we are not able to safely assume that the device under test has the number of states it is specified to have. A trojan horse might be inserted into the circuit during design or fabrication. A typical trojan horse waits for a trigger, which is a special pattern or a sequence. The trigger occurs in, for example, the input data stream, the trojan activates its payload. A payload carries out a malicious action which can be as complex as executing an embedded program, or as simple as halting or resetting the system. Trojans are designed to pass normal testing, so they typically contain all of the benign specified logic, plus extra logic for trigger detection and payload execution. Consequently, from a testing standpoint, the trojan is more an extra feature than a defect. Testing for the presence of extra state variables is exceedingly difficult. Consider testing a system that is intended to implement the four-state state machine shown in Figure 2.1. There are two inputs: clock and reset. There is one output. The output is “1” when the state is S0. The system could faithfully implement the intended state machine (Figure 2.1), or it might, for example, implement the state machine shown in Figure 2.2, where the output is “1” when the state is S00, S10, S20, or S30. RST=0 S11 RST=1 7 RST=0 S12 RST=0 RST=1 RST=0 S10 S21 RST=1 S22 RST=0 S13 RST=1 S20 RST=0 S23 RST=X RST=X RST=1 RST=1 RST=1 RST=0 RST=0 S01 S02 S31 S32 RST=1 RST=0 RST=1 S00 RST=0 S03 RST=0 RST=0 S30 RST=X RST=1 S33 RST=X RST=1 RST=1 Figure 2.2: The machine behaves similarly to the machine shown in Figure 2.1, but deviates for certain rare inputs. Starting in initial state S00, if RST is given the sequence 0,1,0,0,1,0,0,0 the machine enters state S23, at which point the behavior of the system deviates from that shown in Figure 2.1. S23 is a terminal state. The only way to exit S23 is to reinitialize the system (e.g., cycle the power). For normal inputs, the machine shown in Figure 2.2 might be indistinguishable from the machine in 2.1, but it has more states than the intended design. Certain rare sequences of inputs cause the two machines to differ in their outputs. Black-box testing would be much more likely to conclude that the state machines 8 are the same than to find the difference. The rare sequence that causes them to differ can be the trigger for an embedded trojan, and the way in which the they differ can be the payload of that trojan. For example, if we assume that the machine has four states, as intended, then we would expect the test sequence shown in Table 2.1 would thoroughly test the circuit. RST 1 0 0 0 0 1 0 0 1 0 1 0 0 0 1 Current Output X 0 0 0 1 0 0 0 0 0 0 0 1 0 1 Transition X → S0 S0 → S1 S1 → S2 S2 → S3 S3 → S0 S0 → S0 S0 → S1 S1 → S2 S2 → S0 S0 → S1 S1 → S0 S0 → S1 S1 → S2 S2 → S3 S3 → S0 Table 2.1: Test routine for a two-bit counter. All edges of the specified state transition graph are tested. Since we cannot force the counter directly into a arbitrary state, we must sequentially visit the states and test each of the edges while observing the functional output. However, consider the effect of that test sequence on the machine shown in Figure 2.2. The machine goes through the following sequence of states: S00, S01, S02, S03, S00, S00, S01, S02, S00, S01, S10, S11, S12, S13, S10. At no point during the test sequence does its externally observable behavior differ from the intended behavior, that shown in Figure 2.1, although the final state is not the initial state. In the case of this example, running the test sequence repeatedly will not uncover any differences between the actual state machine and the specified one. Although the Figure 2.1 system and Figure 2.2 system are the same for the Table 2.1 test sequence, they behave quite differently for other test sequences, specifically, any sequence that puts the Figure 2.2 system into the S23 state, where it locks up. In summary, when testing a state machine, we make an assumption about the number of states. If our assumption is wrong, we are likely to make invalid inferences about the system under test. 9 Assumption 2: Determinism or Randomness A pivotal assumption is that the device under test is deterministic. If it is not, then we must characterize its behavior statistically instead of logically. That completely changes the nature of the testing procedure. One example is a testing pseudorandom bit sequence generator. The output is expected to satisfy certain statistical requirements. There are standard requirements, such as Golomb’s randomness postulates. A single test is not sufficient to establish the randomness of a sequence. Standard suites of tests have been developed for the purpose of comparatively evaluating cryptographic systems [4]. Related to the testing pseudorandom sequence generators is the testing of “true” randomness sources, which derive their randomness from a physical source. Typically a diode is used as a noise source, which is then digitized to produce random bits. Testing such devices for their fitness in security-critical applications involves several special criteria such as their immunity to external factors influencing their output. How is our testing constrained? Systems that only allow interaction in their normal functional use pattern demand black-box testing. This constraint can appear for a variety of reasons, and has wide-ranging implications. One reason for the absence of test features is their perceived cost, either in engineering effort or in production cost. The implications of being forced to resort to black-box testing are an exponential increase in the time required to test a system, and decreased confidence that it is thoroughly tested. When we are not constrained to black-box testing, the practical approach to testing, for example, a counter, is to structurally decompose it and test the components and the interconnections, and then argue that if the components are good, and the interconnections are good, then the system is good. When we decompose the circuit, we break it into small islands of logic that are easily testable, avoiding the exponential explosion of test complexity seen in the previous paragraph. A 2-bit counter can be implemented as two 1-bit sections cascaded. A 128-bit counter can be implemented as 128 cascaded sections, each containing a flip flop and three logic gates, as shown in Figure 2.3. carry_out 10 Qi carry_in Q D Q’ reset’ Figure 2.3: A cascadable section of a synchronous binary counter with synchronous reset. If suitable test structures are provided, the 128 sections can be tested independently. The number of tests necessary for one isolated section is #tests = 2F2I where F is the number of flip-flops in each section and I is the number of inputs to each section. We have F=1 and I=2, so eight tests a required per section of the counter. If the stages of a B-bit counter are tested sequentially, the number of tests is #tests = 2F2IB = 8B Without structural decomposition, we have to visit each state and test the circuit’s response to the RESET=0 input and the RESET=1 input. This requires two tests per state, so #tests = 2 * 2B = 2B+1 Structural testing is not just somewhat more efficient than black-box testing. It is profoundly more efficient. The number of tests required for structural testing is of linear complexity, O(B), while black-box testing is of exponential complexity. O(2B). Similar general results apply to circuits other than counters. The total complexity of the pieces of a logic circuit is almost always less than the complexity of the circuit as a whole. Consequently black-box testing of whole circuits is avoided for all but the simplest systems. 2.1.3 Fault Testing versus Trojan Detection The standard goals of test engineering are to detect flaws that occur naturally during fabrication and isolate the exact location of the flaw. Any significant deviation from the behavior intended by the designers is considered a fault. By this definition, a piece of malicious logic that is added to the design by an attacker 11 before fabrication is a fault. Although it might seem elegant, and it is certainly correct, to group malicious modifications with manufacturing faults, it is not practical to do so. The fault models assumed when testing for manufacturing faults do not include the changes that would be made by a malicious party who adds some kind of trojan horse to the design. 2.1.4 VLSI Testing: Goals and Metrics VLSI testing is always done in terms of a prescribed fault model. For example, a common fault model for logic circuitry is that each node can be correct, stuck at 1, or stuck at 0. If this stuck-at fault model is used, the goal of testing is determine, for each node, which of the three conditions describes it. A set of tests is said to cover a fault if the test would detect the fault, if the fault were present in the device being tested. Test coverage is the percentage of possible faults that are covered by a given set of tests. In many cases, it is practical to create a set of tests that has 100% coverage. In some cases, 100% coverage is not reachable for a practical number of test vectors. The number of test vectors that are needed for 100% test coverage is an indication of the testability of the circuit. Two factors are important in determining the testability of the circuit: • controllability • observability Certain topologies are known to result in poor testability. One example is reconvergent fanout. This is when a signal fans out from a single node and follows multiple parallel paths, and the reconverges into a single node. This topology exhibits poor testability because the signals along the parallel paths are not independently controllable. Logic gates with a large number of inputs, particularly XOR gates, are also problematic. Design for Test (DFT) is a design process that runs in parallel with the functional design process. The goal of DFT is to ensure that the final design meets testability goals while minimizing the costs associated with testing. There are four main costs that relate to testing. • Die area cost 12 • Time required to apply tests • Cost of the required testing station • Computation required to generate test vectors 2.1.5 Conflict Between Testability and Security Conventional security best practices are to conceptualize the system as a set of modules which expose simple interfaces and hide their internal implementation. This type of control limits the complexity of the interactions in the system. However, black-box testing is profoundly inefficient. Providing controllability and observability of the internal elements within a module makes that module testable, but the module then loses the security that comes from having a single, restricted interface. Thus, there is an apparent conflict between security and testability. 2.2 Scan-based Testing As we discussed in the previous section, an important approach for achieving testability in a VLSI chip is to include elements in the design that allow it to be structurally decomposed for testing. Most designs are synchronous, meaning that they are composed of logic and flip-flops, and the flip-flops only change state during the rising or falling edges of the clock. Synchronous designs can be separated into a set of logic gates, which can be tested by verifying their truth table, and a set of flip-flops, which can be tested by chaining them to form a shift register. This scan-based testing paradigm replaces the regular flip-flops in the design with “scan flip-flops” as shown in Figure 2.4. 13 D TEST ENABLE Q DFF TEST INPUT CLOCK Figure 2.4: The simplest scan flip-flop cell is simply composed of a multiplexer and a regular D flip-flop. The Q output of one scan cell can be connected to the TEST INPUT of another scan cell, enabling a chain configuration. These are flip-flops with input multiplexers which select between the regular “functional” input, and the test-mode input. Typically, the test-mode input comes from the output of another flip-flop, thus forming a scan chain. The operation of a scan chain can be thought of as having three phases: • Assert test mode. All flip-flops are configured into a distributed shift register. Test data is shifted in. This data is applied to the inputs of the logic. • Deassert test mode. Flip-flops are configured to get their data from the outputs of the logic. The flip-flops are clocked, thus latching in the output value of the logic. • Reassert test mode. All flip-flops are configured into a distributed shift register. Test data is shifted out. This data is returned to the tested for analysis and comparison with the expected output. Using this testing method, the tester only tests combinational logic, instead of testing state machines. This amounts to a profound reduction in the testing time required to achieve a given level of test coverage. 2.2.1 Scan-based Attacks Systems that support scan-based testing are potentially vulnerable to attacks that use the scan chains as vectors for reading or modifying sensitive data contained in the device. Cryptographic keys are common targets for this kind of attack. The most basic scan attack applies to chips that contain a key register that can be scanned. In this attack the attacker typically connects to the JTAG port of the victim chip, selects the scan chain that 14 contains the key register, and shifts out the key. Less naive chips avoid having the key directly scannable. However, simply excluding the key register from the scan chain is not sufficient to prevent a skilled attacker from extracting the key. Yang, Wu, and Karri [5] present a practical attack against an AES hardware implementation with a non-scannable key register. They exploit the scan chains as an information leakage path, allowing recovery of the crypto key. They assume that the key is not directly scannable, and their attack uses indirect information to reconstruct the key. They also assume that the attacker does not have knowledge of the internal scan chain structure of the crypto chip. The attack begins with discovering the scan chain structure, determining which bit positions in the scan chain correspond to the bits of the intermediate result register. Next, using the observability of the intermediate result, the attacker recovers the round key. The attack shows that even if the key register is not exposed directly though any scan chain, the byproducts of the key contain enough information for the attacker to infer the key bits. 2.2.2 Countermeasures for Scan Attacks Countermeasures are available for protecting against scan attacks. There is a tradeoff between security and testability. Effective countermeasures for scan attacks must simultaneously provide acceptable levels of security and testability. Hely, Flottes, and Bancel, et. al. [6] observe that a static assignment of functional registers to positions in the scan chain is risky because an attacker can infer the assignment and then use the scan chain to extract secret information from the chip. To mitigate this threat, they introduce Scan Chain Scrambling. For the authorized user, a test key is provided to the chip and assignment of registers to scan chain positions is static. For an attacker without the ability to authenticate, the assignment is semi-random. Chunks of the set of scannable flip-flops are mapped to chunks of the scan chain sequence, but the order will be unknown to the attacker. The permutation changes periodically while the chip is operating. Yang, Wu, and Karri [7] propose the “Secure Scan” scheme for protecting embedded keys from being leaked via scan chains. To allow the crypto hardware to be fully tested without exposing the key or its byproducts, Secure Scan introduces a second embedded key, the mirror key for use during testing. lxxv 15 Figure 4.2: State diagram of secure scan architecture Figure 2.5: Secure Scan state diagram. The only way to get from secure mode, where the mission key is crypto chip is powered off. The operations of crypto chips using the proposed secure scan architecture are summarized in Figure 4.2. is to go through a power cycle reset, which wipes all loaded, to insecure mode, where the chip is testable, The design of MKR is shown in Figure 4.3. A multiplexer controlled by Load Key volatile state variables. can be inserted either at the input to the MKR (Figure 4.3(a)) or at the output of MKR (Figure 4.3(b)). In Figure 4.3(a), when Load Key signal is active, the input of the MKR is locked to secret key. Any operation that writes to or scans the MKR is disabled. In Figure At any moment, as shown in Figure 2.5, the chip is either in mission mode (“Secure mode”) or test 4.3(b), the output of the MKR is masked by secret key when Load Key signal is active. The MKR structure doesmode, not modify the scan and the insertion of mode (“Insecure mode”).shown Whenin theFigure chip is4.3(b) in secure the mission key iscell used but scanning is disallowed. the additional multiplexer does not modify the scan chain. Therefore it is easy to integrate When the chip is in insecure mode, scanning is allowed, but only the mirror key is used. It is impossible secure scan into current scan DFT design flows. The secure DFT architecture introduces threeoff new Scan In, almost to get from secure modescan to insecure mode without powering thecontrol chip. signals Secure Enable Scan thus allows Enable Scan Out and Load Key. There are two ways to generate these signals. One way completeisstructural exposing the mission key. Since the mission key is not to modifytestability the IEEE without 1149.1 JTAG Controller as shown in Figure 4.4 by adding one used morewhile the named Secure the isstandard 16-state TAP FSM. new instruction nameddrawback chip is instate insecure mode, the Mode missiontokey obviously untestable. This is,A however, not a serious Drive Secure Mode that can be applied through TMS pin is added. In the 16-state TAP in practice sincethe theSelect-IR-Scan correctness of the can be verified by state functional FSM, statekey jumps intoquickly Test-logic-reset when testing. a one is received from TMS pin. In the modified TAP controller that supports secure scan architecture, the Select-IR-Scan state jumps into Secure Mode when a one is received from TMS pin. Once the TAP Controller is in this state, it does not take instructions from TMS any more. The transition from Secure Mode to Test-logic-reset will not happen until the chip is reset or lxxiv 16 Figure 4.1: Secure scan architecture with a mirror key register Figure 2.6: Secure Scan architecture. The mirror key register (MKR) is loaded only when Load Key is isolate the secret key from the crypto core. The mirror key register works like a normal register. Enable Scan In and Enable Scan Out signals are active and general scan-based DFT can be performed. The states of all registers in the design can be scanned in and out to verify if the fabricated chip performs as expected. Note that in the model we used for Lee, Tehranipoor, and Plusquellic [8] point out that the intellectual property contained in an integrated scan-based side channel attack, the key register is not included in the scan chain. On one circuit is also risk because of scan chains. An attacker and observe signals inside a chip hand suchatmodel provides limited security sincethat we can needcontrol to perform the two-step attack shown in previous the other it alsoTo limits the this, test the maytoberecover able to the inferkey theas logical structure of thesection. chip, andOn thereby learnhand, the design. prevent capability since only the functionality involving this key is verified. In the secure scan authors introduce a technique they call Low-Cost Secure Scan, that blocks unauthorized access to the scan DFT architecture, higher fault coverage can be expected since multiple test vectors can be scanned into the mirror registers.to the algorithms used for inserting scan chains in designs, but chains. The technique requireskey modification Once the function of the crypto chip and upper layer software has been verified, the the scope of the protection includes the intellectual property of the design, not just embedded keys. To chip is driven into secure mode for its practical applications. In the secure mode, the secret use key the Low-Cost system, vectorsLoad that Key are applied bitstime in addition to the is appliedSecure to theScan crypto corethe by test enabling signal.contain At thekey same the scan function is disabled by de-activating Enable Scan In and Enable Scan Out signals. This test stimulus. If the key bits are correct, the chip produces the pre-calculated output. Otherwise, the test prevents access to the mirror key register. Scan mode signal remains inactive during secure response pseudorandom. Theoperation pseudorandom response is intended to raise the difficulty for an attacker modeisand hence no shift is performed. Once the chip is in secure it cannot return insecureexplicit mode indication to perform who wishes to launch a guessing attack,mode, as opposed to giving an to immediate of any whether test and debug operation. The secret key is loaded to the crypto core only when the chip the attacker’s guess is correct. is in the secure mode. Although the temporary results are stored in other registers, they cannot be scanned out. The only way to re-enter the insecure mode is to reset the chip 2.3by power BIST off followed by a power on operation. All registers inside the chip that have temporary results are cleared. Since in crypto chips such as smart cards, a key is usually Built-In Self Test (BIST) is a popular technique testing hardware requiringwhen external stored in nonvolatile memory or fabricated as afor fixed value, it will without not be cleared the test active, which is controlled by the state machine shown in Figure 2.5. equipment. There are many varieties of BIST designed to test different kinds of circuits and to detect different classes of faults. A common feature of all BIST systems is that a large volume of test data moves 17 between the on-chip BIST logic and the circuit under test, while a minimal amount of data moves between the chip and it surrounding system. In the extreme, the chip can simply produce a one-bit status indication of whether it is working correctly, or there is an error. Two of the most common varieties of BIST are memory BIST and logic BIST. In principle, differentiating between memory and logic is not necessary. In practice, testing techniques that target a specific type of circuit are much more efficient in terms of test time required to get adequate fault coverage. Typical memory BIST uses a state machine or a small embedded microprocessor to generate memory access signals that carry out a standard memory test algorithm such as of the March series of tests [9]. In a typical logic BIST setup, pseudorandom test vectors are generated on-chip and applied to the circuit under test. The responses are compacted and aggregated during many test cycles, using a Multiple-Input Shift Register (MISR). This produces a fixed-length final value that is compared with the expected value, which is hard-coded into the chip. From a security standpoint, BIST has many ramifications. In both logic and memory testing, the BIST controller can act as a trusted proxy between the tester and the chip’s core logic. This architecture can raise security by enforcing a limited set of actions that can be taken by a tester. The enforcement of limited interfaces is consistent with good security design (e.g., principle of least privilege). The tester of cryptographic logic needs assurance that the hardware is working correctly, but shouldn’t necessarily have access to secret data in the chip, such as an embedded key. Similarly, BIST can provide assurance that a memory is error-free while eliminating the possibility of misuse of memory “debugging” functionality for tampering with security-critical data. Despite BIST’s advantages of running at full functional clock speed and improving security, it has two problems. First, it typically provides fault detection, but not fault isolation. Second, it adds area to the chip. A BIST implementation contains a test pattern generator and an output response analyzer. Both of these hardware modules occupy area. However, in certain applications this area cost can be eliminated. A technique called Crypto BIST [10] uses a symmetric cipher core (AES) to test itself. By looping the output of the AES core back into the input, the AES core functions as both the test pattern generator and output response analyzer. Crypto BIST achieves 100% stuck-at fault coverage of the AES core in 120 clock cycles of 18 test time. The essence of the technique is the observation that strict avalanche criterion of the cryptographic algorithm causes the AES core to act as both a diverse source of test patterns and a sensitive output response analyzer, leading to high test coverage in few cycles. 2.4 JTAG In the 1970’s and early 1980’s, a common way of testing printed circuit boards was to add test points, and to probe these test points with a bed-of-nails test fixture, as shown in Figure 2.7. test point component printed circuit board test fixture nail ATE Figure 2.7: Bed of nails test fixture. Automated test equipment (ATE) generates stimulus signals and measures responses. The ATE is connected to the test fixture, which contains one nail per test channel. Each nail is spring-loaded so it maintains a controlled pressure when contacting the test points on the printed circuit board being tested. This approach could not keep up with increases in component density and pin spacing. Alternative methods of testing were developed. As always, cost was a major factor affecting the choice of test method. Interoperability was also a factor, since components from many manufacturers coexist on large printed circuit 19 boards. Having a single test interface from which all components could be tested was desired. The solution was developed by a working group known as the Joint Test Access Group in the 1980’s. This became IEEE Standard 1149.1 and is widely referred to simply as JTAG. IEEE 1149.1 standardizes the set of signals used to access test logic inside chips. The standard specifies the use of scan-based testing for the internal logic of the chip and also for the inter-chip wiring. JTAG uses synchronous serial communication with separate signals for data and control. Using 1149.1, a tester can force signals on pins, read signals on pins, apply signals to the core logic, read signals from the core logic, and invoke arbitrary custom test functions that might exist in certain chips. However complicated the testing task might be, the communication always takes place over the same wires: • TCK - test clock, while in test mode, all events happen on edges of TCK • TMS - test mode select, determines the next state of the JTAG port • TDI - test data in, test vectors and JTAG instructions are applied via TDI • TDO - test data out, test responses or data that loops through • TRST - test reset, optional hardware reset signal for the test logic 20 0 TEST_LOGIC_RST 0 0 RUN_TEST_IDLE 1 SELECT_DR 1 SELECT_IR 0 0 1 1 CAPTURE_DR CAPTURE_IR 0 0 0 SHIFT_DR SHIFT_IR 1 1 EXIT1_DR 0 PAUSE_IR 1 0 1 EXIT2_DR 0 0 EXIT2_IR 1 1 UPDATE_DR Figure 2.8: The JTAG state machine. 1 EXIT1_IR PAUSE_DR 1 0 1 0 0 1 0 There are 16 states. UPDATE_IR 1 0 The TMS signal determines the next state. The SHIFT DR state is used for applying stimuli and collecting responses. From any state, the TEST LOGIC RESET state can be reached by holding TMS high for five clock cycles. An important feature of JTAG is its support for daisy chaining. Devices can be wired in a chain where the TDO (output) of one device is applied to the TDI (input) of the next device in the chain, as shown in Figure 2.9. The TCK, TMS, and TRST signals can be simply bussed to all chips in the chain, within fan-out limits. Otherwise, buffers can be used. Each chip in the JTAG chain has a state machine implementing the 21 protocol shown in Figure 2.8. TDO DEVICE 5 TMS DEVICE 3 COMPUTER WITH JTAG INTERFACE DEVICE 4 TCK TRST TDI DEVICE 1 DEVICE 2 Figure 2.9: A typical JTAG system. TMS, TCK, and TRST are bussed to all of the devices. TDO of each component is connected to TDI of the next component, thereby forming a daisy-chain topology. One of the state variables controlled by the state machine is the Instruction Register (IR), shown in Figure 2.10. The instruction register is typically between 4 and 16 bits. Some instructions are mandated by the JTAG standard, while implementers are free to define as many of their own instructions as they like. One of the most important instructions is the required instruction, BYPASS. When the IR contains the BYPASS opcode, a JTAG-compliant chip places a single flip-flop in the path from its TDI input to its TDO output. Therefore a chain of chips in the BYPASS state behaves like a shift register. 22 select lines BYPASS REGISTER TDO BOUNDARY SCAN REGISTER TDI IDCODE REGISTER INSTRUCTION REGISTER TAP STATE MACHINE TMS Figure 2.10: The essential components of a basic JTAG implementation include a test access port state machine, an instruction register, one or more data registers, and an output multiplexer. Each chain of scan flip-flop cells (internal or boundary) appears to JTAG as a data register that can be selected with the appropriate instruction. 2.4.1 JTAG hacks JTAG has played a part in many attacks on the security of digital hardware. Attackers have used it to copy cryptographic keys out of satellite boxes for the purpose of pirating satellite TV service [11]. The JTAG port in Microsoft’s Xbox 360 has been exploited to circumvent the DRM policies of the device [12]. Powerful low-level capabilities are often exposed through the JTAG interfaces of systems. Attackers have learned this, and when they attack a device, they look for a JTAG port, among other things. Rosenfeld and Karri [13] examine the threat of JTAG-level hacking. Specific attention is given to the vulnerabilities that result from the common daisy-chain topology of JTAG wiring. They consider the possibility of one malicious node in a JTAG chain attacking other nodes or deceiving the tester. They examine the threat of a malicious node hijacking the bus by forcing the control signals. With two nodes (the 23 tester and the attacker) both driving a bus wire at the same time, it becomes an analog question, who will win. The research showed that it was often possible for the attacking node to hijack the bus when the JTAG bus wires are short, and always possible to hijack when the bus wires are long, due to pulse properties of transmission lines. 2.4.2 JTAG Defenses Several defenses for JTAG have been proposed over the years. When considering JTAG defenses, it is important to keep in mind the many constraints and requirements that affect the design process. For example, flexibility to provide in-field firmware updates is often valuable, but for this to be secure, some sort of authentication mechanism is required. Some applications have tight requirements on cost and cannot tolerate the extra circuitry required for authentication. As always in engineering, there are trade-offs, and making the best choice requires a detailed understanding of the application. 2.4.2.1 Elimination of JTAG One way to eliminate the risks associated with JTAG is to eliminate JTAG from the design. There are several ways this can be done while maintaining low escape rate, the probability of a defective part being shipped to a customer. One method is simply to use conservative design rules. Common sources of manufacturing faults are shorts and opens in the metal wiring of the chip. If wires are made wider, and spacing between wires is kept greater, many manufacturing faults are eliminated. If transistors have non-minimum gate length, that eliminates another source of faults. This approach has a high cost in area and speed. Another method, and one that is very popular, is to use Built-In Self Test (BIST), discussed in Section 2.3. The result of running BIST can be as simple as a single bit indicating whether the chip passes the tests or fails. In this form, BIST provides security benefits because internal scan can be eliminated from the set of supported JTAG instructions, thus significantly reducing the chip’s attack surface. BIST, however, is not always a satisfactory replacement for scan-based testing. Since BIST test vectors are generated pseudorandomly instead of deliberately, using an automated test pattern generation (ATPG) 24 algorithm, it can be difficult to get full test coverage using BIST. This is partially offset by the fact that BIST is typically done at-speed, meaning that test vectors are applied at the same rate that functional data would normally be applied. In contrast, when test vectors are applied using external automated test equipment, the test clock is typically an order of magnitude slower than the functional clock. Another disadvantage of BIST is that it does not provide fault isolation. For the engineers developing a chip, it is essential to be able to quickly iterate toward a successful design that can be shipped. Without the ability to determine the location and type of the fault, designers are not able to fix the problem. For this reason, BIST is more useful for testing during full production and in the field, where a failure will simply cause the part to be discarded. Scan-based test infrastructure is often retained in the design, in case it is needed for engineering purposes. 2.4.2.2 Destruction of JTAG Hardware After Use In many circumstances, an argument can be made that the test structures in a chip are only needed at the factory, and constitute nothing more than a security risk once the chip is shipped. In such cases designers sometimes elect to disable the JTAG circuitry on the chip after testing. A common way of implementing this is with fuses that can be electrically blown by the tester. For finer grained control over what capabilities remain enabled, the design can contain more than one fuse. A patent by Sourgen [14] in 1993 discusses these techniques. The IMX31 microprocessor from Freescale Semiconductor is an ARM-based chip intended for mobile and media applications. This type of embedded processor is often required to protect the data that in processes, in the case of digital rights management, and the code that it runs, in cases where the system software contains valuable intellectual property. The IMX31 supports four JTAG security modes, selectable by blowing fuses. In mode 4, the JTAG logic allows all possible JTAG operations. In mode 1, only the JTAG operations necessary for interoperability are allowed. Blowing fuses is an irreversible operation. Therefore, the security mode can be raised, but never lowered. This is fits well with a common use case of JTAG, where it is used at the factory for testing and perhaps by engineers for in-system debugging in their labs, but should not be used in the field by hackers. 25 2.4.2.3 Password Protection of JTAG Buskey and Frosik developed a scheme they call Protected JTAG [15], which enhances the security of JTAG by requiring authentication and authorization to access particular features of the chip’s test structures. The scheme makes use of a trusted server which uses a pre-shared elliptic curve key pair to prove to the chip that the user’s JTAG access request has been authenticated. The authors anticipate a use case where the tester connects directly to the chip and connects to the trusted server via the Internet, using standard Internet communication security protocols. Once authentication is complete, the chip stays in the authenticated state for the duration of a test session. The benefits of having a separate trusted server for authentication and authorization are that these can be managed independently, after the chip is deployed. For example, a new user can be added anytime, with access to an arbitrary set of test features. A disadvantage of the scheme is the reliance on the continued security and availability of the authentication server. 2.4.2.4 Hiding the JTAG Behind a System Controller One approach to JTAG security is to use a system controller chip. In this architecture, the system controller acts as a proxy for test-related communication with one or more chips, typically on a printed circuit board. This scheme adds security to a system without requiring any modification to the chips themselves. The system controller can enforce sensible security policies such as: • All accesses must be authenticated. • Authenticated testers can only access the resources for which they are authorized. • Only signed and verified firmware updates are permitted. • Backrev (reverting to a previous version) of firmware is not permitted. • All communication between the tester and the system controller is cryptographically protected against man-in-the-middle attacks. The system controller can play an active role in the testing and maintenance of the system, beyond simply acting as a proxy [16]. The system controller can store test sequences and run the tests automatically 26 at power-up time or when a specific test routine is externally invoked. This architecture provides the benefits of BIST as discussed in Section 2.3. A controller supporting this type of approach is commercially available under the name SystemBIST [17]. It also provides functionality for verifying the state of JTAG-connected devices, for example, to verify that the configuration bit file of an FPGA was correctly programmed. As with all practical security, it is not absolute. Successful practical approaches have to strike a balance between cost, functionality, and security. The value of the system controller approach is that it preserves the economy of scale of using commodity chips that don’t directly support any JTAG security enhancements, while providing increased functionality in the form of BIST, while defeating some common classes of attacks. 2.4.2.5 Crypto JTAG with Embedded Keys Rosenfeld and Karri [13] introduce a set of security enhancements for JTAG that are backward compatible and interoperable with the original IEEE standard. Compact cryptography modules are included in the security-enhanced chips, allowing the tester to authenticate the chip, preventing unauthorized testing, encrypting test data going to and from the chip, and protecting against modification of test data. Keys are programmed into each chip at the factory and delivered to the user through an out-of-band secure channel. This allows the customer of the chip (e.g., system integrator) to confirm that it came from the real factory, thus thwarting supply-chain attacks. 2.5 SoC test Infrastructure The system-on-chip (SoC) is an important paradigm for the integrated circuit industry. The essential idea is that modules can by designed independently, and integrated onto a single chip. And SoC is in many ways similar to a printed circuit board containing many chips from different manufacturers. Compared to a conventional fully in-house chip development process, SoC development presents new security challenges in how to test, configure, and debug the modules within the chip. A large number of independent entities are involved in a typical SoC development cycle. • SoC integrator 27 • Core designer • CAD tool provider • IC testing service • IC fabrication service • IC packaging service An additional set of entities are affected by the security of the device after it is deployed. • End users • Service and infrastructure providers • Content creators and other holders of intellectual property SoCs are subject to a variety of threats at different stages of their life-cycle. The goals of the attacks that exploit the test mechanisms include grabbing cryptographic keys, changing system behavior, and learning secrets about the intellectual property contained in the SoC. 2.5.1 SoC Test Hacks The test access mechanisms that are used in SoCs evolved from those that are used on printed circuit boards. In particular, JTAG has been used both for external interfacing of the chip to the test equipment as well as for internal test access to cores within the die. All of the threats that apply to the test interfaces of monolithic ICs also apply to SoCs. At the same time, a number of new threats affect SoCs, primarily due to the fragmented nature of their design, with different trust assumptions applying to the different modules. 2.5.1.1 Test Bus Snooping SoC test signals can be routed on the chip several different ways. Several engineering considerations affect test signal routing. Traditionally, the trade-off has been between speed and cost. Wide data paths and dedicated per-core wiring raise cost but give good testing speed. Narrow, often bit-wide, data paths 28 and shared wiring lower cost, but reduce testing speed. The trade-offs in test bus design have been studied extensively [18]. In an SoC made of several cores, the optimal configuration often has multiple cores timesharing the same test wiring. Typically, the intention is for test data to be communicated between the tester (master) and the test target (slave). Some implementations of shared test wiring allow malicious cores to snoop the test bus, receiving messages that go to or from another core. The actions that a malicious core can take with snooped test data depend on the system. If, for example, the test data contains cryptographic keys, a malicious core can leak the keys to an attacker through a side-channel. 2.5.1.2 Test Bus Hijacking Another concern for the SoC designer is that on shared test wiring, a malicious core could actively interfere with communications between the test and the target core. This type of attack is most threatening when test data actually passes through the untrustworthy test logic of the malicious core, as it does in daisy-chained architectures like JTAG. 2.5.2 Defenses for SoC Test Mechanisms Several techniques have been applied by chip companies to address the problem of securing SoC test mechanisms. Successful techniques must, as always, balance security, cost, functionality, and performance goals. Additionally, human factors affect whether new mechanisms will succeed. Engineers are inundated with information during the design process, and often prefer to use mechanisms that are known to work and don’t require learning. An example of this effect can be seen in the continued use of RS-232, even in brand new systems with no legacy. To succeed in the market, enhancements that add security to SoC test interfaces must meet security, cost, functionality, and performance goals and also should not burden the engineers that use them. 2.5.2.1 Elimination of Test Mechanisms The most straightforward approach to securing the test interface is, as always, to eliminate it. Elimination of test interfaces is usually not a practical solution because they are often required, for example, 29 for programming firmware into chips and for configuring platform-specific settings. For reasons of economy of scale, it is preferable for a single chip to support multiple use cases. Test interfaces provide a convenient mechanism for low-level configuration of complex chips. Eliminating the test interface from a chip design would limit the market for the chip. The applications where security is a very high concern, such as electronics for defense systems, require in-field testability for maintaining high availability. 2.5.2.2 Elimination of Shared Test Wiring SoC test infrastructures can be secured against attacks by hostile cores by assigning each core its own test bus. This results in a star architecture, where the cores are the points of the star, and the test controller is the center. If the trust assumption is that the SoC integrator, CAD tools, and fabrication are trustworthy, but the third-party IP cores are not trustworthy, then the star architecture provides good security, since it minimizes the damage that can result from an untrustworthy core. However, it is the most expensive topology from a standpoint of wiring cost. key gen WSI test controller WSO key core 1 reg scan cell WSC WSI WSO key core 2 reg scan cell WSC WSI WSO key core 3 reg WSC scan cell Figure 2.11: A chain of scan cells is used for distributing keys to each of the cores. The scan cells are configured not to expose key bits at their outputs while they are being shifted. 30 2.5.2.3 Cryptographic Protection for Shared Wiring To capture the cost savings of shared wiring while restricting the security risks of untrustworthy cores on the shared test bus, cryptography can be used. Rosenfeld and Karri [19] developed a low-cost technique that adds security without breaking compatibility with existing test interface standards. They introduce a structure that is essentially a scan chain that snakes through the SoC and provides a trusted mechanism for delivering a crypto key to each core. The technique is based on the assumption that the logic and wiring that is placed by the SoC integrator is trustworthy, while the third-party cores are not trustworthy. After delivering separate crypto keys to each core during initialization, the tester can use shared test wiring for the actual testing and configuration. The test wiring data path can be arbitrarily wide while the key delivery wiring need only be one bit wide. 2.6 Emerging Areas of Test Security As engineered systems become more complex year after year, they often acquire testability features. Sometimes standard solutions like JTAG are used, but sometimes new testing schemes are invented and deployed. Each testing scenario has its own security concerns, its own strengths, and its own weaknesses. A common mistake is to develop a new system without listing security in the design requirements while, in fact, it is rare to find a non-trivial system where security is not in fact a requirement. Automobiles and medical implants are two areas where test and management interfaces have been deployed without security. Both of these areas have seen recent research results demonstrating the vulnerabilities of the actual systems in the field. 2.6.1 OBD-II for Automobile Modern cars are heavily computerized. They contain multiple computers performing real-time tasks such as engine control, climate control, braking, traction control, navigation, entertainment, and drivetrain control. These subsystems are typically linked together on a communication bus. From a central point, a technician can check the status of all of the subsystems by communicating with them over the bus. This development is helpful for reducing the time needed to diagnose a problem with the car. The most common 31 interface for electronic testing of cars is On-Board Diagnostics II (OBD-II). OBD-II is a connector, typically under the dashboard. Like all test interfaces, OBD-II exposes the system to attacks that would not otherwise be possible. Koscher, Czeskis, and Roesner et. al. [20] present an analysis of automobile security with a focus on the OBD-II and the Controller-Area Network (CAN) that is used to interconnect the components in the car. They show that by plugging a malicious computer into the OBD-II test interface, they can severely undermine the security of the entire car, causing it to behave in ways that would be dangerous or even deadly for the occupants of the car and for others around them. It is a common saying in the field of information security that security cannot be an afterthought; it must be considered from the beginning and built into the product. It is unlikely that the security flaws in existing cars will be fixed. Most likely, a new version of the standards will be required. OBD-II was mandated by the US government for new cars sold on the American market. Similarly, they could mandate the use of a secure successor to OBD-II and CAN. 2.6.2 Medical Implant Interface Security Some medical implants have remote management features. This allows the physician or technician to interact with the device after it has been implanted, to test, configure, and tune the device. The noninvasive nature of these interactions is important for the patient’s health and comfort. Some implant products use wireless radio frequency communication to link the programming console with the implanted device. As always, if implemented naively, radio links are subject to sniffing, unauthorized access, and jamming attacks. In the context of medical implants, attacks against the management interface could have very serious consequences. Unfortunately many implants that have already been deployed are in fact vulnerable to remote attacks on their management interface. Halperin [21] performed a security analysis of a commercial implantable cardioverter defibrillator. They reverse-engineered the communication protocol used by the management interface. The authors were able to sniff private information and to control the implanted device (potentially inducing fibrillation in the patient) and to cause it to exhaust its battery at an accelerated rate. This first phase of their work focused 32 on assessing the level of security present. What they found was that the only obstacles to attack were security by obscurity, which they circumvented by capturing, analyzing, and emulating the link protocol. The second phase of their work was on suggesting practical measures for improving security. They offer three countermeasures against the attacks they discovered. One countermeasure is simply for the implant to make noise when communication is taking place. This alerts the patient and reduces the likelihood of an undetected attack. The other countermeasures focus on authentication and key exchange within the power and space constraints of medical implants. 2.7 Recapitulation and Projection We have examined the way security and testability interact in various areas. Internal (structural) testability of systems is often required, particularly as systems reach a certain complexity. Testability features that provide controllability and observability of the internals of systems are major security risks if not designed with security in mind. Attackers have exploited unsecured test interfaces in the past and it is likely that this will continue. There is a significant and growing literature on how to provide testability without security vulnerability. The defenses that already exist in the literature are relevant to their intended target system. To some extent, the existing solutions can be adapted to secure new testing scenarios as they emerge. Some testing scenarios will not lend themselves to existing solutions, and will require inventive minds to create new approaches for secure internal testing. Chapter 3 JTAG Security This chapter addresses some security issues surrounding JTAG. We look at the threat of a malicious chip in a JTAG chain. We outline attack scenarios where trust in a digital system is downgraded by the presence of such a chip. To defend against this, we propose a protection scheme that hardens JTAG by making use of lightweight cryptographic primitives, namely stream ciphers and incremental message authentication codes. The scheme defines four levels of protection. For each of the attack scenarios, we determine which protection level is needed to prevent it. Finally, we discuss the practical aspects of implementing these security enhancements, such as area, test time and operational overheads. 3.1 Introduction JTAG [1] is the dominant standard for in-circuit test. It has been in use for 20 years and, like many mature standards seen in the landscape of information systems, JTAG was conceived with a friendly environment in mind. It was designed to handle the natural enemies of digital systems: faults in design, fabrication, packaging, and PC boards. It has been quite successful, largely due to the economy and flexibility of the system, and has been extended in various directions. It has evolved into the de facto method for incircuit configuration and debug. The companion standard IEEE 1532 [22] has extended it even further to support on-board programming. While JTAG has great utility in friendly environments, in even moderately hostile environments it can lead to undesirable forms of exposure. This article discusses attacks and defenses for JTAG-enabled systems. Our goals are to provide a practical path toward better security while maintaining strict compliance with the IEEE 1149.1 JTAG stan- 34 dard, all without significantly affecting test economics. We do not regard JTAG as a security threat. Indeed, the presence of JTAG on a board enables stronger platform security than would typically be achievable in an equivalent system without JTAG. Our view is that security problems arise when there is a discrepancy between what people expect and what assurances the system actually provides. JTAG is a synchronous serial link intended for system management tasks. As shown in Fig 3.1, it supports daisy-chaining of an arbitrary number of devices. A single master device controls the protocol state of all other devices on a chain. From a standpoint of functionality, the order of the devices in the chain does not matter. Shared wiring and freedom over the order of the devices reduce the burden on the PCB designer, which translates into lower cost for JTAG compared with other in-system test and management solutions. In section 3.3, we see that the shared wiring and ordering of devices both have security implications. TDO DEVICE 5 TMS DEVICE 3 STANDARD COMPUTER WITH JTAG INTERFACE DEVICE 4 TCK TRST TDI DEVICE 1 DEVICE 2 Figure 3.1: The typical deployment of JTAG is a chain of several devices on a printed circuit board. Each device may come from a different vendor. The test mode select (TMS), test clock (TCK), and test reset (TRST) signals are typically common to all chips. The test data in (TDI) signal and test data out (TDO) signals loop through the chips. The path returns to the source, which is usually either a PC or an embedded microcontroller, functionally called a “system controller.” The concept of JTAG does not preclude providing protection and assurances, but it is typically deployed in a minimal form which provides little or no protection. We will describe several ways in which an attacker can exploit a typical JTAG deployment to achieve his goals. Then we will review the aspects of 35 JTAG that are relevant to the execution of the attacks. Section 3.4 surveys some of the prior work in this area, both on the attack side and on the countermeasure side. Following that, we will present defenses with the objective of improving security without incurring heavy extra costs. 3.2 JTAG Overview The JTAG protocol defines a bidirectional communication link with one master and an arbitrary number of slaves. The master can initiate an interaction with any of the slaves. Slaves never initiate interactions. A chip is JTAG-enabled by the presence of two essential physical components. The first is the test access port (TAP) controller, which is a state machine that interprets the JTAG serial protocol. The other physical component is the boundary scan register (BSR), which is a register that is interposed between the core logic of the chip and the I/O modules of the chip. The BSR can be arbitrarily wide, but can be preset and read serially. JTAG has three basic modes: BYPASS, EXTEST, and INTEST. 3.2.1 BYPASS Mode In BYPASS mode, the serial bit stream is copied from the TDI pin to the TDO pin, delayed by one cycle of the test clock, TCK. 3.2.2 EXTEST Mode In EXTEST (EXternal TEST) mode, the BSR is connected to the pins of chip. The BSR specifies the values that are to be asserted on the chip’s output pins. The BSR captures the values that are present at the chip’s input pins. The data in the BSR are subsequently shifted out through the TDO pin. The TDO signal is either routed back to the master directly or through other JTAG-enabled chips. 3.2.3 INTEST Mode In INTEST (INternal TEST) mode, the BSR is serially loaded from the TDI pin and the contents of the BSR are applied to the terminals of the internal logic of the chip. The BSR captures the values that are present at the output terminals of the internal logic of the chip. 36 The JTAG standard requires supporting the basic instructions. Implementors are free to add their own instructions, and for these user-defined instructions they are free to define the semantics more or less arbitrarily. Whatever the instruction, the protocol follows the same basic pattern: Enter the shift ir state → Shift the instruction into the instruction register (ir) → Enter the shift dr state → Shift the relevant data into and/or out of the data register (dr). 3.3 JTAG Attacks ATTACKERS ATTACK REQUIREMENTS ATTACKERS’ CAPABILITIES ATTACKS ATTACKERS’ GOALS DEFENSES K1 A1 G1 K2 G2 G3 CAPABILITIES: P1 P2 P3 P4 A2 G4 K3 A3 G5 G6 P1 K4 Figure 3.2: Conceptual security model: A set of attackers A1, A2, and A3 have a set of goals G1 through G6. Each attacker has a set of attack capabilities, some or all of which are masked by defenses that are in place. There is a set of attacks, K1 through K4, each of which requires a certain set of unmasked attack capabilities. Each attack can be used to reach some set of goals. This example shows that attacker A1 can achieve goals G2 and G4 since it has capabilities P2 and P4, which are the requirements for attack K2. Attackers A2 and A3 do not have sufficient unmasked capabilities to execute any of the attacks. 37 A general model of the JTAG security landscape is shown in Figure 3.2. The purpose of this model is to provide a way of analyzing the security risks associated with a JTAG deployment. A system can have multiple potential attackers. For example, one potential attacker of a system is the manufacturer of one of the chips. Another potential attacker is a hostile end user. Each attacker has a set of capabilities, some subset of P1 through P4. For example, an attack capability is to be able to sniff the data that is sent through the data lines. Some of the attackers’ capabilities might be blocked by defenses that are in place. For example, the attacker might have access to sniff the bits on the JTAG bus. If those bits are encrypted, sniffing attacks will not work. There is a set of possible attacks, each of which has attack requirements. For example, an attack to capture the configuration stream of an FPGA as it is being programmed over JTAG requires the ability to sniff the JTAG data line. Each potential attacker will have a set of attack capabilities that are not masked by defenses that determines his set of possible attacks. Finally, we consider the attackers’ goals. One possible goal an attacker might have is to cause a denial of service. Another possible goal might be to clone the system. Having looked at the general model of JTAG security, we examine five basic JTAG-based attacks. The goal here is to see the general range of possibilities, not to exhaustively list all possible scenarios. After examining these attacks in terms of the security model in Figure 3.2, we will be able to see what defense mechanisms we need. In a given attack scenario, the attacker will possess a limited set of capabilities: • sniff the TDI/TDO signals • modify the TDI/TDO signals • control the TMS and TCK signals • access keys used by testers The system topology in all of the examples is the same as shown in Figure 3.1. We assume throughout that the attacker is capable of providing or otherwise controlling the attack chip. An attacker can combine one or more attacks to achieve his goals. Some attacks have already been carried out against real systems as discussed in section 3.4. 38 3.3.1 Sniff Secret Data TDO DEVICE 4 TMS DEVICE 3 (VICTIM) STANDARD COMPUTER WITH JTAG INTERFACE DEVICE 5 TCK TRST TDI SECRET DATA FROM TESTER TO DEVICE 3 DEVICE 1 DEVICE 2 (ATTACKER) ATTACKER KEEPS A COPY OF DATA VICTIM GETS UNMODIFIED DATA Figure 3.3: The attacker obtains secret data by sniffing the JTAG data path. The sniffing attack is shown in Figure 3.3. The attacker’s goal is to learn a secret that is being programmed into victim chip using JTAG. An additional requirement for this attack is that the victim chip is downstream from the attack chip on the same JTAG chain. As shown in Figure 3.3, attack chip exhibits a false BYPASS mode which is externally identical to true bypass mode, but which parses JTAG signals. When the secret is programmed into victim chip, the attack chip captures a copy. The secret is then delivered to the attacker either through JTAG interrogation of attack chip in the field, or through a side-channel. Alternatively, the attack chip might directly make use of the sniffed data. 39 3.3.2 Read Out Secret ATTACKER 2 COLLECTS TEST RESPONSES FROM VICTIM TDO DEVICE 4 TMS DEVICE 3 (VICTIM) STANDARD COMPUTER WITH JTAG INTERFACE DEVICE 5 (ATTACKER 2) TCK TRST TDI DEVICE 1 DEVICE 2 (ATTACKER 1) ATTACKER 1 INITIATES JTAG COMMUNICATION VICTIM RESPONDS NORMALLY TO TEST VECTORS Figure 3.4: The attacker obtains an embedded secret by forcing test vectors onto the JTAG lines. The read-out attack is shown in Figure 3.4. Here, the attacker’s goal is to learn a secret that is contained in the victim chip. We assume that the attacker is capable of using I/O drivers in the upstream attack chip (attacker 1) to forcefully control the TMS and TCK lines. The details of forcing these signals are discussed in section 3.3.6. An additional requirement for this scenario is that the attack and victim chips are on the same JTAG chain with the victim chip sandwiched between the attack chips. The upstream attack chip forcefully acts as JTAG bus master, and performs a scan operation on victim chip to access embedded secrets. Downstream attack chip collects the secret as it emerges from TDO of victim chip. It can be used by “embedded attackers” as described, or it can be used by an attacker who can attach external hardware to the system under attack. 40 3.3.3 Obtain Test Vectors and Responses TESTER RECEIVES UNMODIFIED TEST RESPONSES TDO DEVICE 4 TMS DEVICE 3 (VICTIM) STANDARD COMPUTER WITH JTAG INTERFACE DEVICE 5 (ATTACKER) ATTACKER COLLECTS TEST VECTORS AND RESPONSES FROM VICTIM TCK TRST TDI TESTER SENDS TEST VECTORS TO DEVICE 3 DEVICE 1 DEVICE 2 VICTIM RESPONDS NORMALLY TO TEST VECTORS Figure 3.5: The attacker obtains a copy of the test vectors and normal responses of a chip in the JTAG chain. This can be a passive attack. This attack is shown in Figure 3.5. The attacker’s goal is to learn the vectors that are used to test victim chip, and the normal responses. A requirement for this attack is that the attack chip is downstream from the victim on the same JTAG chain. The attack chip waits for the victim chip to be tested by the authorized party. The attack chip collects the test vectors as they are shifted out of the instruction register, and collects the responses as they are shifted out of the selected register (BSR or others) on their way back to the tester. Knowledge of the test vectors and responses helps an attacker further infiltrate a system by providing trojaned parts that pass normal tests. 41 3.3.4 Modify State of Authentic Part The attacker’s goal is to modify the state of victim chip. We also assume that the attacker is capable of inserting strong I/O drivers in the attack chip to forcefully control the TMS and TCK lines. An additional requirement for this attack is that the victim chip is downstream from the attack chip on the same JTAG chain. The attack chip takes control of the TMS and TCK lines, and puts the JTAG TAP of the victim chip into a state where it can shift data in, thereby setting the state of registers within victim chip, including registers that affect the normal operation of victim chip. 3.3.5 Return False Responses to Test BOGUS TEST RESULTS RETURNED TO TESTER TDO DEVICE 4 TMS DEVICE 3 (VICTIM) STANDARD COMPUTER WITH JTAG INTERFACE DEVICE 5 TCK TRST (VICTIM) TDI TEST VECTORS FROM TESTER TO DEVICE 3 DEVICE 1 DEVICE 2 (ATTACKER) ATTACKER KEEPS A COPY OF DATA VICTIM GETS PUT INTO BYPASS MODE Figure 3.6: The attacker can intercept test vectors that are sent to another chip, and can send false responses to the tester. The false responses attack is shown in Figure 3.6. The attacker’s goal is to deceive the tester about the true state of victim chip. An additional requirement for this attack is that the victim chip is downstream 42 from the attack chip on the same JTAG chain. The tester attempts to apply test vectors to the victim chip, which is not the first chip in the JTAG chain. To do this, the tester attempts to place the other chips into BYPASS mode. The attack chip ignores this request and intercepts the test vectors, while instructing victim chip and other downstream chips to enter the BYPASS mode. Attack chip can then transmit the bogus test responses back to the tester. 3.3.6 Forcing TMS and TCK Whether an attacker can forcefully control TMS and TCK depends on various factors such as • strength of output drivers on JTAG master • strength of output drivers of attack chip • presence of buffers in the TMS and TCK lines • presence of series output resistor (typically 100 ohms) • topology of JTAG bus (star or daisy chain) • physical layout of JTAG bus • the input logic threshold and hysteresis, if any For the attacker to successfully hijack the TMS and TCK lines, he must be able change the voltage seen by the victim’s TMS and TCK input pins. This change must be sufficient for the voltage to cross the logic threshold. If the JTAG is set up in a star topology, with separate TMS and TCK lines for each chip, this attack will be impossible. However, it is quite common, in practice, for system designers to take advantage of the economy of wiring JTAG in a daisy chain topology. Designers can put buffers at various points in the JTAG wiring. There are various reasons for doing this, including noise immunity and fanout. However, these buffers add cost and complexity, and are not required by the JTAG standard. It is common practice to add a resistor, typically 100 ohms, in series with each of the outputs of the JTAG master. This resistor reduces ringing on the line by providing series termination and/or slowing the slew rate. In systems 43 where this series resistor is present there is a reduction in the amount of current that must flow in order for the hijacker to force a TMS bit, or to force a TCK edge. The strength of the drivers in the master and in the attacker are also relevant. If the attack chip is an ASIC, the attacker can specify essentially any strength drivers. If the attack chip is an FPGA where the attacker controls the configuration bits, the attacker can program the I/O block to use a high-current I/O standard. If the attacker can control the PCB design, he can bond multiple pins of the attack chip together to form a mega driver. Finally, if the TMS and TCK lines are simply bussed around without buffers, multiple attackers can gang up to overpower the master. These last possibilities are included only for completeness. Next, we present practical experimental results testing the basis of the JTAG hijacking attack under normal conditions. 3.3.6.1 Experimental Validation Our JTAG hijacking experimental setup is made of three chips: a system controller, a victim, and a hijacker. The system controller is a Philips 8052 microcontroller configured to bit-bang JTAG using port 1 directly, not through an I/O expansion chip (8255, etc.), nor through any resistor. The victim is a Xilinx Spartan 3e FPGA. The attacker is a Spartan 3 FPGA. The 8052 is programmed to keep TCK low all the time. The experiment tests the hypothesis that the attacker can raise the TCK line sufficiently to edgetrigger the victim chip. We programmed the victim chip to count TCK transitions and to display the count on a set of seven-segment LEDs. This way, we were able to confirm the extent of the attacker’s control over the TCK signal. We programmed the attack chip to send 1000 pulses. Comparing 1000 with the number of TCK transitions counted by the victim chip, we confirmed that the Spartan 3 attack chip easily overpowers the 8052 system controller. In multiple runs, the victim always showed the correct count. When one chip overpowers the output of another chip, there is a risk that the output driver circuitry of one or both chips will be damaged. In the experiment, we minimized the chance of overheating either chip by using a TCK waveform with a 80 nsec pulses and a 0.0015% duty cycle. We left the setup running for over an hour and saw no evidence of any sort of damage to any of the chips. 44 3.3.6.2 Common I/O Driver Characteristics Electrically, DC models can represent the two output drivers fighting. Each driver will have an I-V curve for its logic-low state and an I-V curve for its logic-high state. The intersection of the logic-low I-V curve of the 8052 and the logic-high curve of the Spartan 3 gives the voltage at the node where their outputs meet. This is similar to solving for the quiescent point of an amplifier using load-line analysis. 8052 versus Spartan 3 IO Drivers 0.10 Pin Current (A) 0.08 0.06 0.04 0.02 0.00 0.0 0.5 1.0 1.5 2.0 2.5 Pin Voltage (V) 3.0 3.5 Figure 3.7: A Philips 8052 (the system controller) was programmed to keep one of its pins low. The I-V curve for this output driver was extracted using a pulsed I-V measurement. A Xilinx Spartan 3e was programmed to keep one of its pins high. This pin’s I-V curve was also extracted. The result is shown. The solid line is the FPGA; the dashed line is the microcontroller. The intersection is at 2.1V, exceeding VIH for most 3.3V logic. Using a pulse method, we obtained the I-V curves of the output drivers of our 8052 and our Spartan 3. From these curves, we solved for the expected voltage when the drivers fight. The critical issue here is whether this voltage exceeds VIH of the victim chip. In our experimental platform the receiving chip is a 45 Spartan 3e using the 3.3 volt CMOS IO standard where VIH is 1.1 volts and there is no input hysteresis. The Spartan 3 easily overpowers the 8052 and can force the TMS and TCK signals. In our measurements, the pulse width was 80nsec, with a repetition period of 1.3msec. This low duty cycle, 0.0061%, allows hostile JTAG communication but results in a worst-case dissipation of only 22µW in the output driver of either chip, not enough to cause damage. 3.3.6.3 PCB Layout Effects We discussed the TMS and TCK hijacking in terms of DC characteristics such as I-V curves and logic thresholds. We now examine a transient phenomenon that allows even weak hijackers to control the bus regardless of the strength of the master. VICTIM MASTER d MH HIJACKER Figure 3.8: A length of PCB wiring connects the hijacker to the JTAG master. This allows the attacker to inject short pulses onto the wiring without being hindered by the master. As seen in Figure 3.8, there is typically a PCB trace of some length connecting the master’s JTAG signals to each of the slaves. The trace length from the master to the hijacker, dM H , results in a round-trip wiring delay of at least rtt = 2dM H vprop , where vprop is the propagation speed of a pulse in the PCB trace. This propagation speed is typically around 2 ∗ 108 m/s. If we assume a PCB trace of 30cm, we obtain a round-trip time of at least 3ns. Even if the master can present a zero-ohm impedance at its end of the transmission line, each time the hijacker applies a 0 → 1 step to the TMS or TCK line, a pulse at least 3ns long will appear at the victim. This is sufficient to control a JTAG tap. We experimentally verified this phenomenon using the setup shown in Figure 3.8, wired together using short sections of CAT 5 cable, Zo = 100Ω. The hijacker can 46 control the victim’s TAP even when the TMS and TCK lines are shorted to ground at the master’s end. 3.4 Prior Work on Attacks and Defenses Yang, et al. [5] showed that JTAG boundary scans enable an attacker to read secret data out of a chip. They use a hardware DES implementation as their example. Satellite television receivers make use of trusted hardware to prevent non-paying users from viewing the encrypted broadcasted content. There is an active underground effort [11] [23] to hack satellite TV boxes by extracting their keys using JTAG [24]. This has been quite successful and continues. On one hand, it is tempting to suggest that JTAG is a security risk and should be excluded from trusted hardware, but untestable hardware is a prohibitive business risk. Yang, et al. [25] proposed a DFT architecture that allows testing crypto hardware with high fault coverage yet without exposing hard-coded crypto keys to leakage. Novak and Biasizzo [26] propose a locking mechanism for JTAG with negligible added cost by adding LOCK and KEY registers. To lock the TAP, a key is entered using a special LOCK instruction. To unlock the TAP, a special UNLOCK instruction is issued, which requires shifting in the key via TDI. The TAP state machine is not altered. Compliance with the 1149.1 standard is maintained except for the rejection of standard instructions (including ones that are mandated by 1149.1) while the TAP is locked. 3.5 Defenses for JTAG Our security goals are to ensure the authenticity of the devices in the JTAG chain, to ensure the secrecy of communication between the JTAG master and the chip, and to reject inauthentic JTAG messages. To achieve these goals, we make use of three standard security primitives: a keyed hash function, a stream cipher, and a message authentication code. Using these primitives, we construct protocols that significantly enhance the security of JTAG. JTAG communication takes place between a master and a chip. The master is typically a microcontroller or a test station. The threat model of the link is similar to the threat model assumed by designers of network security protocols. Since the threats are similar, it is tempting to apply existing solutions for 47 network communication, such as SSH and SSL. Circuit complexity and key distribution are just two of the problems. SSH and SSL require significant amounts of computation for performing public key cryptography. Even on the full-size computers for which the protocols were designed, the crypto arithmetic is performed by software routines, not native opcodes. These big-integer calculations do not lend themselves to lightweight hardware implementation. Furthermore, the semantics of JTAG include prompt synchronous response, so multi-cycle crypto operations would be problematic. A mechanism that is suitable for securing JTAG is the authentication and key establishment scheme described in [27]. This lightweight scheme uses a PUF [28] [29] [27] [30]. to authenticate the chip and to establish a cipher key for secure communication. The requirements for challenge-response device authentication are repeatability and unpredictability. Given the same challenge, the chip should always calculate the same response. Without having observed the chip’s response to a given challenge, it should be impossible for an attacker to predict the response. We make use of a similar idea to the PUF authentication and key establishment scheme, but where fuses and a keyed hash are used instead of a PUF. In a PUF, each chip’s uniqueness comes from subtle physical differences between chips. For our keyed hash, the uniqueness comes from fuse bits that are programmed at the factory. Either a PUF or a keyed hash can support a challengeresponse protocol for device authentication. There are pros and cons to each of these two mechanisms. Briefly, PUFs have the advantage of being intrinsically unique, not needing uniqueness to be programmed into them. Fuses have the advantage of using less area, and being a a mature technology that is extremely reliable across temperature, aging, and power variations. For verifying the authenticity of a part in the JTAG chain, a challenge is sent to the chip, which feeds the challenge to its keyed hash function. The response is returned to the tester, where it is compared with the expected response. The challenge is shifted into the chip via the TDI line. The response is shifted out via TDO. A custom JTAG instruction is invoked to trigger the calculation of the hash. We also require a cipher to encrypt the communication. A block cipher is not suitable due to its large die area. Block ciphers are also problematic in this application because they operate on whole blocks of data, and the delay this introduces conflicts with standard JTAG timing. The preferred mechanism for encrypting JTAG communication is a stream cipher. Robshaw [31] discusses stream ciphers in depth. Our 48 current implementation uses the Trivium stream cipher [32]. Other stream ciphers can be used. To protect against inauthentic JTAG messages, we need a message authentication code (MAC) scheme. A MAC algorithm uses a key that is known to the sender and receiver to verify that the message was sent by the authentic sender (not spoofed) and was not modified in transit. The most simplistic MAC scheme is just to calculate the hash of the message and the key, concatenated together. This simplistic scheme is not cryptographically strong. Current popular MAC schemes such as HMAC [33] use nested hash functions, each operating on a block of text. Recently, Arazi [34] describes the construction of MAC functions from stream ciphers in embedded platforms where computational power is a constraint. Because of the timing of JTAG, a MAC verification scheme that produces the answer immediately upon receipt of the final bit of the message is highly desirable. Introducing a delay while a message is authenticated would complicate the timing of the application of test vectors. Therefore, block-based algorithms are not desirable for our purpose. We make use of an incremental MAC function. Incremental hashing can be performed so that computation is being done while the bits of the message are being serially received [35]. Another mechanism that is required for securing JTAG is access control. By access control, normally what is meant is that there is a set of subjects, a set of objects, and at each subject-object intersection, there is a set of permissions. Some basic access control schemes have been proposed for JTAG and some have been included in commodity parts. A rudimentary but strong access control mechanism is a side-effect of using a MAC on the messages. If only the tester has the challenge-response pairs for the keyed hash, only the legitimate tester will be able to negotiate a MAC key, so the MAC algorithm assures that an unauthorized party will not be able to successfully issue test instructions. The next subsections discuss how the security primitives are used to secure JTAG communication. 3.5.1 Secure JTAG Communication Protocol Due to the wide range of cost and security sensitivity for different chips, a single rigid protocol will either fail to provide all of the necessary assurances or, alternatively, add excessive overhead to the cost of the parts. Therefore we define four levels of protection and provide solutions for each of the levels. Chips of varying levels can be freely mixed on the same JTAG chain without interoperability problems. 49 • Level 0: No assurances. This is equivalent to the vast majority of JTAG-enabled chips currently made. • Level 1: Authenticity is assured. The authentic chip is guaranteed to be involved in the JTAG authenticity probe. No further assurances are available. • Level 2: In addition to the assurance provided by Level 1, secrecy of JTAG signals is assured. • Level 3: In addition to the assurance provided by Level 2, the JTAG link is protected against active attacks: insertion, modification, reordering, replay, and deletion of messages. level authenticity secrecy integrity 0 no no no 1 yes no no 2 yes yes no 3 yes yes yes Figure 3.9: We define four levels of assurance. Levels correspond to the set of assurances that are provided. The JTAG assurance level of each of the chips in the JTAG chain is known by the tester. This attribute can be defined in the BSDL file for the chip. Note that level 0 involves no active participation in the proposed protocol. Level 0 can be considered an explicit provision for backward compatibility. Our scheme allows participating and nonparticipating chips to coexist in the JTAG chain with each part functioning at its full assurance level. The presence of low assurance chips in the chain does not affect the assurances provided by higher assurance level chips in the same chain. It is, however, the designer’s responsibility to choose the right JTAG assurance level for his chip and its intended application. This section describes the protocol used for communication between the tester and the chip. 3.5.2 Level 0 Protocol Communication with a level-0 device is exactly the same as what is defined in the 1149.1 standard. 50 3.5.3 Level 1 Protocol Communication with a level-1 device is exactly the same as what is defined in the 1149.1 standard except for addition of an authentication operation. (1) Tester randomly extracts a CRP from storage. (2) Tester sends challenge to chip. (3) Chip applies challenge to keyed hash module. (4) Chip sends result to tester. (5) Tester compares chip’s response with CRP. (6) Regular JTAG operations commence. 3.5.4 Level 2 Protocol Communication with a level-2 device involves sending and receiving encrypted data to and from the data register (DR). The encryption only affects the signal on the TDI and TDO lines. The TCK, TMS, and TRST lines retain the exact semantics defined in the 1149.1 standard. The TAP controller’s JTAG state machine is also unaffected. Level-2 communication occurs in two phases: setup and communication. The purpose of the setup phase is to establish a shared secret between the tester and the chip. The setup phase proceeds as follows: (1) Tester randomly selects and extracts a CRP. (2) Tester sends challenge to chip. (3) Chip applies challenge to keyed hash module. (4) Chip uses response as key for stream cipher. In the case of the tester writing to the data register on the chip, the protocol is as follows: (1) Tester encrypts JTAG DR contents using shared secret. 51 (2) Tester places chip into SHIFT-DR state. (3) Tester shifts in the encrypted contents of DR. (4) Chip decrypts data into plaintext register as it is shifted in. (5) Tester places chip into EXIT-IR or EXIT-DR state. (6) Plaintext register is latched in and used. The bits that are shifted out via TDO during a tester-writes-to-chip operation are ciphertext. The protocol for the tester to read data from the chip is as follows: (1) Tester initiates read operation. (2) Chip encrypts contents of DR using stream cipher as it transmits the bits on TDO. (3) Tester decrypts bits using stream cipher. 3.5.5 Level 3 Protocol The level-3 protocol has two phases: setup and communication. The setup phase establishes the secrets that are shared between the tester and the chip for the cipher and MAC algorithms. The communication phase encapsulates, transmits, and de-encapsulates the contents of the IR and DR. The behavior at level 3 is the same as level 2, but an additional key establishment operation is performed for the MAC key. Using that additional MAC key, the chip calculates the MAC of the bits are received. There are two distinct ways of doing this. First is to use an incremental MAC algorithm of the form shown by Lai [35]. The second is to use a more conventional MAC scheme that requires full blocks of data before commencing MAC processing. The MAC is checked when the TAP transitions out of the SHIFT-DR state. If the MAC passes, the contents of dr are transferred to a validated dr register and a flag on validated dr is set, indicating that the contents of the register has been validated. Later operations can make use of the contents of the validated dr register. On one hand, this approach adds area overhead. On the other hand, it enables the use of a conventional (non-incremental) MAC algorithm, the kind that has been more extensively studied. 52 The steps of the MAC setup phase are the same as the crypto key setup in the level-2 protocol. The keyed hash is used along with challenge-response pairs that are stored by the tester. Rudimentary access control is provided by virtue of MACs being checked. Only the authentic sender can successfully negotiate a MAC key. 3.6 Costs and Benefits Associated with JTAG Defenses Successful mechanisms for enhancing the security of JTAG must satisfy cost and operational con- straints, in addition to providing the required forms of assurances. 3.6.1 Die Area Overhead Some of the systems we are protecting are high-volume commodity items which are price-sensitive. We cannot add more than 10% to the die area of any protected part. We see in Figure 3.10 that for designs of non-trivial die area, the proposed schemes will not add significant cost. From Figure 3.10 we see that for a design that uses 10,000 slices, the area overhead is less than 9% even for level 3 defenses, and about 4% for level 1. 10,000 slices of a Xilinx Spartan 3e FPGA is equivalent to 1.4 million gates [36]. 53 1000 area overhead (%) 100 10 1 0 2 ×103 4 ×103 6 ×103 8 ×103 1.0 ×104 1.2 ×104 1.4 ×104 1.6 ×104 design complexity (Spartan 3 slices) Figure 3.10: Area overhead is shown for the protection levels 1 through 3, from bottom to top. The cost of the security enhancements independent of design complexity, so the percentage overhead is lower for more complex designs. The four levels of assurance provide progressively higher levels of assurance. An indication of the area cost of each protection level is given by the number of additional FPGA slices used by the enhanced JTAG circuitry. These figures are for a Spartan 3e. There are no fuses in the FPGA so fuses are modeled as hard-coded bit vectors. Overhead in an ASIC will be less. 3.6.2 Test Time Overhead There is test time overhead associated with the security enhancements we propose. For the level 1 defenses, the time overhead is as follows: (1) Time to shift in challenge. (80 cycles of test clock) (2) Time to initialize stream cipher. (1152 cycles of functional clock) 1 (3) Time to shift out response. (80 cycles of test clock) 1 The stream cipher we use in our reference implementation is Trivium [32]; it uses an 80-bit key and takes 1152 cycles to initialize. 54 For the level 2 defenses, the time overhead is spent once per test session, to set up the cipher key and initialize the stream cipher. (1) Shift in the challenge (80 cycles of test clock). (2) Initialize stream cipher 1 using challenge (1152 cycles of functional clock). (3) Extract 80 bits of keystream from cipher 1 (80 cycles of functional clock). (4) Initialize stream cipher 2 using 80 bits from cipher 1 (1152 cycles of functional clock). After this setup, there is no test time overhead associated with the level-2 defenses. Stream cipher 2 operates synchronously with the test clock to encrypt and decrypt data. For level 3 defenses, the minimum additional test time overhead is the same as for level 2. 3.6.3 Operational Costs The security enhancements that we propose require that challenge-response pairs be extracted at manufacturing test time, before the chips leave the manufacturing facility. If fuses are used, as opposed to a PUF, the fuses should be blown with a random pattern prior to the extraction of challenge-response pairs. For each chip that the customer receives, he needs the challenge-response pairs. These have to be delivered to the customer over a secure channel. It is also possible to eliminate the burden of keeping track of the identity of each chip on a reel of parts. Instead of querying the chip maker for the challenge-response pairs for a part, the chip can contain a unique code which can then be used for retrieving the challenge response pairs from the chip maker. 3.6.4 Impact of Defenses on Known Threats The proposed defenses protect against a range of attacks discussed in this document. We will now illustrate how two otherwise practical attacks can be prevented. • Attacker Obtains Embedded Secrets: The attacker could either sniff the JTAG link as the key is being programmed, or the attacker could perform an unauthorized debugging session to read out the 55 key. These attacks are popular in hacking satellite TV boxes to extract the box keys using JTAG [24] and in taking a snapshot of the firmware in the box [37]. Novak and Biasizzo [26] propose a solution to unauthorized JTAG access based on access control. A key must be presented to the chip before it can be manipulated with JTAG. Under an expansive threat model, where hostile chips may be present in the JTAG chain, the scheme is not effective. On the other hand, the proposed level-2 and level-3 defenses are comprehensive and guard against passive sniffing by encrypting the link. They also prevent unauthorized debugging as a side effect of MACing the data. • Cloning an existing system would involve different attacks depending on the implementation style of the system. If it is a CPU- or PLD-based system, cloning would involve reading out firmware or a bitstream. The attacks are similar to obtaining an embedded secret, and the defenses are the same. For example, at levels 2 and 3, the attacker will need to negotiate keys with the chip in order to communicate over JTAG, and this will be impossible for the attacker since he lacks the challenge-response pairs that were set up offline. 3.7 Conclusion In systems where sensitive data are transported or accessible via JTAG, we do not recommend using a daisy-chain topology with conventional unprotected JTAG. A star topology protects against many of the attacks discussed in this article. However, the star topology increases PCB complexity and cost and doesn’t eliminate all JTAG-based system vulnerabilities. An alternative is to retain the daisy-chain topology, and to derive needed security from cryptographic enhancements built on top of JTAG. We have devised a scheme that provides a significant improvement in JTAG security with reasonable added cost. The scheme is flexible in the sense that it can provide high assurance for important chips and lower assurance (and lower cost) for less important chips. Compatibility is maintained across the assurance levels and compatibility is maintained with the IEEE 1149.1 JTAG standard to the maximum extent possible. The scope of this work is restricted to JTAG. There is a multitude of other threats to hardware that do not involve JTAG. For instance, exploits via bus probing have been successfully executed in the 56 past [38] and will probably be used again. However, it should be noted that the original reason for the existence of JTAG was to facilitate boundary scan of I/O pins because it was difficult to probe the wiring on modern printed circuit boards. Securing JTAG denies the attacker what would otherwise be an easy way of probing a bus. We do not address several other threats including the threat of manufacturers inserting hostile functionality in the chips that they supply. Detecting the presence of such functionality is an active research topic [39] [40]. Chapter 4 Testing Cores in a System on Chip 4.1 Introduction SoC design cycles are getting shorter and designs are getting more complex. Pressure for more productivity per designer per day has led to reuse of design modules and, in many cases, obtaining modules from external sources of intellectual property (IP). It is impractical, and sometimes impossible, for SoC designers to manually assess the security of each of the cores they use. Modular design methodologies hide the internals of the modules, which boosts productivity, but unfortunately it shifts designers from knowing the logic of the chip to merely hoping that each piece actually operates according to its interface specifications. Continued progress in VLSI requires that we not let complexity and short design cycles undermine our ability to produce chips that are trustworthy. In the long run, we need to work toward designing in such a way that the security failure of one module does not result in the security failure of the entire system. One step in that direction is the topic of this research, to reduce the risk of cascading security failure in the test subsystems of chips we design. Test access mechanisms are critical components in digital systems. They affect not only production and operational economics, but also system security. We propose a security enhancement for system-onchip (SoC) test access that addresses the threat posed by untrustworthy cores. The scheme maintains the economy of shared wiring (bus or daisy-chain) while achieving most of the security benefits of star-topology test access wiring. Using the proposed scheme, the tester is able to establish distinct cryptographic session keys with each of the cores, significantly reducing the exposure in cases where one or more of the cores 58 contains malicious or otherwise untrustworthy logic. The proposed scheme is out of the functional path and does not affect functional timing or power consumption. Test access mechanisms (TAMs) are present in every complex chip and are used for a large and growing variety of purposes. Their original purpose was to enable efficient structural testing of digital systems instead of the notoriously inefficient process of black-box functional testing. As system design has evolved, the role of TAMs has been extended to include invoking BIST, programming nonvolatile memory, debugging embedded microprocessors, initializing the configuration of FPGAs, initializing volatile run-time configuration registers, and enabling and disabling system components including the TAM itself. A security-aware TAM improves upon conventional TAM concepts by protecting the data that traverses it, thus reducing the risks arising from untrustworthy cores. 4.1.1 Assumptions The threat model addressed by this research is that one or more untrustworthy cores intercept or modify the test data that passes between the tester and a core. We make the following assumptions: (1) The SoC contains many cores, some of which are untrustworthy. (2) The SoC contains trustworthy inter-core functional wiring. (3) The SoC contains trustworthy test access wiring. (4) Cores are connected to the tester by shared wiring. This can be either a daisy-chain scheme or a bus scheme. (5) Some or all cores, including their test wrappers, are opaque. (6) The SoC design is known to the attacker as much as it is known to the designer. If a crypto key is embedded in the design, the key is known to the attacker. 59 test interface WSI WSO WSI WSO WSI WSO core 1 core 2 core 3 WSC WSC WSC Figure 4.1: Data sent from the test controller to core 2 passes through core 1, giving core 1 an opportunity to intercept it. Likewise, data passing from core 2 back to the tester passes through core 3, giving core 3 an opportunity to intercept it. Some examples of TAM-related threats in a daisy-chained TAM architecture, as shown in Figure 4.1, include the following: (1) Core 1 sniffs data passing from tester to core 2. Core 1 leaks the data. Core 2 can be an FPGA or microprocessor, and the data can be its bitfile or executable program. This attack results in leakage of intellectual property which can result in monetary losses or expose the leaked data to reverse engineering and additional security exposure. (2) Core 1 modifies data passing from the tester to core 2. Core 2 can be a crypto core and the data can be a key. This attack can create a back door into the system. 4.1.2 Constraints One way to reduce TAM-related risk is to use a star topology for test data. The star topology avoids the threats listed above by avoiding the placement of two cores on the same test wiring where they might interfere with each other. This topology is seldom used, since it results in high wiring cost. As the number of cores in SoC designs increases, star-topology TAM wiring becomes less and less practical. Instead, we reject that technique in favor of cryptography, which allows us to simultaneously obtain the low cost of the bus topology and the good security of the star topology. 60 4.1.3 Core Test Wrappers To facilitate design automation, modular design, and efficient reuse of cores, the cores are often delivered to SoC integrators in a wrapped form, which means that the complexity of their internal test structures is hidden behind some wrapper logic, which exposes a simplified, standardized interface to the outside. IEEE 1500 [2] defines this interface. Although wrapping can improve the productivity of SoC integrators, it requires the SoC integrator to trust the wrapper implementations provided by the core vendors. Wrapped versus unwrapped cores is a complex decision with security ramifications. 4.2 Prior Work The threat of malicious inclusions in chip designs has been discussed at a technical level in the security and VLSI communities, and at a policy level in defense communities. DARPA has led US DoD research in the area, funding the TRUST program [41], which aims to ensure that critical government operations are able to source trustworthy chips. King [42] shows the construction of malicious modifications to a CPU design that give the attacker the flexibility to choose the details of the attack at run time, in the field, after deployment. Wang [43] presents a taxonomy of malicious hardware. Kim [44] examines the threat of malicious modules in an SoC abusing their bus-master capability. They provide a security-enhanced bus arbiter that traps the bus transactions of rogue modules and inhibits them from further action by cutting power to the offending module. The risks of having malicious chips on a JTAG [1] chain were studied in [13], and the authors developed countermeasures. The SoC TAM security problem differs significantly from the JTAG security problem. Primarily, the difference is that all cores of an SoC are fabricated together. The JTAG security solutions proposed in [13] assume that each chip exists first in isolation, gets packaged and tested, and then shipped to the customer. While in isolation, before being shipped, keys can be set up for use by the tester for secure communication over the untrustworthy test bus in the field. Since this isolated stage does not exist for SoC cores, the TAM security solutions developed for JTAG do not apply to SoCs. 61 4.3 Proposed Approach We propose TAM enhancements that leverage the SoC designer’s control over the inter-core wiring. The result is that untrustworthy cores are prevented from sniffing communication on the test bus. As with most communication security schemes, key exchange is a pivotal issue. Many cores are to a single test bus, and to single out a target core for communication, the TAM must provide the tester with a mechanism for securely distinguishing the target core from the rest of the cores. We propose that the SoC integrator construct a scan chain outside of any of the wrapped cores, and connect each core to the output of one of the cells in the scan chain. The foundation of the security of our system are the assumptions that: (1) A core, however malicious and devious it may be, cannot affect the intercore wiring of the SoC. (2) A core cannot control where in the intercore wiring its terminals are connected. Thus, the tester securely distinguishes each core by which scan cell it connects to, as shown in Figure 4.2. As stated in Section 4.1.2, TAMs are under significant pressure to minimize their cost. For this reason, SoCs use shared wiring to connect the tester with the cores. During testing, the target core is addressed using any of various schemes, while the other cores are expected to be passive. We describe three scenarios where untrusted cores are placed on the shared TAM wiring. For each scenario, we discuss how the economy of shared wiring can be retained while protecting sensitive test data from sniffing as it passes by or through malicious cores. The first scenario is where a wrapped cores is obtained already containing the security-aware TAM enhancements described in this research. The second scenario is where a wrapped cores is obtained without any TAM security enhancements, and has sensitive test data. The third scenario is where a core is obtained that is untrusted but no sensitive test data will be exchanged with it. 4.3.1 Security-enhanced Test Wrapper We enhance the security of the standard core test wrapper by using cryptography to protect the data. Standard crypto primitives are used and the details of their design and implementation are outside the scope of this research. Here, we focus on the practical aspects of key exchange and issues specifically relevant to the SoC test problem. 62 A common technique for session key establishment is to use the Diffie-Hellman [45] protocol. In Diffie-Hellman, each party generates a random number, sends a message, and does some arithmetic, and the result is that the two parties agree on a secret that was never explicitly transmitted over the wire. Although Diffie-Hellman is a powerful building block, for our application, we can obtain better security at lower cost by taking advantage of practical constraints on what a malicious core can do. Keys can be generated by the test controller or external tester, and distributed to each core at test initialization time as shown in Figure 4.2. Cheap and secure, this is our preferred key distribution scheme. key gen WSI test controller WSO key core 1 reg scan cell WSC WSI WSO key core 2 reg WSC scan cell WSI WSO key core 3 reg WSC scan cell Figure 4.2: A chain of scan cells is used for distributing keys to each of the cores. The scan cells are configured not to expose key bits at their outputs while they are being shifted. The SoC designer can choose between on-chip key generation and off-chip key generation. If done onchip, the external tester (i.e., ATE) does not have access to keys, which improves security in some situations. However, for reasons of cost, as discussed in Section 4.4, in most circumstances we expect SoC designers to prefer off-chip key generation and cryptography. Key generation entails selecting a key for each core that has a security-enhanced wrapper. The most important characteristic of the keys is that no core should be able to learn the key of another core. As stated in Section 4.1.1, we assume that the design is known to the attacker. This precludes the possibility of hard-coding the keys. Instead, for on-chip key generation, a hardware random number generator is used. Holleman [46] reported a hardware number generator requiring 63 0.031 square millimeters of die area in 0.35-micron four-metal, double-poly CMOS. If implemented in a current fabrication process with smaller feature sizes, the die area for the circuit would be correspondingly smaller. When key bits are transmitted to the cores during key setup, they are also stored by the test controller or external tester. The storage of key bits or cipher state is necessary because it is the basis for encrypted communication. However, the SoC designer has a choice of whether to maintain crypto sessions when accessing other cores. For example, if the test schedule involves communicating with core 1 and then with core 2, and then with core 1 again, the question is whether the test controller should maintain the crypto session (cipher state) associated with core 1 while accessing core 2. If it does, then it can resume communications with core 1 without the delay of reinitializing the cipher state associated with core 1. On the other hand, if the on-chip test controller is performing the crypto, then registers must be added to the test controller to store the cipher state, which increases the die area overhead. In the case of off-chip key generation and crypto, storing session state is not a problem. Offloading the crypto to the ATE is consistent with our goals and with the threat model stated in Section 4.1.1. The scan cells in the key setup scan chain, though very simple, provide two properties that are very important for security. First, as shown in Figure 4.3, they accept an output inhibit signal, O INH*, which forces the output to zero when it is pulled low. This has the effect of blocking cores from observing other cores’ key bits while they are being shifted in. Second, the logic gates are, for all intents and purposes, unilateral. There is no way for a core to actively force a value onto the flip-flop to affect the key bits that are received by other cores. 64 GATE_OUT O_INH D_IN DFF Q_OUT Figure 4.3: The key setup scan chain conveys data from the test controller to the core wrapper key registers without allowing it to be sniffed or modified by other cores. Other than the basic distributed shift register functionality, the only extra functionality we require of our scan cell is an output inhibit input (O INH) to ensure that the key is not leaked during shifting. After the tester has the key bits shifted to their intended location, the tester deasserts the output inhibit signal so that the cores receive their key data. Communication requirements of cores vary over a wide range, and optimal test access design involves allocation of test buses and scheduling of tests [47]. Cores using BIST exclusively may be interfaced using only a serial test interface. Cores exposing extensive internal scan chains to the tester typically make use of a parallel test bus to increase test speed. In either case standard low-cost symmetric cryptographic modules are placed between the test interface and the core, as shown in Figure 4.4. The costs of these modules are discussed in Section 4.4. 65 core under test 32 scan chain 2 32 32 32 scan chain i stream cipher parallel output compressor parallel input 32 decompressor 32 scan chain 1 key initialization Figure 4.4: In a typical security-enhanced wrapped core, a word of compressed test data arrives via the parallel data input, is decrypted instantaneously, decompressed and applied to the inputs of the core’s scan chains. The outputs are compressed, encrypted, and sent out. Standard wrapper components are not shown, such as the parallel bypass. If an untrusted core was obtained in an unwrapped state, the test wrapper described in this subsection should be added by the SoC integrator. Purely from a security standpoint, it is good to obtain cores in an unwrapped state. However, it adds significant work for the SoC integrator, losing the productivity benefits of test integration standards like IEEE 1500. Obtaining cores prewrapped with security-aware wrappers is probably the option most SoC integrators would prefer. An untrustworthy core can be provided to the SoC integrator prewrapped with a security-enhanced wrapper. The presence of the security-enhanced wrapper does not make the core trustworthy, even if the wrapper itself is free of malicious features. However, the untrustworthy core, even if its wrapper contains malicious features, cannot undermine the TAM security of the SoC. This is in stark contrast with the conventional daisy-chain TAM architecture, where a single untrustworthy core breaks the security of all other cores on the chain. 66 4.3.2 A Security Overwrapper for Prewrapped Cores In cases where an important module is only available as a conventionally wrapped core, a security overwrapper can be used. The functionality provided by the overwrapper, when combined with the core’s included wrapper, is equivalent to that of the of the security-enhanced wrapper discussed in Section 4.3.1. The security overwrapper contains functional blocks for decrypting input test data and encrypting output test data. 4.3.3 Interoperability with Noncompliant Cores The TAM security scheme described here does not require that all cores in an SoC comply. Noncom- pliant wrapped cores can be used as they are, provided that no security-critical data passes over their test interfaces. The presence of noncompliant cores on the test bus does not undermine the security guarantees provided to the compliant cores. 4.4 Costs The SoC TAM security enhancements presented in this research were designed to be efficient in terms of die area, test time, and effort for the SoC integrator. 4.4.1 Die Area Cost The security enhancements contribute to die area in three ways. To a certain extent, these costs can be traded off, and the optimal choice depends on economics. 4.4.1.1 Wiring Cost The extra wiring cost of the security enhancements is the cost of three extra wires on the test bus. In a typical SoC with serial control of the core wrappers and a 32-bit path for test data, there is a minimum of 40 wires without the enhancements, and 43 with the enhancements, a 7.5% overhead. 67 4.4.1.2 Core Wrapper Area Each core whose test data the SoC integrator decides to protect needs to have its own crypto hardware, whether provided by the core supplier or by the SoC integrator. Assuming a 32-bit parallel test data path, the wrapper must decrypt 32 bits of input test data while encrypting 32 bits of output test data. To implement this with no additional latency, a stream cipher is used. 64 bits of keystream are required for each cycle of the test clock. This is achieved by using a keystream generator that produces multiple bits per clock cycle. The main additional hardware requirements for the security-enhanced core wrapper are: • 32 XOR gates to decrypt the input test stimulus • 32 XOR gates to encrypt the output test response • a stream cipher that generates 64 bits of keystream per cycle of test clock The Trivium [32] stream cipher meets the requirements. In its 64-bit form, it is equivalent to 5504 NAND gates. Assuming the XOR gates are equivalent to 2.5 NAND gates, the XOR gates used in each core wrapper are equivalent to 160 NAND gates. The total area overhead is therefore approximately 5700 NAND gates. For an SoC with n cores with security-enhanced wrappers, the gate count cost is overhead = n × 5700 (4.1) The percentage gate count overhead is Pn i + 5700) 1 (g P n 1 gi − 1 × 100% (4.2) where gi is the number of NAND gate equivalents in core i. For example, for an SoC an average of 100,000 NAND equivalent gates per core, the area overhead is 5.7%. The die area overhead can be cut in half in cases where the core is already architected for scan-based test. The flip-flops already associated with scan chains can be reconfigured to serve as the 288 flip-flop chain of the Trivium cipher. The area overhead associated with the combinational logic in the 68 4.4.1.3 Test Controller Area The cryptography can be handled by the test station or by the on-chip test controller. If it is handled by the test controller, it must have hardware for generating random key bits, and for storing them. It also needs the encryptor and decryptor blocks. If the cryptography is handled by the test station, the test controller’s complexity is essentially the same as without the security enhancements. If the designer prefers to perform the cryptography and key generation using the on-chip test controller, then there are three cases. In the first case, the test controller only maintains information about a single core at a time. This requires a key setup whenever addressing a new core. For this, the die area overhead in the test controller associated with the security-enhanced TAM is the area of the stream cipher and XOR gates, the equivalent of 5700 NAND gates. In cases where all of the key setup is done at initialization time, the die area overhead is 5700 + 12nk NAND gates, where n is the number of cores and k is the key length. The third case is where cipher state is maintained by the test controller for each security-enhanced wrapped core. Essentially, this means keeping a copy of the state register of the stream cipher, which is almost the same as the test controller simply having separate instances of the stream cipher module for each security-enhanced wrapped core with which it will communicate. The die area overhead of this is approximately 5700n NAND gates. This minimizes test time, but is the most expensive option in terms of die area. 4.4.2 Test Time The security enhancements do not affect the test clock speed, test duration, or test scheduling. How- ever, the security-enhanced cores need to have their key registers initialized before testing can commence. The worst-case test time overhead is the time to program all key registers, back-to-back. For a k-bit key and n cores in the SoC, the worst case key initialization time is tinit = kn (4.3) The key setup clock frequency can be assumed to be the same as the test clock frequency. The percentage test time overhead is Pn 1 ti + k P − 1 × 100% n 1 ti (4.4) 69 where ti is the number of test clock cycles required to test core i. For example, assume an 80-bit key length and 50 cores in the SoC, the worst-case key initialization time is 4000 cycles of the key setup clock. If the average core requires more than 8000 bits of test data, the test time overhead of the security enhancements is less than 1%. 4.4.3 Effort for SoC Integrator Under the proposed scheme, cores fall into three categories: (1) no security-enhancements, (2) cores shipped with security-enhanced wrappers as described in Section 4.3.1, and (3) cores that were shipped with non-security-aware wrappers and had the security overwrapper added as described in Section 4.3.2. The effort for the SoC integrator for category-1 cores is zero. The effort for using category-2 cores is very low. The signaling of the cores is consistent, so the per-core additional effort is just assigning the signals to connect the core to the key-setup scan chain. Category-2 is the preferred category, achieving all of the security benefits with minimal effort. Category-3 cores are slightly more effort for the SoC integrator, but still minimal work. The difference is that category-2 cores come with the crypto modules already in place whereas category-3 cores require the SoC integrator to add the crypto module between the core and the test bus terminals. 4.5 Conclusion and Future Work We presented a scheme that eliminates the risk of a malicious SoC core sniffing test data. The essential contribution is a straightforward way of establishing cipher keys without any hard-coded secrets in the design. The area overhead is under 6% and the test time overhead is under 1%. For typical chips where a minority of the cores have secrecy-sensitive test data, only those cores need the security-aware wrapper, and the total area overhead can be correspondingly reduced to 3% or less. Additional area savings are available when the functional clock is more than 8 times higher in frequency than the test clock. In such cases, the Trivium 70 stream cipher can be used in a configuration that produces 8 bits at a time instead of 64 bits at a time, and it can be clocked by the functional clock instead of by the test clock. This reduces the size of the main source of area overhead, the stream cipher, by 32%. We showed how the SoC integrator can use his control over the inter-core wiring to maintain the security of the test data. Future work using the same key setup scheme will provide not only secrecy guarantees for the test data, but integrity guarantees as well. As the number of cores in SoCs increases, and cores are obtained from an increasingly wide variety sources, limiting the damage done by a single rogue core has become an important concern with a practical solution. Chapter 5 Integrity and Authenticity of Sensors We propose a novel variety of sensor that extends the functionality of conventional physical unclonable functions to provide authentication, unclonability, and verification of a sensed value. This new class of device addresses the vulnerability in typical sensing systems whereby an attacker can spoof measurements by interfering with the analog signals that pass from the sensor element to the embedded microprocessor. The concept can be applied to any type of analog sensor. 5.1 Introduction For sensing applications it is desirable that the system provide some degree of assurance regarding the authenticity and veracity of measurements. One scheme for achieving this is to couple the sensor with a trusted cryptography module that digitally signs the sensor data. Unfortunately, that scheme can provide only limited assurance because the sensor is separate from the crypto module. As a result, the crypto module has no mechanism for verifying the sensor data before signing it. Sensor-crypto separation is an architectural vulnerability. 72 challenge a) uC sensor uC sensor response challenge b) response deception Figure 5.1: The naı̈ve secure sensor architecture does not bind the sensing with the cryptography, allowing the analog link between the sensing element and the crypto processor to be easily attacked. In the naı̈ve architecture shown in Fig. 5.1, we illustrate how an attacker could interpose circuitry between the sensor and the microcontroller. Using this extra circuitry, the attacker could cause the microcontroller to falsely report the sensed data. Our work aims to make this and other sensor attacks impractical. 5.1.1 Related Work The technique of integrating a physical quantity into a PUF challenge-response computation is new, but issues relating to the security of sensors in general have been studied. Secure remote sensors have been developed for high security applications such as nuclear and chemical materials tracking [48]. Cryptographic protocols and infrastructures have been developed for securing communication in sensor networks [49]. However, neither can extend the trust perimeter to include the sensing element itself. Our work also builds on work from the PUF community. PUFs have emerged over the past decade as a potent tool for hardware authentication and key generation at low cost [50]. Their low cost makes them particularly attractive for use in the cost-sensitive sensor market and they serve as the foundation for our work. Specifically, our example sensor makes use of non-homogeneous coatings, which have been used to achieve per-chip uniqueness and unclonability [51] in conventional PUFs. We also make use of comparisons of on-chip quantities that are selected by the challenge, a concept borrowed from ring-oscillator PUFs [52]. The problem of protecting analog sensor data before it enters the crypto-enabled digital domain is related to the problem that digital rights management (DRM) systems have of protecting the media after 73 it leaves the crypto-enabled digital domain. This “analog hole” has been a vexing program for the content protection community [53]. Additionally, related work has been done in the media forensics community with camera image sensors. Digital photos can be forensically attributed to the camera that took them because of distinctive anomalies introduced by camera image sensors [54]. These features, like a PUF, are unique to each sensor that is fabricated, and are reasonably stable across time and environmental conditions. However, unlike a PUF, they are not functions in the sense of having a challenge and a response. 5.1.2 Our Contribution We propose an architecture that eliminates sensor-crypto separation. By merging sensing with cryp- tography, we raise the strength of the assurances that the system can provide. The device we propose is a form of PUF that entwines sensing and challenge-response processing. We provide the following: • definition of the security properties of the new device, • a candidate design that targets those properties, • a protocol used for making secure measurements, and • an analysis of the candidate design. CHALLENGE BITS STANDARD PUF RESPONSE BITS PHYSICAL QUANTITY CHALLENGE BITS SENSOR PUF RESPONSE BITS Figure 5.2: A conventional silicon PUF has a binary input and a binary output. The sensor PUF has a binary input, physical quantity being sensed, and a binary output. 74 5.1.3 Security Properties A traditional PUF [55] is a physical device that takes in a challenge and produces a response with the following properties: (1) For a given binary challenge, a PUF always produces the same response. (2) One challenge-response pair leaks nothing about other pairs. (3) The manufacturer of the PUF cannot predetermine the mapping. The variation that we propose extends conventional PUFs by including two inputs: a physical quantity and a traditional binary challenge. This system, which we call a sensor physical unclonable function, has the following properties: (1) For a given challenge and a given sensed quantity, the sensor PUF always produces the same response. (2) One challenge-quantity-response triple leaks nothing about other triples. (3) The manufacturer of the sensor PUF cannot predetermine the challenge-quantity-response mapping. The third property on both lists is known as manufacturer resistance. Here, the manufacturer is considered a potential adversary. From a black-box point of view, a PUF looks like a message authentication code (MAC) operation. From the outside it is difficult to distinguish a true PUF from a MAC operation with an embedded key. If the adversary replaces the PUF with a MAC module, the adversary could predetermine the mapping without the user’s knowledge. Manufacturer resistance is limited in practice. 5.2 Candidate Sensor PUF We present an example of a sensor PUF that measures light level. The inputs to the device are light and a series of challenge bits. The output is a series of response bits. The light sensor PUF is profoundly different from Pappu’s optical PUF [28]. Pappu’s challenge is a beam of coherent light and the response is the light that scatters when the coherent beam passes through a mixture containing tiny beads. Pappu’s optical PUF does not measure a physical quantity. Where a conventional PUF has a single input, the 75 challenge, the sensor PUF has two inputs. Operational protocols for using conventional PUFs dictate that the challenge not be reused in the field. The physical quantity input to a sensor PUF can be repeated in the field without compromising security. It is only the challenge bits that need to be excluded from reuse. 5.2.1 Structure The candidate sensor PUF we propose consists of an array of on-chip photodiodes, a coating, and some on-chip circuitry, as shown in Figure 5.3. The photodiodes are organized in groups of three. A coating containing swirls of dark material in a translucent base is applied onto the sensor area. The nonuniform optical transmittance of the coating results in per-chip variations in the optical sensitivity of each of the photodiodes. 76 COATING PD1 PD2 PD3 PD1 PD2 PD3 PD1 PD2 PD3 offset slope offset slope generator gen generator gen adder adder offset slope generator gen adder GL.1 GL.2 GL.n GR.1 GR.2 GR.n P Q Summation Control Logic RAW BITS output control Stream Cipher challenge Conventional PUF Figure 5.3: The analog portion of the light level sensor PUF includes the coating, the photodiode groups, the switches, the summing junctions, and the analog comparator. The challenge applied to the sensor PUF determines the keystream input to the control circuit, which controls the random selection of left gate signals GL.i and the right gate signals GR.i, which determine the set of sensors that are included in the summations. The left and right sums are compared, producing one raw bit. Each of the three identical photodiodes (P D1 , P D2 , and P D3 ) within a sensor group has its own linear response function. The slope of the response function is generated by photodiode P D3 (see Fig. 5.2.1) and slope generator circuit. The slope is determined by the optical transmittance of the coating covering the photodiode, along with non-varying factors (photodiode area, quantum efficiency, etc) and external factors such as temperature. 77 PD1 PD2 − − OPAMP1 + OPAMP3 + − OPAMP2 D1 D2 + PD3 − + + − V_bias Figure 5.4: a: The offset generator produces a DC voltage that is determined by the optical transmittance of the coating at the sites of photodiodes P D1 and P D2 . b: The slope generator produces a voltage proportional to the light input at photodiode P3 . The offset of the response function for the group is determined by the optical transmittance from the light source to photodiodes P D1 and P D2 (see Fig. 5.2.1). The offset generator produces a DC voltage that is determined by the coating on the sensor PUF as it independently affects each photodiode group. As shown in Fig. 5.3, the currents from each photodiode group are brought to two summing junctions where independent subsets of these currents are added together to produce two currents, P and Q. These currents are compared to produce one bit of the raw binary result. The input challenge determines which of the sensor groups will be included in the summations producing P and Q. The gate signals that control the summations shown in Figure 5.3 are generated by summation control logic. A conventional silicon PUF is used as a component in the architecture, to transform the public challenge into a volatile secret initialization vector for the stream cipher. The stream cipher generates a keystream 78 that is used by the summation control logic for selecting which gate signals to enable at for each comparison. The summation control logic shifts the raw response bits from the comparator into a shift register. 200 300 150 Output of Summations 200 100 50 0 −50 0 −100 −200 −100 −150 −100 200 100 −50 0 50 100 −300 −100 150 −50 0 Sensor Input 50 100 −50 0 Sensor Input 50 100 100 Output of Summations Output of Summations 100 0 −100 50 0 −50 −100 −150 −200 −200 −300 −100 −50 0 Sensor Input 50 100 −250 −100 Figure 5.5: In all four subfigures, the sensor input level is on the x-axis and the electrical response is on the y-axis. Subfigure (a) shows eight photodiode group response lines generated by simulation of the candidate light sensor PUF. The bold line is the sum of the eight lines. Subfigures (b), (c), and (d) show pairs of response lines that occur for different values of the left and right gate signals. Assuming a sensor input value of 10 and assuming that solidline > dashedline is interpreted as a “1”, evaluating the comparisons for the line pairs shown in (b), (c), and (d) gives the raw bit sequence “0”, “1”, “0”. In our simulations, the raw bit sequences are 256 bits long. 79 5.2.2 Protocol The conventional PUF that is used as a component of the sensor PUF must be enrolled before the sensor PUF itself can be enrolled. The exact procedure enrolling the conventional PUF depends on what kind of conventional PUF is used. The design and operation of conventional PUFs is outside the scope of this research but has been developed by the PUF community [27]. The enrollment procedure for the sensor PUF consists of the following steps for each point along the sensor domain: (1) The enrolling party randomly selects (and removes) a challenge-response pair from the database. The challenge is sent to the conventional PUF and the response is used as a cipher key. (2) The enrolling party generates a unique random challenge and sends it to the sensor PUF. (3) The sensor PUF uses the challenge to initialize the measurement logic, generates a vector of raw bits, and sends the raw bits back to the enrolling party, encrypted. (4) The enrolling party decrypts the raw response bits and inserts the challenge-measurement-response triple into the database. Making a measurement with the sensor PUF has the following steps. (1) The querying party sends a challenge to the device. This challenge is used to establish a volatile shared secret key between the querying party and the sensor PUF. (2) The sensor PUF makes a conventional measurement and sends the encrypted conventional measurement to the querying party. This measurement is the claim. (3) The querying party randomly selects (and removes) an entry in the challenge-measurement-response database that matches the claimed measurement. The querying party sends the challenge to the sensor PUF. (4) The sensor PUF initializes the measurement logic using the challenge, performs n comparisons to generate an n-bit vector of raw bits, encrypts them, and sends them to the querying party. 80 (5) The querying party decrypts the raw bits and compares them to the response that is in the challengemeasurement-response database. If the Hamming distance is below a threshold, the measurement claim has been validated. 5.3 Electrical Analysis and Experimental Results In order to throughly develop the analysis of the sensor PUF, the candidate sensor PUF was studied using: • SPICE simulations of the analog modules, • a C program to simulate the statistical operations of the sensor PUF, • analysis according to standard models, and • discrete implementation of critical analog modules. 5.3.1 Assumptions We assume that the optical coating is a non-homogeneous mixture of two epoxy-like materials which attenuate the wavelengths of interest: one material has an optical transmittance of 0.25, and the other material has an optical transmittance of 0.75. Additionally, we assume the transmittances affecting each of the photodiodes are independent and uniformly distributed along the range {0.25, 0.75}. Lastly, we assume that the photodiodes that are operated in the reverse-biased mode have a linear current-versus-light response. 5.3.2 Distribution of Cut Points To evaluate the sensing resolution of the sensor PUF, we analyze the probability distribution of the cut points in the sensing domain. The cut points define the intervals in the sensing domain that can be distinguished by the sensor’s response. Each cut points is the x-coordinate of the intersection of the response lines formed by the left and right summations shown in the previous section and illustrated in Figure 5.5. Since we have designed each sensor group response function to have an independent slope and offset, we can statically analyze the expected crossing points which determine the cut points. 81 The offset of the voltage-versus-light response of each photodiode group follows from the linearity of photodiodes and the simplified Shockley ideal diode equation. VD ID = IS e VT (5.1) We assume that all of the photodiodes have equal intrinsic response, and the differences in their actual response are caused purely by the differences in their coatings. Using this, we can calculate the offset voltage as follows: IP Di,1 VD VDi,1 VDi,1 VDi,2 VDi,1 − VDi,2 = ci,1 IP Di,2 ci,2 = VT (ln ID − ln IS ) ci,1 IP Di,2 − ln IS = VT ln ci,2 ci,1 + ln IP Di,2 − ln IS = VT ln ci,2 = VT ln IP Di,2 − ln IS = VT ln ci,1 ci,2 (5.2) (5.3) (5.4) (5.5) (5.6) (5.7) (5.8) where IP Di,1 is the current in photodiode 1 in photodiode group i. The slopes of the response function are simply proportional to ci,3 , the optical transmittance of the coating affecting photodiode 3 in photodiode group i. The offset generator takes two uniform random variables and calculates their ratio. This results in a uniform ratio distribution, as shown in Figure 5.6. The natural log of the ratio distribution produces the distribution shown in Figure 5.7. 82 0.012 0.010 P(ratio) 0.008 0.006 0.004 0.002 0.000 0 1 2 3 4 5 ratio Figure 5.6: The probability density function of offset current ratios observed in simulation. 0.04 P(offset) 0.03 0.02 0.01 0.00 −30 −20 −10 0 offset 10 20 30 Figure 5.7: The density function of the offset signal of each individual photodiode group. The sum lines are the sum of logs of uniform ratio distributions. Since these distributions have finite mean and variance, we can invoke the central limit theorem to infer that the resulting distribution is approximately Gaussian. To determine the cut point, we considered the random variable that represents the x-coordinate of 83 intersection of the two sum lines that are selected by the challenge. The cut point in the sensor input domain is at: cut = Bb − Ab Am − Bm (5.9) where Bb is the offset of line B, Ab is the offset of line A, Am is the slope of line A, and Bm is the slope of line B. The PDF of the numerator, fnum , is the difference of two independent random variables and is given by the cross-correlation of their density functions. The PDF of the denominator, fden , is also the difference of two independent random variables and is given by the cross-correlation of their density functions. Since we approximate the slopes and offsets of the sum lines as Gaussian, the numerator and denominator are also Gaussian. Simulation results support the zero-mean Gaussian model for these variables. The cut points are the ratio of these zero mean Gaussian variables, and the resulting ratio distribution is Cauchy. This result is further supported by the simulation results, as shown in Figure 5.8. 0.018 0.016 P(cutpoint = x) 0.014 0.012 0.010 0.008 0.006 0.004 0.002 0.000 −100 −50 0 50 sensor input value x 100 Figure 5.8: The green trace shows the probability density function of the cut points in the sensor input domain, as observed in simulation. The red trace shows the Cauchy density function for χ = 0 and γ = 20. 84 5.3.3 Hamming Distance To evaluate whether the raw response of the sensor PUF is brittle with respect to changes in the input, we simulated the raw sensor response. For several randomly chosen challenges and randomly generated instances of the candidate sensor PUF, we selected a reference sensor input value and measured the Hamming distance between the raw response for the reference input and the raw response for sensor input values in the neighborhood of the reference value. 140 Hamming distance 120 100 80 60 40 20 0 −30 −20 −10 0 10 sensor input 20 30 Figure 5.9: Hamming distance for five statistically independent instances of the candidate sensor PUF In Figure 5.9, the Hamming distance is shown for five statistically independent instances of the candidate sensor PUF design. The raw bit vectors are 256 bits in length. A reference bit vector is taken for arbitrary reference sensor input values -10, 0, and 10 using a reference challenge. While continuing to apply the reference challenge, the sensor input value is then swept from -30 to 30, in steps of 1, while the resulting raw output bits are compared with the raw output bits produced for each of the reference sensor inputs. From the figure, we see that the candidate sensor PUF’s raw response is not brittle with respect to the sensor input. For a given challenge, measuring a set of close physical quantities will generate a set of raw bit responses that are close in code space. This makes it suitable for error correction using a linear code. However, it also means that the raw bit responses leak information. If a challenge is repeated (which should 85 never happen in practice), an attacker can infer from the raw bit responses whether the sensor inputs are close or far. From a practical standpoint, this problem is addressed by encrypting the response that is sent back to the querying party. Figures 5.8 and 5.9 define the range and sensitivity of the sensor PUF with respect to the physical quantity being measured. The Cauchy PDF of the cut points peaks at the origin and falls to half of its peak value at x = 20. The cut point density determines the sensitivity (slope) of the Hamming distance to differences in physical input value. From a design standpoint, the gains of the offset generator and slope generator circuits can be tuned to widen or narrow the cut point PDF, which in turn widens or narrows the range of the sensor. Outside the useful range of physical quantity values, the raw response saturates, converging to a fixed pattern for a given challenge. 5.3.4 Verification of Offset Generator The simulations of the behavior of the candidate light sensor PUF assume that the offset generator produces a voltage that is independent of the sensor input. If the offset generator’s output is affected by the sensor input, the response will be nonlinear. Linearity is not a requirement for the light sensor PUF to function, but the assumption does simplify analysis. We validated the independence assumption with SPICE simulation and with experimental data in the lab using discrete optoelectronic components, shown in Figures 5.10 and 5.11. 86 100 offset voltage (mV) 80 60 40 20 0 10−4 10−3 10−2 LED drive current (A) 10−1 Figure 5.10: The offset generator circuit shown in Figure 5.2.1 was constructed and tested using light from an LED driven by a variable current source. The offset voltage is plotted across the range of currents for three different relative transmittances. We see ±2.5% variation over the range. 100 offset voltage (mV) 80 60 40 20 0 10−4 10−3 10−2 LED drive current (A) 10−1 Figure 5.11: SPICE simulation of the offset generator output voltage across a range of light intensity values. 87 5.4 Security Context of Sensor PUFs Sensor modules are often deployed in the field, outside the physical control of those who are deploying them. They are subject to tampering. A motivated attacker can disassemble, break, or modify any part of a sensor module. The attacker can also attempt to substitute an inauthentic sensor. These attacks will be much more difficult to execute against a sensor PUF than a conventional sensor. The most straightforward tampering attack is to decouple the sensing from the challenge-response calculation. For example, if the calculation is performed by a microcontroller with an analog input and the sensor has an analog output connected to the microcontroller, that analog voltage is vulnerable to tampering. 5.4.1 Substitution The resistance of a sensor PUF to being substituted by an inauthentic sensor comes from two things. First, the attacker needs to defeat the conventional PUF within the sensor PUF architecture. Without defeating that, the attacker is unable to establish a shared crypto key. Second, without knowledge of the slopes and offsets of the sensor responses, it is impossible for the attacker to know what response corresponds to a particular physical quantity and a challenge sent by the querying party. 5.4.2 Tampering The main objective of the sensor PUF is to defend against the low-budget attack of tampering with the analog signal that goes from the sensing element to the microprocessor. In the candidate light sensor PUF, tampering with the analog signal is not a low-budget attack. The attacker could probe the on-chip signals while sweeping the physical quantity with the goal of learning the slopes and offsets of each of the photodiode groups. This would have to be done without disturbing the optical coating. This could only be done in a properly-equipped lab by skillful staff. The candidate sensor PUF thus significantly raises the cost of a tampering attack. There is a tradeoff between tamper resistance and robustness of operation. Assuming that protocol outlined in subsection 5.2.2 is followed, a threshold is applied to the Hamming distance between the enrolled response and the response produced when the sensor has been deployed. Beyond this threshold, the sensor’s 88 response will be rejected. To an extent, the threshold can minimized by extracting more challenge-responsepairs during enrollment so the distance between the claim and the enrolled measurement is small, thus minimizing the expected distance of the raw responses. Even without constraints on the duration of the enrollment process or the size of the database, physical variation puts a practical lower bound on the threshold that can be used for reliable results. If the threshold is too high, the attacker can perform a less accurate and therefore cheaper extraction of the sensor’s physical unique features, thus weakening the security of device. The optimal threshold balances these concerns. 5.4.3 Manufacturer Resistance Many sensors are high-volume, low-cost items and are subject to the risks of outsourced fabrication and assembly [56]. If there are significant monetary or strategic incentives for breaking the security of the system, and not too much risk, then a rational model of human behavior predicts that such an attempt will be made. The issue of manufacturer resistance is not absolute. It has gradations of strength in the context of conventional PUFs and the same applies to sensor PUFs. Since the application of the optical coating is a security-critical procedure, it can be delayed until after the chips are delivered from the foundry, assuming they are not packaged. This somewhat reduces the risk of the manufacturer predetermining the responses instead of faithfully executing the design that produces random a challenge-response function. Nevertheless, a determined adversary with control over the mask and chip fabrication can manipulate the behavior of conventional silicon PUFs, sensor PUFs, and practically any other IC. 5.4.4 Sensor Decoupling Sensor decoupling means that the entire sensor unit is separated from the physical quantity it is intended to measure. For example, a thermometer can be placed inside an insulated box, causing it to report the temperature inside the box instead of the ambient temperature. This threat depends on what is being sensed and how the sensor is deployed. This important problem is unfortunately outside the scope of the assurances provided by a sensor PUF. 89 5.5 Security Analysis The security guarantee the sensor PUF aims to provide to the reader is that: (1) The response reflects the current physical observation; one made between when the challenge was sent and when the response was received by the reader. (2) The response is authentic; it was unambiguously generated by the individual sensor PUF the reader intended to query. (3) It the response is accurate; it conveys unambiguous and correct information about the quantity being measured. Any threat to the security of the sensor PUF must entail at least one of the security guarantees being violated. For example, there exists the threat that a sensor PUF can be cloned. How that threat materializes depends on the attack or combination of attacks chosen by the attacker. 5.5.1 Attack Model The prospective sensor PUF attacker can choose from several possible attacks. The attacks differ in their cost, in what they aim to achieve, and in their requirements. (1) Black-box Passive: The attacker eavesdrops on the communication between the reader and the sensor PUF (e.g., over a network). (2) Black-box Half-active: The attacker (without authorization) sends requests to the sensor PUF and receives the responses. (3) Black-box Active: The attacker acts as an active man in the middle between the reader and the sensor PUF. (4) Passive Probing: The attacker directly observes secret bits or signals internal to the sensor PUF. (5) Active Probing: The attacker forces bits internal to the sensor PUF. 90 (6) Incremental Reassembly: The attacker begins by collecting many challenge-response-measurement triples from the sensor PUF. Then then attacker removes the coating from one photodiode group and then incrementally adds the coating back while querying the sensor PUF. When the responses match the responses obtained before tampering, the attacker concludes that the amount of coating added back provides the same optical transmittance as the original coating over that photodiode group. The attacker then repeats this process at the next photodiode group until he has learned the optical transmittance of the coating over every photodiode group. (7) Reader host compromise: The attacker penetrates the reader host, gaining root-level access. (8) Reader software compromise: The attacker replaces the authentic reader software with malicious reader software. This includes rollback to a previous software version. (9) Sensor-Environment Decoupling: As described in Section 5.4.4. 5.5.2 Attack Trees We approach each of threats as the topmost node of an attack tree [57]. The sensor PUF security guarantees given above can be reformulated as attack goals. The attacker aims to: (1) Have the reader accept old sensor PUF measurements as being fresh. (2) Clone a sensor PUF. (3) Have the reader believe that that the sensed quantity is X when it is Y, where X and Y are significantly different. 5.5.2.1 Replaying an Old Measurement One attack goal is to replay an old measurement. This can succeed if the challenge is repeated. Normally the challenge is never repeated. As shown in Figure 5.12, there are two ways to get the reader to repeat the challenge. 91 Replay an Old Measurement Reader Reuses Challenge Compromise Reader Host Distribute Faulty Software to Reader Figure 5.12: Attack tree for replay. The attacker can compromise the computer that is acting as the reader (i.e., getting root-level access), or the attacker can undermine the reader software, arranging for it to reuse challenges. If the reader functionality is offloaded to trusted hardware, it is possible to ensure that compromising the reader host system will not enable challenge reuse. With a trusted reader module, the reader module can sign the measurement it gets from the sensor PUF. The application that uses the data still needs to ensure that its communication with the trusted reader module is trustworthy, including resistance to replay. 5.5.2.2 Cloning a Sensor PUF Cloning a sensor PUF is another attack goal. The black-box passive sniffing attack is to observe and record a large number interactions between the reader and the sensor PUF. Eventually the eavesdropper will have observed sufficiently many readings that he can answer future queries from memory. This kind of passive attack is not practical because a very large number of observations is needed and the actual rate of sensor PUF read operations in the benign application is unlikely to be high. 92 Obtain a Clone of a Sensor PUF Inremental Reassembly Black−box Passive Sniffing Passive Probing Figure 5.13: Attack tree for cloning. Passive probing, as shown in Figure 5.13 is another approach to cloning a sensor PUF. It entails measuring signals while the sensor PUF is operating. For example, the analog outputs of each of the photodiode groups can be probed. For each photodiode group, a minimum of two measurements is needed to determine the optical intensity response line. When the optical intensity response line for every photodiode group is known to the attacker, the sensor PUF can be cloned or emulated easily. However, the attacker needs to probe the signals without disturbing the signals or the coating on the surface of the chip. The incremental reassembly attack, described above, is invasive but does not involve die probing. It is similar to guessing a password letter by letter. Properly designed systems preclude this type of incremental guessing in a black-box attack model. However, when invasive techniques are applied, incremental guessing becomes possible. As with the passive probing attack, incremental reassembly provides the attacker with complete information regarding the randomness resulting from the optical coating on the sensor PUF. However, a working cloning attack still requires cloning the conventional PUF in the sensor PUF. 5.5.2.3 Inducing Error in the Reported Measurement To induce errors in the measurements that the reader gets from the sensor PUF, the attacker can use the attacks that at used for cloning or replay, as shown in Figure 5.14. 93 Induce Error in Measurement Reader−side Compromise Compromise Reader OS Alter Authentic Sensor PUF’s Physical Input Tunnel Queries to Cloned SPUF Distribute Faulty Software to Reader Clone SPUF Incremental Black−box Reassembly Half−active Figure 5.14: Attack tree for inducing errors in measurement. Additionally, execute the decoupling attack described in 5.4.4. The attacker can keep the authentic sensor PUF in place and just alter physical input that arrives at the sensor. For example, the attacker can put a piece of tinted glass in front of the light sensor PUF to cause it to under-report the actual light level. 5.6 Future Work in Sensor PUFs The sensor PUF is a promising mechanism for securing remote sensors. The effects of temperature are very important for conventional PUFs and sensor PUFs as well [58]. We are currently evaluating temperature effects in the candidate sensor PUF design. The candidate design uses a conventional PUF for communication security. This allows the raw bits from the comparator to be sent back to the querying party, which in turn allows fuzzy matching to be performed by the querying party. Error correction is well established for conventional PUFs [59], and can be applied to sensor PUFs as well. If error correction coding is applied in the sensor PUF to obtain a precisely repeatable response across time and environmental conditions, the sensor PUF would obtain interesting capabilities. For example, a message could be encoded so that the sensor PUF can decrypt it only under 94 bright light, or only in the dark, or, for an alcohol sensor PUF, only when the alcohol concentration is in a certain range. The raw response bits generated by the comparator leak information about what the raw response will be for other physical quantities with the same challenge. In our candidate design, we encrypt the output to prevent an attacker from exploiting this leakage. It would be better if the sensor PUF had this property as an integral part of the basic challenge-measurement-response functionality. How to achieve this is an open problem. Chapter 6 Future Work in Hardware Security Hardware security has the potential to protect existing technology applications and to enable new ones. Future work at the intersection of hardware testing and hardware security can maximize its enabling effect by focusing on providing practical assurance to system designers and ultimately to system users. Practical assurance is a fusion of security and reliability. Although engineers have traditionally kept security and reliability separate, the distinction is in many cases not useful. System designers have the burden of considering the failure of each of the subsystems of their design. Practical assurance would provide allencompassing statements about the likelihood of a subsystem failing. This would promote high-level system design, as opposed to system design where the low-level details of each of the subsystems are considered at multiple levels of system abstraction and at multiple phases of the design process. Practical assurance has the promise of simplifying the designer’s job, improving the dependability of the end product, and reducing design costs in both labor and time. A recurring theme in this dissertation is the trade-off between security and testability. A common way of making progress in a trade-off space is to introduce a figure of merit that is the conjunction of the desired features. The simplest such figure of merit in our context would be the security-testability product. Abstractly, a figure of merit like this would guide progress toward practical solutions that do not favor one metric over the other (e.g., good security with bad testability). However, although quantitative metrics for security are being researched [60], they are not yet mature. Rigorously defining a combined figure of merit like the security-testability product would be an important step toward architectures that can provide practical assurance. Chapter 7 Publications (1) Security-Aware SoC Test Access Mechanisms, Rosenfeld, K.; Gavas, E.; Karri, R.; 2011 IEEE VLSI Test Symposium (2) Roadmap for Trusted Hardware; Part II: Trojan Detection Solutions and Design-for-Trust Challenges, Tehranipoor, M.; Salmani, H.; Zhang, X.; Wang, M.; Karri, R.; Rajendran, J.; Rosenfeld, K.; IEEE Computer Magazine, 2011 (3) Security and Testing; Rosenfeld, K.; Book Chapter in Introduction to Hardware Security and Trust, Edited by Mohammad Tehranipoor and Clifford Wang (4) Security Challenges During VLSI Test, Hely, D.; Rosenfeld, K.; 2011 International NEWCAS Conference (5) Sensor physical unclonable functions, Rosenfeld, K.; Gavas, E.; Karri, R.; 2010 IEEE International Symposium on Hardware-Oriented Security and Trust (HOST) (6) Attacks and Defenses for JTAG, Rosenfeld, K.; Karri, R., IEEE Design and Test of Computers, 2010 (7) Trustworthy Hardware: Identifying and Classifying Hardware Trojans, Karri, R.; Rajendran, J.; Rosenfeld, K.; Tehranipoor, M., IEEE Computer Magazine, 2010 (8) JTAG Attacks and Defenses, Rosenfeld, K.; Karri, R., Presented at the 2009 IEEE North Atlantic Test Workshop 97 (9) Volleystore: A Parasitic Storage Framework, Rosenfeld, K.; Sencar, H.; Memon, N.; 2007 IEEE Information Assurance and Security Workshop (10) A study of the robustness of PRNU-based camera identification, Rosenfeld, K.; Sencar, H.; 2007 SPIE Media Forensics and Security Bibliography [1] IEEE Std 1149.1-2001, Test Access Port and Boundary-Scan Architecture. [2] F. DaSilva, Y. Zorian, L. Whetsel, K. Arabi, and R. Kapur, “Overview of the IEEE P1500 standard,” vol. 1, sep. 2003, pp. 988 – 997. [3] M. Sipser, Introduction to the Theory of Computation, 1st ed. 1996. International Thomson Publishing, [4] A. Rukhin, J. Soto, J. Nechvatal, E. Barker, S. Leigh, M. Levenson, D. Banks, A. Heckert, J. Dray, S. Vo, A. Rukhin, J. Soto, M. Smid, S. Leigh, M. Vangel, A. Heckert, J. Dray, and L. E. B. Iii, “A statistical test suite for random and pseudorandom number generators for cryptographic applications,” 2001. [5] B. Yang, K. Wu, and R. Karri, “Scan based side channel attack on dedicated hardware implementations of data encryption standard,” in Proceedings of the IEEE Int Test Conference, 2004, pp. 339–344. [6] D. Hely, M.-L. Flottes, F. Bancel, B. Rouzeyre, N. Berard, and M. Renovell, “Scan design and secure chip [secure ic testing],” in On-Line Testing Symposium, 2004. IOLTS 2004. Proceedings. 10th IEEE International, july 2004, pp. 219 – 224. [7] B. Yang, K. Wu, and R. Karri, “Secure scan: a design-for-test architecture for crypto chips,” in Proceedings of IEEE/ACM Design Automation Conference, 2005, pp. 135–140. [8] J. Lee, M. Tehranipoor, and J. Plusquellic, “A low-cost solution for protecting IPs against scan-based side-channel attacks,” in Proc. VLSI Test Symp. Citeseer, 2006, pp. 94–99. [9] R. Rajsuman, “Design and test of large embedded memories: An overview,” Design Test of Computers, IEEE, vol. 18, no. 3, pp. 16 –27, may 2001. [10] B. Yang and R. Karri, “Crypto bist: A built-in self test architecture for crypto chips,” in Proceedings of the 2nd Workshop on Fault Diagnosis and Tolerance in Cryptography (FDTC05), 2005, pp. 95–108. [11] The Dishnewbies Team. Jtag guide. [Online]. Available: http://dishnewbies.com [12] Free60. Free60 smc hack. [Online]. Available: http://www.free60.org/SMC Hack [13] K. Rosenfeld and R. Karri, “Attacks and defenses for jtag,” Design Test of Computers, IEEE, vol. 27, no. 1, pp. 36 –47, jan. 2010. [14] L. Sourgen, “Us patent 5264742, security locks for integrated circuit,” 1993. [15] R. Buskey and B. Frosik, “Protected jtag,” in Parallel Processing Workshops, 2006. ICPP 2006 Workshops. 2006 International Conference on, 0-0 2006, pp. 8 pp.–414. 99 [16] C. Clark and M. Ricchetti, “A code-less bist processor for embedded test and in-system configuration of boards and systems,” in Test Conference, 2004. Proceedings. ITC 2004. International, oct. 2004, pp. 857 – 866. [17] C. J. Clark. Business considerations for systems with rambased fpga configuration. [Online]. Available: http://www.intellitech.com/pdf/FPGA-security-FPGA-bitstream-Built-in-Test.pdf [18] V. Iyengar, K. Chakrabarty, and E. Marinissen, “Efficient test access mechanism optimization for system-on-chip,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 22, no. 5, pp. 635 – 643, may 2003. [19] K. Rosenfeld and R. Karri, “Security-Aware SoC Test Access Mechanisms,” in Proceedings of the 2011 IEEE VLSI Test Symposium, 2011. [20] K. Koscher, A. Czeskis, F. Roesner, S. Patel, T. Kohno, S. Checkoway, D. McCoy, B. Kantor, D. Anderson, H. Shacham, and S. Savage, “Experimental security analysis of a modern automobile,” in Security and Privacy (SP), 2010 IEEE Symposium on, may 2010, pp. 447 –462. [21] D. Halperin, T. Heydt-Benjamin, B. Ransford, S. Clark, B. Defend, W. Morgan, K. Fu, T. Kohno, and W. Maisel, “Pacemakers and implantable cardiac defibrillators: Software radio attacks and zero-power defenses,” in Security and Privacy, 2008. SP 2008. IEEE Symposium on, may 2008, pp. 129 –142. [22] IEEE Std 1532-2002, In-System Configuration of Programmable Devices. [23] satcardsrus.com. Secure loading advanced blocker 3m’s for rom images. [Online]. Available: http://www.satcardsrus.com/dish net%203m.htm [24] The Dishnewbies Team. JTAG guide. [Online]. Available: http://dishnewbies.com/jtag.shtml [25] B. Yang, R. Karri, and K. Wu, “Secure scan: a design-for-test architecture for crypto chips,” in Proceedings of the IEEE/ACM Design Automation Conference, 2005, pp. 135–140. [26] F. Novak and A. Biasizzo, “Security extension for IEEE Std 1149.1,” Journal of Electronic Testing, vol. 22, no. 3, pp. 301–303, 2006. [27] G. Suh and S. Devadas, “Physical unclonable functions for device authentication and secret key generation,” in Proceedings of IEEE/ACM Design Automation Conference, June 2007, pp. 9–14. [28] R. Pappu, “Physical one-way functions,” Massachusetts Institute of Technology, Tech. Rep., 2001. [29] B. Gassend, D. Clarke, M. V. Dijk, and S. Devadas, “Silicon physical random functions,” in Proceedings of the ACM Computer and Communication Security Conference, 2002, pp. 148–160. [30] M. Majzoobi, F. Koushanfar, and M. Potkonjak, “Testing techniques for hardware security,” in Proceedings of IEEE International Test Conference, Oct. 2008, pp. 1–10. [31] M. J. B. Robshaw, “Stream ciphers,” RSA Laboratories, Tech. Rep. TR-701, 1995. [32] C. D. Canniere and B. Preneel, “Trivium specifications,” ECRYPT Stream Cipher Project, 2006. [33] R. Canetti, “HMAC: keyed-hashing for message authentication,” RFC, vol. 2104, p. 2104, 1997. [34] B. Arazi, “Message authentication in computationally constrained environments,” IEEE Transactions on Mobile Computing, vol. 8, no. 7, pp. 968–974, July 2009. [35] X. Lai, R. Rueppel, and J. Woollven, “A fast cryptographic check-sum algorithm based on stream ciphers,” in Advances in Cryptology-AusCrypt. Springer-Verlag, 1992, pp. 339–348. [36] Xilinx. Spartan-3e fpga family: Introduction and ordering information. [Online]. Available: http://www.xilinx.com/support/documentation/data sheets/ds312.pdf 100 [37] Fwaggle. Howto: JTAG interface on a Dish http://www.hungryhacker.com/articles/misc/dish3700 jtag 3700 receiver. [Online]. Available: [38] A. Huang, “Keeping secrets in hardware: The Microsoft XBox case study,” in Proceedings of Workshop on Cryptographic Hardware and Embedded Systems, 2002, pp. 213–227. [39] Y. Jin and Y. Makris, “Hardware trojan detection using path delay fingerprint,” in Proceedings of IEEE International Workshop on Hardware-Oriented Security and Trust, June 2008, pp. 51–57. [40] D. Agrawal, S. Baktir, D. Karakoyunlu, P. Rohatgi, and B. Sunar, “Trojan detection using IC fingerprinting,” in Proceedings of IEEE Symposium on Security and Privacy, May 2007, pp. 296–310. [41] D. Dean. R. Collins, “Trust, a proposed plan for trusted integrated circuits,” http://www.dtic.mil/cgibin/GetTRDoc?AD=ADA456459. [42] S. T. King, J. Tucek, A. Cozzie, C. Grier, W. Jiang, and Y. Zhou, “Designing and implementing malicious hardware,” USENIX Workshop on Large-Scale Exploits and Emergent Threats, 2008. [43] X. Wang, M. Tehranipoor, and J. Plusquellic, “Detecting malicious inclusions in secure hardware: Challenges and solutio ns,” IEEE International Workshop on Hardware-Oriented Security and Trust, pp. 15 –19, jun. 2008. [44] L.-W. Kim and J. D. Villasenor, “A system-on-chip bus architecture for thwarting integrated circuit trojan horses,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. PP, no. 99, pp. 1 –5, 2010. [45] W. Diffie and M. Hellman, “New directions in cryptography,” IEEE Transactions on Information Theory, vol. 22, no. 6, pp. 644 – 654, nov. 1976. [46] J. Holleman, B. Otis, S. Bridges, A. Mitros, and C. Diorio, “A 2.92 microwatt hardware random number generator,” Proceedings of the 32nd European Solid-State Circuits Conference, pp. 134 –137, sep. 2006. [47] K. Chakrabarty, “Optimal test access architectures for system-on-a-chip,” ACM Transactions on Design Automation of Electronic Systems, vol. 6, pp. 26–49, 2001. [48] B. Schoeneman and S. Blankenau, “Secure sensor platform (SSP) for materials’ sealing and monitoring applications,” Proceedings of International Carnahan Conference On Security Technology, pp. 29 – 32, Oct. 2005. [49] F. Bagci, T. Ungerer, and N. Bagherzadeh, “SecSens - Security architecture for wireless sensor networks,” Proceedings of International Conference on Sensor Technologies and Applications, pp. 449 –454, Jun. 2009. [50] G. Suh and S. Devadas, “Physical unclonable functions for device authentication and secret key generation,” Proceedings of ACM/IEEE Design Automation Conference, pp. 9 –14, Jun. 2007. [51] D. Roy, J. Klootwijk, N. Verhaegh, H. Roosen, and R. Wolters, “Comb capacitor structures for on-chip physical uncloneable function,” IEEE Transactions on Semiconductor Manufacturing, vol. 22, no. 1, pp. 96–102, Feb. 2009. [52] V. Vivekraja and L. Nazhandali, “Circuit-level techniques for reliable physically uncloneable functions,” IEEE International Workshop on Hardware-Oriented Security and Trust, pp. 30–35, Jul. 2009. [53] E. Diehl and T. Furon, “Copy watermark: closing the analog hole,” Proceedings of IEEE International Conference on Consumer Electronics, pp. 52 – 53, Jun. 2003. [54] J. Lukas, J. Fridrich, and M. Goljan, “Digital camera identification from sensor pattern noise,” IEEE Transactions on Information Forensics and Security, vol. 1, no. 2, pp. 205 – 214, Jun. 2006. 101 [55] B. Gassend, D. Clarke, M. van Dijk, and S. Devadas, “Controlled physical random functions,” Proceedings of Computer Security Applications Conference, pp. 149 – 160, 2002. [56] D. Agrawal, S. Baktir, D. Karakoyunlu, P. Rohatgi, and B. Sunar, “Trojan detection using IC fingerprinting,” Proceedings of IEEE Symposium on Security and Privacy, pp. 296–310, May 2007. [57] Bruce Schneier. Attack trees. [Online]. Available: http://www.schneier.com/paper-attacktrees-ddjft.html [58] G. Qu and C.-E. Yin, “Temperature-aware cooperative ring oscillator PUF,” Proceedings of IEEE International Workshop on Hardware-Oriented Security and Trust, pp. 36–42, July 2009. [59] M.-D. Yu and S. Devadas, “Secure and robust error correction for physical unclonable functions,” IEEE Design Test of Computers, vol. 27, no. 1, pp. 48–65, Jan.-Feb. 2010. [60] T. Heyman, R. Scandariato, C. Huygens, and W. Joosen, “Using security patterns to combine security metrics,” in Availability, Reliability and Security, 2008. ARES 08. Third International Conference on, march 2008, pp. 1156 –1163.