Download Security and Testing - Information Systems and Internet Security

Document related concepts

Electromagnetic compatibility wikipedia , lookup

Fault tolerance wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Opto-isolator wikipedia , lookup

Portable appliance testing wikipedia , lookup

JTAG wikipedia , lookup

Transcript
Security and Testing
by
Kurt Rosenfeld
M.S., City College of New York, 2004
B.S., City College of New York, 2002
A thesis submitted to the
Faculty of the Graduate School of the
Polytechnic Institute of NYU in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Department of Computer Science and Engineering
2012
Microfilm or copies of this dissertation may be obtained from:
UMI Dissertation Publishing
ProQuest CSA
789 E. Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106-1346
Vita
Kurt Rosenfeld was born in 1972, in Palo Alto, California. He received the B.S.
degree in Electrical Engineering from The City College of New York in 2002 and the
M.S. degree in Electrical Engineering from The City College of New York in 2004.
From 2005 to the present, he has been in the Information Systems and Internet
Security Lab at Polytechnic institute of NYU, studying with Professors Ramesh
Karri and Nasir Memon.
His research interests include hardware security, distributed system security, and
testing. He is an engineer at Google, Inc.
Acknowledgements
I would like to express my gratitude for all of the help I got from my family,
advisors, friends, colleagues, and my employer. Without your support I would have
failed. Your patience, generosity, and encouragement have been my extremely good
fortune.
This work was partly supported by NSF award numbers 0831349 and 0621856.
Abstract
This dissertation presents research on improving the security of computing platforms at a physical and logical level. The main contributions are to improve the
security of:
1. test data communication between chips
2. test data communication within chips
3. communication between sensors and chips
4. verification of chip authenticity
We investigated the security of IEEE 1149.1 JTAG and studied existing attacks.
We invented two new attacks and experimentally verified them. After generalizing
the threats, we designed and implemented a security-enhanced backwards compatible
version of JTAG.
We identified security vulnerabilities that stem from the use of shared test onchip test data wiring in system-on-chip (SoC) designs, particularly where trusted
and untrusted cores coexist. We developed an efficient architecture and protocol
that mitigates test-related risks.
We extended the concept of the physical unclonable functions to encompass sensors. The result is a sensor whose measurement can be verified by the logic inside
the trust perimeter.
We propose countermeasures to the growing problem of counterfeit components.
We developed an inexpensive end-to-end scheme for ensuring the authenticity of
parts received by a system integrator.
The four platform security enhancements we developed complement each other.
They solve non-overlapping problems that exist today and they can be applied individually or together. Applied together, they significantly raise the bar for platform
security.
Contents
Chapter
1 Introduction
1
1.1
The Core Root of Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Trustworthy Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.3
Testing
2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Security of Digital System Testing
2.1
3
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.1.1
Development of Test Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.1.2
Example: Testing a Two-bit State Machine . . . . . . . . . . . . . . . . . . . . . . . .
4
2.1.3
Fault Testing versus Trojan Detection . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.1.4
VLSI Testing: Goals and Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.1.5
Conflict Between Testability and Security . . . . . . . . . . . . . . . . . . . . . . . . .
12
Scan-based Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.2.1
Scan-based Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.2.2
Countermeasures for Scan Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.3
BIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.4
JTAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.4.1
JTAG hacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.4.2
JTAG Defenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.2
v
2.5
2.6
2.7
SoC test Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.5.1
SoC Test Hacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.5.2
Defenses for SoC Test Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
Emerging Areas of Test Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
2.6.1
OBD-II for Automobile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
2.6.2
Medical Implant Interface Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
Recapitulation and Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3 JTAG Security
33
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.2
JTAG Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.2.1
BYPASS Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.2.2
EXTEST Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.2.3
INTEST Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.3
JTAG Attacks
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
3.3.1
Sniff Secret Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.3.2
Read Out Secret . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.3.3
Obtain Test Vectors and Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
3.3.4
Modify State of Authentic Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
3.3.5
Return False Responses to Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
3.3.6
Forcing TMS and TCK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
3.4
Prior Work on Attacks and Defenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
3.5
Defenses for JTAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
3.5.1
Secure JTAG Communication Protocol
. . . . . . . . . . . . . . . . . . . . . . . . . .
48
3.5.2
Level 0 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
3.5.3
Level 1 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
3.5.4
Level 2 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
vi
3.5.5
3.6
3.7
Level 3 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
Costs and Benefits Associated with JTAG Defenses . . . . . . . . . . . . . . . . . . . . . . . .
52
3.6.1
Die Area Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
3.6.2
Test Time Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
3.6.3
Operational Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
3.6.4
Impact of Defenses on Known Threats . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
Conclusion
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Testing Cores in a System on Chip
4.1
55
57
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
4.1.1
Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
4.1.2
Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
4.1.3
Core Test Wrappers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
4.2
Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
4.3
Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
4.3.1
Security-enhanced Test Wrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
4.3.2
A Security Overwrapper for Prewrapped Cores . . . . . . . . . . . . . . . . . . . . . .
66
4.3.3
Interoperability with Noncompliant Cores . . . . . . . . . . . . . . . . . . . . . . . . .
66
Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
4.4.1
Die Area Cost
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
4.4.2
Test Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
4.4.3
Effort for SoC Integrator
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
4.4
4.5
5 Integrity and Authenticity of Sensors
5.1
71
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
5.1.1
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
5.1.2
Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
vii
5.1.3
5.2
5.3
5.4
5.5
5.6
Security Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
Candidate Sensor PUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
5.2.1
Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
5.2.2
Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
Electrical Analysis and Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
5.3.1
Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
5.3.2
Distribution of Cut Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
5.3.3
Hamming Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
5.3.4
Verification of Offset Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
Security Context of Sensor PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
5.4.1
Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
5.4.2
Tampering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
5.4.3
Manufacturer Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
5.4.4
Sensor Decoupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
5.5.1
Attack Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
5.5.2
Attack Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
Future Work in Sensor PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
6 Future Work in Hardware Security
95
7 Publications
96
Bibliography
98
Figures
Figure
2.1
A two-bit counter with synchronous reset has four states. From each state, there are two
possible next states. This realization provides an output signal that is asserted when the
counter is in state S03. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
5
The machine behaves similarly to the machine shown in Figure 2.1, but deviates for certain
rare inputs. Starting in initial state S00, if RST is given the sequence 0,1,0,0,1,0,0,0 the
machine enters state S23, at which point the behavior of the system deviates from that shown
in Figure 2.1. S23 is a terminal state. The only way to exit S23 is to reinitialize the system
(e.g., cycle the power). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.3
A cascadable section of a synchronous binary counter with synchronous reset. . . . . . . . . .
10
2.4
The simplest scan flip-flop cell is simply composed of a multiplexer and a regular D flip-flop.
The Q output of one scan cell can be connected to the TEST INPUT of another scan cell,
enabling a chain configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5
13
Secure Scan state diagram. The only way to get from secure mode, where the mission key
is loaded, to insecure mode, where the chip is testable, is to go through a power cycle reset,
which wipes all volatile state variables.
2.6
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
Secure Scan architecture. The mirror key register (MKR) is loaded only when Load Key is
active, which is controlled by the state machine shown in Figure 2.5.
. . . . . . . . . . . . .
16
ix
2.7
Bed of nails test fixture. Automated test equipment (ATE) generates stimulus signals and
measures responses. The ATE is connected to the test fixture, which contains one nail per
test channel. Each nail is spring-loaded so it maintains a controlled pressure when contacting
the test points on the printed circuit board being tested. . . . . . . . . . . . . . . . . . . . . .
2.8
18
The JTAG state machine. There are 16 states. The TMS signal determines the next state.
The SHIFT DR state is used for applying stimuli and collecting responses. From any state,
the TEST LOGIC RESET state can be reached by holding TMS high for five clock cycles. .
2.9
20
A typical JTAG system. TMS, TCK, and TRST are bussed to all of the devices. TDO of
each component is connected to TDI of the next component, thereby forming a daisy-chain
topology.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.10 The essential components of a basic JTAG implementation include a test access port state
machine, an instruction register, one or more data registers, and an output multiplexer. Each
chain of scan flip-flop cells (internal or boundary) appears to JTAG as a data register that
can be selected with the appropriate instruction. . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.11 A chain of scan cells is used for distributing keys to each of the cores. The scan cells are
configured not to expose key bits at their outputs while they are being shifted. . . . . . . . .
3.1
29
The typical deployment of JTAG is a chain of several devices on a printed circuit board. Each
device may come from a different vendor. The test mode select (TMS), test clock (TCK), and
test reset (TRST) signals are typically common to all chips. The test data in (TDI) signal and
test data out (TDO) signals loop through the chips. The path returns to the source, which is
usually either a PC or an embedded microcontroller, functionally called a “system controller.” 34
x
3.2
Conceptual security model: A set of attackers A1, A2, and A3 have a set of goals G1 through
G6. Each attacker has a set of attack capabilities, some or all of which are masked by defenses
that are in place. There is a set of attacks, K1 through K4, each of which requires a certain
set of unmasked attack capabilities. Each attack can be used to reach some set of goals. This
example shows that attacker A1 can achieve goals G2 and G4 since it has capabilities P2 and
P4, which are the requirements for attack K2. Attackers A2 and A3 do not have sufficient
unmasked capabilities to execute any of the attacks. . . . . . . . . . . . . . . . . . . . . . . .
36
3.3
The attacker obtains secret data by sniffing the JTAG data path.
. . . . . . . . . . . . . . .
38
3.4
The attacker obtains an embedded secret by forcing test vectors onto the JTAG lines. . . . .
39
3.5
The attacker obtains a copy of the test vectors and normal responses of a chip in the JTAG
chain. This can be a passive attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6
The attacker can intercept test vectors that are sent to another chip, and can send false
responses to the tester.
3.7
40
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
A Philips 8052 (the system controller) was programmed to keep one of its pins low. The I-V
curve for this output driver was extracted using a pulsed I-V measurement. A Xilinx Spartan
3e was programmed to keep one of its pins high. This pin’s I-V curve was also extracted.
The result is shown. The solid line is the FPGA; the dashed line is the microcontroller. The
intersection is at 2.1V, exceeding VIH for most 3.3V logic.
3.8
. . . . . . . . . . . . . . . . . . .
A length of PCB wiring connects the hijacker to the JTAG master. This allows the attacker
to inject short pulses onto the wiring without being hindered by the master.
3.9
44
. . . . . . . . .
45
We define four levels of assurance. Levels correspond to the set of assurances that are provided. 49
xi
3.10 Area overhead is shown for the protection levels 1 through 3, from bottom to top. The cost
of the security enhancements independent of design complexity, so the percentage overhead
is lower for more complex designs. The four levels of assurance provide progressively higher
levels of assurance. An indication of the area cost of each protection level is given by the
number of additional FPGA slices used by the enhanced JTAG circuitry. These figures are
for a Spartan 3e. There are no fuses in the FPGA so fuses are modeled as hard-coded bit
vectors. Overhead in an ASIC will be less.
4.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
Data sent from the test controller to core 2 passes through core 1, giving core 1 an opportunity
to intercept it. Likewise, data passing from core 2 back to the tester passes through core 3,
giving core 3 an opportunity to intercept it. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
A chain of scan cells is used for distributing keys to each of the cores. The scan cells are
configured not to expose key bits at their outputs while they are being shifted. . . . . . . . .
4.3
59
62
The key setup scan chain conveys data from the test controller to the core wrapper key
registers without allowing it to be sniffed or modified by other cores. Other than the basic
distributed shift register functionality, the only extra functionality we require of our scan cell
is an output inhibit input (O INH) to ensure that the key is not leaked during shifting. After
the tester has the key bits shifted to their intended location, the tester deasserts the output
inhibit signal so that the cores receive their key data. . . . . . . . . . . . . . . . . . . . . . . .
4.4
64
In a typical security-enhanced wrapped core, a word of compressed test data arrives via the
parallel data input, is decrypted instantaneously, decompressed and applied to the inputs
of the core’s scan chains. The outputs are compressed, encrypted, and sent out. Standard
wrapper components are not shown, such as the parallel bypass. . . . . . . . . . . . . . . . .
5.1
The naı̈ve secure sensor architecture does not bind the sensing with the cryptography, allowing
the analog link between the sensing element and the crypto processor to be easily attacked. .
5.2
65
72
A conventional silicon PUF has a binary input and a binary output. The sensor PUF has a
binary input, physical quantity being sensed, and a binary output. . . . . . . . . . . . . . . .
73
xii
5.3
The analog portion of the light level sensor PUF includes the coating, the photodiode groups,
the switches, the summing junctions, and the analog comparator. The challenge applied to
the sensor PUF determines the keystream input to the control circuit, which controls the
random selection of left gate signals GL.i and the right gate signals GR.i, which determine
the set of sensors that are included in the summations. The left and right sums are compared,
producing one raw bit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4
76
a: The offset generator produces a DC voltage that is determined by the optical transmittance
of the coating at the sites of photodiodes P D1 and P D2 . b: The slope generator produces a
voltage proportional to the light input at photodiode P3 . . . . . . . . . . . . . . . . . . . . . .
5.5
77
In all four subfigures, the sensor input level is on the x-axis and the electrical response is on
the y-axis. Subfigure (a) shows eight photodiode group response lines generated by simulation
of the candidate light sensor PUF. The bold line is the sum of the eight lines. Subfigures (b),
(c), and (d) show pairs of response lines that occur for different values of the left and right
gate signals. Assuming a sensor input value of 10 and assuming that solidline > dashedline
is interpreted as a “1”, evaluating the comparisons for the line pairs shown in (b), (c), and
(d) gives the raw bit sequence “0”, “1”, “0”. In our simulations, the raw bit sequences are
256 bits long.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
5.6
The probability density function of offset current ratios observed in simulation. . . . . . . . .
82
5.7
The density function of the offset signal of each individual photodiode group. . . . . . . . . .
82
5.8
The green trace shows the probability density function of the cut points in the sensor input
domain, as observed in simulation. The red trace shows the Cauchy density function for χ = 0
and γ = 20.
5.9
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
Hamming distance for five statistically independent instances of the candidate sensor PUF . .
84
5.10 The offset generator circuit shown in Figure 5.2.1 was constructed and tested using light from
an LED driven by a variable current source. The offset voltage is plotted across the range of
currents for three different relative transmittances. We see ±2.5% variation over the range. .
86
5.11 SPICE simulation of the offset generator output voltage across a range of light intensity values. 86
xiii
5.12 Attack tree for replay. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
5.13 Attack tree for cloning.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
5.14 Attack tree for inducing errors in measurement. . . . . . . . . . . . . . . . . . . . . . . . . . .
93
Chapter 1
Introduction
1.1
The Core Root of Trust
Trust in information systems is built from the ground up. Just as human knowledge is built on a set
of core beliefs, information system security is based on a set of assumptions. These assumptions form the
root of trust for the system. For example, common assumptions in microprocessor-based systems are that
data written to memory will be read back correctly and that the computer’s internal state is secret unless
an I/O operation is made that explicitly writes the data. The root of trust can be different for different
systems, but if a system is to provide any kind of assurance, there must be a root of trust.
Untrustworthy components can be useful parts of a trustworthy system as long as they are outside the
core root of trust, explicitly untrusted, and appropriate design measures are taken. For example, consider a
computer where the secrecy of the data in main memory cannot be ensured. The architecture can encrypt
data when writing it to main memory and decrypt it when reading it, thereby eliminating exposure of secret
data. Here, although the main memory is untrusted, the encryption, decryption, and key management is
trusted. This is a typical example of trust relocation. The architect has freedom to relocate the root of trust
but cannot eliminate it.
1.2
Trustworthy Hardware
Traditionally, in most information systems, the root of trust has been the behavior of the hardware.
This trust has been informally justified by the relative difficulty for an attacker to change the behavior of
2
the hardware. Although it remains to be seen whether trustworthy hardware is actually a requirement for
trustworthy computation, it is certainly a convenient assumption. The field of hardware security aims to
make the assumption correct.
When we say that hardware is secure, we mean that it provides functionality the can be relied on
despite physical threats. For example, a soda vending machine is secure hardware. It is intended to behave
correctly in a moderately hostile physical environment. In contrast, personal computers generally are not
secure hardware. In a hostile physical environment, most personal computers yield total control to an
attacker. Once a personal computer enters the attacker’s physical control, the machine can no longer be
trusted by its rightful owner. Trusted platform modules are only a partial remedy. In their most common
use case, they are islands of trust in a sea of untrustworthy components. Personal computers with trusted
platform modules offer little security in physically hostile environments.
1.3
Testing
Testing and security are closely related. Both aim to provide some kind of assurance to humans
about what we can expect from a system. A battery tester tells us how much more life we can expect from
a battery. A security evaluation of a system tells us what threats the system can be expected to resist.
Testing, as opposed to measurement, is rarely passive. Testing typically involves application of a stimulus
and observation of a response. Since this generalization encapsulates all functional interactions between
modules in a system, the stimulus-response frameworks that are built for testing are often used for other
maintenance tasks, like configuring and programming the system.
Chapter 2
Security of Digital System Testing
2.1
Introduction
Test interfaces are present in nearly all digital hardware. In many cases, the security of the system
depends on the security of the test interfaces. Systems have been hacked in the field using test interfaces as
an avenue for attack. Researchers in industry and academia have developed defenses over the past twenty
years. A diligent designer can significantly reduce the chance of system exploitation by understanding known
threats and applying known defenses.
2.1.1
Development of Test Interfaces
Test interfaces have been part of man-made systems for at least 100 years. They address a need for
applying a stimulus and/or making an observation via a path other than the primary functional path. For
example, large tanks holding liquids or gases usually have inspection ports. These ports allow the internal
condition of the tank to be visually evaluated to avoid unexpected failure due to corrosion. Otherwise, it
would be difficult to predict failure. Brake systems on cars often have inspection holes in the calipers. This
allows the condition of the brake pads to be assessed without disassembling the brakes. More than just
the functional question of whether the brakes work, the inspection hole allows the mechanic to answer the
deeper question of how much more life is left in the brake pads. In areas where operational reliability and
efficiency are valued, features are added to products to make them testable, to let their maintainers probe
their internal condition.
As electronic devices grew more complex in the mid 20th century, it became difficult to tune them or
4
diagnose problems with only an input-output view of the system. Take, for example, a 1960’s radio receiver.
These receivers contain several filters and mixers cascaded to form the desired frequency response. There
are dozens of adjustments, many of which interact, and all of which affect the output. Optimal receiver
performance is achieved for a specific vector of settings. Applying a signal to the input while observing the
output, it is almost impossible for the technician to infer which adjustment to change to bring the receiver
closer to correct alignment. To make their equipment maintainable, manufacturers provided test points
in their circuits, where signals could be measured or injected. This allowed the circuit to be structurally
decomposed to make maintenance straightforward. Each section can be independently aligned, a process
involving only a small number of adjustments.
When electronic computers were first developed in the 1940’s and 50’s, it was customary to write “test”
or “checkout” programs that could be run on the system to verify correct functionality of the hardware. Test
programs were designed so that if it failed, it would provide the technician with an indication of where it
failed, speeding diagnosis and repair. The method of running programs on the computer to test the computer
is really just functional testing, and since there isn’t enough time for the tests to cover all possible states
and transitions of the hardware, this testing paradigm can never provide rigorous assurance of the hardware
even if all of the tests pass.
As the complexity of computers grew in the 1960’s, designers sought stronger assurance from testing,
and faster fault isolation. From an operational standpoint in the field, designers wanted to minimize the
mean time between when a fault is detected and when the system is back up. As computers began to
play crucial roles in real-time operations, high availability became a goal, in addition to the traditional
performance goals. All of these factors led major computer developers such as IBM to develop techniques
for testing the structural blocks independently.
2.1.2
Example: Testing a Two-bit State Machine
As an illustration of a digital circuit testing problem, consider testing a 2-bit synchronous circuit with
the state diagram shown in Figure 2.1. The circuit has two inputs: clock and reset. It has one output
signal, which is high if and only if the state is S3. On every rising edge of the clock signal, the next state is
5
determined as follows:
• If the reset signal is asserted, the next state is S0.
• Otherwise, the next state is oldstate+1 mod 4.
RST=0
S01
RST=1
RST=0
S02
RST=1
S00
RST=0
S03
RST=X
RST=1
Figure 2.1: A two-bit counter with synchronous reset has four states. From each state, there are two possible
next states. This realization provides an output signal that is asserted when the counter is in state S03.
The process of testing a practical state machine implementation is affected by three main considerations:
(1) What are we trying to determine?
(2) What can we assume about the device under test?
(3) How is our testing constrained?
2.1.2.1
What are we trying to determine?
For the first question, one possible answer is that we are trying to determine whether the device
under test is equivalent to a reference implementation or model. Equivalence, in this case, means that for all
possible input sequences, the outputs are the same. Another possibility is that we aim to determine whether
the device is equivalent to the reference implementation for some restricted set of inputs. Yet another
possibility is that we are checking whether an invariant rule is satisfied for some set of input sequences.
6
2.1.2.2
What can we assume?
Assumption 1: Number of States
The second question we must ask when testing a circuit is what we can assume. A state machine is composed
of a set of states, a set of edges (transitions), a set of inputs, a set of outputs, and an initial state. It is
profoundly helpful to know how many states the system has, how many flip-flops are in the circuit. If we
know how many states it has, we can, for example, apply the pumping lemma for regular languages [3], to
place limits on the state machine’s behavior. The pumping lemma states that if the number of states is
finite, there must be an infinite set of input sequences that result in the same final state. This has practical
ramifications for testing. If the number of states is bounded, it is possible, at least in theory, to completely
test the circuit using a finite set of test vectors.
Under adverse security circumstances, we are not able to safely assume that the device under test has
the number of states it is specified to have. A trojan horse might be inserted into the circuit during design
or fabrication. A typical trojan horse waits for a trigger, which is a special pattern or a sequence. The
trigger occurs in, for example, the input data stream, the trojan activates its payload. A payload carries out
a malicious action which can be as complex as executing an embedded program, or as simple as halting or
resetting the system. Trojans are designed to pass normal testing, so they typically contain all of the benign
specified logic, plus extra logic for trigger detection and payload execution. Consequently, from a testing
standpoint, the trojan is more an extra feature than a defect.
Testing for the presence of extra state variables is exceedingly difficult. Consider testing a system that
is intended to implement the four-state state machine shown in Figure 2.1. There are two inputs: clock and
reset. There is one output. The output is “1” when the state is S0. The system could faithfully implement
the intended state machine (Figure 2.1), or it might, for example, implement the state machine shown in
Figure 2.2, where the output is “1” when the state is S00, S10, S20, or S30.
RST=0
S11
RST=1
7
RST=0
S12
RST=0
RST=1
RST=0
S10
S21
RST=1
S22
RST=0
S13
RST=1
S20
RST=0
S23
RST=X
RST=X
RST=1
RST=1
RST=1
RST=0
RST=0
S01
S02
S31
S32
RST=1
RST=0
RST=1
S00
RST=0
S03
RST=0
RST=0
S30
RST=X
RST=1
S33
RST=X
RST=1
RST=1
Figure 2.2: The machine behaves similarly to the machine shown in Figure 2.1, but deviates for certain rare
inputs. Starting in initial state S00, if RST is given the sequence 0,1,0,0,1,0,0,0 the machine enters state
S23, at which point the behavior of the system deviates from that shown in Figure 2.1. S23 is a terminal
state. The only way to exit S23 is to reinitialize the system (e.g., cycle the power).
For normal inputs, the machine shown in Figure 2.2 might be indistinguishable from the machine in
2.1, but it has more states than the intended design. Certain rare sequences of inputs cause the two machines
to differ in their outputs. Black-box testing would be much more likely to conclude that the state machines
8
are the same than to find the difference. The rare sequence that causes them to differ can be the trigger for
an embedded trojan, and the way in which the they differ can be the payload of that trojan. For example,
if we assume that the machine has four states, as intended, then we would expect the test sequence shown
in Table 2.1 would thoroughly test the circuit.
RST
1
0
0
0
0
1
0
0
1
0
1
0
0
0
1
Current Output
X
0
0
0
1
0
0
0
0
0
0
0
1
0
1
Transition
X → S0
S0 → S1
S1 → S2
S2 → S3
S3 → S0
S0 → S0
S0 → S1
S1 → S2
S2 → S0
S0 → S1
S1 → S0
S0 → S1
S1 → S2
S2 → S3
S3 → S0
Table 2.1: Test routine for a two-bit counter. All edges of the specified state transition graph are tested.
Since we cannot force the counter directly into a arbitrary state, we must sequentially visit the states
and test each of the edges while observing the functional output.
However, consider the effect of that test sequence on the machine shown in Figure 2.2. The machine
goes through the following sequence of states: S00, S01, S02, S03, S00, S00, S01, S02, S00, S01, S10, S11,
S12, S13, S10. At no point during the test sequence does its externally observable behavior differ from the
intended behavior, that shown in Figure 2.1, although the final state is not the initial state. In the case
of this example, running the test sequence repeatedly will not uncover any differences between the actual
state machine and the specified one. Although the Figure 2.1 system and Figure 2.2 system are the same for
the Table 2.1 test sequence, they behave quite differently for other test sequences, specifically, any sequence
that puts the Figure 2.2 system into the S23 state, where it locks up. In summary, when testing a state
machine, we make an assumption about the number of states. If our assumption is wrong, we are likely to
make invalid inferences about the system under test.
9
Assumption 2: Determinism or Randomness
A pivotal assumption is that the device under test is deterministic. If it is not, then we must characterize
its behavior statistically instead of logically. That completely changes the nature of the testing procedure.
One example is a testing pseudorandom bit sequence generator. The output is expected to satisfy certain
statistical requirements. There are standard requirements, such as Golomb’s randomness postulates. A single
test is not sufficient to establish the randomness of a sequence. Standard suites of tests have been developed
for the purpose of comparatively evaluating cryptographic systems [4]. Related to the testing pseudorandom
sequence generators is the testing of “true” randomness sources, which derive their randomness from a
physical source. Typically a diode is used as a noise source, which is then digitized to produce random bits.
Testing such devices for their fitness in security-critical applications involves several special criteria such as
their immunity to external factors influencing their output.
How is our testing constrained?
Systems that only allow interaction in their normal functional use pattern demand black-box testing. This
constraint can appear for a variety of reasons, and has wide-ranging implications. One reason for the absence
of test features is their perceived cost, either in engineering effort or in production cost. The implications of
being forced to resort to black-box testing are an exponential increase in the time required to test a system,
and decreased confidence that it is thoroughly tested.
When we are not constrained to black-box testing, the practical approach to testing, for example, a
counter, is to structurally decompose it and test the components and the interconnections, and then argue
that if the components are good, and the interconnections are good, then the system is good. When we
decompose the circuit, we break it into small islands of logic that are easily testable, avoiding the exponential
explosion of test complexity seen in the previous paragraph. A 2-bit counter can be implemented as two
1-bit sections cascaded. A 128-bit counter can be implemented as 128 cascaded sections, each containing a
flip flop and three logic gates, as shown in Figure 2.3.
carry_out
10
Qi
carry_in
Q
D
Q’
reset’
Figure 2.3: A cascadable section of a synchronous binary counter with synchronous reset.
If suitable test structures are provided, the 128 sections can be tested independently. The number of
tests necessary for one isolated section is
#tests = 2F2I
where F is the number of flip-flops in each section and I is the number of inputs to each section. We
have F=1 and I=2, so eight tests a required per section of the counter. If the stages of a B-bit counter are
tested sequentially, the number of tests is
#tests = 2F2IB = 8B
Without structural decomposition, we have to visit each state and test the circuit’s response to the
RESET=0 input and the RESET=1 input. This requires two tests per state, so
#tests = 2 * 2B = 2B+1
Structural testing is not just somewhat more efficient than black-box testing. It is profoundly more
efficient. The number of tests required for structural testing is of linear complexity, O(B), while black-box
testing is of exponential complexity. O(2B). Similar general results apply to circuits other than counters.
The total complexity of the pieces of a logic circuit is almost always less than the complexity of the circuit
as a whole. Consequently black-box testing of whole circuits is avoided for all but the simplest systems.
2.1.3
Fault Testing versus Trojan Detection
The standard goals of test engineering are to detect flaws that occur naturally during fabrication and
isolate the exact location of the flaw. Any significant deviation from the behavior intended by the designers
is considered a fault. By this definition, a piece of malicious logic that is added to the design by an attacker
11
before fabrication is a fault. Although it might seem elegant, and it is certainly correct, to group malicious
modifications with manufacturing faults, it is not practical to do so. The fault models assumed when testing
for manufacturing faults do not include the changes that would be made by a malicious party who adds some
kind of trojan horse to the design.
2.1.4
VLSI Testing: Goals and Metrics
VLSI testing is always done in terms of a prescribed fault model. For example, a common fault model
for logic circuitry is that each node can be correct, stuck at 1, or stuck at 0. If this stuck-at fault model is
used, the goal of testing is determine, for each node, which of the three conditions describes it. A set of tests
is said to cover a fault if the test would detect the fault, if the fault were present in the device being tested.
Test coverage is the percentage of possible faults that are covered by a given set of tests. In many cases, it
is practical to create a set of tests that has 100% coverage. In some cases, 100% coverage is not reachable
for a practical number of test vectors.
The number of test vectors that are needed for 100% test coverage is an indication of the testability
of the circuit. Two factors are important in determining the testability of the circuit:
• controllability
• observability
Certain topologies are known to result in poor testability. One example is reconvergent fanout. This
is when a signal fans out from a single node and follows multiple parallel paths, and the reconverges into
a single node. This topology exhibits poor testability because the signals along the parallel paths are not
independently controllable. Logic gates with a large number of inputs, particularly XOR gates, are also
problematic.
Design for Test (DFT) is a design process that runs in parallel with the functional design process. The
goal of DFT is to ensure that the final design meets testability goals while minimizing the costs associated
with testing. There are four main costs that relate to testing.
• Die area cost
12
• Time required to apply tests
• Cost of the required testing station
• Computation required to generate test vectors
2.1.5
Conflict Between Testability and Security
Conventional security best practices are to conceptualize the system as a set of modules which expose
simple interfaces and hide their internal implementation. This type of control limits the complexity of the
interactions in the system. However, black-box testing is profoundly inefficient. Providing controllability
and observability of the internal elements within a module makes that module testable, but the module then
loses the security that comes from having a single, restricted interface. Thus, there is an apparent conflict
between security and testability.
2.2
Scan-based Testing
As we discussed in the previous section, an important approach for achieving testability in a VLSI
chip is to include elements in the design that allow it to be structurally decomposed for testing. Most designs
are synchronous, meaning that they are composed of logic and flip-flops, and the flip-flops only change state
during the rising or falling edges of the clock. Synchronous designs can be separated into a set of logic gates,
which can be tested by verifying their truth table, and a set of flip-flops, which can be tested by chaining
them to form a shift register. This scan-based testing paradigm replaces the regular flip-flops in the design
with “scan flip-flops” as shown in Figure 2.4.
13
D
TEST
ENABLE
Q
DFF
TEST INPUT
CLOCK
Figure 2.4: The simplest scan flip-flop cell is simply composed of a multiplexer and a regular D flip-flop.
The Q output of one scan cell can be connected to the TEST INPUT of another scan cell, enabling a chain
configuration.
These are flip-flops with input multiplexers which select between the regular “functional” input, and
the test-mode input. Typically, the test-mode input comes from the output of another flip-flop, thus forming
a scan chain. The operation of a scan chain can be thought of as having three phases:
• Assert test mode. All flip-flops are configured into a distributed shift register. Test data is shifted
in. This data is applied to the inputs of the logic.
• Deassert test mode. Flip-flops are configured to get their data from the outputs of the logic. The
flip-flops are clocked, thus latching in the output value of the logic.
• Reassert test mode. All flip-flops are configured into a distributed shift register. Test data is shifted
out. This data is returned to the tested for analysis and comparison with the expected output.
Using this testing method, the tester only tests combinational logic, instead of testing state machines.
This amounts to a profound reduction in the testing time required to achieve a given level of test coverage.
2.2.1
Scan-based Attacks
Systems that support scan-based testing are potentially vulnerable to attacks that use the scan chains
as vectors for reading or modifying sensitive data contained in the device. Cryptographic keys are common
targets for this kind of attack.
The most basic scan attack applies to chips that contain a key register that can be scanned. In
this attack the attacker typically connects to the JTAG port of the victim chip, selects the scan chain that
14
contains the key register, and shifts out the key.
Less naive chips avoid having the key directly scannable. However, simply excluding the key register
from the scan chain is not sufficient to prevent a skilled attacker from extracting the key. Yang, Wu, and
Karri [5] present a practical attack against an AES hardware implementation with a non-scannable key
register. They exploit the scan chains as an information leakage path, allowing recovery of the crypto key.
They assume that the key is not directly scannable, and their attack uses indirect information to reconstruct
the key. They also assume that the attacker does not have knowledge of the internal scan chain structure of
the crypto chip. The attack begins with discovering the scan chain structure, determining which bit positions
in the scan chain correspond to the bits of the intermediate result register. Next, using the observability of
the intermediate result, the attacker recovers the round key. The attack shows that even if the key register
is not exposed directly though any scan chain, the byproducts of the key contain enough information for the
attacker to infer the key bits.
2.2.2
Countermeasures for Scan Attacks
Countermeasures are available for protecting against scan attacks. There is a tradeoff between security
and testability. Effective countermeasures for scan attacks must simultaneously provide acceptable levels of
security and testability.
Hely, Flottes, and Bancel, et. al. [6] observe that a static assignment of functional registers to positions
in the scan chain is risky because an attacker can infer the assignment and then use the scan chain to extract
secret information from the chip. To mitigate this threat, they introduce Scan Chain Scrambling. For the
authorized user, a test key is provided to the chip and assignment of registers to scan chain positions is
static. For an attacker without the ability to authenticate, the assignment is semi-random. Chunks of the
set of scannable flip-flops are mapped to chunks of the scan chain sequence, but the order will be unknown
to the attacker. The permutation changes periodically while the chip is operating.
Yang, Wu, and Karri [7] propose the “Secure Scan” scheme for protecting embedded keys from being
leaked via scan chains. To allow the crypto hardware to be fully tested without exposing the key or its
byproducts, Secure Scan introduces a second embedded key, the mirror key for use during testing.
lxxv
15
Figure 4.2: State diagram of secure scan architecture
Figure 2.5: Secure Scan state diagram. The only way to get from secure mode, where the mission key is
crypto chip is powered off. The operations of crypto chips using the proposed secure scan
architecture
are summarized
in Figure
4.2. is to go through a power cycle reset, which wipes all
loaded, to
insecure mode,
where the chip
is testable,
The design of MKR is shown in Figure 4.3. A multiplexer controlled by Load Key
volatile state variables.
can be inserted either at the input to the MKR (Figure 4.3(a)) or at the output of MKR
(Figure 4.3(b)). In Figure 4.3(a), when Load Key signal is active, the input of the MKR is
locked to secret key. Any operation that writes to or scans the MKR is disabled. In Figure
At any moment, as shown in Figure 2.5, the chip is either in mission mode (“Secure mode”) or test
4.3(b), the output of the MKR is masked by secret key when Load Key signal is active. The
MKR structure
doesmode,
not modify
the scan
and
the
insertion
of
mode (“Insecure
mode”).shown
Whenin
theFigure
chip is4.3(b)
in secure
the mission
key iscell
used
but
scanning
is disallowed.
the additional multiplexer does not modify the scan chain. Therefore it is easy to integrate
When the chip is in insecure mode, scanning is allowed, but only the mirror key is used. It is impossible
secure scan into current scan DFT design flows.
The secure
DFT architecture
introduces
threeoff
new
Scan
In, almost
to get from secure
modescan
to insecure
mode without
powering
thecontrol
chip. signals
Secure Enable
Scan thus
allows
Enable Scan Out and Load Key. There are two ways to generate these signals. One way
completeisstructural
exposing
the mission
key. Since
the mission
key is not
to modifytestability
the IEEE without
1149.1 JTAG
Controller
as shown
in Figure
4.4 by adding
one used
morewhile the
named
Secure
the isstandard
16-state
TAP FSM.
new instruction
nameddrawback
chip is instate
insecure
mode,
the Mode
missiontokey
obviously
untestable.
This is,A however,
not a serious
Drive Secure Mode that can be applied through TMS pin is added. In the 16-state TAP
in practice
sincethe
theSelect-IR-Scan
correctness of the
can be
verified by state
functional
FSM,
statekey
jumps
intoquickly
Test-logic-reset
when testing.
a one is received
from TMS pin. In the modified TAP controller that supports secure scan architecture, the
Select-IR-Scan state jumps into Secure Mode when a one is received from TMS pin. Once
the TAP Controller is in this state, it does not take instructions from TMS any more. The
transition from Secure Mode to Test-logic-reset will not happen until the chip is reset or
lxxiv
16
Figure 4.1: Secure scan architecture with a mirror key register
Figure 2.6: Secure Scan architecture. The mirror key register (MKR) is loaded only when Load Key is
isolate the secret key from the crypto core. The mirror key register works like a normal
register. Enable Scan In and Enable Scan Out signals are active and general scan-based
DFT can be performed. The states of all registers in the design can be scanned in and out
to verify if the fabricated chip performs as expected. Note that in the model we used for
Lee, Tehranipoor, and Plusquellic [8] point out that the intellectual property contained in an integrated
scan-based side channel attack, the key register is not included in the scan chain. On one
circuit
is also
risk because
of scan
chains.
An attacker
and observe
signals inside
a chip
hand
suchatmodel
provides
limited
security
sincethat
we can
needcontrol
to perform
the two-step
attack
shown
in previous
the other
it alsoTo
limits
the this,
test the
maytoberecover
able to the
inferkey
theas
logical
structure
of thesection.
chip, andOn
thereby
learnhand,
the design.
prevent
capability since only the functionality involving this key is verified. In the secure scan
authors introduce a technique they call Low-Cost Secure Scan, that blocks unauthorized access to the scan
DFT architecture, higher fault coverage can be expected since multiple test vectors can be
scanned
into the mirror
registers.to the algorithms used for inserting scan chains in designs, but
chains.
The technique
requireskey
modification
Once the function of the crypto chip and upper layer software has been verified, the
the scope of the protection includes the intellectual property of the design, not just embedded keys. To
chip is driven into secure mode for its practical applications. In the secure mode, the secret
use key
the Low-Cost
system,
vectorsLoad
that Key
are applied
bitstime
in addition
to the
is appliedSecure
to theScan
crypto
corethe
by test
enabling
signal.contain
At thekey
same
the scan
function is disabled by de-activating Enable Scan In and Enable Scan Out signals. This
test stimulus. If the key bits are correct, the chip produces the pre-calculated output. Otherwise, the test
prevents access to the mirror key register. Scan mode signal remains inactive during secure
response
pseudorandom.
Theoperation
pseudorandom
response is intended to raise the difficulty for an attacker
modeisand
hence no shift
is performed.
Once
the chip
is in secure
it cannot
return
insecureexplicit
mode indication
to perform
who wishes
to launch
a guessing
attack,mode,
as opposed
to giving
an to
immediate
of any
whether
test and debug operation. The secret key is loaded to the crypto core only when the chip
the attacker’s guess is correct.
is in the secure mode. Although the temporary results are stored in other registers, they
cannot be scanned out. The only way to re-enter the insecure mode is to reset the chip
2.3by power
BIST
off followed by a power on operation. All registers inside the chip that have
temporary results are cleared. Since in crypto chips such as smart cards, a key is usually
Built-In
Self Test (BIST)
is a popular
technique
testing
hardware
requiringwhen
external
stored
in nonvolatile
memory
or fabricated
as afor
fixed
value,
it will without
not be cleared
the test
active, which is controlled by the state machine shown in Figure 2.5.
equipment. There are many varieties of BIST designed to test different kinds of circuits and to detect
different classes of faults. A common feature of all BIST systems is that a large volume of test data moves
17
between the on-chip BIST logic and the circuit under test, while a minimal amount of data moves between
the chip and it surrounding system. In the extreme, the chip can simply produce a one-bit status indication
of whether it is working correctly, or there is an error.
Two of the most common varieties of BIST are memory BIST and logic BIST. In principle, differentiating between memory and logic is not necessary. In practice, testing techniques that target a specific type
of circuit are much more efficient in terms of test time required to get adequate fault coverage. Typical memory BIST uses a state machine or a small embedded microprocessor to generate memory access signals that
carry out a standard memory test algorithm such as of the March series of tests [9]. In a typical logic BIST
setup, pseudorandom test vectors are generated on-chip and applied to the circuit under test. The responses
are compacted and aggregated during many test cycles, using a Multiple-Input Shift Register (MISR). This
produces a fixed-length final value that is compared with the expected value, which is hard-coded into the
chip.
From a security standpoint, BIST has many ramifications. In both logic and memory testing, the
BIST controller can act as a trusted proxy between the tester and the chip’s core logic. This architecture
can raise security by enforcing a limited set of actions that can be taken by a tester. The enforcement of
limited interfaces is consistent with good security design (e.g., principle of least privilege). The tester of
cryptographic logic needs assurance that the hardware is working correctly, but shouldn’t necessarily have
access to secret data in the chip, such as an embedded key. Similarly, BIST can provide assurance that
a memory is error-free while eliminating the possibility of misuse of memory “debugging” functionality for
tampering with security-critical data.
Despite BIST’s advantages of running at full functional clock speed and improving security, it has
two problems. First, it typically provides fault detection, but not fault isolation. Second, it adds area to the
chip. A BIST implementation contains a test pattern generator and an output response analyzer. Both of
these hardware modules occupy area. However, in certain applications this area cost can be eliminated. A
technique called Crypto BIST [10] uses a symmetric cipher core (AES) to test itself. By looping the output
of the AES core back into the input, the AES core functions as both the test pattern generator and output
response analyzer. Crypto BIST achieves 100% stuck-at fault coverage of the AES core in 120 clock cycles of
18
test time. The essence of the technique is the observation that strict avalanche criterion of the cryptographic
algorithm causes the AES core to act as both a diverse source of test patterns and a sensitive output response
analyzer, leading to high test coverage in few cycles.
2.4
JTAG
In the 1970’s and early 1980’s, a common way of testing printed circuit boards was to add test points,
and to probe these test points with a bed-of-nails test fixture, as shown in Figure 2.7.
test point
component
printed circuit board
test fixture
nail
ATE
Figure 2.7: Bed of nails test fixture. Automated test equipment (ATE) generates stimulus signals and
measures responses. The ATE is connected to the test fixture, which contains one nail per test channel.
Each nail is spring-loaded so it maintains a controlled pressure when contacting the test points on the printed
circuit board being tested.
This approach could not keep up with increases in component density and pin spacing. Alternative
methods of testing were developed. As always, cost was a major factor affecting the choice of test method.
Interoperability was also a factor, since components from many manufacturers coexist on large printed circuit
19
boards. Having a single test interface from which all components could be tested was desired. The solution
was developed by a working group known as the Joint Test Access Group in the 1980’s. This became IEEE
Standard 1149.1 and is widely referred to simply as JTAG.
IEEE 1149.1 standardizes the set of signals used to access test logic inside chips. The standard specifies
the use of scan-based testing for the internal logic of the chip and also for the inter-chip wiring. JTAG uses
synchronous serial communication with separate signals for data and control. Using 1149.1, a tester can force
signals on pins, read signals on pins, apply signals to the core logic, read signals from the core logic, and
invoke arbitrary custom test functions that might exist in certain chips. However complicated the testing
task might be, the communication always takes place over the same wires:
• TCK - test clock, while in test mode, all events happen on edges of TCK
• TMS - test mode select, determines the next state of the JTAG port
• TDI - test data in, test vectors and JTAG instructions are applied via TDI
• TDO - test data out, test responses or data that loops through
• TRST - test reset, optional hardware reset signal for the test logic
20
0
TEST_LOGIC_RST
0
0
RUN_TEST_IDLE
1
SELECT_DR
1
SELECT_IR
0
0
1
1
CAPTURE_DR
CAPTURE_IR
0
0
0
SHIFT_DR
SHIFT_IR
1
1
EXIT1_DR
0
PAUSE_IR
1
0
1
EXIT2_DR
0
0
EXIT2_IR
1
1
UPDATE_DR
Figure 2.8: The JTAG state machine.
1
EXIT1_IR
PAUSE_DR
1
0
1
0
0
1
0
There are 16 states.
UPDATE_IR
1
0
The TMS signal determines the next
state. The SHIFT DR state is used for applying stimuli and collecting responses. From any state, the
TEST LOGIC RESET state can be reached by holding TMS high for five clock cycles.
An important feature of JTAG is its support for daisy chaining. Devices can be wired in a chain where
the TDO (output) of one device is applied to the TDI (input) of the next device in the chain, as shown in
Figure 2.9. The TCK, TMS, and TRST signals can be simply bussed to all chips in the chain, within fan-out
limits. Otherwise, buffers can be used. Each chip in the JTAG chain has a state machine implementing the
21
protocol shown in Figure 2.8.
TDO
DEVICE 5
TMS
DEVICE 3
COMPUTER
WITH
JTAG
INTERFACE
DEVICE 4
TCK
TRST
TDI
DEVICE 1
DEVICE 2
Figure 2.9: A typical JTAG system. TMS, TCK, and TRST are bussed to all of the devices. TDO of each
component is connected to TDI of the next component, thereby forming a daisy-chain topology.
One of the state variables controlled by the state machine is the Instruction Register (IR), shown in
Figure 2.10. The instruction register is typically between 4 and 16 bits. Some instructions are mandated
by the JTAG standard, while implementers are free to define as many of their own instructions as they like.
One of the most important instructions is the required instruction, BYPASS. When the IR contains the
BYPASS opcode, a JTAG-compliant chip places a single flip-flop in the path from its TDI input to its TDO
output. Therefore a chain of chips in the BYPASS state behaves like a shift register.
22
select lines
BYPASS REGISTER
TDO
BOUNDARY SCAN REGISTER
TDI
IDCODE REGISTER
INSTRUCTION REGISTER
TAP
STATE
MACHINE
TMS
Figure 2.10: The essential components of a basic JTAG implementation include a test access port state
machine, an instruction register, one or more data registers, and an output multiplexer. Each chain of
scan flip-flop cells (internal or boundary) appears to JTAG as a data register that can be selected with the
appropriate instruction.
2.4.1
JTAG hacks
JTAG has played a part in many attacks on the security of digital hardware. Attackers have used it
to copy cryptographic keys out of satellite boxes for the purpose of pirating satellite TV service [11]. The
JTAG port in Microsoft’s Xbox 360 has been exploited to circumvent the DRM policies of the device [12].
Powerful low-level capabilities are often exposed through the JTAG interfaces of systems. Attackers have
learned this, and when they attack a device, they look for a JTAG port, among other things.
Rosenfeld and Karri [13] examine the threat of JTAG-level hacking. Specific attention is given to
the vulnerabilities that result from the common daisy-chain topology of JTAG wiring. They consider the
possibility of one malicious node in a JTAG chain attacking other nodes or deceiving the tester. They
examine the threat of a malicious node hijacking the bus by forcing the control signals. With two nodes (the
23
tester and the attacker) both driving a bus wire at the same time, it becomes an analog question, who will
win. The research showed that it was often possible for the attacking node to hijack the bus when the JTAG
bus wires are short, and always possible to hijack when the bus wires are long, due to pulse properties of
transmission lines.
2.4.2
JTAG Defenses
Several defenses for JTAG have been proposed over the years. When considering JTAG defenses, it
is important to keep in mind the many constraints and requirements that affect the design process. For
example, flexibility to provide in-field firmware updates is often valuable, but for this to be secure, some
sort of authentication mechanism is required. Some applications have tight requirements on cost and cannot
tolerate the extra circuitry required for authentication. As always in engineering, there are trade-offs, and
making the best choice requires a detailed understanding of the application.
2.4.2.1
Elimination of JTAG
One way to eliminate the risks associated with JTAG is to eliminate JTAG from the design. There
are several ways this can be done while maintaining low escape rate, the probability of a defective part being
shipped to a customer.
One method is simply to use conservative design rules. Common sources of manufacturing faults are
shorts and opens in the metal wiring of the chip. If wires are made wider, and spacing between wires is
kept greater, many manufacturing faults are eliminated. If transistors have non-minimum gate length, that
eliminates another source of faults. This approach has a high cost in area and speed.
Another method, and one that is very popular, is to use Built-In Self Test (BIST), discussed in Section
2.3. The result of running BIST can be as simple as a single bit indicating whether the chip passes the tests
or fails. In this form, BIST provides security benefits because internal scan can be eliminated from the set
of supported JTAG instructions, thus significantly reducing the chip’s attack surface.
BIST, however, is not always a satisfactory replacement for scan-based testing. Since BIST test vectors
are generated pseudorandomly instead of deliberately, using an automated test pattern generation (ATPG)
24
algorithm, it can be difficult to get full test coverage using BIST. This is partially offset by the fact that BIST
is typically done at-speed, meaning that test vectors are applied at the same rate that functional data would
normally be applied. In contrast, when test vectors are applied using external automated test equipment,
the test clock is typically an order of magnitude slower than the functional clock. Another disadvantage of
BIST is that it does not provide fault isolation. For the engineers developing a chip, it is essential to be
able to quickly iterate toward a successful design that can be shipped. Without the ability to determine the
location and type of the fault, designers are not able to fix the problem. For this reason, BIST is more useful
for testing during full production and in the field, where a failure will simply cause the part to be discarded.
Scan-based test infrastructure is often retained in the design, in case it is needed for engineering purposes.
2.4.2.2
Destruction of JTAG Hardware After Use
In many circumstances, an argument can be made that the test structures in a chip are only needed at
the factory, and constitute nothing more than a security risk once the chip is shipped. In such cases designers
sometimes elect to disable the JTAG circuitry on the chip after testing. A common way of implementing
this is with fuses that can be electrically blown by the tester. For finer grained control over what capabilities
remain enabled, the design can contain more than one fuse. A patent by Sourgen [14] in 1993 discusses these
techniques.
The IMX31 microprocessor from Freescale Semiconductor is an ARM-based chip intended for mobile
and media applications. This type of embedded processor is often required to protect the data that in
processes, in the case of digital rights management, and the code that it runs, in cases where the system
software contains valuable intellectual property. The IMX31 supports four JTAG security modes, selectable
by blowing fuses. In mode 4, the JTAG logic allows all possible JTAG operations. In mode 1, only the JTAG
operations necessary for interoperability are allowed. Blowing fuses is an irreversible operation. Therefore,
the security mode can be raised, but never lowered. This is fits well with a common use case of JTAG,
where it is used at the factory for testing and perhaps by engineers for in-system debugging in their labs,
but should not be used in the field by hackers.
25
2.4.2.3
Password Protection of JTAG
Buskey and Frosik developed a scheme they call Protected JTAG [15], which enhances the security of
JTAG by requiring authentication and authorization to access particular features of the chip’s test structures.
The scheme makes use of a trusted server which uses a pre-shared elliptic curve key pair to prove to the chip
that the user’s JTAG access request has been authenticated. The authors anticipate a use case where the
tester connects directly to the chip and connects to the trusted server via the Internet, using standard Internet
communication security protocols. Once authentication is complete, the chip stays in the authenticated state
for the duration of a test session. The benefits of having a separate trusted server for authentication and
authorization are that these can be managed independently, after the chip is deployed. For example, a new
user can be added anytime, with access to an arbitrary set of test features. A disadvantage of the scheme is
the reliance on the continued security and availability of the authentication server.
2.4.2.4
Hiding the JTAG Behind a System Controller
One approach to JTAG security is to use a system controller chip. In this architecture, the system
controller acts as a proxy for test-related communication with one or more chips, typically on a printed circuit
board. This scheme adds security to a system without requiring any modification to the chips themselves.
The system controller can enforce sensible security policies such as:
• All accesses must be authenticated.
• Authenticated testers can only access the resources for which they are authorized.
• Only signed and verified firmware updates are permitted.
• Backrev (reverting to a previous version) of firmware is not permitted.
• All communication between the tester and the system controller is cryptographically protected
against man-in-the-middle attacks.
The system controller can play an active role in the testing and maintenance of the system, beyond
simply acting as a proxy [16]. The system controller can store test sequences and run the tests automatically
26
at power-up time or when a specific test routine is externally invoked. This architecture provides the benefits
of BIST as discussed in Section 2.3. A controller supporting this type of approach is commercially available
under the name SystemBIST [17]. It also provides functionality for verifying the state of JTAG-connected
devices, for example, to verify that the configuration bit file of an FPGA was correctly programmed. As with
all practical security, it is not absolute. Successful practical approaches have to strike a balance between cost,
functionality, and security. The value of the system controller approach is that it preserves the economy of
scale of using commodity chips that don’t directly support any JTAG security enhancements, while providing
increased functionality in the form of BIST, while defeating some common classes of attacks.
2.4.2.5
Crypto JTAG with Embedded Keys
Rosenfeld and Karri [13] introduce a set of security enhancements for JTAG that are backward compatible and interoperable with the original IEEE standard. Compact cryptography modules are included in
the security-enhanced chips, allowing the tester to authenticate the chip, preventing unauthorized testing,
encrypting test data going to and from the chip, and protecting against modification of test data. Keys are
programmed into each chip at the factory and delivered to the user through an out-of-band secure channel.
This allows the customer of the chip (e.g., system integrator) to confirm that it came from the real factory,
thus thwarting supply-chain attacks.
2.5
SoC test Infrastructure
The system-on-chip (SoC) is an important paradigm for the integrated circuit industry. The essential
idea is that modules can by designed independently, and integrated onto a single chip. And SoC is in many
ways similar to a printed circuit board containing many chips from different manufacturers. Compared to a
conventional fully in-house chip development process, SoC development presents new security challenges in
how to test, configure, and debug the modules within the chip.
A large number of independent entities are involved in a typical SoC development cycle.
• SoC integrator
27
• Core designer
• CAD tool provider
• IC testing service
• IC fabrication service
• IC packaging service
An additional set of entities are affected by the security of the device after it is deployed.
• End users
• Service and infrastructure providers
• Content creators and other holders of intellectual property
SoCs are subject to a variety of threats at different stages of their life-cycle. The goals of the attacks that exploit the test mechanisms include grabbing cryptographic keys, changing system behavior, and
learning secrets about the intellectual property contained in the SoC.
2.5.1
SoC Test Hacks
The test access mechanisms that are used in SoCs evolved from those that are used on printed circuit
boards. In particular, JTAG has been used both for external interfacing of the chip to the test equipment
as well as for internal test access to cores within the die. All of the threats that apply to the test interfaces
of monolithic ICs also apply to SoCs. At the same time, a number of new threats affect SoCs, primarily due
to the fragmented nature of their design, with different trust assumptions applying to the different modules.
2.5.1.1
Test Bus Snooping
SoC test signals can be routed on the chip several different ways. Several engineering considerations
affect test signal routing. Traditionally, the trade-off has been between speed and cost. Wide data paths
and dedicated per-core wiring raise cost but give good testing speed. Narrow, often bit-wide, data paths
28
and shared wiring lower cost, but reduce testing speed. The trade-offs in test bus design have been studied
extensively [18]. In an SoC made of several cores, the optimal configuration often has multiple cores timesharing the same test wiring. Typically, the intention is for test data to be communicated between the tester
(master) and the test target (slave). Some implementations of shared test wiring allow malicious cores to
snoop the test bus, receiving messages that go to or from another core. The actions that a malicious core
can take with snooped test data depend on the system. If, for example, the test data contains cryptographic
keys, a malicious core can leak the keys to an attacker through a side-channel.
2.5.1.2
Test Bus Hijacking
Another concern for the SoC designer is that on shared test wiring, a malicious core could actively
interfere with communications between the test and the target core. This type of attack is most threatening
when test data actually passes through the untrustworthy test logic of the malicious core, as it does in
daisy-chained architectures like JTAG.
2.5.2
Defenses for SoC Test Mechanisms
Several techniques have been applied by chip companies to address the problem of securing SoC test
mechanisms. Successful techniques must, as always, balance security, cost, functionality, and performance
goals. Additionally, human factors affect whether new mechanisms will succeed. Engineers are inundated
with information during the design process, and often prefer to use mechanisms that are known to work and
don’t require learning. An example of this effect can be seen in the continued use of RS-232, even in brand
new systems with no legacy. To succeed in the market, enhancements that add security to SoC test interfaces
must meet security, cost, functionality, and performance goals and also should not burden the engineers that
use them.
2.5.2.1
Elimination of Test Mechanisms
The most straightforward approach to securing the test interface is, as always, to eliminate it. Elimination of test interfaces is usually not a practical solution because they are often required, for example,
29
for programming firmware into chips and for configuring platform-specific settings. For reasons of economy
of scale, it is preferable for a single chip to support multiple use cases. Test interfaces provide a convenient mechanism for low-level configuration of complex chips. Eliminating the test interface from a chip
design would limit the market for the chip. The applications where security is a very high concern, such as
electronics for defense systems, require in-field testability for maintaining high availability.
2.5.2.2
Elimination of Shared Test Wiring
SoC test infrastructures can be secured against attacks by hostile cores by assigning each core its own
test bus. This results in a star architecture, where the cores are the points of the star, and the test controller
is the center. If the trust assumption is that the SoC integrator, CAD tools, and fabrication are trustworthy,
but the third-party IP cores are not trustworthy, then the star architecture provides good security, since
it minimizes the damage that can result from an untrustworthy core. However, it is the most expensive
topology from a standpoint of wiring cost.
key
gen
WSI
test
controller
WSO
key core 1
reg
scan
cell
WSC
WSI
WSO
key core 2
reg
scan
cell
WSC
WSI
WSO
key core 3
reg
WSC
scan
cell
Figure 2.11: A chain of scan cells is used for distributing keys to each of the cores. The scan cells are
configured not to expose key bits at their outputs while they are being shifted.
30
2.5.2.3
Cryptographic Protection for Shared Wiring
To capture the cost savings of shared wiring while restricting the security risks of untrustworthy cores
on the shared test bus, cryptography can be used. Rosenfeld and Karri [19] developed a low-cost technique
that adds security without breaking compatibility with existing test interface standards. They introduce a
structure that is essentially a scan chain that snakes through the SoC and provides a trusted mechanism for
delivering a crypto key to each core. The technique is based on the assumption that the logic and wiring
that is placed by the SoC integrator is trustworthy, while the third-party cores are not trustworthy. After
delivering separate crypto keys to each core during initialization, the tester can use shared test wiring for the
actual testing and configuration. The test wiring data path can be arbitrarily wide while the key delivery
wiring need only be one bit wide.
2.6
Emerging Areas of Test Security
As engineered systems become more complex year after year, they often acquire testability features.
Sometimes standard solutions like JTAG are used, but sometimes new testing schemes are invented and
deployed. Each testing scenario has its own security concerns, its own strengths, and its own weaknesses.
A common mistake is to develop a new system without listing security in the design requirements while,
in fact, it is rare to find a non-trivial system where security is not in fact a requirement. Automobiles and
medical implants are two areas where test and management interfaces have been deployed without security.
Both of these areas have seen recent research results demonstrating the vulnerabilities of the actual systems
in the field.
2.6.1
OBD-II for Automobile
Modern cars are heavily computerized. They contain multiple computers performing real-time tasks
such as engine control, climate control, braking, traction control, navigation, entertainment, and drivetrain
control. These subsystems are typically linked together on a communication bus. From a central point, a
technician can check the status of all of the subsystems by communicating with them over the bus. This
development is helpful for reducing the time needed to diagnose a problem with the car. The most common
31
interface for electronic testing of cars is On-Board Diagnostics II (OBD-II). OBD-II is a connector, typically
under the dashboard.
Like all test interfaces, OBD-II exposes the system to attacks that would not otherwise be possible.
Koscher, Czeskis, and Roesner et. al. [20] present an analysis of automobile security with a focus on the
OBD-II and the Controller-Area Network (CAN) that is used to interconnect the components in the car.
They show that by plugging a malicious computer into the OBD-II test interface, they can severely undermine
the security of the entire car, causing it to behave in ways that would be dangerous or even deadly for the
occupants of the car and for others around them.
It is a common saying in the field of information security that security cannot be an afterthought;
it must be considered from the beginning and built into the product. It is unlikely that the security flaws
in existing cars will be fixed. Most likely, a new version of the standards will be required. OBD-II was
mandated by the US government for new cars sold on the American market. Similarly, they could mandate
the use of a secure successor to OBD-II and CAN.
2.6.2
Medical Implant Interface Security
Some medical implants have remote management features. This allows the physician or technician to
interact with the device after it has been implanted, to test, configure, and tune the device. The noninvasive
nature of these interactions is important for the patient’s health and comfort. Some implant products
use wireless radio frequency communication to link the programming console with the implanted device.
As always, if implemented naively, radio links are subject to sniffing, unauthorized access, and jamming
attacks. In the context of medical implants, attacks against the management interface could have very
serious consequences. Unfortunately many implants that have already been deployed are in fact vulnerable
to remote attacks on their management interface.
Halperin [21] performed a security analysis of a commercial implantable cardioverter defibrillator.
They reverse-engineered the communication protocol used by the management interface. The authors were
able to sniff private information and to control the implanted device (potentially inducing fibrillation in the
patient) and to cause it to exhaust its battery at an accelerated rate. This first phase of their work focused
32
on assessing the level of security present. What they found was that the only obstacles to attack were
security by obscurity, which they circumvented by capturing, analyzing, and emulating the link protocol.
The second phase of their work was on suggesting practical measures for improving security. They offer
three countermeasures against the attacks they discovered. One countermeasure is simply for the implant
to make noise when communication is taking place. This alerts the patient and reduces the likelihood of an
undetected attack. The other countermeasures focus on authentication and key exchange within the power
and space constraints of medical implants.
2.7
Recapitulation and Projection
We have examined the way security and testability interact in various areas. Internal (structural)
testability of systems is often required, particularly as systems reach a certain complexity. Testability
features that provide controllability and observability of the internals of systems are major security risks if
not designed with security in mind. Attackers have exploited unsecured test interfaces in the past and it
is likely that this will continue. There is a significant and growing literature on how to provide testability
without security vulnerability. The defenses that already exist in the literature are relevant to their intended
target system. To some extent, the existing solutions can be adapted to secure new testing scenarios as
they emerge. Some testing scenarios will not lend themselves to existing solutions, and will require inventive
minds to create new approaches for secure internal testing.
Chapter 3
JTAG Security
This chapter addresses some security issues surrounding JTAG. We look at the threat of a malicious
chip in a JTAG chain. We outline attack scenarios where trust in a digital system is downgraded by the
presence of such a chip. To defend against this, we propose a protection scheme that hardens JTAG by making
use of lightweight cryptographic primitives, namely stream ciphers and incremental message authentication
codes. The scheme defines four levels of protection. For each of the attack scenarios, we determine which
protection level is needed to prevent it. Finally, we discuss the practical aspects of implementing these
security enhancements, such as area, test time and operational overheads.
3.1
Introduction
JTAG [1] is the dominant standard for in-circuit test. It has been in use for 20 years and, like
many mature standards seen in the landscape of information systems, JTAG was conceived with a friendly
environment in mind. It was designed to handle the natural enemies of digital systems: faults in design,
fabrication, packaging, and PC boards. It has been quite successful, largely due to the economy and flexibility
of the system, and has been extended in various directions. It has evolved into the de facto method for incircuit configuration and debug. The companion standard IEEE 1532 [22] has extended it even further to
support on-board programming. While JTAG has great utility in friendly environments, in even moderately
hostile environments it can lead to undesirable forms of exposure.
This article discusses attacks and defenses for JTAG-enabled systems. Our goals are to provide a
practical path toward better security while maintaining strict compliance with the IEEE 1149.1 JTAG stan-
34
dard, all without significantly affecting test economics. We do not regard JTAG as a security threat. Indeed,
the presence of JTAG on a board enables stronger platform security than would typically be achievable in
an equivalent system without JTAG. Our view is that security problems arise when there is a discrepancy
between what people expect and what assurances the system actually provides.
JTAG is a synchronous serial link intended for system management tasks. As shown in Fig 3.1, it
supports daisy-chaining of an arbitrary number of devices. A single master device controls the protocol state
of all other devices on a chain. From a standpoint of functionality, the order of the devices in the chain does
not matter. Shared wiring and freedom over the order of the devices reduce the burden on the PCB designer,
which translates into lower cost for JTAG compared with other in-system test and management solutions.
In section 3.3, we see that the shared wiring and ordering of devices both have security implications.
TDO
DEVICE 5
TMS
DEVICE 3
STANDARD
COMPUTER
WITH
JTAG
INTERFACE
DEVICE 4
TCK
TRST
TDI
DEVICE 1
DEVICE 2
Figure 3.1: The typical deployment of JTAG is a chain of several devices on a printed circuit board. Each
device may come from a different vendor. The test mode select (TMS), test clock (TCK), and test reset
(TRST) signals are typically common to all chips. The test data in (TDI) signal and test data out (TDO)
signals loop through the chips. The path returns to the source, which is usually either a PC or an embedded
microcontroller, functionally called a “system controller.”
The concept of JTAG does not preclude providing protection and assurances, but it is typically
deployed in a minimal form which provides little or no protection. We will describe several ways in which
an attacker can exploit a typical JTAG deployment to achieve his goals. Then we will review the aspects of
35
JTAG that are relevant to the execution of the attacks. Section 3.4 surveys some of the prior work in this
area, both on the attack side and on the countermeasure side. Following that, we will present defenses with
the objective of improving security without incurring heavy extra costs.
3.2
JTAG Overview
The JTAG protocol defines a bidirectional communication link with one master and an arbitrary
number of slaves. The master can initiate an interaction with any of the slaves. Slaves never initiate
interactions. A chip is JTAG-enabled by the presence of two essential physical components. The first is the
test access port (TAP) controller, which is a state machine that interprets the JTAG serial protocol. The
other physical component is the boundary scan register (BSR), which is a register that is interposed between
the core logic of the chip and the I/O modules of the chip. The BSR can be arbitrarily wide, but can be
preset and read serially. JTAG has three basic modes: BYPASS, EXTEST, and INTEST.
3.2.1
BYPASS Mode
In BYPASS mode, the serial bit stream is copied from the TDI pin to the TDO pin, delayed by one
cycle of the test clock, TCK.
3.2.2
EXTEST Mode
In EXTEST (EXternal TEST) mode, the BSR is connected to the pins of chip. The BSR specifies
the values that are to be asserted on the chip’s output pins. The BSR captures the values that are present
at the chip’s input pins. The data in the BSR are subsequently shifted out through the TDO pin. The TDO
signal is either routed back to the master directly or through other JTAG-enabled chips.
3.2.3
INTEST Mode
In INTEST (INternal TEST) mode, the BSR is serially loaded from the TDI pin and the contents of
the BSR are applied to the terminals of the internal logic of the chip. The BSR captures the values that are
present at the output terminals of the internal logic of the chip.
36
The JTAG standard requires supporting the basic instructions. Implementors are free to add their
own instructions, and for these user-defined instructions they are free to define the semantics more or less
arbitrarily. Whatever the instruction, the protocol follows the same basic pattern: Enter the shift ir state
→ Shift the instruction into the instruction register (ir) → Enter the shift dr state → Shift the relevant
data into and/or out of the data register (dr).
3.3
JTAG Attacks
ATTACKERS
ATTACK REQUIREMENTS
ATTACKERS’ CAPABILITIES
ATTACKS
ATTACKERS’ GOALS
DEFENSES
K1
A1
G1
K2
G2
G3
CAPABILITIES:
P1 P2 P3 P4
A2
G4
K3
A3
G5
G6
P1
K4
Figure 3.2: Conceptual security model: A set of attackers A1, A2, and A3 have a set of goals G1 through
G6. Each attacker has a set of attack capabilities, some or all of which are masked by defenses that are in
place. There is a set of attacks, K1 through K4, each of which requires a certain set of unmasked attack
capabilities. Each attack can be used to reach some set of goals. This example shows that attacker A1
can achieve goals G2 and G4 since it has capabilities P2 and P4, which are the requirements for attack K2.
Attackers A2 and A3 do not have sufficient unmasked capabilities to execute any of the attacks.
37
A general model of the JTAG security landscape is shown in Figure 3.2. The purpose of this model
is to provide a way of analyzing the security risks associated with a JTAG deployment. A system can have
multiple potential attackers. For example, one potential attacker of a system is the manufacturer of one
of the chips. Another potential attacker is a hostile end user. Each attacker has a set of capabilities,
some subset of P1 through P4. For example, an attack capability is to be able to sniff the data that is sent
through the data lines. Some of the attackers’ capabilities might be blocked by defenses that are in place.
For example, the attacker might have access to sniff the bits on the JTAG bus. If those bits are encrypted,
sniffing attacks will not work. There is a set of possible attacks, each of which has attack requirements.
For example, an attack to capture the configuration stream of an FPGA as it is being programmed over
JTAG requires the ability to sniff the JTAG data line. Each potential attacker will have a set of attack
capabilities that are not masked by defenses that determines his set of possible attacks. Finally, we consider
the attackers’ goals. One possible goal an attacker might have is to cause a denial of service. Another
possible goal might be to clone the system.
Having looked at the general model of JTAG security, we examine five basic JTAG-based attacks.
The goal here is to see the general range of possibilities, not to exhaustively list all possible scenarios. After
examining these attacks in terms of the security model in Figure 3.2, we will be able to see what defense
mechanisms we need.
In a given attack scenario, the attacker will possess a limited set of capabilities:
• sniff the TDI/TDO signals
• modify the TDI/TDO signals
• control the TMS and TCK signals
• access keys used by testers
The system topology in all of the examples is the same as shown in Figure 3.1. We assume throughout
that the attacker is capable of providing or otherwise controlling the attack chip. An attacker can combine
one or more attacks to achieve his goals. Some attacks have already been carried out against real systems
as discussed in section 3.4.
38
3.3.1
Sniff Secret Data
TDO
DEVICE 4
TMS
DEVICE 3
(VICTIM)
STANDARD
COMPUTER
WITH
JTAG
INTERFACE
DEVICE 5
TCK
TRST
TDI
SECRET DATA FROM
TESTER TO DEVICE 3
DEVICE 1
DEVICE 2
(ATTACKER)
ATTACKER KEEPS
A COPY OF DATA
VICTIM GETS
UNMODIFIED DATA
Figure 3.3: The attacker obtains secret data by sniffing the JTAG data path.
The sniffing attack is shown in Figure 3.3. The attacker’s goal is to learn a secret that is being
programmed into victim chip using JTAG. An additional requirement for this attack is that the victim chip
is downstream from the attack chip on the same JTAG chain.
As shown in Figure 3.3, attack chip exhibits a false BYPASS mode which is externally identical to
true bypass mode, but which parses JTAG signals. When the secret is programmed into victim chip, the
attack chip captures a copy. The secret is then delivered to the attacker either through JTAG interrogation
of attack chip in the field, or through a side-channel. Alternatively, the attack chip might directly make use
of the sniffed data.
39
3.3.2
Read Out Secret
ATTACKER 2 COLLECTS
TEST RESPONSES
FROM VICTIM
TDO
DEVICE 4
TMS
DEVICE 3
(VICTIM)
STANDARD
COMPUTER
WITH
JTAG
INTERFACE
DEVICE 5
(ATTACKER 2)
TCK
TRST
TDI
DEVICE 1
DEVICE 2
(ATTACKER 1)
ATTACKER 1 INITIATES
JTAG COMMUNICATION
VICTIM RESPONDS
NORMALLY TO
TEST VECTORS
Figure 3.4: The attacker obtains an embedded secret by forcing test vectors onto the JTAG lines.
The read-out attack is shown in Figure 3.4. Here, the attacker’s goal is to learn a secret that is
contained in the victim chip. We assume that the attacker is capable of using I/O drivers in the upstream
attack chip (attacker 1) to forcefully control the TMS and TCK lines. The details of forcing these signals
are discussed in section 3.3.6. An additional requirement for this scenario is that the attack and victim chips
are on the same JTAG chain with the victim chip sandwiched between the attack chips.
The upstream attack chip forcefully acts as JTAG bus master, and performs a scan operation on
victim chip to access embedded secrets. Downstream attack chip collects the secret as it emerges from TDO
of victim chip. It can be used by “embedded attackers” as described, or it can be used by an attacker who
can attach external hardware to the system under attack.
40
3.3.3
Obtain Test Vectors and Responses
TESTER RECEIVES
UNMODIFIED TEST
RESPONSES
TDO
DEVICE 4
TMS
DEVICE 3
(VICTIM)
STANDARD
COMPUTER
WITH
JTAG
INTERFACE
DEVICE 5
(ATTACKER)
ATTACKER COLLECTS
TEST VECTORS AND
RESPONSES FROM VICTIM
TCK
TRST
TDI
TESTER SENDS TEST
VECTORS TO DEVICE 3
DEVICE 1
DEVICE 2
VICTIM RESPONDS
NORMALLY TO
TEST VECTORS
Figure 3.5: The attacker obtains a copy of the test vectors and normal responses of a chip in the JTAG
chain. This can be a passive attack.
This attack is shown in Figure 3.5. The attacker’s goal is to learn the vectors that are used to test
victim chip, and the normal responses. A requirement for this attack is that the attack chip is downstream
from the victim on the same JTAG chain.
The attack chip waits for the victim chip to be tested by the authorized party. The attack chip
collects the test vectors as they are shifted out of the instruction register, and collects the responses as they
are shifted out of the selected register (BSR or others) on their way back to the tester. Knowledge of the
test vectors and responses helps an attacker further infiltrate a system by providing trojaned parts that pass
normal tests.
41
3.3.4
Modify State of Authentic Part
The attacker’s goal is to modify the state of victim chip. We also assume that the attacker is capable
of inserting strong I/O drivers in the attack chip to forcefully control the TMS and TCK lines. An additional
requirement for this attack is that the victim chip is downstream from the attack chip on the same JTAG
chain.
The attack chip takes control of the TMS and TCK lines, and puts the JTAG TAP of the victim chip
into a state where it can shift data in, thereby setting the state of registers within victim chip, including
registers that affect the normal operation of victim chip.
3.3.5
Return False Responses to Test
BOGUS TEST RESULTS
RETURNED TO TESTER
TDO
DEVICE 4
TMS
DEVICE 3
(VICTIM)
STANDARD
COMPUTER
WITH
JTAG
INTERFACE
DEVICE 5
TCK
TRST
(VICTIM)
TDI
TEST VECTORS FROM
TESTER TO DEVICE 3
DEVICE 1
DEVICE 2
(ATTACKER)
ATTACKER KEEPS
A COPY OF DATA
VICTIM GETS PUT
INTO BYPASS MODE
Figure 3.6: The attacker can intercept test vectors that are sent to another chip, and can send false responses
to the tester.
The false responses attack is shown in Figure 3.6. The attacker’s goal is to deceive the tester about
the true state of victim chip. An additional requirement for this attack is that the victim chip is downstream
42
from the attack chip on the same JTAG chain.
The tester attempts to apply test vectors to the victim chip, which is not the first chip in the JTAG
chain. To do this, the tester attempts to place the other chips into BYPASS mode. The attack chip ignores
this request and intercepts the test vectors, while instructing victim chip and other downstream chips to
enter the BYPASS mode. Attack chip can then transmit the bogus test responses back to the tester.
3.3.6
Forcing TMS and TCK
Whether an attacker can forcefully control TMS and TCK depends on various factors such as
• strength of output drivers on JTAG master
• strength of output drivers of attack chip
• presence of buffers in the TMS and TCK lines
• presence of series output resistor (typically 100 ohms)
• topology of JTAG bus (star or daisy chain)
• physical layout of JTAG bus
• the input logic threshold and hysteresis, if any
For the attacker to successfully hijack the TMS and TCK lines, he must be able change the voltage
seen by the victim’s TMS and TCK input pins. This change must be sufficient for the voltage to cross
the logic threshold. If the JTAG is set up in a star topology, with separate TMS and TCK lines for each
chip, this attack will be impossible. However, it is quite common, in practice, for system designers to take
advantage of the economy of wiring JTAG in a daisy chain topology. Designers can put buffers at various
points in the JTAG wiring. There are various reasons for doing this, including noise immunity and fanout.
However, these buffers add cost and complexity, and are not required by the JTAG standard. It is common
practice to add a resistor, typically 100 ohms, in series with each of the outputs of the JTAG master. This
resistor reduces ringing on the line by providing series termination and/or slowing the slew rate. In systems
43
where this series resistor is present there is a reduction in the amount of current that must flow in order for
the hijacker to force a TMS bit, or to force a TCK edge. The strength of the drivers in the master and in
the attacker are also relevant. If the attack chip is an ASIC, the attacker can specify essentially any strength
drivers. If the attack chip is an FPGA where the attacker controls the configuration bits, the attacker can
program the I/O block to use a high-current I/O standard. If the attacker can control the PCB design, he
can bond multiple pins of the attack chip together to form a mega driver. Finally, if the TMS and TCK lines
are simply bussed around without buffers, multiple attackers can gang up to overpower the master. These
last possibilities are included only for completeness. Next, we present practical experimental results testing
the basis of the JTAG hijacking attack under normal conditions.
3.3.6.1
Experimental Validation
Our JTAG hijacking experimental setup is made of three chips: a system controller, a victim, and a
hijacker. The system controller is a Philips 8052 microcontroller configured to bit-bang JTAG using port 1
directly, not through an I/O expansion chip (8255, etc.), nor through any resistor. The victim is a Xilinx
Spartan 3e FPGA. The attacker is a Spartan 3 FPGA. The 8052 is programmed to keep TCK low all the
time.
The experiment tests the hypothesis that the attacker can raise the TCK line sufficiently to edgetrigger the victim chip. We programmed the victim chip to count TCK transitions and to display the count
on a set of seven-segment LEDs. This way, we were able to confirm the extent of the attacker’s control over
the TCK signal. We programmed the attack chip to send 1000 pulses. Comparing 1000 with the number of
TCK transitions counted by the victim chip, we confirmed that the Spartan 3 attack chip easily overpowers
the 8052 system controller. In multiple runs, the victim always showed the correct count.
When one chip overpowers the output of another chip, there is a risk that the output driver circuitry
of one or both chips will be damaged. In the experiment, we minimized the chance of overheating either
chip by using a TCK waveform with a 80 nsec pulses and a 0.0015% duty cycle. We left the setup running
for over an hour and saw no evidence of any sort of damage to any of the chips.
44
3.3.6.2
Common I/O Driver Characteristics
Electrically, DC models can represent the two output drivers fighting. Each driver will have an I-V
curve for its logic-low state and an I-V curve for its logic-high state. The intersection of the logic-low I-V
curve of the 8052 and the logic-high curve of the Spartan 3 gives the voltage at the node where their outputs
meet. This is similar to solving for the quiescent point of an amplifier using load-line analysis.
8052 versus Spartan 3 IO Drivers
0.10
Pin Current (A)
0.08
0.06
0.04
0.02
0.00
0.0
0.5
1.0
1.5 2.0 2.5
Pin Voltage (V)
3.0
3.5
Figure 3.7: A Philips 8052 (the system controller) was programmed to keep one of its pins low. The I-V curve
for this output driver was extracted using a pulsed I-V measurement. A Xilinx Spartan 3e was programmed
to keep one of its pins high. This pin’s I-V curve was also extracted. The result is shown. The solid line is
the FPGA; the dashed line is the microcontroller. The intersection is at 2.1V, exceeding VIH for most 3.3V
logic.
Using a pulse method, we obtained the I-V curves of the output drivers of our 8052 and our Spartan
3. From these curves, we solved for the expected voltage when the drivers fight. The critical issue here is
whether this voltage exceeds VIH of the victim chip. In our experimental platform the receiving chip is a
45
Spartan 3e using the 3.3 volt CMOS IO standard where VIH is 1.1 volts and there is no input hysteresis.
The Spartan 3 easily overpowers the 8052 and can force the TMS and TCK signals. In our measurements,
the pulse width was 80nsec, with a repetition period of 1.3msec. This low duty cycle, 0.0061%, allows hostile
JTAG communication but results in a worst-case dissipation of only 22µW in the output driver of either
chip, not enough to cause damage.
3.3.6.3
PCB Layout Effects
We discussed the TMS and TCK hijacking in terms of DC characteristics such as I-V curves and logic
thresholds. We now examine a transient phenomenon that allows even weak hijackers to control the bus
regardless of the strength of the master.
VICTIM
MASTER
d MH
HIJACKER
Figure 3.8: A length of PCB wiring connects the hijacker to the JTAG master. This allows the attacker to
inject short pulses onto the wiring without being hindered by the master.
As seen in Figure 3.8, there is typically a PCB trace of some length connecting the master’s JTAG
signals to each of the slaves. The trace length from the master to the hijacker, dM H , results in a round-trip
wiring delay of at least rtt =
2dM H
vprop ,
where vprop is the propagation speed of a pulse in the PCB trace. This
propagation speed is typically around 2 ∗ 108 m/s. If we assume a PCB trace of 30cm, we obtain a round-trip
time of at least 3ns. Even if the master can present a zero-ohm impedance at its end of the transmission line,
each time the hijacker applies a 0 → 1 step to the TMS or TCK line, a pulse at least 3ns long will appear at
the victim. This is sufficient to control a JTAG tap. We experimentally verified this phenomenon using the
setup shown in Figure 3.8, wired together using short sections of CAT 5 cable, Zo = 100Ω. The hijacker can
46
control the victim’s TAP even when the TMS and TCK lines are shorted to ground at the master’s end.
3.4
Prior Work on Attacks and Defenses
Yang, et al. [5] showed that JTAG boundary scans enable an attacker to read secret data out of a
chip. They use a hardware DES implementation as their example. Satellite television receivers make use of
trusted hardware to prevent non-paying users from viewing the encrypted broadcasted content. There is an
active underground effort [11] [23] to hack satellite TV boxes by extracting their keys using JTAG [24]. This
has been quite successful and continues.
On one hand, it is tempting to suggest that JTAG is a security risk and should be excluded from
trusted hardware, but untestable hardware is a prohibitive business risk. Yang, et al. [25] proposed a DFT
architecture that allows testing crypto hardware with high fault coverage yet without exposing hard-coded
crypto keys to leakage. Novak and Biasizzo [26] propose a locking mechanism for JTAG with negligible
added cost by adding LOCK and KEY registers. To lock the TAP, a key is entered using a special LOCK
instruction. To unlock the TAP, a special UNLOCK instruction is issued, which requires shifting in the key
via TDI. The TAP state machine is not altered. Compliance with the 1149.1 standard is maintained except
for the rejection of standard instructions (including ones that are mandated by 1149.1) while the TAP is
locked.
3.5
Defenses for JTAG
Our security goals are to ensure the authenticity of the devices in the JTAG chain, to ensure the secrecy
of communication between the JTAG master and the chip, and to reject inauthentic JTAG messages. To
achieve these goals, we make use of three standard security primitives: a keyed hash function, a stream
cipher, and a message authentication code. Using these primitives, we construct protocols that significantly
enhance the security of JTAG.
JTAG communication takes place between a master and a chip. The master is typically a microcontroller or a test station. The threat model of the link is similar to the threat model assumed by designers
of network security protocols. Since the threats are similar, it is tempting to apply existing solutions for
47
network communication, such as SSH and SSL. Circuit complexity and key distribution are just two of the
problems. SSH and SSL require significant amounts of computation for performing public key cryptography.
Even on the full-size computers for which the protocols were designed, the crypto arithmetic is performed by
software routines, not native opcodes. These big-integer calculations do not lend themselves to lightweight
hardware implementation. Furthermore, the semantics of JTAG include prompt synchronous response, so
multi-cycle crypto operations would be problematic.
A mechanism that is suitable for securing JTAG is the authentication and key establishment scheme
described in [27]. This lightweight scheme uses a PUF [28] [29] [27] [30]. to authenticate the chip and to
establish a cipher key for secure communication. The requirements for challenge-response device authentication are repeatability and unpredictability. Given the same challenge, the chip should always calculate the
same response. Without having observed the chip’s response to a given challenge, it should be impossible
for an attacker to predict the response. We make use of a similar idea to the PUF authentication and key
establishment scheme, but where fuses and a keyed hash are used instead of a PUF. In a PUF, each chip’s
uniqueness comes from subtle physical differences between chips. For our keyed hash, the uniqueness comes
from fuse bits that are programmed at the factory. Either a PUF or a keyed hash can support a challengeresponse protocol for device authentication. There are pros and cons to each of these two mechanisms.
Briefly, PUFs have the advantage of being intrinsically unique, not needing uniqueness to be programmed
into them. Fuses have the advantage of using less area, and being a a mature technology that is extremely
reliable across temperature, aging, and power variations.
For verifying the authenticity of a part in the JTAG chain, a challenge is sent to the chip, which feeds
the challenge to its keyed hash function. The response is returned to the tester, where it is compared with
the expected response. The challenge is shifted into the chip via the TDI line. The response is shifted out
via TDO. A custom JTAG instruction is invoked to trigger the calculation of the hash.
We also require a cipher to encrypt the communication. A block cipher is not suitable due to its
large die area. Block ciphers are also problematic in this application because they operate on whole blocks
of data, and the delay this introduces conflicts with standard JTAG timing. The preferred mechanism for
encrypting JTAG communication is a stream cipher. Robshaw [31] discusses stream ciphers in depth. Our
48
current implementation uses the Trivium stream cipher [32]. Other stream ciphers can be used.
To protect against inauthentic JTAG messages, we need a message authentication code (MAC) scheme.
A MAC algorithm uses a key that is known to the sender and receiver to verify that the message was sent
by the authentic sender (not spoofed) and was not modified in transit. The most simplistic MAC scheme
is just to calculate the hash of the message and the key, concatenated together. This simplistic scheme is
not cryptographically strong. Current popular MAC schemes such as HMAC [33] use nested hash functions,
each operating on a block of text. Recently, Arazi [34] describes the construction of MAC functions from
stream ciphers in embedded platforms where computational power is a constraint.
Because of the timing of JTAG, a MAC verification scheme that produces the answer immediately upon
receipt of the final bit of the message is highly desirable. Introducing a delay while a message is authenticated
would complicate the timing of the application of test vectors. Therefore, block-based algorithms are not
desirable for our purpose. We make use of an incremental MAC function. Incremental hashing can be
performed so that computation is being done while the bits of the message are being serially received [35].
Another mechanism that is required for securing JTAG is access control. By access control, normally
what is meant is that there is a set of subjects, a set of objects, and at each subject-object intersection, there
is a set of permissions. Some basic access control schemes have been proposed for JTAG and some have been
included in commodity parts. A rudimentary but strong access control mechanism is a side-effect of using
a MAC on the messages. If only the tester has the challenge-response pairs for the keyed hash, only the
legitimate tester will be able to negotiate a MAC key, so the MAC algorithm assures that an unauthorized
party will not be able to successfully issue test instructions.
The next subsections discuss how the security primitives are used to secure JTAG communication.
3.5.1
Secure JTAG Communication Protocol
Due to the wide range of cost and security sensitivity for different chips, a single rigid protocol will
either fail to provide all of the necessary assurances or, alternatively, add excessive overhead to the cost of
the parts. Therefore we define four levels of protection and provide solutions for each of the levels. Chips of
varying levels can be freely mixed on the same JTAG chain without interoperability problems.
49
• Level 0: No assurances. This is equivalent to the vast majority of JTAG-enabled chips currently
made.
• Level 1: Authenticity is assured. The authentic chip is guaranteed to be involved in the JTAG
authenticity probe. No further assurances are available.
• Level 2: In addition to the assurance provided by Level 1, secrecy of JTAG signals is assured.
• Level 3: In addition to the assurance provided by Level 2, the JTAG link is protected against active
attacks: insertion, modification, reordering, replay, and deletion of messages.
level
authenticity
secrecy
integrity
0
no
no
no
1
yes
no
no
2
yes
yes
no
3
yes
yes
yes
Figure 3.9: We define four levels of assurance. Levels correspond to the set of assurances that are provided.
The JTAG assurance level of each of the chips in the JTAG chain is known by the tester. This attribute
can be defined in the BSDL file for the chip. Note that level 0 involves no active participation in the proposed
protocol. Level 0 can be considered an explicit provision for backward compatibility. Our scheme allows
participating and nonparticipating chips to coexist in the JTAG chain with each part functioning at its full
assurance level. The presence of low assurance chips in the chain does not affect the assurances provided
by higher assurance level chips in the same chain. It is, however, the designer’s responsibility to choose the
right JTAG assurance level for his chip and its intended application.
This section describes the protocol used for communication between the tester and the chip.
3.5.2
Level 0 Protocol
Communication with a level-0 device is exactly the same as what is defined in the 1149.1 standard.
50
3.5.3
Level 1 Protocol
Communication with a level-1 device is exactly the same as what is defined in the 1149.1 standard
except for addition of an authentication operation.
(1) Tester randomly extracts a CRP from storage.
(2) Tester sends challenge to chip.
(3) Chip applies challenge to keyed hash module.
(4) Chip sends result to tester.
(5) Tester compares chip’s response with CRP.
(6) Regular JTAG operations commence.
3.5.4
Level 2 Protocol
Communication with a level-2 device involves sending and receiving encrypted data to and from the
data register (DR). The encryption only affects the signal on the TDI and TDO lines. The TCK, TMS, and
TRST lines retain the exact semantics defined in the 1149.1 standard. The TAP controller’s JTAG state
machine is also unaffected. Level-2 communication occurs in two phases: setup and communication.
The purpose of the setup phase is to establish a shared secret between the tester and the chip. The
setup phase proceeds as follows:
(1) Tester randomly selects and extracts a CRP.
(2) Tester sends challenge to chip.
(3) Chip applies challenge to keyed hash module.
(4) Chip uses response as key for stream cipher.
In the case of the tester writing to the data register on the chip, the protocol is as follows:
(1) Tester encrypts JTAG DR contents using shared secret.
51
(2) Tester places chip into SHIFT-DR state.
(3) Tester shifts in the encrypted contents of DR.
(4) Chip decrypts data into plaintext register as it is shifted in.
(5) Tester places chip into EXIT-IR or EXIT-DR state.
(6) Plaintext register is latched in and used.
The bits that are shifted out via TDO during a tester-writes-to-chip operation are ciphertext. The
protocol for the tester to read data from the chip is as follows:
(1) Tester initiates read operation.
(2) Chip encrypts contents of DR using stream cipher as it transmits the bits on TDO.
(3) Tester decrypts bits using stream cipher.
3.5.5
Level 3 Protocol
The level-3 protocol has two phases: setup and communication. The setup phase establishes the secrets
that are shared between the tester and the chip for the cipher and MAC algorithms. The communication
phase encapsulates, transmits, and de-encapsulates the contents of the IR and DR.
The behavior at level 3 is the same as level 2, but an additional key establishment operation is
performed for the MAC key. Using that additional MAC key, the chip calculates the MAC of the bits are
received. There are two distinct ways of doing this. First is to use an incremental MAC algorithm of the
form shown by Lai [35]. The second is to use a more conventional MAC scheme that requires full blocks
of data before commencing MAC processing. The MAC is checked when the TAP transitions out of the
SHIFT-DR state. If the MAC passes, the contents of dr are transferred to a validated dr register and a
flag on validated dr is set, indicating that the contents of the register has been validated. Later operations
can make use of the contents of the validated dr register. On one hand, this approach adds area overhead.
On the other hand, it enables the use of a conventional (non-incremental) MAC algorithm, the kind that
has been more extensively studied.
52
The steps of the MAC setup phase are the same as the crypto key setup in the level-2 protocol. The
keyed hash is used along with challenge-response pairs that are stored by the tester. Rudimentary access
control is provided by virtue of MACs being checked. Only the authentic sender can successfully negotiate
a MAC key.
3.6
Costs and Benefits Associated with JTAG Defenses
Successful mechanisms for enhancing the security of JTAG must satisfy cost and operational con-
straints, in addition to providing the required forms of assurances.
3.6.1
Die Area Overhead
Some of the systems we are protecting are high-volume commodity items which are price-sensitive.
We cannot add more than 10% to the die area of any protected part. We see in Figure 3.10 that for designs
of non-trivial die area, the proposed schemes will not add significant cost. From Figure 3.10 we see that for
a design that uses 10,000 slices, the area overhead is less than 9% even for level 3 defenses, and about 4%
for level 1. 10,000 slices of a Xilinx Spartan 3e FPGA is equivalent to 1.4 million gates [36].
53
1000
area overhead (%)
100
10
1
0
2 ×103
4 ×103 6 ×103 8 ×103 1.0 ×104 1.2 ×104 1.4 ×104 1.6 ×104
design complexity (Spartan 3 slices)
Figure 3.10: Area overhead is shown for the protection levels 1 through 3, from bottom to top. The cost of
the security enhancements independent of design complexity, so the percentage overhead is lower for more
complex designs. The four levels of assurance provide progressively higher levels of assurance. An indication
of the area cost of each protection level is given by the number of additional FPGA slices used by the
enhanced JTAG circuitry. These figures are for a Spartan 3e. There are no fuses in the FPGA so fuses are
modeled as hard-coded bit vectors. Overhead in an ASIC will be less.
3.6.2
Test Time Overhead
There is test time overhead associated with the security enhancements we propose. For the level 1
defenses, the time overhead is as follows:
(1) Time to shift in challenge. (80 cycles of test clock)
(2) Time to initialize stream cipher. (1152 cycles of functional clock)
1
(3) Time to shift out response. (80 cycles of test clock)
1
The stream cipher we use in our reference implementation is Trivium [32]; it uses an 80-bit key and takes 1152 cycles to
initialize.
54
For the level 2 defenses, the time overhead is spent once per test session, to set up the cipher key and
initialize the stream cipher.
(1) Shift in the challenge (80 cycles of test clock).
(2) Initialize stream cipher 1 using challenge (1152 cycles of functional clock).
(3) Extract 80 bits of keystream from cipher 1 (80 cycles of functional clock).
(4) Initialize stream cipher 2 using 80 bits from cipher 1 (1152 cycles of functional clock).
After this setup, there is no test time overhead associated with the level-2 defenses. Stream cipher 2 operates
synchronously with the test clock to encrypt and decrypt data. For level 3 defenses, the minimum additional
test time overhead is the same as for level 2.
3.6.3
Operational Costs
The security enhancements that we propose require that challenge-response pairs be extracted at
manufacturing test time, before the chips leave the manufacturing facility. If fuses are used, as opposed to a
PUF, the fuses should be blown with a random pattern prior to the extraction of challenge-response pairs.
For each chip that the customer receives, he needs the challenge-response pairs. These have to be delivered
to the customer over a secure channel.
It is also possible to eliminate the burden of keeping track of the identity of each chip on a reel of
parts. Instead of querying the chip maker for the challenge-response pairs for a part, the chip can contain a
unique code which can then be used for retrieving the challenge response pairs from the chip maker.
3.6.4
Impact of Defenses on Known Threats
The proposed defenses protect against a range of attacks discussed in this document. We will now
illustrate how two otherwise practical attacks can be prevented.
• Attacker Obtains Embedded Secrets: The attacker could either sniff the JTAG link as the key is
being programmed, or the attacker could perform an unauthorized debugging session to read out the
55
key. These attacks are popular in hacking satellite TV boxes to extract the box keys using JTAG [24]
and in taking a snapshot of the firmware in the box [37]. Novak and Biasizzo [26] propose a solution
to unauthorized JTAG access based on access control. A key must be presented to the chip before
it can be manipulated with JTAG. Under an expansive threat model, where hostile chips may be
present in the JTAG chain, the scheme is not effective. On the other hand, the proposed level-2 and
level-3 defenses are comprehensive and guard against passive sniffing by encrypting the link. They
also prevent unauthorized debugging as a side effect of MACing the data.
• Cloning an existing system would involve different attacks depending on the implementation style
of the system. If it is a CPU- or PLD-based system, cloning would involve reading out firmware
or a bitstream. The attacks are similar to obtaining an embedded secret, and the defenses are the
same. For example, at levels 2 and 3, the attacker will need to negotiate keys with the chip in
order to communicate over JTAG, and this will be impossible for the attacker since he lacks the
challenge-response pairs that were set up offline.
3.7
Conclusion
In systems where sensitive data are transported or accessible via JTAG, we do not recommend using
a daisy-chain topology with conventional unprotected JTAG. A star topology protects against many of the
attacks discussed in this article. However, the star topology increases PCB complexity and cost and doesn’t
eliminate all JTAG-based system vulnerabilities. An alternative is to retain the daisy-chain topology, and
to derive needed security from cryptographic enhancements built on top of JTAG.
We have devised a scheme that provides a significant improvement in JTAG security with reasonable
added cost. The scheme is flexible in the sense that it can provide high assurance for important chips and
lower assurance (and lower cost) for less important chips. Compatibility is maintained across the assurance
levels and compatibility is maintained with the IEEE 1149.1 JTAG standard to the maximum extent possible.
The scope of this work is restricted to JTAG. There is a multitude of other threats to hardware
that do not involve JTAG. For instance, exploits via bus probing have been successfully executed in the
56
past [38] and will probably be used again. However, it should be noted that the original reason for the
existence of JTAG was to facilitate boundary scan of I/O pins because it was difficult to probe the wiring
on modern printed circuit boards. Securing JTAG denies the attacker what would otherwise be an easy way
of probing a bus. We do not address several other threats including the threat of manufacturers inserting
hostile functionality in the chips that they supply. Detecting the presence of such functionality is an active
research topic [39] [40].
Chapter 4
Testing Cores in a System on Chip
4.1
Introduction
SoC design cycles are getting shorter and designs are getting more complex. Pressure for more
productivity per designer per day has led to reuse of design modules and, in many cases, obtaining modules
from external sources of intellectual property (IP). It is impractical, and sometimes impossible, for SoC
designers to manually assess the security of each of the cores they use. Modular design methodologies
hide the internals of the modules, which boosts productivity, but unfortunately it shifts designers from
knowing the logic of the chip to merely hoping that each piece actually operates according to its interface
specifications. Continued progress in VLSI requires that we not let complexity and short design cycles
undermine our ability to produce chips that are trustworthy. In the long run, we need to work toward
designing in such a way that the security failure of one module does not result in the security failure of the
entire system. One step in that direction is the topic of this research, to reduce the risk of cascading security
failure in the test subsystems of chips we design.
Test access mechanisms are critical components in digital systems. They affect not only production
and operational economics, but also system security. We propose a security enhancement for system-onchip (SoC) test access that addresses the threat posed by untrustworthy cores. The scheme maintains the
economy of shared wiring (bus or daisy-chain) while achieving most of the security benefits of star-topology
test access wiring. Using the proposed scheme, the tester is able to establish distinct cryptographic session
keys with each of the cores, significantly reducing the exposure in cases where one or more of the cores
58
contains malicious or otherwise untrustworthy logic. The proposed scheme is out of the functional path and
does not affect functional timing or power consumption.
Test access mechanisms (TAMs) are present in every complex chip and are used for a large and growing
variety of purposes. Their original purpose was to enable efficient structural testing of digital systems instead
of the notoriously inefficient process of black-box functional testing. As system design has evolved, the role of
TAMs has been extended to include invoking BIST, programming nonvolatile memory, debugging embedded
microprocessors, initializing the configuration of FPGAs, initializing volatile run-time configuration registers,
and enabling and disabling system components including the TAM itself. A security-aware TAM improves
upon conventional TAM concepts by protecting the data that traverses it, thus reducing the risks arising
from untrustworthy cores.
4.1.1
Assumptions
The threat model addressed by this research is that one or more untrustworthy cores intercept or
modify the test data that passes between the tester and a core. We make the following assumptions:
(1) The SoC contains many cores, some of which are untrustworthy.
(2) The SoC contains trustworthy inter-core functional wiring.
(3) The SoC contains trustworthy test access wiring.
(4) Cores are connected to the tester by shared wiring. This can be either a daisy-chain scheme or a
bus scheme.
(5) Some or all cores, including their test wrappers, are opaque.
(6) The SoC design is known to the attacker as much as it is known to the designer. If a crypto key is
embedded in the design, the key is known to the attacker.
59
test
interface
WSI
WSO
WSI
WSO
WSI
WSO
core 1
core 2
core 3
WSC
WSC
WSC
Figure 4.1: Data sent from the test controller to core 2 passes through core 1, giving core 1 an opportunity
to intercept it. Likewise, data passing from core 2 back to the tester passes through core 3, giving core 3 an
opportunity to intercept it.
Some examples of TAM-related threats in a daisy-chained TAM architecture, as shown in Figure 4.1,
include the following:
(1) Core 1 sniffs data passing from tester to core 2. Core 1 leaks the data. Core 2 can be an FPGA or
microprocessor, and the data can be its bitfile or executable program. This attack results in leakage
of intellectual property which can result in monetary losses or expose the leaked data to reverse
engineering and additional security exposure.
(2) Core 1 modifies data passing from the tester to core 2. Core 2 can be a crypto core and the data
can be a key. This attack can create a back door into the system.
4.1.2
Constraints
One way to reduce TAM-related risk is to use a star topology for test data. The star topology avoids
the threats listed above by avoiding the placement of two cores on the same test wiring where they might
interfere with each other. This topology is seldom used, since it results in high wiring cost. As the number
of cores in SoC designs increases, star-topology TAM wiring becomes less and less practical. Instead, we
reject that technique in favor of cryptography, which allows us to simultaneously obtain the low cost of the
bus topology and the good security of the star topology.
60
4.1.3
Core Test Wrappers
To facilitate design automation, modular design, and efficient reuse of cores, the cores are often
delivered to SoC integrators in a wrapped form, which means that the complexity of their internal test
structures is hidden behind some wrapper logic, which exposes a simplified, standardized interface to the
outside. IEEE 1500 [2] defines this interface. Although wrapping can improve the productivity of SoC
integrators, it requires the SoC integrator to trust the wrapper implementations provided by the core vendors.
Wrapped versus unwrapped cores is a complex decision with security ramifications.
4.2
Prior Work
The threat of malicious inclusions in chip designs has been discussed at a technical level in the security
and VLSI communities, and at a policy level in defense communities. DARPA has led US DoD research
in the area, funding the TRUST program [41], which aims to ensure that critical government operations
are able to source trustworthy chips. King [42] shows the construction of malicious modifications to a CPU
design that give the attacker the flexibility to choose the details of the attack at run time, in the field,
after deployment. Wang [43] presents a taxonomy of malicious hardware. Kim [44] examines the threat of
malicious modules in an SoC abusing their bus-master capability. They provide a security-enhanced bus
arbiter that traps the bus transactions of rogue modules and inhibits them from further action by cutting
power to the offending module. The risks of having malicious chips on a JTAG [1] chain were studied in [13],
and the authors developed countermeasures.
The SoC TAM security problem differs significantly from the JTAG security problem. Primarily, the
difference is that all cores of an SoC are fabricated together. The JTAG security solutions proposed in [13]
assume that each chip exists first in isolation, gets packaged and tested, and then shipped to the customer.
While in isolation, before being shipped, keys can be set up for use by the tester for secure communication
over the untrustworthy test bus in the field. Since this isolated stage does not exist for SoC cores, the TAM
security solutions developed for JTAG do not apply to SoCs.
61
4.3
Proposed Approach
We propose TAM enhancements that leverage the SoC designer’s control over the inter-core wiring.
The result is that untrustworthy cores are prevented from sniffing communication on the test bus. As with
most communication security schemes, key exchange is a pivotal issue. Many cores are to a single test bus,
and to single out a target core for communication, the TAM must provide the tester with a mechanism
for securely distinguishing the target core from the rest of the cores. We propose that the SoC integrator
construct a scan chain outside of any of the wrapped cores, and connect each core to the output of one of
the cells in the scan chain. The foundation of the security of our system are the assumptions that:
(1) A core, however malicious and devious it may be, cannot affect the intercore wiring of the SoC.
(2) A core cannot control where in the intercore wiring its terminals are connected.
Thus, the tester securely distinguishes each core by which scan cell it connects to, as shown in Figure 4.2.
As stated in Section 4.1.2, TAMs are under significant pressure to minimize their cost. For this reason,
SoCs use shared wiring to connect the tester with the cores. During testing, the target core is addressed
using any of various schemes, while the other cores are expected to be passive. We describe three scenarios
where untrusted cores are placed on the shared TAM wiring. For each scenario, we discuss how the economy
of shared wiring can be retained while protecting sensitive test data from sniffing as it passes by or through
malicious cores. The first scenario is where a wrapped cores is obtained already containing the security-aware
TAM enhancements described in this research. The second scenario is where a wrapped cores is obtained
without any TAM security enhancements, and has sensitive test data. The third scenario is where a core is
obtained that is untrusted but no sensitive test data will be exchanged with it.
4.3.1
Security-enhanced Test Wrapper
We enhance the security of the standard core test wrapper by using cryptography to protect the data.
Standard crypto primitives are used and the details of their design and implementation are outside the scope
of this research. Here, we focus on the practical aspects of key exchange and issues specifically relevant to
the SoC test problem.
62
A common technique for session key establishment is to use the Diffie-Hellman [45] protocol. In
Diffie-Hellman, each party generates a random number, sends a message, and does some arithmetic, and the
result is that the two parties agree on a secret that was never explicitly transmitted over the wire. Although
Diffie-Hellman is a powerful building block, for our application, we can obtain better security at lower cost
by taking advantage of practical constraints on what a malicious core can do. Keys can be generated by the
test controller or external tester, and distributed to each core at test initialization time as shown in Figure
4.2. Cheap and secure, this is our preferred key distribution scheme.
key
gen
WSI
test
controller
WSO
key core 1
reg
scan
cell
WSC
WSI
WSO
key core 2
reg
WSC
scan
cell
WSI
WSO
key core 3
reg
WSC
scan
cell
Figure 4.2: A chain of scan cells is used for distributing keys to each of the cores. The scan cells are configured
not to expose key bits at their outputs while they are being shifted.
The SoC designer can choose between on-chip key generation and off-chip key generation. If done onchip, the external tester (i.e., ATE) does not have access to keys, which improves security in some situations.
However, for reasons of cost, as discussed in Section 4.4, in most circumstances we expect SoC designers to
prefer off-chip key generation and cryptography. Key generation entails selecting a key for each core that
has a security-enhanced wrapper. The most important characteristic of the keys is that no core should be
able to learn the key of another core. As stated in Section 4.1.1, we assume that the design is known to
the attacker. This precludes the possibility of hard-coding the keys. Instead, for on-chip key generation, a
hardware random number generator is used. Holleman [46] reported a hardware number generator requiring
63
0.031 square millimeters of die area in 0.35-micron four-metal, double-poly CMOS. If implemented in a
current fabrication process with smaller feature sizes, the die area for the circuit would be correspondingly
smaller.
When key bits are transmitted to the cores during key setup, they are also stored by the test controller
or external tester. The storage of key bits or cipher state is necessary because it is the basis for encrypted
communication. However, the SoC designer has a choice of whether to maintain crypto sessions when
accessing other cores. For example, if the test schedule involves communicating with core 1 and then
with core 2, and then with core 1 again, the question is whether the test controller should maintain the
crypto session (cipher state) associated with core 1 while accessing core 2. If it does, then it can resume
communications with core 1 without the delay of reinitializing the cipher state associated with core 1. On
the other hand, if the on-chip test controller is performing the crypto, then registers must be added to the
test controller to store the cipher state, which increases the die area overhead. In the case of off-chip key
generation and crypto, storing session state is not a problem. Offloading the crypto to the ATE is consistent
with our goals and with the threat model stated in Section 4.1.1.
The scan cells in the key setup scan chain, though very simple, provide two properties that are very
important for security. First, as shown in Figure 4.3, they accept an output inhibit signal, O INH*, which
forces the output to zero when it is pulled low. This has the effect of blocking cores from observing other
cores’ key bits while they are being shifted in. Second, the logic gates are, for all intents and purposes,
unilateral. There is no way for a core to actively force a value onto the flip-flop to affect the key bits that
are received by other cores.
64
GATE_OUT
O_INH
D_IN
DFF
Q_OUT
Figure 4.3: The key setup scan chain conveys data from the test controller to the core wrapper key registers
without allowing it to be sniffed or modified by other cores. Other than the basic distributed shift register
functionality, the only extra functionality we require of our scan cell is an output inhibit input (O INH) to
ensure that the key is not leaked during shifting. After the tester has the key bits shifted to their intended
location, the tester deasserts the output inhibit signal so that the cores receive their key data.
Communication requirements of cores vary over a wide range, and optimal test access design involves
allocation of test buses and scheduling of tests [47]. Cores using BIST exclusively may be interfaced using
only a serial test interface. Cores exposing extensive internal scan chains to the tester typically make use of
a parallel test bus to increase test speed. In either case standard low-cost symmetric cryptographic modules
are placed between the test interface and the core, as shown in Figure 4.4. The costs of these modules are
discussed in Section 4.4.
65
core under test
32
scan chain 2
32
32
32
scan chain i
stream cipher
parallel output
compressor
parallel input
32
decompressor
32
scan chain 1
key initialization
Figure 4.4: In a typical security-enhanced wrapped core, a word of compressed test data arrives via the
parallel data input, is decrypted instantaneously, decompressed and applied to the inputs of the core’s scan
chains. The outputs are compressed, encrypted, and sent out. Standard wrapper components are not shown,
such as the parallel bypass.
If an untrusted core was obtained in an unwrapped state, the test wrapper described in this subsection
should be added by the SoC integrator. Purely from a security standpoint, it is good to obtain cores in an
unwrapped state. However, it adds significant work for the SoC integrator, losing the productivity benefits
of test integration standards like IEEE 1500. Obtaining cores prewrapped with security-aware wrappers is
probably the option most SoC integrators would prefer.
An untrustworthy core can be provided to the SoC integrator prewrapped with a security-enhanced
wrapper. The presence of the security-enhanced wrapper does not make the core trustworthy, even if the
wrapper itself is free of malicious features. However, the untrustworthy core, even if its wrapper contains
malicious features, cannot undermine the TAM security of the SoC. This is in stark contrast with the
conventional daisy-chain TAM architecture, where a single untrustworthy core breaks the security of all
other cores on the chain.
66
4.3.2
A Security Overwrapper for Prewrapped Cores
In cases where an important module is only available as a conventionally wrapped core, a security
overwrapper can be used. The functionality provided by the overwrapper, when combined with the core’s
included wrapper, is equivalent to that of the of the security-enhanced wrapper discussed in Section 4.3.1.
The security overwrapper contains functional blocks for decrypting input test data and encrypting output
test data.
4.3.3
Interoperability with Noncompliant Cores
The TAM security scheme described here does not require that all cores in an SoC comply. Noncom-
pliant wrapped cores can be used as they are, provided that no security-critical data passes over their test
interfaces. The presence of noncompliant cores on the test bus does not undermine the security guarantees
provided to the compliant cores.
4.4
Costs
The SoC TAM security enhancements presented in this research were designed to be efficient in terms
of die area, test time, and effort for the SoC integrator.
4.4.1
Die Area Cost
The security enhancements contribute to die area in three ways. To a certain extent, these costs can
be traded off, and the optimal choice depends on economics.
4.4.1.1
Wiring Cost
The extra wiring cost of the security enhancements is the cost of three extra wires on the test bus. In
a typical SoC with serial control of the core wrappers and a 32-bit path for test data, there is a minimum of
40 wires without the enhancements, and 43 with the enhancements, a 7.5% overhead.
67
4.4.1.2
Core Wrapper Area
Each core whose test data the SoC integrator decides to protect needs to have its own crypto hardware,
whether provided by the core supplier or by the SoC integrator. Assuming a 32-bit parallel test data path, the
wrapper must decrypt 32 bits of input test data while encrypting 32 bits of output test data. To implement
this with no additional latency, a stream cipher is used. 64 bits of keystream are required for each cycle of
the test clock. This is achieved by using a keystream generator that produces multiple bits per clock cycle.
The main additional hardware requirements for the security-enhanced core wrapper are:
• 32 XOR gates to decrypt the input test stimulus
• 32 XOR gates to encrypt the output test response
• a stream cipher that generates 64 bits of keystream per cycle of test clock
The Trivium [32] stream cipher meets the requirements. In its 64-bit form, it is equivalent to 5504 NAND
gates. Assuming the XOR gates are equivalent to 2.5 NAND gates, the XOR gates used in each core wrapper
are equivalent to 160 NAND gates. The total area overhead is therefore approximately 5700 NAND gates.
For an SoC with n cores with security-enhanced wrappers, the gate count cost is
overhead = n × 5700
(4.1)
The percentage gate count overhead is
Pn
i + 5700)
1 (g
P
n
1 gi
− 1 × 100%
(4.2)
where gi is the number of NAND gate equivalents in core i. For example, for an SoC an average of 100,000
NAND equivalent gates per core, the area overhead is 5.7%.
The die area overhead can be cut in half in cases where the core is already architected for scan-based
test. The flip-flops already associated with scan chains can be reconfigured to serve as the 288 flip-flop chain
of the Trivium cipher. The area overhead associated with the combinational logic in the
68
4.4.1.3
Test Controller Area
The cryptography can be handled by the test station or by the on-chip test controller. If it is handled
by the test controller, it must have hardware for generating random key bits, and for storing them. It
also needs the encryptor and decryptor blocks. If the cryptography is handled by the test station, the test
controller’s complexity is essentially the same as without the security enhancements.
If the designer prefers to perform the cryptography and key generation using the on-chip test controller,
then there are three cases. In the first case, the test controller only maintains information about a single core
at a time. This requires a key setup whenever addressing a new core. For this, the die area overhead in the
test controller associated with the security-enhanced TAM is the area of the stream cipher and XOR gates,
the equivalent of 5700 NAND gates. In cases where all of the key setup is done at initialization time, the
die area overhead is 5700 + 12nk NAND gates, where n is the number of cores and k is the key length. The
third case is where cipher state is maintained by the test controller for each security-enhanced wrapped core.
Essentially, this means keeping a copy of the state register of the stream cipher, which is almost the same as
the test controller simply having separate instances of the stream cipher module for each security-enhanced
wrapped core with which it will communicate. The die area overhead of this is approximately 5700n NAND
gates. This minimizes test time, but is the most expensive option in terms of die area.
4.4.2
Test Time
The security enhancements do not affect the test clock speed, test duration, or test scheduling. How-
ever, the security-enhanced cores need to have their key registers initialized before testing can commence.
The worst-case test time overhead is the time to program all key registers, back-to-back. For a k-bit key
and n cores in the SoC, the worst case key initialization time is
tinit = kn
(4.3)
The key setup clock frequency can be assumed to be the same as the test clock frequency. The
percentage test time overhead is
Pn
1 ti + k
P
−
1
× 100%
n
1 ti
(4.4)
69
where ti is the number of test clock cycles required to test core i. For example, assume an 80-bit key length
and 50 cores in the SoC, the worst-case key initialization time is 4000 cycles of the key setup clock. If the
average core requires more than 8000 bits of test data, the test time overhead of the security enhancements
is less than 1%.
4.4.3
Effort for SoC Integrator
Under the proposed scheme, cores fall into three categories:
(1) no security-enhancements,
(2) cores shipped with security-enhanced wrappers as described in Section 4.3.1, and
(3) cores that were shipped with non-security-aware wrappers and had the security overwrapper added
as described in Section 4.3.2.
The effort for the SoC integrator for category-1 cores is zero. The effort for using category-2 cores is very
low. The signaling of the cores is consistent, so the per-core additional effort is just assigning the signals
to connect the core to the key-setup scan chain. Category-2 is the preferred category, achieving all of the
security benefits with minimal effort. Category-3 cores are slightly more effort for the SoC integrator, but
still minimal work. The difference is that category-2 cores come with the crypto modules already in place
whereas category-3 cores require the SoC integrator to add the crypto module between the core and the test
bus terminals.
4.5
Conclusion and Future Work
We presented a scheme that eliminates the risk of a malicious SoC core sniffing test data. The essential
contribution is a straightforward way of establishing cipher keys without any hard-coded secrets in the design.
The area overhead is under 6% and the test time overhead is under 1%. For typical chips where a minority
of the cores have secrecy-sensitive test data, only those cores need the security-aware wrapper, and the total
area overhead can be correspondingly reduced to 3% or less. Additional area savings are available when the
functional clock is more than 8 times higher in frequency than the test clock. In such cases, the Trivium
70
stream cipher can be used in a configuration that produces 8 bits at a time instead of 64 bits at a time,
and it can be clocked by the functional clock instead of by the test clock. This reduces the size of the main
source of area overhead, the stream cipher, by 32%. We showed how the SoC integrator can use his control
over the inter-core wiring to maintain the security of the test data. Future work using the same key setup
scheme will provide not only secrecy guarantees for the test data, but integrity guarantees as well. As the
number of cores in SoCs increases, and cores are obtained from an increasingly wide variety sources, limiting
the damage done by a single rogue core has become an important concern with a practical solution.
Chapter 5
Integrity and Authenticity of Sensors
We propose a novel variety of sensor that extends the functionality of conventional physical unclonable
functions to provide authentication, unclonability, and verification of a sensed value. This new class of
device addresses the vulnerability in typical sensing systems whereby an attacker can spoof measurements
by interfering with the analog signals that pass from the sensor element to the embedded microprocessor.
The concept can be applied to any type of analog sensor.
5.1
Introduction
For sensing applications it is desirable that the system provide some degree of assurance regarding
the authenticity and veracity of measurements. One scheme for achieving this is to couple the sensor with
a trusted cryptography module that digitally signs the sensor data. Unfortunately, that scheme can provide
only limited assurance because the sensor is separate from the crypto module. As a result, the crypto module
has no mechanism for verifying the sensor data before signing it. Sensor-crypto separation is an architectural
vulnerability.
72
challenge
a)
uC
sensor
uC
sensor
response
challenge
b)
response
deception
Figure 5.1: The naı̈ve secure sensor architecture does not bind the sensing with the cryptography, allowing
the analog link between the sensing element and the crypto processor to be easily attacked.
In the naı̈ve architecture shown in Fig. 5.1, we illustrate how an attacker could interpose circuitry
between the sensor and the microcontroller. Using this extra circuitry, the attacker could cause the microcontroller to falsely report the sensed data. Our work aims to make this and other sensor attacks impractical.
5.1.1
Related Work
The technique of integrating a physical quantity into a PUF challenge-response computation is new,
but issues relating to the security of sensors in general have been studied. Secure remote sensors have been
developed for high security applications such as nuclear and chemical materials tracking [48]. Cryptographic
protocols and infrastructures have been developed for securing communication in sensor networks [49]. However, neither can extend the trust perimeter to include the sensing element itself.
Our work also builds on work from the PUF community. PUFs have emerged over the past decade as
a potent tool for hardware authentication and key generation at low cost [50]. Their low cost makes them
particularly attractive for use in the cost-sensitive sensor market and they serve as the foundation for our
work. Specifically, our example sensor makes use of non-homogeneous coatings, which have been used to
achieve per-chip uniqueness and unclonability [51] in conventional PUFs. We also make use of comparisons
of on-chip quantities that are selected by the challenge, a concept borrowed from ring-oscillator PUFs [52].
The problem of protecting analog sensor data before it enters the crypto-enabled digital domain is
related to the problem that digital rights management (DRM) systems have of protecting the media after
73
it leaves the crypto-enabled digital domain. This “analog hole” has been a vexing program for the content
protection community [53].
Additionally, related work has been done in the media forensics community with camera image sensors.
Digital photos can be forensically attributed to the camera that took them because of distinctive anomalies
introduced by camera image sensors [54]. These features, like a PUF, are unique to each sensor that is
fabricated, and are reasonably stable across time and environmental conditions. However, unlike a PUF,
they are not functions in the sense of having a challenge and a response.
5.1.2
Our Contribution
We propose an architecture that eliminates sensor-crypto separation. By merging sensing with cryp-
tography, we raise the strength of the assurances that the system can provide. The device we propose is a
form of PUF that entwines sensing and challenge-response processing. We provide the following:
• definition of the security properties of the new device,
• a candidate design that targets those properties,
• a protocol used for making secure measurements, and
• an analysis of the candidate design.
CHALLENGE
BITS
STANDARD
PUF
RESPONSE BITS
PHYSICAL QUANTITY
CHALLENGE
BITS
SENSOR
PUF
RESPONSE BITS
Figure 5.2: A conventional silicon PUF has a binary input and a binary output. The sensor PUF has a
binary input, physical quantity being sensed, and a binary output.
74
5.1.3
Security Properties
A traditional PUF [55] is a physical device that takes in a challenge and produces a response with the
following properties:
(1) For a given binary challenge, a PUF always produces the same response.
(2) One challenge-response pair leaks nothing about other pairs.
(3) The manufacturer of the PUF cannot predetermine the mapping.
The variation that we propose extends conventional PUFs by including two inputs: a physical quantity
and a traditional binary challenge. This system, which we call a sensor physical unclonable function,
has the following properties:
(1) For a given challenge and a given sensed quantity, the sensor PUF always produces the same response.
(2) One challenge-quantity-response triple leaks nothing about other triples.
(3) The manufacturer of the sensor PUF cannot predetermine the challenge-quantity-response mapping.
The third property on both lists is known as manufacturer resistance. Here, the manufacturer is
considered a potential adversary. From a black-box point of view, a PUF looks like a message authentication
code (MAC) operation. From the outside it is difficult to distinguish a true PUF from a MAC operation with
an embedded key. If the adversary replaces the PUF with a MAC module, the adversary could predetermine
the mapping without the user’s knowledge. Manufacturer resistance is limited in practice.
5.2
Candidate Sensor PUF
We present an example of a sensor PUF that measures light level. The inputs to the device are light
and a series of challenge bits. The output is a series of response bits. The light sensor PUF is profoundly
different from Pappu’s optical PUF [28]. Pappu’s challenge is a beam of coherent light and the response is
the light that scatters when the coherent beam passes through a mixture containing tiny beads. Pappu’s
optical PUF does not measure a physical quantity. Where a conventional PUF has a single input, the
75
challenge, the sensor PUF has two inputs. Operational protocols for using conventional PUFs dictate that
the challenge not be reused in the field. The physical quantity input to a sensor PUF can be repeated in the
field without compromising security. It is only the challenge bits that need to be excluded from reuse.
5.2.1
Structure
The candidate sensor PUF we propose consists of an array of on-chip photodiodes, a coating, and
some on-chip circuitry, as shown in Figure 5.3. The photodiodes are organized in groups of three. A coating
containing swirls of dark material in a translucent base is applied onto the sensor area. The nonuniform
optical transmittance of the coating results in per-chip variations in the optical sensitivity of each of the
photodiodes.
76
COATING
PD1 PD2 PD3 PD1 PD2 PD3
PD1 PD2 PD3
offset slope offset slope
generator gen generator gen
adder
adder
offset slope
generator gen
adder
GL.1
GL.2
GL.n
GR.1
GR.2
GR.n
P
Q
Summation Control Logic
RAW
BITS
output
control
Stream Cipher
challenge
Conventional PUF
Figure 5.3: The analog portion of the light level sensor PUF includes the coating, the photodiode groups,
the switches, the summing junctions, and the analog comparator. The challenge applied to the sensor PUF
determines the keystream input to the control circuit, which controls the random selection of left gate signals
GL.i and the right gate signals GR.i, which determine the set of sensors that are included in the summations.
The left and right sums are compared, producing one raw bit.
Each of the three identical photodiodes (P D1 , P D2 , and P D3 ) within a sensor group has its own
linear response function. The slope of the response function is generated by photodiode P D3 (see Fig. 5.2.1)
and slope generator circuit. The slope is determined by the optical transmittance of the coating covering the
photodiode, along with non-varying factors (photodiode area, quantum efficiency, etc) and external factors
such as temperature.
77
PD1
PD2
−
−
OPAMP1
+
OPAMP3
+
−
OPAMP2
D1
D2
+
PD3
−
+
+
−
V_bias
Figure 5.4: a: The offset generator produces a DC voltage that is determined by the optical transmittance of
the coating at the sites of photodiodes P D1 and P D2 . b: The slope generator produces a voltage proportional
to the light input at photodiode P3 .
The offset of the response function for the group is determined by the optical transmittance from the
light source to photodiodes P D1 and P D2 (see Fig. 5.2.1). The offset generator produces a DC voltage that
is determined by the coating on the sensor PUF as it independently affects each photodiode group.
As shown in Fig. 5.3, the currents from each photodiode group are brought to two summing junctions
where independent subsets of these currents are added together to produce two currents, P and Q. These
currents are compared to produce one bit of the raw binary result. The input challenge determines which of
the sensor groups will be included in the summations producing P and Q.
The gate signals that control the summations shown in Figure 5.3 are generated by summation control
logic. A conventional silicon PUF is used as a component in the architecture, to transform the public challenge
into a volatile secret initialization vector for the stream cipher. The stream cipher generates a keystream
78
that is used by the summation control logic for selecting which gate signals to enable at for each comparison.
The summation control logic shifts the raw response bits from the comparator into a shift register.
200
300
150
Output of Summations
200
100
50
0
−50
0
−100
−200
−100
−150
−100
200
100
−50
0
50
100
−300
−100
150
−50
0
Sensor Input
50
100
−50
0
Sensor Input
50
100
100
Output of Summations
Output of Summations
100
0
−100
50
0
−50
−100
−150
−200
−200
−300
−100
−50
0
Sensor Input
50
100
−250
−100
Figure 5.5: In all four subfigures, the sensor input level is on the x-axis and the electrical response is on the
y-axis. Subfigure (a) shows eight photodiode group response lines generated by simulation of the candidate
light sensor PUF. The bold line is the sum of the eight lines. Subfigures (b), (c), and (d) show pairs of
response lines that occur for different values of the left and right gate signals. Assuming a sensor input value
of 10 and assuming that solidline > dashedline is interpreted as a “1”, evaluating the comparisons for the
line pairs shown in (b), (c), and (d) gives the raw bit sequence “0”, “1”, “0”. In our simulations, the raw
bit sequences are 256 bits long.
79
5.2.2
Protocol
The conventional PUF that is used as a component of the sensor PUF must be enrolled before the
sensor PUF itself can be enrolled. The exact procedure enrolling the conventional PUF depends on what
kind of conventional PUF is used. The design and operation of conventional PUFs is outside the scope of
this research but has been developed by the PUF community [27].
The enrollment procedure for the sensor PUF consists of the following steps for each point along the
sensor domain:
(1) The enrolling party randomly selects (and removes) a challenge-response pair from the database.
The challenge is sent to the conventional PUF and the response is used as a cipher key.
(2) The enrolling party generates a unique random challenge and sends it to the sensor PUF.
(3) The sensor PUF uses the challenge to initialize the measurement logic, generates a vector of raw
bits, and sends the raw bits back to the enrolling party, encrypted.
(4) The enrolling party decrypts the raw response bits and inserts the challenge-measurement-response
triple into the database.
Making a measurement with the sensor PUF has the following steps.
(1) The querying party sends a challenge to the device. This challenge is used to establish a volatile
shared secret key between the querying party and the sensor PUF.
(2) The sensor PUF makes a conventional measurement and sends the encrypted conventional measurement to the querying party. This measurement is the claim.
(3) The querying party randomly selects (and removes) an entry in the challenge-measurement-response
database that matches the claimed measurement. The querying party sends the challenge to the
sensor PUF.
(4) The sensor PUF initializes the measurement logic using the challenge, performs n comparisons to
generate an n-bit vector of raw bits, encrypts them, and sends them to the querying party.
80
(5) The querying party decrypts the raw bits and compares them to the response that is in the challengemeasurement-response database. If the Hamming distance is below a threshold, the measurement
claim has been validated.
5.3
Electrical Analysis and Experimental Results
In order to throughly develop the analysis of the sensor PUF, the candidate sensor PUF was studied
using:
• SPICE simulations of the analog modules,
• a C program to simulate the statistical operations of the sensor PUF,
• analysis according to standard models, and
• discrete implementation of critical analog modules.
5.3.1
Assumptions
We assume that the optical coating is a non-homogeneous mixture of two epoxy-like materials which
attenuate the wavelengths of interest: one material has an optical transmittance of 0.25, and the other
material has an optical transmittance of 0.75. Additionally, we assume the transmittances affecting each of
the photodiodes are independent and uniformly distributed along the range {0.25, 0.75}. Lastly, we assume
that the photodiodes that are operated in the reverse-biased mode have a linear current-versus-light response.
5.3.2
Distribution of Cut Points
To evaluate the sensing resolution of the sensor PUF, we analyze the probability distribution of the
cut points in the sensing domain. The cut points define the intervals in the sensing domain that can be
distinguished by the sensor’s response. Each cut points is the x-coordinate of the intersection of the response
lines formed by the left and right summations shown in the previous section and illustrated in Figure 5.5.
Since we have designed each sensor group response function to have an independent slope and offset, we can
statically analyze the expected crossing points which determine the cut points.
81
The offset of the voltage-versus-light response of each photodiode group follows from the linearity of
photodiodes and the simplified Shockley ideal diode equation.
VD
ID = IS e VT
(5.1)
We assume that all of the photodiodes have equal intrinsic response, and the differences in their actual
response are caused purely by the differences in their coatings. Using this, we can calculate the offset voltage
as follows:
IP Di,1
VD
VDi,1
VDi,1
VDi,2
VDi,1 − VDi,2
=
ci,1
IP Di,2
ci,2
= VT (ln ID − ln IS )
ci,1
IP Di,2 − ln IS
= VT ln
ci,2
ci,1
+ ln IP Di,2 − ln IS
= VT ln
ci,2
= VT ln IP Di,2 − ln IS
= VT ln
ci,1
ci,2
(5.2)
(5.3)
(5.4)
(5.5)
(5.6)
(5.7)
(5.8)
where IP Di,1 is the current in photodiode 1 in photodiode group i.
The slopes of the response function are simply proportional to ci,3 , the optical transmittance of the
coating affecting photodiode 3 in photodiode group i.
The offset generator takes two uniform random variables and calculates their ratio. This results in a
uniform ratio distribution, as shown in Figure 5.6. The natural log of the ratio distribution produces the
distribution shown in Figure 5.7.
82
0.012
0.010
P(ratio)
0.008
0.006
0.004
0.002
0.000
0
1
2
3
4
5
ratio
Figure 5.6: The probability density function of offset current ratios observed in simulation.
0.04
P(offset)
0.03
0.02
0.01
0.00
−30
−20
−10
0
offset
10
20
30
Figure 5.7: The density function of the offset signal of each individual photodiode group.
The sum lines are the sum of logs of uniform ratio distributions. Since these distributions have
finite mean and variance, we can invoke the central limit theorem to infer that the resulting distribution is
approximately Gaussian.
To determine the cut point, we considered the random variable that represents the x-coordinate of
83
intersection of the two sum lines that are selected by the challenge. The cut point in the sensor input domain
is at:
cut =
Bb − Ab
Am − Bm
(5.9)
where Bb is the offset of line B, Ab is the offset of line A, Am is the slope of line A, and Bm is the
slope of line B. The PDF of the numerator, fnum , is the difference of two independent random variables
and is given by the cross-correlation of their density functions.
The PDF of the denominator, fden , is also the difference of two independent random variables and is
given by the cross-correlation of their density functions.
Since we approximate the slopes and offsets of the sum lines as Gaussian, the numerator and denominator are also Gaussian. Simulation results support the zero-mean Gaussian model for these variables. The
cut points are the ratio of these zero mean Gaussian variables, and the resulting ratio distribution is Cauchy.
This result is further supported by the simulation results, as shown in Figure 5.8.
0.018
0.016
P(cutpoint = x)
0.014
0.012
0.010
0.008
0.006
0.004
0.002
0.000
−100
−50
0
50
sensor input value x
100
Figure 5.8: The green trace shows the probability density function of the cut points in the sensor input
domain, as observed in simulation. The red trace shows the Cauchy density function for χ = 0 and γ = 20.
84
5.3.3
Hamming Distance
To evaluate whether the raw response of the sensor PUF is brittle with respect to changes in the
input, we simulated the raw sensor response. For several randomly chosen challenges and randomly generated
instances of the candidate sensor PUF, we selected a reference sensor input value and measured the Hamming
distance between the raw response for the reference input and the raw response for sensor input values in
the neighborhood of the reference value.
140
Hamming distance
120
100
80
60
40
20
0
−30
−20
−10
0
10
sensor input
20
30
Figure 5.9: Hamming distance for five statistically independent instances of the candidate sensor PUF
In Figure 5.9, the Hamming distance is shown for five statistically independent instances of the
candidate sensor PUF design. The raw bit vectors are 256 bits in length. A reference bit vector is taken
for arbitrary reference sensor input values -10, 0, and 10 using a reference challenge. While continuing to
apply the reference challenge, the sensor input value is then swept from -30 to 30, in steps of 1, while the
resulting raw output bits are compared with the raw output bits produced for each of the reference sensor
inputs. From the figure, we see that the candidate sensor PUF’s raw response is not brittle with respect to
the sensor input. For a given challenge, measuring a set of close physical quantities will generate a set of raw
bit responses that are close in code space. This makes it suitable for error correction using a linear code.
However, it also means that the raw bit responses leak information. If a challenge is repeated (which should
85
never happen in practice), an attacker can infer from the raw bit responses whether the sensor inputs are
close or far. From a practical standpoint, this problem is addressed by encrypting the response that is sent
back to the querying party.
Figures 5.8 and 5.9 define the range and sensitivity of the sensor PUF with respect to the physical
quantity being measured. The Cauchy PDF of the cut points peaks at the origin and falls to half of its
peak value at x = 20. The cut point density determines the sensitivity (slope) of the Hamming distance to
differences in physical input value. From a design standpoint, the gains of the offset generator and slope
generator circuits can be tuned to widen or narrow the cut point PDF, which in turn widens or narrows
the range of the sensor. Outside the useful range of physical quantity values, the raw response saturates,
converging to a fixed pattern for a given challenge.
5.3.4
Verification of Offset Generator
The simulations of the behavior of the candidate light sensor PUF assume that the offset generator
produces a voltage that is independent of the sensor input. If the offset generator’s output is affected by
the sensor input, the response will be nonlinear. Linearity is not a requirement for the light sensor PUF
to function, but the assumption does simplify analysis. We validated the independence assumption with
SPICE simulation and with experimental data in the lab using discrete optoelectronic components, shown
in Figures 5.10 and 5.11.
86
100
offset voltage (mV)
80
60
40
20
0
10−4
10−3
10−2
LED drive current (A)
10−1
Figure 5.10: The offset generator circuit shown in Figure 5.2.1 was constructed and tested using light from
an LED driven by a variable current source. The offset voltage is plotted across the range of currents for
three different relative transmittances. We see ±2.5% variation over the range.
100
offset voltage (mV)
80
60
40
20
0
10−4
10−3
10−2
LED drive current (A)
10−1
Figure 5.11: SPICE simulation of the offset generator output voltage across a range of light intensity values.
87
5.4
Security Context of Sensor PUFs
Sensor modules are often deployed in the field, outside the physical control of those who are deploying
them. They are subject to tampering. A motivated attacker can disassemble, break, or modify any part of
a sensor module. The attacker can also attempt to substitute an inauthentic sensor. These attacks will be
much more difficult to execute against a sensor PUF than a conventional sensor. The most straightforward
tampering attack is to decouple the sensing from the challenge-response calculation. For example, if the
calculation is performed by a microcontroller with an analog input and the sensor has an analog output
connected to the microcontroller, that analog voltage is vulnerable to tampering.
5.4.1
Substitution
The resistance of a sensor PUF to being substituted by an inauthentic sensor comes from two things.
First, the attacker needs to defeat the conventional PUF within the sensor PUF architecture. Without
defeating that, the attacker is unable to establish a shared crypto key. Second, without knowledge of the
slopes and offsets of the sensor responses, it is impossible for the attacker to know what response corresponds
to a particular physical quantity and a challenge sent by the querying party.
5.4.2
Tampering
The main objective of the sensor PUF is to defend against the low-budget attack of tampering with
the analog signal that goes from the sensing element to the microprocessor. In the candidate light sensor
PUF, tampering with the analog signal is not a low-budget attack. The attacker could probe the on-chip
signals while sweeping the physical quantity with the goal of learning the slopes and offsets of each of the
photodiode groups. This would have to be done without disturbing the optical coating. This could only be
done in a properly-equipped lab by skillful staff. The candidate sensor PUF thus significantly raises the cost
of a tampering attack.
There is a tradeoff between tamper resistance and robustness of operation. Assuming that protocol
outlined in subsection 5.2.2 is followed, a threshold is applied to the Hamming distance between the enrolled
response and the response produced when the sensor has been deployed. Beyond this threshold, the sensor’s
88
response will be rejected. To an extent, the threshold can minimized by extracting more challenge-responsepairs during enrollment so the distance between the claim and the enrolled measurement is small, thus
minimizing the expected distance of the raw responses. Even without constraints on the duration of the
enrollment process or the size of the database, physical variation puts a practical lower bound on the threshold
that can be used for reliable results. If the threshold is too high, the attacker can perform a less accurate and
therefore cheaper extraction of the sensor’s physical unique features, thus weakening the security of device.
The optimal threshold balances these concerns.
5.4.3
Manufacturer Resistance
Many sensors are high-volume, low-cost items and are subject to the risks of outsourced fabrication
and assembly [56]. If there are significant monetary or strategic incentives for breaking the security of the
system, and not too much risk, then a rational model of human behavior predicts that such an attempt will
be made. The issue of manufacturer resistance is not absolute. It has gradations of strength in the context
of conventional PUFs and the same applies to sensor PUFs. Since the application of the optical coating is a
security-critical procedure, it can be delayed until after the chips are delivered from the foundry, assuming
they are not packaged. This somewhat reduces the risk of the manufacturer predetermining the responses
instead of faithfully executing the design that produces random a challenge-response function. Nevertheless,
a determined adversary with control over the mask and chip fabrication can manipulate the behavior of
conventional silicon PUFs, sensor PUFs, and practically any other IC.
5.4.4
Sensor Decoupling
Sensor decoupling means that the entire sensor unit is separated from the physical quantity it is
intended to measure. For example, a thermometer can be placed inside an insulated box, causing it to report
the temperature inside the box instead of the ambient temperature. This threat depends on what is being
sensed and how the sensor is deployed. This important problem is unfortunately outside the scope of the
assurances provided by a sensor PUF.
89
5.5
Security Analysis
The security guarantee the sensor PUF aims to provide to the reader is that:
(1) The response reflects the current physical observation; one made between when the challenge was
sent and when the response was received by the reader.
(2) The response is authentic; it was unambiguously generated by the individual sensor PUF the reader
intended to query.
(3) It the response is accurate; it conveys unambiguous and correct information about the quantity
being measured.
Any threat to the security of the sensor PUF must entail at least one of the security guarantees
being violated. For example, there exists the threat that a sensor PUF can be cloned. How that threat
materializes depends on the attack or combination of attacks chosen by the attacker.
5.5.1
Attack Model
The prospective sensor PUF attacker can choose from several possible attacks. The attacks differ in
their cost, in what they aim to achieve, and in their requirements.
(1) Black-box Passive: The attacker eavesdrops on the communication between the reader and the sensor
PUF (e.g., over a network).
(2) Black-box Half-active: The attacker (without authorization) sends requests to the sensor PUF and
receives the responses.
(3) Black-box Active: The attacker acts as an active man in the middle between the reader and the
sensor PUF.
(4) Passive Probing: The attacker directly observes secret bits or signals internal to the sensor PUF.
(5) Active Probing: The attacker forces bits internal to the sensor PUF.
90
(6) Incremental Reassembly: The attacker begins by collecting many challenge-response-measurement
triples from the sensor PUF. Then then attacker removes the coating from one photodiode group
and then incrementally adds the coating back while querying the sensor PUF. When the responses
match the responses obtained before tampering, the attacker concludes that the amount of coating
added back provides the same optical transmittance as the original coating over that photodiode
group. The attacker then repeats this process at the next photodiode group until he has learned the
optical transmittance of the coating over every photodiode group.
(7) Reader host compromise: The attacker penetrates the reader host, gaining root-level access.
(8) Reader software compromise: The attacker replaces the authentic reader software with malicious
reader software. This includes rollback to a previous software version.
(9) Sensor-Environment Decoupling: As described in Section 5.4.4.
5.5.2
Attack Trees
We approach each of threats as the topmost node of an attack tree [57]. The sensor PUF security
guarantees given above can be reformulated as attack goals. The attacker aims to:
(1) Have the reader accept old sensor PUF measurements as being fresh.
(2) Clone a sensor PUF.
(3) Have the reader believe that that the sensed quantity is X when it is Y, where X and Y are significantly different.
5.5.2.1
Replaying an Old Measurement
One attack goal is to replay an old measurement. This can succeed if the challenge is repeated.
Normally the challenge is never repeated. As shown in Figure 5.12, there are two ways to get the reader to
repeat the challenge.
91
Replay an
Old Measurement
Reader Reuses
Challenge
Compromise
Reader Host
Distribute
Faulty Software
to Reader
Figure 5.12: Attack tree for replay.
The attacker can compromise the computer that is acting as the reader (i.e., getting root-level access),
or the attacker can undermine the reader software, arranging for it to reuse challenges. If the reader
functionality is offloaded to trusted hardware, it is possible to ensure that compromising the reader host
system will not enable challenge reuse. With a trusted reader module, the reader module can sign the
measurement it gets from the sensor PUF. The application that uses the data still needs to ensure that its
communication with the trusted reader module is trustworthy, including resistance to replay.
5.5.2.2
Cloning a Sensor PUF
Cloning a sensor PUF is another attack goal. The black-box passive sniffing attack is to observe and
record a large number interactions between the reader and the sensor PUF. Eventually the eavesdropper
will have observed sufficiently many readings that he can answer future queries from memory. This kind of
passive attack is not practical because a very large number of observations is needed and the actual rate of
sensor PUF read operations in the benign application is unlikely to be high.
92
Obtain a
Clone of a
Sensor PUF
Inremental
Reassembly
Black−box
Passive Sniffing
Passive
Probing
Figure 5.13: Attack tree for cloning.
Passive probing, as shown in Figure 5.13 is another approach to cloning a sensor PUF. It entails
measuring signals while the sensor PUF is operating. For example, the analog outputs of each of the
photodiode groups can be probed. For each photodiode group, a minimum of two measurements is needed
to determine the optical intensity response line. When the optical intensity response line for every photodiode
group is known to the attacker, the sensor PUF can be cloned or emulated easily. However, the attacker
needs to probe the signals without disturbing the signals or the coating on the surface of the chip.
The incremental reassembly attack, described above, is invasive but does not involve die probing. It
is similar to guessing a password letter by letter. Properly designed systems preclude this type of incremental guessing in a black-box attack model. However, when invasive techniques are applied, incremental
guessing becomes possible. As with the passive probing attack, incremental reassembly provides the attacker
with complete information regarding the randomness resulting from the optical coating on the sensor PUF.
However, a working cloning attack still requires cloning the conventional PUF in the sensor PUF.
5.5.2.3
Inducing Error in the Reported Measurement
To induce errors in the measurements that the reader gets from the sensor PUF, the attacker can use
the attacks that at used for cloning or replay, as shown in Figure 5.14.
93
Induce Error in
Measurement
Reader−side
Compromise
Compromise
Reader OS
Alter Authentic
Sensor PUF’s
Physical Input
Tunnel Queries
to Cloned SPUF
Distribute
Faulty Software
to Reader
Clone SPUF
Incremental
Black−box
Reassembly
Half−active
Figure 5.14: Attack tree for inducing errors in measurement.
Additionally, execute the decoupling attack described in 5.4.4. The attacker can keep the authentic
sensor PUF in place and just alter physical input that arrives at the sensor. For example, the attacker
can put a piece of tinted glass in front of the light sensor PUF to cause it to under-report the actual light
level.
5.6
Future Work in Sensor PUFs
The sensor PUF is a promising mechanism for securing remote sensors. The effects of temperature are
very important for conventional PUFs and sensor PUFs as well [58]. We are currently evaluating temperature
effects in the candidate sensor PUF design.
The candidate design uses a conventional PUF for communication security. This allows the raw bits
from the comparator to be sent back to the querying party, which in turn allows fuzzy matching to be
performed by the querying party. Error correction is well established for conventional PUFs [59], and can be
applied to sensor PUFs as well. If error correction coding is applied in the sensor PUF to obtain a precisely
repeatable response across time and environmental conditions, the sensor PUF would obtain interesting
capabilities. For example, a message could be encoded so that the sensor PUF can decrypt it only under
94
bright light, or only in the dark, or, for an alcohol sensor PUF, only when the alcohol concentration is in a
certain range.
The raw response bits generated by the comparator leak information about what the raw response
will be for other physical quantities with the same challenge. In our candidate design, we encrypt the output
to prevent an attacker from exploiting this leakage. It would be better if the sensor PUF had this property
as an integral part of the basic challenge-measurement-response functionality. How to achieve this is an open
problem.
Chapter 6
Future Work in Hardware Security
Hardware security has the potential to protect existing technology applications and to enable new
ones. Future work at the intersection of hardware testing and hardware security can maximize its enabling
effect by focusing on providing practical assurance to system designers and ultimately to system users.
Practical assurance is a fusion of security and reliability. Although engineers have traditionally kept
security and reliability separate, the distinction is in many cases not useful. System designers have the burden
of considering the failure of each of the subsystems of their design. Practical assurance would provide allencompassing statements about the likelihood of a subsystem failing. This would promote high-level system
design, as opposed to system design where the low-level details of each of the subsystems are considered at
multiple levels of system abstraction and at multiple phases of the design process. Practical assurance has
the promise of simplifying the designer’s job, improving the dependability of the end product, and reducing
design costs in both labor and time.
A recurring theme in this dissertation is the trade-off between security and testability. A common
way of making progress in a trade-off space is to introduce a figure of merit that is the conjunction of the
desired features. The simplest such figure of merit in our context would be the security-testability product.
Abstractly, a figure of merit like this would guide progress toward practical solutions that do not favor one
metric over the other (e.g., good security with bad testability). However, although quantitative metrics
for security are being researched [60], they are not yet mature. Rigorously defining a combined figure of
merit like the security-testability product would be an important step toward architectures that can provide
practical assurance.
Chapter 7
Publications
(1) Security-Aware SoC Test Access Mechanisms, Rosenfeld, K.; Gavas, E.; Karri, R.; 2011 IEEE VLSI
Test Symposium
(2) Roadmap for Trusted Hardware; Part II: Trojan Detection Solutions and Design-for-Trust Challenges, Tehranipoor, M.; Salmani, H.; Zhang, X.; Wang, M.; Karri, R.; Rajendran, J.; Rosenfeld,
K.; IEEE Computer Magazine, 2011
(3) Security and Testing; Rosenfeld, K.; Book Chapter in Introduction to Hardware Security and Trust,
Edited by Mohammad Tehranipoor and Clifford Wang
(4) Security Challenges During VLSI Test, Hely, D.; Rosenfeld, K.; 2011 International NEWCAS Conference
(5) Sensor physical unclonable functions, Rosenfeld, K.; Gavas, E.; Karri, R.; 2010 IEEE International
Symposium on Hardware-Oriented Security and Trust (HOST)
(6) Attacks and Defenses for JTAG, Rosenfeld, K.; Karri, R., IEEE Design and Test of Computers, 2010
(7) Trustworthy Hardware: Identifying and Classifying Hardware Trojans, Karri, R.; Rajendran, J.;
Rosenfeld, K.; Tehranipoor, M., IEEE Computer Magazine, 2010
(8) JTAG Attacks and Defenses, Rosenfeld, K.; Karri, R., Presented at the 2009 IEEE North Atlantic
Test Workshop
97
(9) Volleystore: A Parasitic Storage Framework, Rosenfeld, K.; Sencar, H.; Memon, N.; 2007 IEEE
Information Assurance and Security Workshop
(10) A study of the robustness of PRNU-based camera identification, Rosenfeld, K.; Sencar, H.; 2007
SPIE Media Forensics and Security
Bibliography
[1] IEEE Std 1149.1-2001, Test Access Port and Boundary-Scan Architecture.
[2] F. DaSilva, Y. Zorian, L. Whetsel, K. Arabi, and R. Kapur, “Overview of the IEEE P1500 standard,”
vol. 1, sep. 2003, pp. 988 – 997.
[3] M. Sipser, Introduction to the Theory of Computation, 1st ed.
1996.
International Thomson Publishing,
[4] A. Rukhin, J. Soto, J. Nechvatal, E. Barker, S. Leigh, M. Levenson, D. Banks, A. Heckert, J. Dray,
S. Vo, A. Rukhin, J. Soto, M. Smid, S. Leigh, M. Vangel, A. Heckert, J. Dray, and L. E. B. Iii, “A
statistical test suite for random and pseudorandom number generators for cryptographic applications,”
2001.
[5] B. Yang, K. Wu, and R. Karri, “Scan based side channel attack on dedicated hardware implementations
of data encryption standard,” in Proceedings of the IEEE Int Test Conference, 2004, pp. 339–344.
[6] D. Hely, M.-L. Flottes, F. Bancel, B. Rouzeyre, N. Berard, and M. Renovell, “Scan design and secure
chip [secure ic testing],” in On-Line Testing Symposium, 2004. IOLTS 2004. Proceedings. 10th IEEE
International, july 2004, pp. 219 – 224.
[7] B. Yang, K. Wu, and R. Karri, “Secure scan: a design-for-test architecture for crypto chips,” in
Proceedings of IEEE/ACM Design Automation Conference, 2005, pp. 135–140.
[8] J. Lee, M. Tehranipoor, and J. Plusquellic, “A low-cost solution for protecting IPs against scan-based
side-channel attacks,” in Proc. VLSI Test Symp. Citeseer, 2006, pp. 94–99.
[9] R. Rajsuman, “Design and test of large embedded memories: An overview,” Design Test of Computers,
IEEE, vol. 18, no. 3, pp. 16 –27, may 2001.
[10] B. Yang and R. Karri, “Crypto bist: A built-in self test architecture for crypto chips,” in Proceedings
of the 2nd Workshop on Fault Diagnosis and Tolerance in Cryptography (FDTC05), 2005, pp. 95–108.
[11] The Dishnewbies Team. Jtag guide. [Online]. Available: http://dishnewbies.com
[12] Free60. Free60 smc hack. [Online]. Available: http://www.free60.org/SMC Hack
[13] K. Rosenfeld and R. Karri, “Attacks and defenses for jtag,” Design Test of Computers, IEEE, vol. 27,
no. 1, pp. 36 –47, jan. 2010.
[14] L. Sourgen, “Us patent 5264742, security locks for integrated circuit,” 1993.
[15] R. Buskey and B. Frosik, “Protected jtag,” in Parallel Processing Workshops, 2006. ICPP 2006
Workshops. 2006 International Conference on, 0-0 2006, pp. 8 pp.–414.
99
[16] C. Clark and M. Ricchetti, “A code-less bist processor for embedded test and in-system configuration
of boards and systems,” in Test Conference, 2004. Proceedings. ITC 2004. International, oct. 2004, pp.
857 – 866.
[17] C. J. Clark. Business considerations for systems with rambased fpga configuration. [Online]. Available:
http://www.intellitech.com/pdf/FPGA-security-FPGA-bitstream-Built-in-Test.pdf
[18] V. Iyengar, K. Chakrabarty, and E. Marinissen, “Efficient test access mechanism optimization for
system-on-chip,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on,
vol. 22, no. 5, pp. 635 – 643, may 2003.
[19] K. Rosenfeld and R. Karri, “Security-Aware SoC Test Access Mechanisms,” in Proceedings of the 2011
IEEE VLSI Test Symposium, 2011.
[20] K. Koscher, A. Czeskis, F. Roesner, S. Patel, T. Kohno, S. Checkoway, D. McCoy, B. Kantor, D. Anderson, H. Shacham, and S. Savage, “Experimental security analysis of a modern automobile,” in Security
and Privacy (SP), 2010 IEEE Symposium on, may 2010, pp. 447 –462.
[21] D. Halperin, T. Heydt-Benjamin, B. Ransford, S. Clark, B. Defend, W. Morgan, K. Fu, T. Kohno, and
W. Maisel, “Pacemakers and implantable cardiac defibrillators: Software radio attacks and zero-power
defenses,” in Security and Privacy, 2008. SP 2008. IEEE Symposium on, may 2008, pp. 129 –142.
[22] IEEE Std 1532-2002, In-System Configuration of Programmable Devices.
[23] satcardsrus.com. Secure loading advanced blocker 3m’s for rom images. [Online]. Available:
http://www.satcardsrus.com/dish net%203m.htm
[24] The Dishnewbies Team. JTAG guide. [Online]. Available: http://dishnewbies.com/jtag.shtml
[25] B. Yang, R. Karri, and K. Wu, “Secure scan: a design-for-test architecture for crypto chips,” in
Proceedings of the IEEE/ACM Design Automation Conference, 2005, pp. 135–140.
[26] F. Novak and A. Biasizzo, “Security extension for IEEE Std 1149.1,” Journal of Electronic Testing,
vol. 22, no. 3, pp. 301–303, 2006.
[27] G. Suh and S. Devadas, “Physical unclonable functions for device authentication and secret key generation,” in Proceedings of IEEE/ACM Design Automation Conference, June 2007, pp. 9–14.
[28] R. Pappu, “Physical one-way functions,” Massachusetts Institute of Technology, Tech. Rep., 2001.
[29] B. Gassend, D. Clarke, M. V. Dijk, and S. Devadas, “Silicon physical random functions,” in Proceedings
of the ACM Computer and Communication Security Conference, 2002, pp. 148–160.
[30] M. Majzoobi, F. Koushanfar, and M. Potkonjak, “Testing techniques for hardware security,” in
Proceedings of IEEE International Test Conference, Oct. 2008, pp. 1–10.
[31] M. J. B. Robshaw, “Stream ciphers,” RSA Laboratories, Tech. Rep. TR-701, 1995.
[32] C. D. Canniere and B. Preneel, “Trivium specifications,” ECRYPT Stream Cipher Project, 2006.
[33] R. Canetti, “HMAC: keyed-hashing for message authentication,” RFC, vol. 2104, p. 2104, 1997.
[34] B. Arazi, “Message authentication in computationally constrained environments,” IEEE Transactions
on Mobile Computing, vol. 8, no. 7, pp. 968–974, July 2009.
[35] X. Lai, R. Rueppel, and J. Woollven, “A fast cryptographic check-sum algorithm based on stream
ciphers,” in Advances in Cryptology-AusCrypt. Springer-Verlag, 1992, pp. 339–348.
[36] Xilinx. Spartan-3e fpga family: Introduction and ordering information. [Online]. Available:
http://www.xilinx.com/support/documentation/data sheets/ds312.pdf
100
[37] Fwaggle. Howto:
JTAG interface on a Dish
http://www.hungryhacker.com/articles/misc/dish3700 jtag
3700
receiver.
[Online].
Available:
[38] A. Huang, “Keeping secrets in hardware: The Microsoft XBox case study,” in Proceedings of Workshop
on Cryptographic Hardware and Embedded Systems, 2002, pp. 213–227.
[39] Y. Jin and Y. Makris, “Hardware trojan detection using path delay fingerprint,” in Proceedings of IEEE
International Workshop on Hardware-Oriented Security and Trust, June 2008, pp. 51–57.
[40] D. Agrawal, S. Baktir, D. Karakoyunlu, P. Rohatgi, and B. Sunar, “Trojan detection using IC fingerprinting,” in Proceedings of IEEE Symposium on Security and Privacy, May 2007, pp. 296–310.
[41] D. Dean. R. Collins, “Trust, a proposed plan for trusted integrated circuits,” http://www.dtic.mil/cgibin/GetTRDoc?AD=ADA456459.
[42] S. T. King, J. Tucek, A. Cozzie, C. Grier, W. Jiang, and Y. Zhou, “Designing and implementing
malicious hardware,” USENIX Workshop on Large-Scale Exploits and Emergent Threats, 2008.
[43] X. Wang, M. Tehranipoor, and J. Plusquellic, “Detecting malicious inclusions in secure hardware:
Challenges and solutio ns,” IEEE International Workshop on Hardware-Oriented Security and Trust,
pp. 15 –19, jun. 2008.
[44] L.-W. Kim and J. D. Villasenor, “A system-on-chip bus architecture for thwarting integrated circuit
trojan horses,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. PP, no. 99,
pp. 1 –5, 2010.
[45] W. Diffie and M. Hellman, “New directions in cryptography,” IEEE Transactions on Information Theory,
vol. 22, no. 6, pp. 644 – 654, nov. 1976.
[46] J. Holleman, B. Otis, S. Bridges, A. Mitros, and C. Diorio, “A 2.92 microwatt hardware random number
generator,” Proceedings of the 32nd European Solid-State Circuits Conference, pp. 134 –137, sep. 2006.
[47] K. Chakrabarty, “Optimal test access architectures for system-on-a-chip,” ACM Transactions on Design
Automation of Electronic Systems, vol. 6, pp. 26–49, 2001.
[48] B. Schoeneman and S. Blankenau, “Secure sensor platform (SSP) for materials’ sealing and monitoring
applications,” Proceedings of International Carnahan Conference On Security Technology, pp. 29 – 32,
Oct. 2005.
[49] F. Bagci, T. Ungerer, and N. Bagherzadeh, “SecSens - Security architecture for wireless sensor networks,” Proceedings of International Conference on Sensor Technologies and Applications, pp. 449
–454, Jun. 2009.
[50] G. Suh and S. Devadas, “Physical unclonable functions for device authentication and secret key generation,” Proceedings of ACM/IEEE Design Automation Conference, pp. 9 –14, Jun. 2007.
[51] D. Roy, J. Klootwijk, N. Verhaegh, H. Roosen, and R. Wolters, “Comb capacitor structures for on-chip
physical uncloneable function,” IEEE Transactions on Semiconductor Manufacturing, vol. 22, no. 1, pp.
96–102, Feb. 2009.
[52] V. Vivekraja and L. Nazhandali, “Circuit-level techniques for reliable physically uncloneable functions,”
IEEE International Workshop on Hardware-Oriented Security and Trust, pp. 30–35, Jul. 2009.
[53] E. Diehl and T. Furon, “Copy watermark: closing the analog hole,” Proceedings of IEEE International
Conference on Consumer Electronics, pp. 52 – 53, Jun. 2003.
[54] J. Lukas, J. Fridrich, and M. Goljan, “Digital camera identification from sensor pattern noise,” IEEE
Transactions on Information Forensics and Security, vol. 1, no. 2, pp. 205 – 214, Jun. 2006.
101
[55] B. Gassend, D. Clarke, M. van Dijk, and S. Devadas, “Controlled physical random functions,”
Proceedings of Computer Security Applications Conference, pp. 149 – 160, 2002.
[56] D. Agrawal, S. Baktir, D. Karakoyunlu, P. Rohatgi, and B. Sunar, “Trojan detection using IC fingerprinting,” Proceedings of IEEE Symposium on Security and Privacy, pp. 296–310, May 2007.
[57] Bruce Schneier. Attack trees. [Online]. Available: http://www.schneier.com/paper-attacktrees-ddjft.html
[58] G. Qu and C.-E. Yin, “Temperature-aware cooperative ring oscillator PUF,” Proceedings of IEEE
International Workshop on Hardware-Oriented Security and Trust, pp. 36–42, July 2009.
[59] M.-D. Yu and S. Devadas, “Secure and robust error correction for physical unclonable functions,” IEEE
Design Test of Computers, vol. 27, no. 1, pp. 48–65, Jan.-Feb. 2010.
[60] T. Heyman, R. Scandariato, C. Huygens, and W. Joosen, “Using security patterns to combine security
metrics,” in Availability, Reliability and Security, 2008. ARES 08. Third International Conference on,
march 2008, pp. 1156 –1163.