Microarchitectural Side-Channel Attacks
Jean-François Gallais February 5th , 2013 PhD defense
Introduction
Cryptography Symmetric encryption
Alice
C = encryptK (M)
C
Bob
M = decryptK (C ) Confidentiality
Symmetric schemes
aim to provide
Asymmetric schemes
Data integrity Authentication Non-repudiation
The entire secrecy of a scheme lies in the key. Microarchitectural Side-Channel Attacks
N
1 / 40
Introduction
Physical attacks Implementation Cryptographic algorithms executed by electronic circuits Fault attacks exploit the effect of disruption of normal functioning of the device Side-channel attacks Reduce the space of possible keys (analytic attacks) Bias the a priori (uniform) distribution of correct subkey (divide and conquer attacks)
with information derived from side-channel leakage Microarchitectural Side-Channel Attacks
N
2 / 40
1
Trace-driven cache-collision attacks
2
DPA on modular addition
3
Microarchitectural Trojans
4
Key enumeration in divide-and-conquer side-channel attacks
Microarchitectural Side-Channel Attacks
N
3 / 40
1
Trace-driven cache-collision attacks Generalities Efficient key recoveries against AES Error tolerance Attack complexity Countermeasures Conclusion
2
DPA on modular addition
3
Microarchitectural Trojans
4
Key enumeration in divide-and-conquer side-channel attacks
Microarchitectural Side-Channel Attacks
N
4 / 40
Trace-driven cache-collision attacks – Generalities
Cache collisions
Lookup index
a1 b1
(16)
Cache
lower nibbles bi
Miss higher nibbles ai
1
Non-volatile memory
Microarchitectural Side-Channel Attacks
N
5 / 40
Trace-driven cache-collision attacks – Generalities
Cache collisions
Lookup index
a1 b1
2
a2 b2 , a2 = a1
lower nibbles bi
Miss Hit
Cache
higher nibbles ai
(16)
1
Non-volatile memory
Microarchitectural Side-Channel Attacks
N
5 / 40
Trace-driven cache-collision attacks – Generalities
Cache collisions
Non-volatile memory
Lookup index
lower nibbles bi
Miss
a1 b1
2
a2 b2 , a2 = a1
Hit
3
a3 b3 , a3 6= a1
Miss
higher nibbles ai
(16)
1
Cache
Microarchitectural Side-Channel Attacks
N
5 / 40
Trace-driven cache-collision attacks – Generalities
EM measurements
M
5
M
M
Amplitude, mV
0
−5 0
2
4
6
M
5
8
10
12
H
14
16
M
0
−5 0
2
4
6 8 10 12 CPU clock cycles
EM traces: M M M (top) vs M H M (bottom)
14
16
NXP LPC2124 ARM7 chip with H-field passive probe Microarchitectural Side-Channel Attacks
N
6 / 40
Trace-driven cache-collision attacks – Generalities
Framework
po
w er / E M trigger
Applicability µC with cache
s
rol
tra ces
nt
np
ut
table lookups indexed by functions of K
co
&i
uts outp
co ntr ol
cryptographic software parametrized by key K
Microarchitectural Side-Channel Attacks
N
7 / 40
Trace-driven cache-collision attacks – Generalities
Attack flow Attack flow
LOOP
1
2
3
4
acquire side-channel trace from routine execution derive cache profile from side-channel trace deduce algebraic relations involving K
Approach adaptive (chosen plaintext attack) non-adaptative (known plaintext attack)
retrieve K from these relations
Microarchitectural Side-Channel Attacks
N
8 / 40
Trace-driven cache-collision attacks – Generalities
ACISP’06 attack from Fournier & Tunstall Observation for AES MH. . . H pattern for S-box lookups in 1st round implies
Method
0.02
Relative frequency
à kà 0 ⊕ ki = p 0 ⊕ pi , ∀i ∈ [1, 15]
0.025
0.015
0.01
0.005
Adaptive approach
0 0
50
100
150
200
250
Number of inputs
Goal Find plaintext for MH. . . H
60-bit reduction of key entropy average complexity: 127.5 inputs Microarchitectural Side-Channel Attacks
N
9 / 40
Trace-driven cache-collision attacks – Efficient key recoveries against AES
Improvement to ACISP’06 attack Method & goal 0.25
same as ACISP attack
Every subsequent M implies one more line paged in, with index k j ⊕ pj . ∆j ,k ← ∆j ,k ∪ pà j ⊕ pk ∆ used for optimally selecting next plaintext.
Relative frequency
0.2
Additional observation
0.15
0.1
0.05
0 0
5
10
15
20
25
30
Number of inputs
60-bit reduction of key entropy average complexity: 14.5 inputs Microarchitectural Side-Channel Attacks
N
10 / 40
Trace-driven cache-collision attacks – Efficient key recoveries against AES
Known-plaintext attack Sieve lookup i is M
0.07
lookup i is H à à ∃!j ∈ Γ, kà i ⊕ k0 = p i ⊕ pj ⊕ kj ⊕ k0 reduce possibilities for kà i ⊕ k0 to 1 same approach in 2nd round, although more complex
Relative frequency
0.06
à à ∀j ∈ Γ, kà i ⊕ k0 6= p i ⊕ pj ⊕ kj ⊕ k0
0.05 0.04 0.03 0.02 0.01 0 0
20
40
60
80
100
Number of inputs
60-bit reduction of key entropy average complexity: 19.4 inputs Microarchitectural Side-Channel Attacks
N
11 / 40
Trace-driven cache-collision attacks – Error tolerance
Approaches to two practical conditions Noise in measurements H M
Relative frequency
Difficult event detection: ë Consider Uncertain events in the sieve and embrace the highest number of candidates possible (U = H or M)
007
008
tH
tM
Statistic (e.g. height of peak)
Partially preloaded cache Cache not clean of AES data prior to encryption: ë Skip H and U [Bonneau’06] Microarchitectural Side-Channel Attacks
N
12 / 40
Trace-driven cache-collision attacks – Error tolerance
Simulation results Number of inputs (log. scale)
M M M M
296.0
5318
only, 8 preloaded lines only, 4 preloaded lines only, 0 preloaded lines &H
2377 1384
158
118.7
60.6
29.2
0
0.2
0.4 Uncertainty rate
0.6
0.8
Average number of traces required for full AES key recovery Microarchitectural Side-Channel Attacks
N
13 / 40
Trace-driven cache-collision attacks – Attack complexity
Univariate model 70 ρ = 0, empirical ρ = 0, theoretical ρ = 0.25, empirical ρ = 0.25, theoretical ρ = 0.5, empirical ρ = 0.5, theoretical
Number of inputs
60 50 40 30 20 10 0 1
3
5
7
9
S-Box number
11
13
15
Number of inputs for the 1st round lookups with uncertainty rate ρ Microarchitectural Side-Channel Attacks
N
14 / 40
Trace-driven cache-collision attacks – Attack complexity
Multivariate model Events not statistically independent: E(maxi Ni ) 6= maxi (ENi ) Relative frequency
0.25
Relative frequency 0.15
0.1
20
0.05
40
N1
max(N1 , N2 ) N1 N2
0.2
0.12 0.1 0.08 0.06 0.04 0.02 0 0
60 80
0
40 30 35 20 25 15 5 10 N2
0 0
5
10 15 20 25 30 35 40 45 50
Number of inputs
Empirical distributions for (N1 , N2 ) (bivariate) and for N1 , N2 and max(N1 , N2 ) (univariate), dashed lines showing the means Microarchitectural Side-Channel Attacks
N
15 / 40
Trace-driven cache-collision attacks
Countermeasures Ad hoc countermeasures pre-fetch the entire lookup table disable cache mechanism
DPA countermeasures Shuffling Masking Random delays or dummy instructions
Microarchitectural Side-Channel Attacks
N
16 / 40
Trace-driven cache-collision attacks
Countermeasures Ad hoc countermeasures pre-fetch the entire lookup table
3
disable cache mechanism
DPA countermeasures Shuffling Masking Random delays or dummy instructions
Microarchitectural Side-Channel Attacks
N
16 / 40
Trace-driven cache-collision attacks
Countermeasures Ad hoc countermeasures pre-fetch the entire lookup table
3
disable cache mechanism
3
DPA countermeasures Shuffling Masking Random delays or dummy instructions
Microarchitectural Side-Channel Attacks
N
16 / 40
Trace-driven cache-collision attacks
Countermeasures Ad hoc countermeasures pre-fetch the entire lookup table
3
disable cache mechanism
3
DPA countermeasures Shuffling
3
Masking Random delays or dummy instructions
Microarchitectural Side-Channel Attacks
N
16 / 40
Trace-driven cache-collision attacks
Countermeasures Ad hoc countermeasures pre-fetch the entire lookup table
3
disable cache mechanism
3
DPA countermeasures Shuffling
3
Masking
B
Random delays or dummy instructions
Microarchitectural Side-Channel Attacks
N
16 / 40
Trace-driven cache-collision attacks
Countermeasures Ad hoc countermeasures pre-fetch the entire lookup table
3
disable cache mechanism
3
DPA countermeasures Shuffling
3
Masking
B
Random delays or dummy instructions
B
Microarchitectural Side-Channel Attacks
N
16 / 40
Trace-driven cache-collision attacks
Conclusion Verified that cache events are distinguishable on EM measurements Improved a chosen plaintext attack (14.5 measurements instead of 127.5 for 60-bit reduction of key entropy) Proposed a known plaintext attack (29.2 measurements for full key recovery) Made our attacks tolerant to errors and partially preloaded cache Scrutinized a univariate model for estimation of measurement complexity. Only a multivariate model is sound. Reviewed several DPA and ad hoc countermeasures against trace-driven cache-collision attacks
Microarchitectural Side-Channel Attacks
N
17 / 40
1
Trace-driven cache-collision attacks
2
DPA on modular addition Generalities Practical approach with combination Application to Threefish Conclusion
3
Microarchitectural Trojans
4
Key enumeration in divide-and-conquer side-channel attacks
Microarchitectural Side-Channel Attacks
N
18 / 40
Previous works Lemke et al.: DPA on modular addition Zohner et al.: butterfly attack (least-square approach to identify the symmetric points of the correlation trace) 1 0.8 0.6 0.4 0.2 0 −0.2
−0.4
0
114
50
178
242
Simulated correlation coefficients for all key hypothesis involved in an 8-bit modular addition – correct key value is 50 Microarchitectural Side-Channel Attacks
N
19 / 40
DPA on modular addition – Generalities
0.8
0.4
0.6
0.3
Correlation Coefficient (ρ)
Correlation Coefficient (ρ)
DPA against 8-bit AVR µC
0.4
0.2
0
−0.2
−0.4
−0.6
−0.8
0.2
0.1
0
−0.1
−0.2
−0.3
0
5
10
15
20
25
30
35
40
Time (µs)
−0.4
0
5
10
15
20
25
30
35
40
Time (µs)
Unsuccessful recovery of the 1st (left) and 8th key byte (right)
Microarchitectural Side-Channel Attacks
N
20 / 40
DPA on modular addition – Practical approach with combination
Contributions pi ,t Practical and generic approach to circumvent the problem induced by carry bits in DPA: recover 2 key blocks ki ,t at a time in a more complex (=nonlinear) target function
pj ,t kj ,t
ë
Works with ⊕ and as combination
ë
Works with Hamming weight and Hamming distance power models
ë
Divide and conquer strategy to keep attack complexity low enough
ë
Yields higher success when one block is known (e.g. in TEA) Application to fast implementation of block cipher Threefish
Microarchitectural Side-Channel Attacks
N
21 / 40
DPA on modular addition – Practical approach with combination
Recovery of 2 key blocks combined with
Correlation traces
Correlation evolution
Recovery of a pair of bytes in two key blocks
Microarchitectural Side-Channel Attacks
N
22 / 40
DPA on modular addition – Application to Threefish
Structure of Threefish-256 block cipher p0 k0
p2
p1 k1
k2
p3
Subkey i
k3
MIX
MIX
Permute MIX
MIX
MIX
MIX
Permute MIX
MIX
Permute
Permute MIX
MIX
Permute
First round of Threefish-256 on four Subkey i + 1 64-bit words Four rounds of Threefish-256 Microarchitectural Side-Channel Attacks
N
23 / 40
DPA on modular addition – Application to Threefish
Structure of Threefish-256 block cipher p0
p2
p1
target function
k0
k1
k2
p3
Subkey i
k3
MIX
MIX
Permute MIX
MIX
MIX
MIX
Permute MIX
MIX
Permute
Permute MIX
MIX
Permute
First round of Threefish-256 on four Subkey i + 1 64-bit words Four rounds of Threefish-256 Microarchitectural Side-Channel Attacks
N
23 / 40
DPA on modular addition – Application to Threefish
DPA against Threefish on 8-bit AVR µC 0.12
0.3
0.1
Correlation Coefficient (ρ)
Correlation Coefficient (ρ)
0.2 0.08
0.06
0.04
0.02
0
−0.02
−0.04
0.1
0
−0.1
−0.2
−0.3 −0.06
−0.08
0
10
20
30
40
50
60
70
−0.4
0
Time (µs)
5
10
15
20
25
30
35
40
45
50
Number of Traces (X100)
Correlation traces
Correlation evolution
Recovery of the first nibbles in two key bytes in Threefish
Microarchitectural Side-Channel Attacks
N
24 / 40
DPA on modular addition
Conclusion
Verified the failure of standard DPA attack against modular addition (simulations and experiments) Proposed the attack of more complex operation (possibly involving more key bits): combination of 2 modular additions. Applied divide and conquer to maintain feasible offline complexity Verified the validity of our approach with experiments against modular additions combined with ⊕ and and against Fhreefish, recommended implementation of Threefish for 8-bit AVR µC.
Microarchitectural Side-Channel Attacks
N
25 / 40
1
Trace-driven cache-collision attacks
2
DPA on modular addition
3
Microarchitectural Trojans Generalities Activation mechanism Payload section Case studies Conclusion
4
Key enumeration in divide-and-conquer side-channel attacks
Microarchitectural Side-Channel Attacks
N
26 / 40
Microarchitectural Trojans
Generalities Hardware Trojan
activation mechanism payload section
Microarchitectural Trojan Render possible a fault or side-channel attack Use
Detection
Backdoor on general-purpose processor IP watermarking full reverse engineering functional testing DPA analysis against golden samples Microarchitectural Side-Channel Attacks
N
27 / 40
Microarchitectural Trojans
Activation mechanism Help Trojan remain undetected
Goal Internal trigger
low area and performance overhead Same pattern used for Trojan deactivation
Different ways Snooping the data bus Activation pattern := (pre-defined blockkparameter) Snooping operands of instruction For architectures of size ≥ 32 bits
Microarchitectural Side-Channel Attacks
N
28 / 40
Microarchitectural Trojans
Payload section Fault induction Zero lookups Single bit-flip in instruction execution
Timing & power variations Pipeline stall from execution of 32-bit xor: byte index nb. of clock cycles
B4
B3
B2
B1
8
4
2
1
Allows to identify zero input bytes (i.e. ki = pi ) Microarchitectural Side-Channel Attacks
N
29 / 40
Microarchitectural Trojans – Case studies
AES
Challenge-response authentication protocol: nonce Prover (trojanized) Verifier (attacker) AESK (nonce)
Microarchitectural Side-Channel Attacks
N
30 / 40
Microarchitectural Trojans – Case studies
RSA
Handshake protocol with SSL or TLS server: C = r e (N) Client (attacker)
Server (trojanized)
˜ = RSA-decryptd (C ) M
Microarchitectural Side-Channel Attacks
N
31 / 40
Microarchitectural Trojans – Conclusion
Conclusion Introduced microarchitectural Trojans for inducing or amplifying side-channel or inserting computation faults Proposed software-based activation mechanisms Snoop data bus Snoop operands of instruction
Proposed payload sections fault induction timing & power variation
Described two practical scenarios where such Trojans would allow adversary to retrieve secret/private key.
Microarchitectural Side-Channel Attacks
N
32 / 40
1
Trace-driven cache-collision attacks
2
DPA on modular addition
3
Microarchitectural Trojans
4
Key enumeration in divide-and-conquer side-channel attacks Generalities Key enumeration algorithm Practical results Conclusion
Microarchitectural Side-Channel Attacks
N
33 / 40
"We already have quite a few people who know how to divide. So essentially, we’re now looking for people who know how to conquer." Microarchitectural Side-Channel Attacks
N
34 / 40
Key enumeration in divide-and-conquer side-channel attacks
Generalities Divide and conquer attacks 1 2
Divide part: recovery of key chunks Conquer part: retrieve full key from multiple candidates for each key chunk
Conquer part Key divided into n chunks; consider top m candidates for each chunk: key space of size O(mn ) From probability mass function (PMF) output by Bayesian distinguisher for a chunk, build full key PMF
Microarchitectural Side-Channel Attacks
N
35 / 40
Key enumeration in divide-and-conquer side-channel attacks
Generalities Divide and conquer attacks 1 2
Divide part: recovery of key chunks Conquer part: retrieve full key from multiple candidates for each key chunk
Conquer part Key divided into n chunks; consider top m candidates for each chunk: key space of size O(mn ) From probability mass function (PMF) output by Bayesian distinguisher for a chunk, build full key PMF
Microarchitectural Side-Channel Attacks
N
35 / 40
Key enumeration in divide-and-conquer side-channel attacks
Metrics for side-channel adversary Metrics for subkey recovery [Standaert et al.’09] o -th order success rate SR0 : probability that the correct key is within the top o candidates. the guessing entropy E : expected rank of the correct key.
Probability for full key candidate to be correct (ij )
product of individuals probabilities ζj with index j and rank ij .
Z (i0 , i1 , . . . , i15 ) =
for each subkey candidate
16 Y j =1
(ij )
ζj
Microarchitectural Side-Channel Attacks
N
36 / 40
Key enumeration in divide-and-conquer side-channel attacks
Metrics for side-channel adversary Metrics for subkey recovery [Standaert et al.’09] o -th order success rate SR0 : probability that the correct key is within the top o candidates. the guessing entropy E : expected rank of the correct key.
Probability for full key candidate to be correct (ij )
product of individuals probabilities ζj with index j and rank ij .
Z (i0 , i1 , . . . , i15 ) =
for each subkey candidate
16 Y j =1
(ij )
ζj
Microarchitectural Side-Channel Attacks
N
36 / 40
Key enumeration in divide-and-conquer side-channel attacks
Sorting with pairwise multiplications Contributions Sorting algorithm with pairwise computations and recursive decomposition of enumeration problem Comparison with lexicographical sorting
Our key enumeration algorithm From two PMFs: 1 2 3
multiply pairwise sort truncate
Iterate on two resulting lists Microarchitectural Side-Channel Attacks
N
37 / 40
Key enumeration in divide-and-conquer side-channel attacks
Practical results Expected rank of the correct key. olex = 316 ≈ 226 = oprw
t 1-byte lex prw
t
1-byte
lex
prw
7 14 21 28
1.1828 1.0124 1.0011 1.0002
1662500 266890 23677 4305.7
17497 4.4721 1.1584 1.0275
number of traces for building one template subkey PMF full key PMF lexicographically sorted full key PMF sorted with pairwise multiplications Microarchitectural Side-Channel Attacks
N
38 / 40
Key enumeration in divide-and-conquer side-channel attacks
Conclusion
Addressed key enumeration problem in divide and conquer side-channel attack Proposed sorting algorithm that uses pairwise multiplications and recursive decomposition of problem Compared our algorithm against lexicographical sorting
ë Optimized sorting enables adversary to decrease complexity of template-building phase in template DPA
Microarchitectural Side-Channel Attacks
N
39 / 40
Thank you!
Microarchitectural Side-Channel Attacks
N
40 / 40
List of publications Jean-François Gallais, Arnab Roy and Praveen Kumar Vadnala. Full key recovery attacks on modular addition: an application to Threefish. Proceedings of WESS 2012. Jean-François Gallais and Ilya Kizhvatov. Error-tolerance in trace-driven cache collision attacks. Proceedings of COSADE 2011. Jean-François Gallais, Johann Großschädl, Neil Hanley, Markus Kasper, Marcel Medwed, Francesco Regazzoni, Jörn-Marc Schmidt, Stefan Tillich, Marcin Wójcik. Hardware Trojans for inducing or amplifying side-channel leakage. In INTRUST 2010, volume 6802 of LNCS, pages 253–270. Springer, 2010. Jean-François Gallais, Ilya Kizhvatov, and Michael Tunstall. Improved trace-driven cache-collision attacks against embedded AES implementations. In WISA 2010, volume 6513 of LNCS, pages 243–257. Springer, 2011. Extended version available at http://eprint.iacr.org/2010/408. Microarchitectural Side-Channel Attacks
N
41 / 41