# Clueless: A Tool Characterising Values Leaking as Addresses

Xiaoyue Chen Uppsala University Uppsala, Sweden xiaoyue.chen@it.uu.se Pavlos Aimoniotis Uppsala University Uppsala, Sweden pavlos.aimoniotis@it.uu.se Stefanos Kaxiras Uppsala University Uppsala, Sweden stefanos.kaxiras@it.uu.se

# ABSTRACT

Clueless is a binary instrumentation tool that characterises explicit cache side channel vulnerabilities of programs. It detects the transformation of data values into addresses by tracking dynamic instruction dependencies. Clueless tags data values in memory if it discovers that they are used in address calculations to further access other data.

Clueless can report on the amount of data that are used as addresses at each point during execution. It can also be specifically instructed to track certain data in memory (e.g., a password) to see if they are turned into addresses at any point during execution. It returns a trace on how the tracked data are turned into addresses, if they do.

We demonstrate Clueless on SPEC 2006 and characterise, for the first time, the amount of data values that are turned into addresses in these programs. We further demonstrate Clueless on a micro benchmark and on a case study. The case study is the different implementations of AES in OpenSSL: T-table, Vector Permutation AES (VPAES), and Intel Advanced Encryption Standard New Instructions (AES-NI). Clueless shows how the encryption key is transformed into addresses in the T-table implementation, while explicit cache side channel vulnerabilities are note detected in the other implementations.

# **CCS CONCEPTS**

• Security and privacy  $\rightarrow$  Side-channel analysis and countermeasures; Information flow control.

#### **1** INTRODUCTION

Cache side-channel attacks leak information through a microarchitectural covert channel - the cache. By observing changes in the shared cache state, a spy process can bypass process isolation and read secret data from a victim process. Cache side-channel attacks have been demonstrated on processors of different architectures and on different algorithms, e.g., RSA [1], AES [2-5], and ElGamal [6]. Speculative side-channel attacks such as Spectre [7], Meltdown [8] and their variants [9-14] have caused major changes on how the architecture community view security. These attacks exploit speculative instructions that are to be squashed (e.g., instructions in mispredicted branches) to access and then transmit secret data over the shared cache. While non-speculative cache side-channel attacks could usually be mitigated by improving the implementations of vulnerable algorithms (e.g., avoid using secret data to look up in large tables), the speculative variants of them are difficult to prevent by changing software implementations because the information leakage happens in speculation.

Fig. 1 shows Spectre Variant 1 where an attacker can exploit the branch misprediction to access arbitrary program data and transmit the secret over a shared cache [7]. The victim program is correctly implemented with the appropriate bound check, yet it is still vulnerable due to speculative execution.

Speculative side-channel attacks have found to be an enormous security threat. Different hardware approaches have been proposed to protect against them. For example, InvisiSpec [15] and Ghost-Minion [16] makes speculation invisible in the data cache hierarchy using additional speculative buffers so that secrets cannot be transmitted over cache channels. Delay-on-Miss (DoM) [17] delays all speculative loads that miss in data cache and thus prevent the observable timing differences, while Speculative Taint Tracking (STT) [18] focuses on blocking only the transmitter instruction. STT uses dynamic information flow tracking (DIFT) to taint secret data. It allows to forward the results of speculative instructions if they cannot leak secrets via any potential covert channels.

This work does not propose new mitigation methods for speculative side-channel attacks. Instead, we intend to understand how prevalent these vulnerabilities are in programs from a new perspective. Side-channel attacks rely on a fundamental programming feature to leak the value of secrets – *the transformation of data values into memory addresses.* Besides the victim program in Fig. 1, consider for example sorting, hashing, or many other algorithms that create addresses based on data values. While we understand the mechanism that leaks data as addresses, there is no clear indication of how serious the problem is in our workloads: How many values do "leak" as addresses in a given application?

This work aims to shed some light on how exposed are we to the potential vulnerability. *Clueless* is a tool (based on binary rewriting) that tracks dynamic instruction dependencies and tags data values in memory if it discovers that they are used in address calculations to further access other data.

Clueless can be used in two modes: *aggregating mode*, where it reports on the amount of data that are used as addresses at each point during execution, and *tracking mode* where the tool is specifically asked to track certain data in memory (e.g., a password) to see if they are turned into addresses at any point during execution. Tracking mode returns a trace on how the tracked data are turned into addresses, if they do.

```
1 uint8 A[10];
2 uint8 B[256*64];
3 void victim (size_t addr) {
4 if (addr < 10) { // mispredicted branch
5 uint8 val = A[addr]; // secret accessed
6 ... = B[64*val]; // secret transmitted
7 }
8 }
```

Figure 1: Spectre Variant 1.



Figure 2: Code that leaks secret.

We demonstrate Clueless in aggregating mode on SPEC 2006 and characterise, for the first time, the amount of data values that are turned into addresses in these programs. We further demonstrate Clueless in tracking mode on a micro benchmark and on a case study. The case study is the different implementations of AES in OpenSSL: T-table, Vector Permutation AES (VPAES), and Intel Advanced Encryption Standard New Instructions (AES-NI). The T-table AES implementation can be easily broken with a cache side-channel attack (e.g., Prime+Probe), but VPAES and AES-NI are immune to cache-timing attacks. Clueless readily shows how the encryption key is transformed into addresses in the T-table implementation and a lack of the corresponding transformations in the other two implementations.

## 2 METHODS

Clueless is a dynamic instrumentation tool that analyses instructions at run-time to track values that leak as memory addresses. Values are data that should not be used, directly or indirectly, as memory addresses, e.g., password hashes, private encryption keys. A value can leak as a memory address when there is information flow from the value to a memory address. The scope of the tool is limited to detecting data-flow: it tracks data dependences but disregards control dependences. In other words, Clueless is able to detect explicit channels [18], a value that is used as an address on a load instruction, but not implicit channels [18], where the value is leaked through control flow interaction. Assume secret is a value, the leakage in the code in Fig. 2a will not be detected by the tool because &A[0] and &A[128] only have control dependence on secret. On the other hand, the tool will detect the leakage in Fig. 2b because secret is involved in the computation of &A[i]. Furthermore, addr will be tagged as a leak point. A leak point is a memory location where a leaked value resides.

Clueless uses an algorithm based on dynamic information flow tracking (DIFT) [19–21]. DIFT has been successfully applied to prevent attacks on software [20–25] and has been seen in hard-ware protection proposals against speculative execution attacks [18]. DIFT tracks information flow by associating taints with data and propagating the taints according to the data flow. In addition, Clueless's algorithm needs to automatically assign taints to data and maintain the taints.

## 2.1 Taint assignment

A new taint is assigned to a memory location whenever a *value* (i.e., data that should not be used to address memory) is loaded from that memory location. Each taint is associated to the address of a value. In the example in Fig. 3, suppose that values reside at memory location addrX and addrY, a new taint  $t_x$  is assigned to addrX

| L | r1 = addrX      |
|---|-----------------|
| 2 | r2 = addrY      |
| 3 | load rX <- (r1) |
| Ł | load rY <- (r2) |
| 5 | r3 = rX + 64    |
| 5 | r4 = r3 + rY    |
| , | load r5 <- (r4) |
|   |                 |

Figure 3: Instructions where addrX and addrY are leak points.

when the load instruction on line 3 executes, and then another taint  $t_y$  is assigned to addrY when line 4 executes.

Clueless needs to know if the loaded data is a value. Most contemporary Instruction Set Architectures (ISAs) do not make a distinction between *value* and *address* loads nor between *value* and *address* registers. As a binary instrumentation tool, Clueless is clueless about which loads actually load values. Clueless provides two solutions to this problem.

*Everything is a value.* One solution is to regard all data in the memory initially as values, i.e., Clueless assumes nothing in the memory should be used as a memory address. For every load instruction, a new taint is assigned to the memory address of the load. Consequently, all memory locations which contain memory addresses will be considered as leak points. Clueless effectively provides a way to classify any data in memory into memory addresses or non-addresses based on the past execution. This provides a new perspective to analyse programs: how much of a process's memory is potentially observable by another process through a cache side channel? We name this model as *aggregation mode*. Aggregation mode indicate how visible a program memory can be — Section 3 presents its results.

Users set watchpoints. Another solution is to let users mark out memory regions that contain values. A new taint is assigned to the memory address of a load only if that address is within a marked memory region. Clueless supports this solution by providing an API that can dynamically register and unregister memory regions to watch. This requires users to modify the source code of instrumented programs by inserting Clueless watchpoint API calls. We name this model as *tracking mode*. Section 4 presents its results.

# 2.2 Taint propagation

Clueless uses bit arrays to store taint sets, where each bit represents a different taint. With this representation, set union operations are equivalent to bit-wise or operations, which are efficient to perform. Each bit array is associated to a register or a memory location. The number of bits in a bit array is finite and can be configured when compiling the tool. Consequently, the maximal number of taints is equal to the number of bits in a bit array.

At instruction level, data-flow can be divided into two categories: register-register flow and register-memory flow. One of the main differences of the two categories in the context of dependence tracking is that the space required by register-register flow tracking is upper-bounded by the number of architectural registers while that of register-memory flow tracking is upper-bounded by the number of virtual memory locations. For example, pairs of a load and a store (both cause register-memory flow) can copy some data throughout the entire virtual memory and result in every memory

| Table 1: Examp | le Taint Pro | pagation |
|----------------|--------------|----------|
|----------------|--------------|----------|

| Instruction     | rX        | rY        | r3        | r4             | Remark                    |
|-----------------|-----------|-----------|-----------|----------------|---------------------------|
| r1 = addrX      | {}        | {}        | {}        | {}             |                           |
| r2 = addrY      | {}        | {}        | {}        | {}             |                           |
| load rX <- (r1) | $\{t_X\}$ | {}        | {}        | {}             | $t_x$ associated to addrX |
| load rY <- (r2) | $\{t_X\}$ | $\{t_y\}$ | {}        | {}             | $t_y$ associated to addrY |
| r3 = rX * 64    | $\{t_X\}$ | $\{t_y\}$ | $\{t_X\}$ | {}             |                           |
| r4 = r3 + rY    | $\{t_X\}$ | $\{t_y\}$ | $\{t_X\}$ | $\{t_x, t_y\}$ |                           |
| load r5 <- (r4) | {}        | {}        | {}        | {}             | addrX, addrY tagged       |

location being tainted by the taints of the data, requiring enormous amount of space to store the taint sets. This might not be an issue when a few pieces of data are tracked because the data are not likely to flow through a large part of the memory. When the numbers of tracking points are large, however, the space overhead makes complete tracking of register-memory flow impractical. This is the case for Clueless in aggregating mode — it regards everything in the memory as a value and tracks the entire memory. On the other hand, storing a taint set for each register requires much less space because the number of architectural registers is low.

*Tracking dependences via registers.* Clueless tracks register-register flow by examining instructions, identifying source, destination, and memory addressing registers and following propagation rules. Table 1 demonstrates how taints propagate through the registers as instructions from Fig. 3 are executed. The propagation rules are listed below:

- (1) For each load instruction, the taint set of its destination registers becomes either a singleton or an empty set. If a value is loaded, the taint set is a singleton whose element is the new taint associated to the value's address. If what is loaded is not a value, the taint set is the empty set.
- (2) For instructions that set their destination registers to a constant (e.g. xor with two same source registers, mov a constant to a register), the taint sets of their destination registers become the empty set.
- (3) For instructions whose source and destination operands are all registers except the instructions in rule 2, the taint sets of their destination registers become the union of the taint sets of their source registers.
- (4) For load and store instructions, memory addressing registers have their taint sets emptied. All the memory addresses associated with the emptied taints are tagged as leak points.
- (5) For store instructions, if the taint sets of all the memory addressing registers are the empty set, the address is no longer a leak point and is untagged.

*Expanding dependence tracking to memory.* Using register-register flow tracking alone, the taint sets of data could be lost because programs often store some data to the memory, use the register containing the data for something else, and later reload the data from the memory. These cases require tracking register-memory flow to store and reload the taint sets. Two additional propagation rules are introduced to expand dependence tracking to memory:

- (6) For each store instruction, the taint set of the memory address becomes the taint set of the source register that contains the stored data.
- (7) For each load instruction, in addition to rule 1, the taint set of a destination register becomes the union of the resulting taint set from rule 1 and the taint set of the memory address.

Although tracking all the register-memory flow using a complete method is impractical due to the space requirements, it is still important to track these flows because temporarily storing data to memory is very common. For this reason, a set-associative cache is used as a best-effort approach to store the taint sets that are associated with memory addresses. The cache uses a first-in-first-out replacement policy. The number of sets as well as the associativity of the cache can be configured when compiling the tool.

# 2.3 Taint maintenance

Clueless has finite number of taints because of the use of statically sized bit arrays as taint sets. Therefore, taints must be maintained and reused. A taint can only be reused when it is in none of the taint sets. Propagation rule 1, 2 and 4 are the rules that can empty taint sets and make taints reusable. Since the addresses associated with the emptied taints are already tagged as leak points according to propagation rule 4, the emptied taints no longer have useful information, thus can be removed from all the taint sets, resulting in them immediately becoming reusable.

Taints can still be exhausted in spite of the recycling. For example, a program can have a loop that loads many values from the memory and sums them. In these cases, Clueless makes the taint assigned by the earliest load available by removing it from all the taint sets.

# 2.4 Limitations

*No tracking on speculative execution.* Clueless is a binary instrumentation tool. It is not a hardware simulator and does not obtain micro architectural information such as instructions executed in speculation. As a result, Clueless cannot track speculative execution.

*Incomplete tracking.* Clueless is a characterisation tool as oppose to a verification tool. The tracking of Clueless is incomplete. Clueless can track data dependence within a limited window. The incompleteness is the consequence of our implementation that uses a finite number of taints and a finite sized cache. When compiling the tool, users can adjust these parameters to find the desired size of the tracking window.

Dependencies. Clueless depends on Intel Pin [26]. Clueless is compiled into a shared library and needs to be loaded by Intel Pin. The propagation algorithms of Clueless is implemented in a platform-independent way, but Intel Pin only supports instrumentation of IA-32, x86-64 and MIC ISAs. As a result, Clueless currently only supports these ISAs.

# 2.5 Source code

The source code of Clueless is published under the GNU General Public License, Version 3. Its git repository is accessible at https: //github.com/xiaoyuechen/dift-addr.git.

# **3 AGGREGATING MODE**

Clueless in aggregating mode regards everything in the instrumented program's memory as values. Clueless in this mode tags any memory locations whose data transform into addresses as leak points. In addition, Clueless collects a set of all memory addresses used by the program, i.e., addresses used in any memory accessing instructions. With the set of leak points and the set of all memory addresses, we could introduce a metric that describes the proportion of data that are used as addresses for a given execution of a program.

# 3.1 The $\Lambda$ metric

Let  $L_i$  be the set of leak points and  $A_i$  be the set of all addresses after the execution of the *i*:th instruction of a program (Trivially,  $L_i \subseteq A_i$ ). Let *n* be the number of instructions of the entire execution of the program, metric  $\Lambda$  defined by

$$\Lambda = \frac{\sum_{i=1}^{n} |L_i|}{\sum_{i=1}^{n} |A_i|}$$

indicates the average proportion of data that transform into addresses during the entire executing of the program. Figuratively,  $\Lambda$ is the area under the  $|L_i|$  curve divided by the area under the  $|A_i|$ curve in Fig. 5.

# 3.2 $\Lambda$ of SPEC benchmarks

We used Clueless's aggregating mode on SPEC 2006 to characterise data transformation into memory addresses by analysing how  $|A_i|$  and  $|L_i|$  change and comparing the  $\Lambda$  values of different benchmarks. Since Clueless uses incomplete methods to track  $L_i$ while the tracking of  $A_i$  is complete, the reported values of  $\Lambda$  are lower-bounds of the actual  $\Lambda$ .

The prevalence of data-address transformations, indicated by  $\Lambda$ , is an innate property of a program. Fig. 4 shows the values of  $\Lambda$  of different benchmarks programs. The astar program and soplex program use more than one third of their memory to store addresses. In the bwaves program and sjeng program, on the other hand, such transformations are rarely seen.

#### 3.3 A closer look

For more insights into data-address transformation, we further study how much data are transformed into addresses at each point of execution of some benchmark programs.



Figure 4:  $\Lambda$  of SPEC benchmarks.

Astar. Fig. 5a shows how  $|A_i|$  and  $|L_i|$  change during an execution of the astar program. Note that  $|A_i|$  increases monotonically while  $|L_i|$  does not. When an address  $l \in L_i$  is written to by the (i + 1):th instruction,  $L_{i+1} = L_i \setminus \{l\}$ . This is very common when l is a stack address, as the stack memory tends to be rewritten often. In the astar program,  $|L_i|$  fluctuates when  $i \in [2 \times 10^{11}, 3 \times 10^{11}]$ . The cause of such fluctuations is that the same blocks of memory containing addresses are repeatedly loaded from and written to.

*Bzip2.* Fig. 5b shows that the bzip2 program periodically store new addresses to the same blocks of memory. One common dataaddress transformation pattern can be found when  $i \in [0, 3.9 \times 10^{10}]$ ,  $i \in [3.9 \times 10^{10}, 7.9 \times 10^{10}]$ , and  $i \in [7.9 \times 10^{10}, 1.75 \times 10^{11}] - a$  rapid increase in  $|L_i|$  which then fluctuates periodically, followed by another rapid but smaller increase in  $|L_i|$  and fluctuates periodically again. The cause for the repeated pattern could be that the bzip2 program reallocates memory to store memory addresses, but the same algorithm is used on the reallocated memory.

*Calculix.* Fig. 5c shows that the calculix program has an obvious periodic memory access pattern. After the initial increase of both  $|A_i|$  and  $|L_i|$ ,  $|A_i|$  becomes stable while  $|L_i|$  becomes periodic. The amplitude of  $|L_i|$  is relatively large at approximately 2.1 × 10<sup>6</sup>, indicating that blocks containing 2.1 × 10<sup>6</sup> addresses are repeatedly written with new addresses.

Soplex. Fig. 5d shows how  $|A_i|$  and  $|L_i|$  of the soplex program change. After the initial increase, both  $|A_i|$  and  $|L_i|$  become stable. This does *not* mean that data in this program are transformed to memory addresses only once. After the data are tagged, they may still be transformed into memory addresses multiple times in different ways, but  $L_i$  would remain the same. The stable  $|L_i|$  only indicates that no new data are tagged, and no tagged memory location is written to.

## 4 TRACKING MODE

Clueless in tracking mode allows users to dynamically register and unregister watchpoints, i.e., memory blocks that contain values. If data from any watchpoint are transformed into memory addresses, Clueless will provide a detailed diagnose on each leakage. The diagnostic information include the leak point, the memory address that



Figure 5:  $|A_i|$  and  $|L_i|$  of benchmark programs.



Figure 6: The micro benchmark program.

the value in the leak point transforms into, a trace of instructions that shows the value-address transformation, and the relevant routine and image names where the leakage happens. This mode could be used to test the side-channel vulnerability of programs and help understand where and how secrets are leaked if such vulnerability exists.

# 4.1 The micro benchmark

We demonstrate Clueless in tracking mode on a micro benchmark program in Fig. 6. Array T[] and function  $f_{00}$  are defined in a shared library the victim program links against. The victim program has a secret stored in the s[] array. The victim calls  $f_{00}$  with the secret as its parameter. Function  $f_{00}$  loads each byte of its parameter array, multiplies the byte value by 64, and uses the result as the index of a constant array T[] to do some lookup.

This program is vulnerable to side channel attacks such as Flush + Reload [1]. The attacking program may mmap the shared library and flush the cache lines containing T[], wait for the victim to call the foo function, and measure the time to reload the cache lines to find out which lines are accessed by foo. Assuming that the victim's machine has 64-byte cache lines, the attacker can recover the secret completely — each access of T[s[i]\*64] will be on a different cache line, so each byte of the secret can be computed using (1-T)/64 where 1 is the address of an accessed line. The offset of T can also be found trivially because it is just a symbol in a shared library. In our example, T has an offset of  $0 \times 2020$ .

# 4.2 **Pinpointing the leakage**

Clueless's aggregation mode can be used to characterise the micro benchmark program. Fig. 7 shows how its  $|A_i|$  and  $|L_i|$  change. With Clueless in tracking mode, we can pinpoint which increase of



Figure 7:  $|A_i|$  and  $|L_i|$  of the micro benchmark program.

|   | (a) Propagation                    | <b>(b)</b> | Instruction |
|---|------------------------------------|------------|-------------|
| 7 | [ %rcx { 0 } ] = 0x55baae46d8e0 -> | mov        | (%rcx),%rax |
| 6 | %rdx { 0 } -> %rcx                 | mov        | %rdx,%rcx   |
| 5 | %rdx %rax { 0 } -> %rdx            | add        | %rax,%rdx   |
| 4 | %rax { 0 } -> %rax                 | cltq       |             |
| 3 | %rax { 0 } -> %rax                 | shl        | \$0x6,%eax  |
| 2 | %rax { 0 } -> %rax                 | movsbl     | %al,%eax    |
| 1 | 0x7fff41801683 { 0 } -> %rax       | movzbl     | (%rax),%eax |
|   |                                    | _          |             |

Figure 8: Tracing the micro benchmark.

 $|L_i|$  results in the tracked secrets being leaked. The annotations in Fig. 7 reveal the content of the leaked secrets at the point of their leakage. For example,  $0\times38e0$  is leaked when i = 20, i.e.,  $\&T[s[0]\times64]$  evaluates to  $0\times38e0$  (after subtracting the image load offset), and is used as a memory address in a load. To recover s[0], we compute  $(0\times38e0-0\times2020)/64$  and yield  $0\times63$ , which is the ASCII code for 'c'.

#### 4.3 Tracing the transformation

For further understanding on the leakage, Clueless gives a trace of propagation that causes it. Fig. 8a shows the part the propagation trace that causes s[0] to leak, and Fig. 8b shows the corresponding instructions.

The propagation trace has the following syntax:

- { NUM, ... } is a taint set, e.g., %rax { 0, 3 } means register rax has taint set {t<sub>0</sub>, t<sub>3</sub>}.
- -> represents data flow and taint propagation, e.g., %rdx { 0
   } -> %rcx means data flow from register rdx to register rcx, and taint t<sub>0</sub> propagates to register rcx.
- [REG { NUM, ... } ] represents a register being used as a memory address, followed by an equal sign and the effective address, e.g., [ %rcx { 0 } ] = 0x55baae46d8e0 -> means that register rcx whose taint set is t<sub>0</sub> is used as a memory address in a load.

By analysing the trace of the micro benchmark, we find that the instruction at line 1 loads &s[0] which is 0x7fff41801683 with taint set { $t_0$ } to register rax. The following 3 instructions propagate { $t_0$ } from register rax to itself. Then the instruction at line 5 propagates { $t_0$ } to register rdx by adding register rax to it. { $t_0$ } is further propagated to register rcx which is eventually used as the memory address 0x55baae46d8e0.

# 5 CASE STUDY: AES

We have seen how the micro benchmark in Section 4 could leak secrets due to its value-address transformations. Some implementations of Advanced Encryption Standard (AES) [27] are susceptible to cache side-channel attacks for the same reason. These implementations often depend on large tables to speed up the encryption process [28]. If encryption keys are transformed into indices of large tables for lookups, attackers may partially or completely recover the keys by observing the corresponding cache state changes. Numerous attacks on AES exploiting this class of vulnerability have been demonstrated in the past [2–5]. Different implementations of AES have also been proposed to protect against such attacks while retaining or improving the speed of encryption [3, 28, 29].

In this case study, we use Clueless in tracking mode to analyse three different implementations of AES present in OpenSSL 3.0.3 — T-table, Vector Permutation AES (VPAES), and Intel Advanced Encryption Standard New Instructions (AES-NI). The expanded encryption key is set as the watchpoint in order to observe if it is transformed into memory addresses.

# 5.1 T-table

The T-table implementation of AES in OpenSSL uses 9 T-tables, i.e., pre-computed lookup tables, with 8 of them being 8kiB each and 1 of them being 2kiB. The encryption key is first expanded to round keys. The first round key is combined with the 16-byte plaintext using xor to form the initial state vector. The elements in the state vector are then used as indices of the T-tables to look up values which are combined with the next round key to form the next state vector. This implementation could be easily broken using Prime+Probe, with the 128-bit encryption key fully recovered after only 300 encryptions [3].

Clueless detects the potential leak and marks all the bytes of the key as leak points. In addition, Clueless gives a propagation trace that shows how the key is transformed into memory addresses. Fig. 9a shows the propagation trace of the first round of the encryption (all rounds are similar) while Fig. 9b shows the corresponding instructions. The traces shows that the first round key is leaked as follows:

| <pre>0x7fff609ec660 { 1 } -&gt; %xmm5<br/>%xmm2 %xmm5 { 1 } -&gt; %xmm2<br/>%xmm0 %xmm2 { 1 } -&gt; %xmm0<br/>%xmm1 %xmm0 { 1 } -&gt; %xmm1<br/>%xmm1 { 1 } -&gt; %xmm1<br/>%xmm0 { 1 } -&gt; %xmm0<br/>%xmm3 %xmm1 { 1 } -&gt; %xmm5<br/>%xmm0 %xmm1 { 1 } -&gt; %xmm0<br/>%xmm3 %xmm1 { 1 } -&gt; %xmm3<br/>%xmm4 %xmm5 { 1 } -&gt; %xmm4<br/>%xmm4 %xmm5 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm3 { 1 } -&gt; %xmm2<br/>%xmm2 %xmm0 { 1 } -&gt; %xmm2<br/>%xmm2 %xmm0 { 1 } -&gt; %xmm2<br/>%xmm2 %xmm0 { 1 } -&gt; %xmm3<br/></pre> |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <pre>%xmm2 %xmm2 { 1 } -&gt; %xmm2<br/>%xmm0 %xmm2 { 1 } -&gt; %xmm0<br/>%xmm1 %xmm0 { 1 } -&gt; %xmm0<br/>%xmm1 { 1 } -&gt; %xmm1<br/>%xmm0 { 1 } -&gt; %xmm0<br/>%xmm0 %xmm1 { 1 } -&gt; %xmm5<br/>%xmm0 %xmm1 { 1 } -&gt; %xmm0<br/>%xmm3 %xmm1 { 1 } -&gt; %xmm3<br/>%xmm3 %xmm5 { 1 } -&gt; %xmm3<br/>%xmm4 %xmm4 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm3 { 1 } -&gt; %xmm2<br/>%xmm2 %xmm0 { 1 } -&gt; %xmm2<br/>%xmm3 %xmm4 { 1 } -&gt; %xmm3<br/><br/>%xmm3 %xmm1 { 1 2 10 } -&gt; %xmm3</pre>                                 |
| <pre>%xmm0 %xmm2 { 1 } -&gt; %xmm0<br/>%xmm1 %xmm0 { 1 } -&gt; %xmm1<br/>%xmm1 { 1 } -&gt; %xmm1<br/>%xmm0 { 1 } -&gt; %xmm0<br/>%xmm5 %xmm0 { 1 } -&gt; %xmm0<br/>%xmm3 %xmm1 { 1 } -&gt; %xmm3<br/>%xmm3 %xmm1 { 1 } -&gt; %xmm3<br/>%xmm4 %xmm5 { 1 } -&gt; %xmm3<br/>%xmm4 %xmm5 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm3 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm0 { 1 } -&gt; %xmm2<br/>%xmm2 %xmm0 { 1 } -&gt; %xmm3<br/><br/>%xmm3 %xmm1 { 1 2 10 } -&gt; %xmm3</pre>                                                                   |
| <pre>%xmm1 %xmm0 { 1 } -&gt; %xmm1<br/>%xmm1 { 1 } -&gt; %xmm1<br/>%xmm0 { 1 } -&gt; %xmm0<br/>%xmm5 %xmm0 { 1 } -&gt; %xmm0<br/>%xmm3 %xmm1 { 1 } -&gt; %xmm3<br/>%xmm3 %xmm1 { 1 } -&gt; %xmm3<br/>%xmm4 %xmm5 { 1 } -&gt; %xmm3<br/>%xmm4 %xmm5 { 1 } -&gt; %xmm4<br/>%xmm4 %xmm5 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm3 { 1 } -&gt; %xmm2<br/>%xmm2 %xmm0 { 1 } -&gt; %xmm2<br/>%xmm3 %xmm4 { 1 } -&gt; %xmm3<br/></pre>                                                                                                          |
| <pre>%xmm1 { 1 } -&gt; %xmm1<br/>%xmm0 { 1 } -&gt; %xmm0<br/>%xmm5 %xmm0 { 1 } -&gt; %xmm0<br/>%xmm3 %xmm1 { 1 } -&gt; %xmm0<br/>%xmm3 %xmm1 { 1 } -&gt; %xmm3<br/>%xmm3 %xmm5 { 1 } -&gt; %xmm3<br/>%xmm4 %xmm4 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm3 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm0 { 1 } -&gt; %xmm2<br/>%xmm3 %xmm4 { 1 } -&gt; %xmm2<br/><br/>%xmm3 %xmm1 { 1 2 10 } -&gt; %xmm3</pre>                                                                                                                                       |
| <pre>%xmm0 { 1 } -&gt; %xmm0<br/>%xmm5 %xmm0 { 1 } -&gt; %xmm5<br/>%xmm0 %xmm1 { 1 } -&gt; %xmm0<br/>%xmm3 %xmm1 { 1 } -&gt; %xmm0<br/>%xmm3 %xmm5 { 1 } -&gt; %xmm3<br/>%xmm4 %xmm0 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm3 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm3 { 1 } -&gt; %xmm2<br/>%xmm3 %xmm4 { 1 } -&gt; %xmm2<br/><br/>%xmm3 %xmm1 { 1 2 10 } -&gt; %xmm3</pre>                                                                                                                                                                   |
| <pre>%xmm0 %xmm0 { 1 } -&gt; %xmm5<br/>%xmm0 %xmm1 { 1 } -&gt; %xmm0<br/>%xmm3 %xmm1 { 1 } -&gt; %xmm0<br/>%xmm3 %xmm5 { 1 } -&gt; %xmm3<br/>%xmm4 %xmm0 { 1 } -&gt; %xmm4<br/>%xmm4 %xmm5 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm3 { 1 } -&gt; %xmm2<br/>%xmm2 %xmm0 { 1 } -&gt; %xmm2<br/>%xmm3 %xmm4 { 1 } -&gt; %xmm3<br/></pre>                                                                                                                                                                                                    |
| <pre>%xmm0 %xmm1 { 1 } -&gt; %xmm0<br/>%xmm3 %xmm1 { 1 } -&gt; %xmm3<br/>%xmm3 %xmm5 { 1 } -&gt; %xmm3<br/>%xmm4 %xmm0 { 1 } -&gt; %xmm4<br/>%xmm4 %xmm5 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm3 { 1 } -&gt; %xmm2<br/>%xmm2 %xmm0 { 1 } -&gt; %xmm2<br/>%xmm3 %xmm4 { 1 } -&gt; %xmm3<br/></pre>                                                                                                                                                                                                                                      |
| <pre>%xmm3 %xmm1 { 1 } -&gt; %xmm3<br/>%xmm3 %xmm5 { 1 } -&gt; %xmm3<br/>%xmm4 %xmm0 { 1 } -&gt; %xmm4<br/>%xmm4 %xmm5 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm3 { 1 } -&gt; %xmm2<br/>%xmm2 %xmm0 { 1 } -&gt; %xmm2<br/>%xmm3 %xmm4 { 1 } -&gt; %xmm3<br/></pre>                                                                                                                                                                                                                                                                        |
| <pre>%xmm3 %xmm5 { 1 } -&gt; %xmm3<br/>%xmm4 %xmm0 { 1 } -&gt; %xmm4<br/>%xmm4 %xmm5 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm3 { 1 } -&gt; %xmm2<br/>%xmm2 %xmm3 { 1 } -&gt; %xmm2<br/>%xmm3 %xmm4 { 1 } -&gt; %xmm3<br/><br/>%xmm3 %xmm1 { 1 2 10 } -&gt; %xmm3</pre>                                                                                                                                                                                                                                                                   |
| <pre>%xmm4 %xmm0 { 1 } -&gt; %xmm4<br/>%xmm4 %xmm5 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm3 { 1 } -&gt; %xmm2<br/>%xmm2 %xmm0 { 1 } -&gt; %xmm2<br/>%xmm3 %xmm4 { 1 } -&gt; %xmm3<br/></pre>                                                                                                                                                                                                                                                                                                                                            |
| <pre>%xmm4 %xmm5 { 1 } -&gt; %xmm4<br/>%xmm2 %xmm3 { 1 } -&gt; %xmm2<br/>%xmm2 %xmm0 { 1 } -&gt; %xmm2<br/>%xmm3 %xmm4 { 1 } -&gt; %xmm3<br/></pre>                                                                                                                                                                                                                                                                                                                                                                              |
| <pre>%xmm2 %xmm3 { 1 } -&gt; %xmm2<br/>%xmm2 %xmm0 { 1 } -&gt; %xmm2<br/>%xmm3 %xmm4 { 1 } -&gt; %xmm3<br/><br/>%xmm3 %xmm1 { 1 2 10 } -&gt; %xmm3</pre>                                                                                                                                                                                                                                                                                                                                                                         |
| %xmm2 %xmm0 { 1 } -> %xmm2<br>%xmm3 %xmm4 { 1 } -> %xmm3<br><br>%xmm3 %xmm1 { 1 2 10 } -> %xmm3                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| %xmm3 %xmm4 { 1 } -> %xmm3<br><br>%xmm3 %xmm1 { 1 2 10 } -> %xmm3                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| <br>%xmm3 %xmm1 { 1 2 10 } -> %xmm3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| %xmm3 %xmm1 { 1 2 10 } -> %xmm3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| %xmm4 %xmm2 { 1 2 10 } -> %xmm4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| %xmm4 %xmm5 { 1 2 11 } -> %xmm4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| %xmm0 %xmm3 { 1 2 10 } -> %xmm0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| %xmm0 %xmm4 { 1 2 11 } -> %xmm0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| %xmm0 { 1 2 11 } -> %xmm0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |

Figure 10: Tracing the VPAES implementation.

- (1) Instructions from line 1 to 4 load the first round key and xor them with the plaintext in register rax, rbx, rcx, and rax to form the initial state vector. Different taints are assigned to the watched memory locations and propagated to the destination registers.
- (2) Instructions from line 5 to 8 extracts some bytes from the initial state vector and copy them to a different set of registers (r10, r11, r12 and r8). Taints propagate from the source registers to the destination registers.
- (3) Instructions from line 13 to 16 use the extracted bytes to perform lookup in one of the T-tables. The tainted registers are used to form effective addresses. At this point, Clueless marks the memory locations of the first round key as leak points.

# 5.2 VPAES

VPAES is a technique for accelerating AES using vector permute instructions. It avoids key-dependent memory references thus being immune to known cache-timing attacks [28]. The hardware must support vector permutation instructions to use VPAES.

Clueless in tracking mode shows that no part of the key is leaked as memory addresses in OpenSSL's VPAES implementation. The reported propagation trace in Fig. 10 indicates that the key is loaded (e.g., by instruction at 86e1) and combined in different rounds, but no part of the key is used to reference memory.

#### 5.3 AES-NI

Intel introduced AES-NI instruction sets in 2010 to provide direct hardware support for AES. These new instructions run in dataindependent time and do not use tables [29]. AES-NI can encrypt an entire round with a single instruction. AES implementations that properly use AES-NI should be immune to cache side-channel attacks as no cache is involved in these instructions.

| 1  | 0x7ffc4afe86b0 { 1 } -> %rax       | xor (%r15),%eax            |
|----|------------------------------------|----------------------------|
| 2  | 0x7ffc4afe86b4 { 2 } -> %rbx       | xor 0x4(%r15),%ebx         |
| 3  | 0x7ffc4afe86b8 { 3 } -> %rcx       | xor 0x8(%r15),%ecx         |
| 4  | 0x7ffc4afe86bc { 4 } -> %rdx       | xor 0xc(%r15),%edx         |
| 5  | %rax { 1 } -> %r10                 | movzbl %al,%r10d           |
| 6  | %rbx { 2 } -> %r11                 | movzbl %bl,%r11d           |
| 7  | %rcx { 3 } -> %r12                 | movzbl %cl,%r12d           |
| 8  | %rdx { 4 } -> %r8                  | movzbl %dl,%r8d            |
| 9  | %rbx { 2 } -> %rsi                 | movzbl %bh,%esi            |
| 10 | %rcx { 3 } -> %rdi                 | movzbl %ch,%edi            |
| 11 | %rcx { 3 } -> %rcx                 | shr \$0x10,%ecx            |
| 12 | %rdx { 4 } -> %rbp                 | movzbl %dh,%ebp            |
| 13 | [ %r10 { 1 } ] = 0x555e939ad533 -> | movzbl (%r14,%r10,1),%r10d |
| 14 | [ %r11 { 2 } ] = 0x555e939ad527 -> | movzbl (%r14,%r11,1),%r11d |
| 15 | [ %r12 { 3 } ] = 0x555e939ad4c0 -> | movzbl (%r14,%r12,1),%r12d |
| 16 | [ %r8 { 4 } ] = 0x555e939ad4c0 ->  | movzbl (%r14,%r8,1),%r8d   |
|    |                                    |                            |

(a) Propagation

(b) Instruction

Figure 9: Tracing the T-table implementation.

```
0x7ffd85426fd0 { 1 } -> %xmm0
0x7ffd85426fe0 { 2 } -> %xmm1
%xmm2 %xmm0 { 1 } -> %xmm2
%xmm2 %xmm1 { 1 2 } -> %xmm2
0x7ffd85426ff0 { 3 } -> %xmm1
%xmm2 %xmm1 { 1 2 3 } -> %xmm1
...
%xmm2 %xmm1 { 1 2 3 ... 10 } -> %xmm2
0x7ffd85427000 { 4 } -> %xmm1
...
%xmm2 %xmm1 { 1 2 3 ... 10 } -> %xmm2
0x7ffd85427070 { 11 } -> %xmm1
%xmm2 %xmm1 { 1 2 ... 11 } -> %xmm2
%xmm0 { 1 } -> %xmm1
%xmm1 { 1 2 ... 11 } -> 0x55feae1dd130
```

Figure 11: Tracing the AES-NI implementation.

The AES implementation in OpenSSL that uses AES-NI has not been found to transform the key into memory addresses. The propagation trace in Fig. 11 shows that no part of the key is used to reference memory.

# 6 CONCLUSION AND FUTURE WORK

We have presented Clueless: a tool characterising values leaking as addresses. Using Clueless in aggregating mode, we have characterised, for the first time, the amount of data values that transformed into memory addresses in SPEC 2006 benchmark programs. Some benchmark programs use more than one third of accessed memory to reference memory. Clueless in tracking mode has provided the traces of how secrets propagate and leak in a micro benchmark and AES implementations in OpenSSL. The T-table implementation of AES exhibits potential vulnerabilities to cache side-channel attacks while the VPAES and AES-NI implementations are immune to such attacks.

The "leaks" reported by Clueless are to be further studied. We hope to identify the value-address transformations that would lead to the danger of leaking sensitive information from the false positives (e.g. secrets transforming to addresses on the same cache lines). We are interested in applying similar dynamic information flow tracking techniques on hardware models to mitigate cache side-channel attacks such as Specture. The high frequencies of data-address transformations in some programs also indicate optimisation opportunities in cache systems: data that would transform to memory addresses may be associated to the data the transformed addresses point to. This may be a focus of our future work.

# ACKNOWLEDGMENTS

This work was supported by Microsoft Research through its EMEA PhD Scholarship Programme grant no. 2021-020, the Swedish Research Council (VR) grant 2018-05254, the VINNOVA grant 2021-02422, the SSF grant FUS21-0067, and Uppsala University funds for Cybersecurity.

#### REFERENCES

- Y. Yarom and K. Falkner, "Flush+Reload: A high resolution, low noise, l3 cache side-channel attack," in 23rd USENIX security symposium (USENIX security 14), 2014, pp. 719–732.
- [2] O. Aciçmez, W. Schindler, and Ç. K. Koç, "Cache based remote timing attack on the aes," in *Cryptographers' track at the RSA conference*. Springer, 2007, pp. 271–286.
- [3] D. A. Osvik, A. Shamir, and E. Tromer, "Cache attacks and countermeasures: the case of aes," in *Cryptographers' track at the RSA conference*. Springer, 2006, pp. 1–20.
- [4] J. Bonneau and I. Mironov, "Cache-collision timing attacks against aes," in International Workshop on Cryptographic Hardware and Embedded Systems. Springer, 2006, pp. 201–215.
- [5] D. Gullasch, E. Bangerter, and S. Krenn, "Cache games-bringing access-based cache attacks on aes to practice," in 2011 IEEE Symposium on Security and Privacy. IEEE, 2011, pp. 490–505.
- [6] F. Liu, Y. Yarom, Q. Ge, G. Heiser, and R. B. Lee, "Last-level cache side-channel attacks are practical," in 2015 IEEE Symposium on Security and Privacy, 2015, pp. 605–622.
- [7] P. Kocher, J. Horn, A. Fogh, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom, "Spectre attacks: Exploiting speculative execution," in SP, May 2019, pp. 1–19.
- [8] M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P. Kocher, D. Genkin et al., "Meltdown: Reading kernel memory from user space," in 27th USENIX Security Symposium (USENIX Security 18), 2018, pp. 973–990.
- [9] A. Bhattacharyya, A. Sandulescu, M. Neugschwandtner, A. Sorniotti, B. Falsafi, M. Payer, and A. Kurmus, "SMoTherSpectre: Exploiting speculative execution through port contention," in *Proceedings of the ACM SIGSAC Conference on Computer and Communications Security*. Association for Computing Machinery, Nov. 2019, p. 785–800. [Online]. Available: https://doi.org/10.1145/3319535. 3363194
- [10] V. Kiriansky and C. Waldspurger, "Speculative buffer overflows: Attacks and defenses," arXiv:1807.03757 [cs], Jul. 2018. [Online]. Available: http: //arxiv.org/abs/1807.03757
- [11] jannh@google.com, "Issue 1528: speculative execution, variant 4: speculative store bypass - project-zero," Feb. 2018. [Online]. Available: https://bugs.chromium. org/p/project-zero/issues/detail?id=1528

- [12] E. M. Koruyeh, K. N. Khasawneh, C. Song, and N. Abu-Ghazaleh, "Spectre returns! speculation attacks using the return stack buffer," in 12th USENIX Workshop on Offensive Technologies (WOOT 18), 2018.
- [13] G. Maisuradze and C. Rossow, "ret2spec: Speculative execution using return stack buffers," in Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018, pp. 2109–2122.
- [14] P. Aimoniotis, C. Sakalis, M. Själander, and S. Kaxiras, "Reorder buffer contention: A forward speculative interference attack for speculation invariant instructions," vol. 20, pp. 162–165, Jul. 2021.
- [15] M. Yan, J. Choi, D. Skarlatos, A. Morrison, C. Fletcher, and J. Torrellas, "InvisiSpec: Making speculative execution invisible in the cache hierarchy," in *Proceedings* of the IEEE/ACM International Symposium on Microarchitecture, Oct. 2018, pp. 428–441.
- [16] S. Ainsworth, "GhostMinion: A strictness-ordered cache system for spectre mitigation," in *Proceedings of the IEEE/ACM International Symposium on Microarchitecture*. Association for Computing Machinery, Oct. 2021, p. 592–606. [Online]. Available: https://doi.org/10.1145/3466752.3480074
- [17] C. Sakalis, S. Kaxiras, A. Ros, A. Jimborean, and M. Själander, "Efficient invisible speculative execution through selective delay and value prediction," in *Proceedings* of the International Symposium on Computer Architecture, Jun. 2019, pp. 723–735.
- [18] J. Yu, M. Yan, A. Khyzha, A. Morrison, J. Torrellas, and C. W. Fletcher, "Speculative taint tracking (STT): A comprehensive protection for speculatively accessed data," in *Proceedings of the IEEE/ACM International Symposium on Microarchitecture*. Association for Computing Machinery, Oct. 2019, p. 954–968. [Online]. Available: https://doi.org/10.1145/3352460.3358274
- [19] D. E. Denning and P. J. Denning, "Certification of programs for secure information flow," *Commun. ACM*, vol. 20, no. 7, p. 504–513, jul 1977. [Online]. Available: https://doi.org/10.1145/359636.359712

- [20] G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas, "Secure program execution via dynamic information flow tracking," ACM Sigplan Notices, vol. 39, no. 11, pp. 85–96, 2004.
- [21] J. Clause, W. Li, and A. Orso, "Dytan: a generic dynamic taint analysis framework," in Proceedings of the 2007 international symposium on Software testing and analysis, 2007, pp. 196–206.
- [22] L. C. Lam and T.-c. Chiueh, "A general dynamic information flow tracking framework for security applications," in 2006 22nd Annual Computer Security Applications Conference (ACSAC'06). IEEE, 2006, pp. 463–472.
- [23] V. Haldar, D. Chandra, and M. Franz, "Dynamic taint propagation for java," in 21st Annual Computer Security Applications Conference (ACSAC'05). IEEE, 2005, pp. 9-pp.
- [24] J. Kong, C. C. Zou, and H. Zhou, "Improving software security via runtime instruction-level taint checking," in *Proceedings of the 1st workshop on Architectural and system support for improving software dependability*, 2006, pp. 18–24.
- [25] F. Qin, C. Wang, Z. Li, H.-s. Kim, Y. Zhou, and Y. Wu, "Lift: A low-overhead practical information flow tracking system for detecting security attacks," in 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06). IEEE, 2006, pp. 135–148.
- [26] "Pin a dynamic binary instrumentation tool." [Online]. Available: https://www.intel.com/content/www/us/en/developer/articles/tool/pina-dynamic-binary-instrumentation-tool.html
- [27] M. Dworkin, E. Barker, J. Nechvatal, J. Foti, L. Bassham, E. Roback, and J. Dray, "Advanced encryption standard (aes)," 2001-11-26 2001.
- [28] M. Hamburg, "Accelerating aes with vector permute instructions," in International Workshop on Cryptographic Hardware and Embedded Systems. Springer, 2009, pp. 18–32.
- [29] S. Gueron, "Intel advanced encryption standard (aes) new instructions set," 2010.