FPGA Optimal Implementation of PRINCE against Power Analysis Attacks

Yi Zou

College of Computer Science and Technology, Hengyang Normal University
Hunan Provincial Key Laboratory of Intelligent Information Processing and Application
Hengyang, 421002, China
E-mail: 51674074@qq.com

Lang Li

College of Computer Science and Technology, Hengyang Normal University
Hengyang, 421002, China
E-mail: lilang911@126.com

The algorithm PRINCE proposed in ASIACRYPT 2012 is a lightweight cipher, currently suitable for the implementation on RFID Smart Card of the IoT. It's studied to achieve the optimal implementation of PRINCE encryption algorithm, and to add a fixed random mask to enhance the anti power attacking ability of PRINCE. In this paper, we constructed the same operations into a module called PrinceRound, and used counter to control the PrinceRound module, while the same round operation runs repeatedly. As a result, it can save registers and effectively reduce the amount of computation. Furthermore, we used a special method by adding the random mask to S-box. The experimental results show that the optimized PRINCE occupies area 9.8% fewer than original PRINCE on FPGA and the encryption with a fixed random mask can run correctly. Last but not least, our research is the first one about the implementation of PRINCE algorithm on FPGA, and can serve as a reference for further application of IOT encryption.
1. Introduction

With the rapid development of the Internet of Things (IoT), great changes have taken place in all aspects of people's lives in recent years. Many lightweight block ciphers have been proposed to provide cipher blocks for resource constrained devices such as RFID tags. Among the best studied ciphers are PRESENT, KATAN, LED and PRINT Cipher [1-4].

PRINCE is a novel lightweight block cipher which was proposed by Borgho in 2012 [5]. It is optimized with respect to latency when implemented in hardware. This is the first lightweight block cipher which gives priority to latency with block length 64 bits and the key 128 bits. There are few research papers about PRINCE cryptographic algorithms but there is no research paper about optimization of the PRINCE cipher algorithm and resisting power analysis.

It is very important to reduce area resources when the cipher is implemented on resource constrained chips. At the same time, in order to obtain the area optimization, the anti attack ability of the cipher is correspondingly declining. So it is a very meaningful to find out how to make the lightweight cipher more secure, occupy less resource and run faster.

This paper is organized as follows: Section 2 briefly describes the PRINCE block cipher; Section 3 elaborates on area optimization of PRINCE; in Section 4, resisting power analysis for PRINCE encryption algorithm is proposed; Section 5 discusses the experimental results, and finally the conclusion is presented in the last section.

2. Description of PRINCE

PRINCE is a 64-bit block cipher with a 128-bit key. The key is split into two parts of 64 bits each, \( k = k_0 || k_1 \), and extended to 192 bits by the mapping \((k_0 || k_1) \rightarrow (k_0 || k_0' || k_1) := (k_0 || (k_0 \gg \geq 1) \oplus (k_0 \gg \geq 63)|| k_1)\). During the encryption, the first two subkeys \( k_0 \) and \( k_0' \) are used as whitening keys, while the third subkey \( k_1 \) is the key for a 12-round block cipher referred to as PRINCEcore. The high level structure of PRINCE is demonstrated in Fig. 1.

The 12-round process of PRINCEcore is depicted in Fig. 2. A typical round of PRINCEcore consists of an S-box layer, a linear layer and an addition layer. The intermediate computation result called state, is usually represented by a 64-bit vector or a 16-nibble vector.

![Figure 1: The structure of PRINCE](image1)

![Figure 2: PRINCEcore](image2)

S-box layer: The cipher uses a 4-bit S-box which is given in Table 1. We denote the S-box and its inverse by \( S \) and \( S^{-1} \) respectively.

<table>
<thead>
<tr>
<th>X</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
<th>E</th>
<th>F</th>
</tr>
</thead>
<tbody>
<tr>
<td>S(X)</td>
<td>B</td>
<td>F</td>
<td>3</td>
<td>2</td>
<td>A</td>
<td>C</td>
<td>9</td>
<td>1</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>0</td>
<td>E</td>
<td>5</td>
<td>D</td>
</tr>
</tbody>
</table>

Table 1: S-box of PRINCE
3. The Area Optimization and Verification of PRINCE

In the area optimization method, the PRINCE algorithm mainly has five modules: MatrixMutil, ShiftRow, SubCell, PrinceRound, and the main program Prince. Five modules are described as follows:

- **MatrixMutil**: the column vector of matrix multiplication (the intermediate state is the column vector);
- **ShiftRow**: including the row shift and reverse row shift, through the control signal to choose the corresponding results as output;
- **SubCell**: including S-box replacement and reverse S-box replacement, through the control signal to choose the corresponding results as output;
- **PrinceRound**: complete a round wheel transformation of Prince algorithm
- **The main program Prince**: The module which controls all modules.

### 3.1 The Optimization of MatrixMutil and Experiment

In MatrixMutil, we found that according to the characteristics of the matrix, the element on each line is a corresponding element of the corresponding column vector, the result is obtained by adding these elements. Using this method, we only need to do 64*3 XOR on MatrixMutil, effectively reducing the computational complexity. The Verilog HDL code is given as follows.

```verilog
module MatrixMutil(res,state);
    input [0:63] state;
    output[0:63] res;
    assign res={
        ...........
        s[48]^s[56]^s[60],s[49]^s[53]^s[61],s[50]^s[54]^s[58],s[55]^s[59]^s[63]
    };
endmodule
```

The simulation results of MatrixMutil by Modelsim 6.1f are as follows.

![Simulation results of MatrixMutil module](image)

**Figure 3**: Simulation results of MatrixMutil module

### 3.2 Optimization of Repeated Round Modules and Experiment

The same operation module of the PRINCE encryption is placed in the round operation called PrinceRound so as to realize the repeated calling. It can effectively reduce the area when the encryption is running on hardware. Thus, it is feasible to apply encryption into sensors and other nodes.

In PrinceRound module, the operation order of the corresponding module was controlled by the control signal, which can implement encryption/decryption operations within the module, and it can reduce the number of registers. The Verilog HDL code of main programming is described as follows.

```verilog
assign tag=(r<=5)?1:0;
```
assign t[0]=tag? state:tem[1], ......t[3]=tag?tem[2]:state,
result=tag?tem[3]:tem[0];
MatrixMutil MM(tem[0],t[0]);
ShiftRow SR(tem[1],t[1],tag);
SubCell SC(tem[3],t[3],tag);

Simulation results for the PRINCE round module are shown as Fig. 4.

Figure 4: Simulation Results for the PRINCE Round Module

3.3 Main Program of Prince

The realization of the Prince master control program is to use the counter which controls
PrinceRound module to run transformation round 10 times. The code is shown as follows.

```verilog
always @(posedge clk) begin
  cnt <= (cnt^10)? cnt+1: cnt;
  res <= ready ? ( (cnt ?t:SC ) :res;
  ready <= (cnt^10)? 1:0;
end
PrinceRound PR(t_res,res,key[64:127],cnt);
```

Owing to the method of continuous assignment and the clock signal controlled by the
calculator to update, the encryption requires only 12 clock cycles.

<table>
<thead>
<tr>
<th>plaintext</th>
<th>k0</th>
<th>k1</th>
<th>ciphertext</th>
</tr>
</thead>
<tbody>
<tr>
<td>0000000000000000</td>
<td>0000000000000000</td>
<td>0000000000000000</td>
<td>818665aa0d02dfa</td>
</tr>
<tr>
<td>mmmmmmmmmmmmmmm</td>
<td>0000000000000000</td>
<td>0000000000000000</td>
<td>604ae6ca03c20ada</td>
</tr>
<tr>
<td>0000000000000000</td>
<td>mmmmmmmmmmmmmmm</td>
<td>0000000000000000</td>
<td>9fb51935fc3df524</td>
</tr>
<tr>
<td>0000000000000000</td>
<td>0000000000000000</td>
<td>mmmmmmmmmmmmmmm</td>
<td>78a54cbe737bb7ef</td>
</tr>
<tr>
<td>0123456789abcdef</td>
<td>0000000000000000</td>
<td>fedcba9876543210</td>
<td>ae25ad3ca8fa9ccf</td>
</tr>
</tbody>
</table>

Table 2: Test Vector of PRINCE

Figure 5: Simulation Results of Main Program Prince

The test vector in Table 2 is the vector used in Paper [5], and we also used the same test
values on the fifth group in Paper [5]. The simulation results can be seen in Fig. 5, the plaintext
for input is 012345789abcdef, the key for input is 0000000000000000 and fedcba9876543210,
the ciphertext is ae25ad3ca8fa9ccf. So, the method of optimization is correct.

4. Algorithm of Resisting Power Analysis for PRINCE

The random mask is considered for the S-box transform and inverse S-box transform of
PRINCE encryption algorithm. Due to the special design of the PRINCE algorithm, it also
needs to do an S-box transform and inverse S-box transformation in the control section. In order to construct S-boxes with random mask, a large amount of memory is needed. So we use a special way to avoid consuming more memory. Before the S-box transform, the intermediate state is XOR with a random array once, then after S-box transform, the result XOR with a random array runs again. The modified code is suitable for completely random mask, partial random mask, and fixed random mask. The algorithm is shown as follows:

Algorithm 1: Add(state, key0); Add(state, key1); Add(state, RC[0]);
Add(state, R); SubCell(state, R); Add(state, R);
for (i = 1; i < 5; i++) {
    M_layer(state); M_layer(R);
    ShiftRow(state); ShiftRow(R); Add(state, RC[i]); Add(state, key1);
    Add(state, R); // add random mask
    SubCell(state, R);
    Add(state, R); // compensation mask
} M_layer(state); M_layer(R);
for (i = 10; i < 15; i++) {
    r_ShiftRow(state); r_ShiftRow(R);
    M_layer(state); M_layer(R);...} // the next 5 rounds

From Fig. 6, with the input of plaintext and key, the output of the PRINCE algorithm which added a random mask can verify that the random mask is correct.

Figure 6: Verification Result of PRINCE Random Mask Algorithm in VC

5. Experiment on FPGA and Analysis

5.1 Area Comparison

Among the above method, the optimized PRINCE algorithm, un-optimized PRINCE algorithm and the PRINCE random mask algorithm are run on FPGA, synthesized by ISE13.2 and downloaded on Xilinx Virtex-5 LX50T FPGA. The result of the un-optimized PRINCE algorithm on FPGA is shown in Fig. 7. The result of optimized is shown in Fig. 8. The result of PRINCE random mask algorithm is shown in Fig. 9 respectively. Table 3 is the comparison of logic resource usage on FPGA.
FPGA Optimal Implementation of PRINCE Against Power Analysis Attacks

Yi Zou

5.2 Speed Comparison

Throughput is equal System clock frequency * length of block / clock of encryption and decryption. The length of cipher block is 64, and clock of encryption and decryption is 12. The clock frequency of the optimized PRINCE algorithm is 102.459, so

\[
T_{\text{throughput}} = 102.459 \times 64/12 = 546.448 (\text{Mbps})
\]

The clock frequency of un-optimized PRINCE algorithm is 109.393, so

\[
T_{\text{throughput}} = 109.393 \times 64/12 = 538.43 (\text{Mbps})
\]

The clock frequency of PRINCE random mask algorithm is 100.472, so

\[
T_{\text{throughput}} = 100.472 \times 64/12 = 535.850 (\text{Mbps})
\]

From the above results, we can see that the optimized PRINCE with less area and the encryption speed is faster, the throughput is increased by 1.5%, and it can be testified that we added a fixed value.
6. Conclusion

This paper proposed a method based on area optimization for PRINCE, and realized the algorithm of resisting power analysis for PRINCE based on Verilog HDL language and running on FPGA. The experimental results have shown that random mask added correctly, and the algorithm of resisting power analysis which is proposed in this paper added fixed random mask value. Theoretically, it has a certain ability of defending attack.

References


