Exploiting parallelism opportunities in non-parallel architectures to improve NLFSR software implementations

50 %
50 %
Information about Exploiting parallelism opportunities in non-parallel architectures to...
Technology

Published on March 14, 2014

Author: GreenLSI

Source: slideshare.net

Description

Presentation by Pedro Malagón at DCIS 2013 conference, organized by CEIT (Nov 27th, 2013)

Exploiting parallelism opportunities in non-parallel architectures to improve NLFSR software implementations Pedro Malagón Juan-Mariano de Goyeneche José M. Moya 1 / 20

Context • Remote Keyless Entry Systems (RKE) – Small communications – Two sides of communication know state – Knowing previous state/message provides no information of next state/message (ideally) 2

Global goal 3 • Automatic generation of different implementations of the same encryption algorithm • Random execution of implementations in order to introduce variability that increases resistance against Side-Channel Attacks

LFSR (I) • Linear Feedback Shift Registers • Implementation – Very simple in Hardware – One-bit at a time in Software 4

LFSR (II) • Pros: – Pseudo-random sequence – Long period: n-bits → 2n – Simple implementation • Cons: – Berlekamp-Massey algorithm • Observing 2n gives complete information of LFSR 5

NLFSR (I) • Add non linearity to improve security • Non-Linear Feedback Shift Registers 6

NLFSR (II) • Implementation – Focus on the NLF – bit LUT – Run-time computed: ANF – Automatically detection of ci values 7 { } { } ( ) ∑ − = −− − ••••= → 12 0 11010 110 ,, 1,01,0 n n i i n ii in n xxxcxxf KK

Concrete goal 8 • Goal: different implementations potentially automatic • Two completley different implementations: – ANF based and LUT based • ANF drawbacks – Too many run-time operations (boolean) • Optimization of ANF based implementations

Round processing • Feedback inputs can be available • Available processing capabilities – min (j - i, n) n-bit ALU, j-bit data, i bit – Similar to MMX in AES implementations 9 round i+1 round i+1

LLVM Passes 10 • ANF implementation • DAG building • CFG generation • Masking meta → valid bits • Instruction scheduling (maximize bits) • Loop instruction motion → Nested loops – Power of two step

Test case 11 • KeeLoq in MSP430 (16-bit) • Inputs: d0, d1, d9, d16, d20, d26, d31, k0 • Data: 32-bits

Experimental 12 • Compare 5 implementations – 3 LUT based – tb041: official PIC implementation – nlf_tb041: mask calculation – gen_tb041: official generic Microchip – 2 ANF based – bin_ops: one bit at a time – par_bin_ops: applying optimizer 16-round processing < 33 Setup output

par_bin_ops 13 • Implementation

Cycles (16 rounds) 14

Instructions (16 rounds) 15

Memory (16 rounds) 16

Conclusions 17 • Worst case – Cycles improvement: 2.45 – Code size grows in 2.27 • Automatically generated

Thank you 18 Thank you for coming Any questions?

Add a comment

Related presentations