Annual Computer Security Applications Conference (ACSAC) 2022

Full Program »

Parallel Small Polynomial Multiplication for Dilithium: A Faster Design and Implementation

The lattice-based signature scheme CRYSTALS-Dilithium is one of the two signature finalists in the third round NIST post-quantum cryptography (PQC) standardization project. For applications of low-power Internet-of-Things (IoT) devices, recent research efforts have been focusing on the performance optimization of PQC algorithms on embedded systems. In particular, performance optimization is more demanding for PQC signature algorithms that are usually significantly more time-consuming than PQC public-key encryption counterparts. For most cryptographic algorithms based on algebraic lattices including Dilithium, the fundamental and most time-consuming operation is polynomial multiplication over rings. For this computational task, number theoretic transform (NTT) is the most efficient multiplication method for NTT-friendly rings, and is now the typical technique for performing fast polynomial multiplications when implementing lattice-based PQC algorithms. The key observation of this work is that, besides multiplications of polynomials of standard forms, Dilithium involves a list of multiplications for polynomials of very small coefficients. Can we have more efficient methods for multiplying such polynomials of small coefficients? Under this motivation, we present in this work a parallel small polynomial multiplication algorithm to speed up the implementations of Dilithium. We complete both C reference implementation and ARM Neon implementation. Moreover, we conducted some speed tests in combination with Becker’s Neon NTT [4]. The results show that, in comparison with the C reference implementation of Dilithium submitted to the third round of the NIST PQC competition, our reference implementation with the proposed parallel small polynomial multiplication is faster: specifically, our Sign and Verify speed up 18% and 19% respectively for Dilithium-2 (30% and 7% for Dilithium-3, 27% and 3% for Dilithium5, respectively). As for the Arm Neon implementation, we achieved a performance improvement of about 64% in Sign and 50% in Verify for Dilithium-2 (60% and 32% for Dilithium-3) compared with the C reference implementation of Dilithium submitted to the third round of the NIST PQC competition. We aslo compared our work with the state-of-the-art Arm Neon implementation of Dilithium [4], the results show our speed of Sign is 13.4% faster for Dilithium-2 and 8.0% faster for Dilithium-3, achieving the new record of fast Dilithium implementation.

Jieyu Zheng
Fudan university

Feng He
Fudan university

Shiyu Shen
Fudan university

Chenxi Xue
Fudan university

Yunlei Zhao
Fudan university

Paper (ACM DL)

Slides

 



Powered by OpenConf®
Copyright©2002-2023 Zakon Group LLC