December 11-15, 2000

New Orleans, Louisiana

*The Chinese Remainder Theorem and its Application in a High-Speed RSA Crypto-Chip*

**Johann GroßschädlIAIK, Graz University of TechnologyAustria**

The performance of RSA hardware is primarily determined by an efficient implementation of the long integer modular arithmetic and the ability to utilize the Chinese Remainder Theorem (CRT) for the private key operations. This paper presents the multiplier architecture of the RSAΓ crypto chip, a high-speed hardware accelerator for long integer modular arithmetic. The RSAΓ multiplier datapath is reconfigurable to execute either one 1024 bit modular exponentiation or two 512 bit modular exponentiations in parallel. Another significant characteristic of the multiplier core is its high degree of parallelism. The actual RSAΓ prototype contains a 1056*16 bit word-serial multiplier which is optimized for modular multiplications according to Barret's modular reduction method. The multiplier core is dimensioned for a clock frequency of 200 MHz and requires 227 clock cycles for a single 1024 bit modular multiplication. Pipelining in the highly parallel long integer unit allows to achieve a decryption rate of 560 kbit/s for a 1024 bit exponent. In CRT-mode, the multiplier executes two 512 bit modular exponentiations in parallel, which increases the decryption rate by a factor of 3.5 to almost 2 Mbit/s.