chacha

Verilog 2001 implementation of the ChaCha stream cipher.

Purpose

The main purpose of this implementation is as a proof of concept and as a comparison of hardware complexity and performance in comparison with Salsa20 and other ciphers.

Functionality

This core implements ChaCha with support for 128 and 256 bit keys. The number of rounds can be set from two to 32 rounds in steps of two. The default number of rounds is eight.

The core contains an internal 64-bit block counter that is automatically updated for each data block.

Branch for VHDL interoperability

There is a branch vhdl_interop that changes the port name "next" in chacha_core.v. Next is a reserved word in VHDL. If you are instantiating the core in a mixed language environment use this branch. This branch will not be merged into master, but will track changes to the master branch.

Performance

The core now has four separate quarterround modules, which means that one round takes one cycle.

Implementation

Implementation results using the Altera Quartus Prime 15.1 design tool.

Cyclone IV E

4748 LEs
1940 registers
55 MHz
11 cycles latency

Cyclone V GX

1939 ALMs for logic
1940 registers
60 MHz
11 cycles latency

Implementation results using Xilinx ISE 14.7.

Xilinx Spartan-6

xc6slx75-3fgg676
3843 slice LUTs
1049 slices
1994 registers
83 MHz
11 cycles latency

Xilinx Artix-7

xc7a200t-3fbg484
3837 slice LUTs
1076 slices
1949 registers
100 MHz
11 cycles latency

Status

(2016-09-16) Even more cleanup has been done. Have found several bugs that effected performance and size of the design. There are still a few cleanups that could be done, but the core is now much more compact, esp in terms of registers and yet much faster.

(2016-09-10) As part of developing an implementation of RFC 7539, the ChaCha core has seen quite extensive cleanup and optimization. Things like:

Updated interface to match what has become the standard interface for my core.
Updated API that matches what is used in other cores and esp the Cryptech project.
The number of QR modules has increased to four, which means that the performance has basically quadrupled.
The double set of states has been decreased to a sigle set of state registers.
The core does no longer sample all inputs. Instead it expects the higher level module to provide stable key, iv and data in registers. This matches what is being used in the top level module.

The implementation result will have to be updated to include all these changes.

(2014-11-07)

Added implementation results for Xilinx Spartan-6.

(2014-09-03)

Added a new port in the core to allow setting of the initial value of the counter. The top level wrapper currently sets this value to a constant zero.

(2014-03-18)

Renamed the repo to chacha. We don't need the sw-prefix.

(2013-12-29)

Changed to four parallel QR engines to more than quadruple the performance using 5% more logic. The latency is down from 35 cycles to 11 cycles.

(2013-12-20)

Top and core are verified. A somewhat major overhaul of the design was needed to fix proper handling of multiple block messages.
The design has been synthesized and senth trough the Altera Quartus backend to get completed implementations for Alteras Cyclone IV GX and Cyclone V GX devises.

(2013-12-17)

Top is almost completely verified. Top testbench runs self checking test cases.

(2013-12-10)

Core is now functionally correct. Still need to be pushed through backend and be cleaned up.
Top is not functionally verified.

(2013-11-03)

The core round processing is (seems to be) functionally correct.
The Python model has been cleaned up. A few minor display bugs has been fixed.

(2013-10-13)

Python model now completed and pass all tests cases generated by the reference model.

(2013-10-06)

Added TC1 from chacha_testvectors. Testing shows that the Python model generates correct result, but is intra word-big endian, not little endian. We need to flip the words around.

(2013-10-04)

The reference model has been moved to a separate project created just to generate and document ChaCha test vectors. See: https://github.com/secworks/chacha_testvectors

(2013-10-03)

There is now also a c reference model that is intended to generate test vectors for different combinations of keys, iv, blocks and rounds.

(2013-09-26)

Debugging of the core is ongoing with quarterround at the focus.
The core goes through the motions of processing blocks.
There is a testbench for the top level the top seems to work ok.
There is a Python model being developed in src/model.
Quarterround in the Python model works. Used to debug the RTL.

(Older stuff)

The current implementation consists of a ChaCha core with (very) wide data interfaces.
There is also a top level wrapper, chacha.v that provides a memory like 32-bit API. Note that with
The core is currently not completed. The datapath and control is basically completed but needs to be debugged. The initial version of the testbench for the core is done but the core is not yet connected.
The top level wrapper is functionally completed, but not yet debugged. There is no testbench.

chenruibuaa / chacha