Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
GiacomoPope committed Jul 22, 2024
1 parent 23d4c9f commit afef20d
Showing 1 changed file with 149 additions and 102 deletions.
251 changes: 149 additions & 102 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,114 @@
[![GitHub CI](https://github.com/GiacomoPope/kyber-py/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/GiacomoPope/kyber-py/actions/workflows/ci.yml)
[![Documentation Status](https://readthedocs.org/projects/kyber-py/badge/?version=latest)](https://kyber-py.readthedocs.io/en/latest/?badge=latest)

# CRYSTALS-Kyber Python Implementation
# ML-KEM / CRYSTALS-Kyber Python Implementation

This repository contains a pure python implementation of CRYSTALS-Kyber
following (at the time of writing) the most recent
[specification](https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf)
(v3.02)
:warning: **Under no circumstances should this be used for a cryptographic
application.** :warning:

## A note on ML-KEM
This repository contains a pure python implementation of both:

There is also a ML-KEM implementation passing KAT vectors compliant with the NIST spec in this repo, documentation of ML-KEM is a work in progress.
1. **CRYSTALS-Kyber**: following (at the time of writing) the most recent
[specification](https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf)
(v3.02)
2. **ML-KEM**: The NIST Module-Lattice-Based Key-Encapsulation Mechanism
Standard following the [FIPS 203 (Initial Public
Draft)](https://csrc.nist.gov/pubs/fips/203/ipd) based off the Kyber submission
to the NIST post-quantum cryptography project.

## Disclaimer

:warning: **Under no circumstances should this be used for a cryptographic application.** :warning:
`kyber-py` has been written as an educational tool. The goal of this project was
to learn about how Kyber works, and to try and create a clean, well commented
implementation which people can learn from.

I have written `kyber-py` as a way to learn about the way Kyber works, and to
try and create a clean, well commented implementation which people can learn
from.
This code is not constant time, or written to be performant. Rather, it was
written so that the python code closely follows Algorithms 1-9 in the original
[specification](https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf).

This code is not constant time, or written to be performant. Rather, it was
written so that reading though Algorithms 1-9 in the
[specification](https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf)
closely matches the code which is seen in `kyber.py`.
## History of this Repository

This work started by simply implementing Kyber for fun, however after NIST
picked Kyber to standardise as ML-KEM, the repository grew and now includes both
implementations of Kyber and ML-KEM. I assume as this repository ages, the Kyber
implementation will get less useful and the ML-KEM one will be the focus, but
for historical reasons we will include both. If only so that people can study
the differences which NIST introduced during the standardisation of the
protocol.

### KATs

This implementation currently passes all KAT tests from the reference implementation.
For more information, see the unit tests in [`test_kyber.py`](test_kyber.py).
This implementation currently passes all KAT tests for `kyber` and `ml_kem` For
more information, see the unit tests in [`test_kyber.py`](tests/test_kyber.py)
and [`test_ml_kem.py`](tests/test_ml_kem.py).

The KAT files were either downloaded or generated:

**Note**: there is a discrepancy between the specification and reference implementation.
To ensure all KATs pass, I have to generate the public key **before** the random
bytes $z = \mathcal{B}^{32}$ in algorithm 7 of the
1. For **Kyber**, the KAT files were generated from the projects [GitHub
repository](https://github.com/pq-crystals/kyber/) and are included in
`assets/PQCLkemKAT_*.rsp`
2. For **ML-KEM**, the KAT files were download from the GitHub repository
(post-quantum-cryptography/KAT)[https://github.com/post-quantum-cryptography/KAT/tree/main/MLKEM]

**Note**: for Kyber v3.02, there is a discrepancy between the specification and
reference implementation. To ensure all KATs pass, one has to generate the
public key **before** the random bytes $z = \mathcal{B}^{32}$ in algorithm 7 of
the
[specification](https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf)
(v3.02).

### Dependencies

Originally this was planned to have zero dependencies, however to make this work
pass the KATs, I needed a deterministic CSRNG. The reference implementation uses
Originally this project was planned to have zero dependencies, however to make this work
pass the KATs, we needed a deterministic CSRNG. The reference implementation uses
AES256 CTR DRBG. I have implemented this in [`aes256_ctr_drbg.py`](aes256_ctr_drbg.py).
However, I have not implemented AES itself, instead I import this from `pycryptodome`.
However, I have not implemented AES itself, instead I import this from `pycryptodome`. If this dependency is too annoying, then please make an issue and we can have a pure-python AES included into the repo.

To install dependencies, run `pip -r install requirements`.

If you're happy to use system randomness (`os.urandom`) then you don't need
this dependency.

## Using kyber-py

There are three functions exposed on the `Kyber` class which are intended
for use:
### ML-KEM

- `Kyber.keygen()`: generate a keypair `(pk, sk)`
- `Kyber.enc(pk)`: generate a challenge and a shared key `(c, K)`
- `Kyber.dec(c, sk)`: generate the shared key `K`
There are three functions exposed on the `ML_KEM` class which are intended for
use:

To use `Kyber()` it must be initialised with a dictionary of the
protocol parameters. An example can be seen in `DEFAULT_PARAMETERS`.
- `ML_KEM.keygen()`: generate a keypair `(ek, dk)`
- `ML_KEM.encaps(ek)`: generate a key and ciphertext pair `(key, ct)`
- `ML_KEM.decaps(ct, dk)`: generate the shared key `key`

Additionally, the class has been initialised with these default parameters,
so you can simply import the NIST level you want to play with:
#### Example

```python
>>> from ml_kem import ML_KEM128
>>> ek, dk = ML_KEM128.keygen()
>>> key, ct = ML_KEM128.encaps(pk)
>>> _key = ML_KEM128.decaps(ct, sk)
>>> assert key == _key
```

The above example would also work with `ML_KEM192` and `ML_KEM256`.

#### Benchmarks

Included ere are some approximate benchmarks, although the purpose of this project is not speed, but rather education!

| Params | keygen | keygen/s | encap | encap/s | decap | decap/s |
|------------|---------:|-----------:|--------:|----------:|--------:|---------:|
|ML_KEM128 | 4.91ms| 203.76| 7.69ms| 130.08| 12.11ms| 82.56 |
|ML_KEM192 | 6.93ms| 144.31| 10.89ms| 91.86| 17.10ms| 58.47 |
|ML_KEM256 | 9.85ms | 101.54| 14.71ms| 67.97| 22.96ms| 43.56 |

All times recorded using a Intel Core i7-9750H CPU and averaged over 1000 runs.

### Kyber

There are three functions exposed on the `Kyber` class which are intended for
use:

- `Kyber.keygen()`: generate a keypair `(pk, sk)`
- `Kyber.enc(pk)`: generate a challenge and a shared key `(c, key)`
- `Kyber.dec(c, sk)`: generate the shared key `key`

#### Example

Expand All @@ -75,54 +122,61 @@ so you can simply import the NIST level you want to play with:

The above example would also work with `Kyber768` and `Kyber1024`.

### Benchmarks
We expect users to pick one of the three initalised classes which use the
default parameters of the Kyber specification. The three options are `Kyber512`,
`Kyber768` and `Kyber1024`. However, by following the values in
`DEFAULT_PARAMETERS` one could tweak these values to look at how Kyber behaves
for different default values.

For now, here are some approximate benchmarks, although the purpose of this project is not speed, but rather education!
**NOTE**: it is relatively easy to change the parameters $k$, $\eta_1$, $\eta_2$
$d_u$ and $d_v$ from the Kyber specification. However, if you wish to change the
polynomial ring itself, then you will lose access to the NTT transforms which
currently only support $q = 3329$ and $n = 256$.

#### Benchmarks

Included ere are some approximate benchmarks, although the purpose of this project is not speed, but rather education!

| Params | keygen | keygen/s | encap | encap/s | decap | decap/s |
|------------|---------:|-----------:|--------:|----------:|--------:|---------:|
|Kyber512 | 4.82ms| 207.59| 7.10ms| 140.80| 11.65ms| 85.82 |
|Kyber768 | 6.87ms| 145.60| 10.11ms| 98.92| 16.51ms| 60.58 |
|Kyber1024 | 9.72ms| 102.91| 13.71ms| 72.94| 22.20ms| 45.05 |

All times recorded using a Intel Core i7-9750H CPU.
All times recorded using a Intel Core i7-9750H CPU and averaged over 1000 runs.

## Documentation (under active development)

- https://kyber-py.readthedocs.io/en/latest/

## Future Plans
## Polynomials and Modules

* Add documentation on `NTT` transform for polynomials
* Add documentation for working with DRBG and setting the seed
There are two main things to worry about when implementing Kyber/ML-KEM. The
first thing to consider is the mathematics, which requires performing linear
algebra in a module with elements in the ring $R_q = \mathbb{F}_q[X] /(X^n + 1)$
and the second is the sampling, compression and decompression, which links to
the cryptographic assurance of the protocol.

## Discussion of Implementation

### Kyber

```
TODO:
Add some more information about how working with Kyber works with this
library...
```
For those who don't know, a module is a generalisation of a vector space, where
elements of a matrix are not selected from a field (such as the rationals, or
element of a finite field $\mathbb{F}_{p^k}$), but rather in a ring (we do not
require each element in a ring to have a multiplicative inverse). The ring in question for Kyber/ML-KEM is a polynomial ring where polynomials have coefficents in $\mathbb{F}_{q}$ with $q = 3329$ and the polynomial ring has a modulus $X^n + 1$ with $n = 256$ (and so every element of the polynomial ring has at most 256 coefficients).

### Polynomials

The file [`polynomials.py`](polynomials.py) contains the classes
`PolynomialRing` and
`Polynomial`. This implements the univariate polynomial ring
To help with experimenting with these polynomial rings themselves, the file [`polynomials_generic.py`](polynomials/polynomials_generic.py) has an implementation of the univariate polynomial ring

$$
R_q = \mathbb{F}_q[X] /(X^n + 1)
$$

The implementation is inspired by `SageMath` and you can create the
where the user can select any $q, n$. For example, you can create the
ring $R_{11} = \mathbb{F}_{11}[X] /(X^8 + 1)$ in the following way:

#### Example

```python
>>> from polynomials.polynomials_generic import PolynomialRing
>>> R = PolynomialRing(11, 8)
>>> x = R.gen()
>>> f = 3*x**3 + 4*x**7
Expand All @@ -136,72 +190,63 @@ ring $R_{11} = \mathbb{F}_{11}[X] /(X^8 + 1)$ in the following way:
0
```

We additionally include functions for `PolynomialRingKyber` and `PolynomialKyber`
to move from bytes to polynomials (and back again).
We hope that this allows for some hands-on experience at working with these
polynomials before starting to play with the whole of Kyber/ML-KEM.

For the "Kyber-specific" functions, needed to implement the protocol itself, we
have made a child class `PolynomialRingKyber(PolynomialRing)` which has the
following additional methods:

- `PolynomialRingKyber`
- `parse(bytes)` takes $3n$ bytes and produces a random polynomial in $R_q$
- `decode(bytes, l)` takes $\ell n$ bits and produces a polynomial in $R_q$
- `cbd(beta, eta)` takes $\eta \cdot n / 4$ bytes and produces a polynomial in $R_q$ with coefficents taken from a centered binomial distribution
- `cbd(beta, eta)` takes $\eta \cdot n / 4$ bytes and produces a polynomial in
$R_q$ with coefficents taken from a centered binomial distribution
- `PolynomialKyber`
- `self.encode(l)` takes the polynomial and returns a length $\ell n / 8$ bytearray
- `encode(l)` takes the polynomial and returns a length $\ell n / 8$ bytearray
- `to_ntt()` converts the polynomial into the NTT domain for efficient
polynomial multiplication and returns an element of type
`PolynomialKyberNTT`
- `PolynomialKyberNTT`
- `from_ntt()` converts the polynomial back from the NTT domain and returns an
element of type `PolynomialKyber`

#### Example

```python
TODO
```
This class fixes $q = 3329$ and $n = 256$

Lastly, we define a `self.compress(d)` and `self.decompress(d)` method for
polynomials following page 2 of the
polynomials following page 2 of the
[specification](https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf)

$$
\textsf{compress}_q(x, d) = \lceil (2^d / q) \cdot x \rfloor \textrm{mod}^+ 2^d,
$$
$$ \textsf{compress}_q(x, d) = \lceil (2^d / q) \cdot x \rfloor \textrm{mod}^+
2^d, $$

$$
\textsf{decompress}_q(x, d) = \lceil (q / 2^d) \cdot x \rfloor.
$$
$$ \textsf{decompress}_q(x, d) = \lceil (q / 2^d) \cdot x \rfloor. $$

The functions `compress` and `decompress` are defined for the coefficients
of a polynomial and a polynomial is (de)compressed by acting the function
on every coefficient.
Similarly, an element of a module is (de)compressed by acting the
The functions `compress` and `decompress` are defined for the coefficients of a
polynomial and a polynomial is (de)compressed by acting the function on every
coefficient. Similarly, an element of a module is (de)compressed by acting the
function on every polynomial.

#### Example

```python
TODO
```

**Note**: compression is lossy! We do not get the same polynomial back
by computing `f.compress(d).decompress(d)`. They are however *close*.
See the specification for more information.
**Note**: compression is lossy! We do not get the same polynomial back by
computing `f.compress(d).decompress(d)`. They are however *close*. See the
specification for more information.

### Number Theoretic Transform

```
TODO:
Talk about what is available, and how it is used and the two Polynomial types
we have to handle this.
```
**TODO**: it would be good to write something more detailed here.

### Modules

The file [`modules.py`](modules.py) contains the classes `Module` and `Matrix`.
A module is a generalisation of a vector space, where the field
of scalars is replaced with a ring. In the case of Kyber, we
need the module with the ring $R_q$ as described above.
Building on `polynomials_generic.py` we also include a file
[`modules_generic.py`](modules/modules_generic.py) which has all of the
functions needed to perform linear algebra given a ring.

`Matrix` allows elements of the module to be of size $m \times n$
but for Kyber, we only need vectors of length $k$ and square
matricies of size $k \times k$.
Note that `Matrix` allows elements of the module to be of size $m \times n$ but
for Kyber, we only need vectors of length $k$ and square matrices of size $k
\times k$.

As an example of the operations we can perform with out `Module`
lets revisit the ring from the previous example:
As an example of the operations we can perform with out `Module` lets revisit
the ring from the previous example:

#### Example

Expand Down Expand Up @@ -245,6 +290,8 @@ lets revisit the ring from the previous example:
[ 2 + 6*x^4 + x^5]
```

### TODO

Explain the extra functions available in `ModuleKyber` and `MatrixKyber`.
On top of this class, we have the classes `ModuleKyber(Module)` and
`MatrixKyber(Matrix)` which have helper functions which (for example) encode
every element of a matrix, or convert every element to or from the NTT domain.
These are simple functions which call the respective `PolynomialKyber` methods
for every element.

0 comments on commit afef20d

Please sign in to comment.