update README

GiacomoPope · Jul 22, 2024 · afef20d · afef20d
1 parent 23d4c9f
commit afef20d
Showing 1 changed file with 149 additions and 102 deletions.
diff --git a/README.md b/README.md
@@ -1,67 +1,114 @@
 [![GitHub CI](https://github.com/GiacomoPope/kyber-py/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/GiacomoPope/kyber-py/actions/workflows/ci.yml)
 [![Documentation Status](https://readthedocs.org/projects/kyber-py/badge/?version=latest)](https://kyber-py.readthedocs.io/en/latest/?badge=latest)
 
-# CRYSTALS-Kyber Python Implementation
+# ML-KEM / CRYSTALS-Kyber Python Implementation
 
-This repository contains a pure python implementation of CRYSTALS-Kyber 
-following (at the time of writing) the most recent 
-[specification](https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf)
-(v3.02)
+:warning: **Under no circumstances should this be used for a cryptographic
+application.** :warning:
 
-## A note on ML-KEM
+This repository contains a pure python implementation of both:
 
-There is also a ML-KEM implementation passing KAT vectors compliant with the NIST spec in this repo, documentation of ML-KEM is a work in progress.
+1. **CRYSTALS-Kyber**: following (at the time of writing) the most recent
+[specification](https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf)
+(v3.02)
+2. **ML-KEM**: The NIST Module-Lattice-Based Key-Encapsulation Mechanism
+Standard following the [FIPS 203 (Initial Public
+Draft)](https://csrc.nist.gov/pubs/fips/203/ipd) based off the Kyber submission
+to the NIST post-quantum cryptography project.
 
 ## Disclaimer
 
-:warning: **Under no circumstances should this be used for a cryptographic application.** :warning:
+`kyber-py` has been written as an educational tool. The goal of this project was
+to learn about how Kyber works, and to try and create a clean, well commented
+implementation which people can learn from.
 
-I have written `kyber-py` as a way to learn about the way Kyber works, and to
-try and create a clean, well commented implementation which people can learn 
-from.
+This code is not constant time, or written to be performant. Rather, it was
+written so that the python code closely follows Algorithms 1-9 in the original
+[specification](https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf).
 
-This code is not constant time, or written to be performant. Rather, it was 
-written so that reading though Algorithms 1-9 in the 
-[specification](https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf)
-closely matches the code which is seen in `kyber.py`.
+## History of this Repository
+
+This work started by simply implementing Kyber for fun, however after NIST
+picked Kyber to standardise as ML-KEM, the repository grew and now includes both
+implementations of Kyber and ML-KEM. I assume as this repository ages, the Kyber
+implementation will get less useful and the ML-KEM one will be the focus, but
+for historical reasons we will include both. If only so that people can study
+the differences which NIST introduced during the standardisation of the
+protocol.
 
 ### KATs
 
-This implementation currently passes all KAT tests from the reference implementation. 
-For more information, see the unit tests in [`test_kyber.py`](test_kyber.py).
+This implementation currently passes all KAT tests for `kyber` and `ml_kem` For
+more information, see the unit tests in [`test_kyber.py`](tests/test_kyber.py)
+and [`test_ml_kem.py`](tests/test_ml_kem.py).
+
+The KAT files were either downloaded or generated:
 
-**Note**: there is a discrepancy between the specification and reference implementation.
-To ensure all KATs pass, I have to generate the public key **before** the random
-bytes $z = \mathcal{B}^{32}$ in algorithm 7 of the 
+1. For **Kyber**, the KAT files were generated from the projects [GitHub
+   repository](https://github.com/pq-crystals/kyber/) and are included in
+   `assets/PQCLkemKAT_*.rsp`
+2. For **ML-KEM**, the KAT files were download from the GitHub repository
+   (post-quantum-cryptography/KAT)[https://github.com/post-quantum-cryptography/KAT/tree/main/MLKEM]
+
+**Note**: for Kyber v3.02, there is a discrepancy between the specification and
+reference implementation. To ensure all KATs pass, one has to generate the
+public key **before** the random bytes $z = \mathcal{B}^{32}$ in algorithm 7 of
+the
 [specification](https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf)
 (v3.02).
 
 ### Dependencies
 
-Originally this was planned to have zero dependencies, however to make this work
-pass the KATs, I needed a deterministic CSRNG. The reference implementation uses
+Originally this project was planned to have zero dependencies, however to make this work
+pass the KATs, we needed a deterministic CSRNG. The reference implementation uses
 AES256 CTR DRBG. I have implemented this in [`aes256_ctr_drbg.py`](aes256_ctr_drbg.py). 
-However, I have not implemented AES itself, instead I import this from `pycryptodome`.
+However, I have not implemented AES itself, instead I import this from `pycryptodome`. If this dependency is too annoying, then please make an issue and we can have a pure-python AES included into the repo.
 
 To install dependencies, run `pip -r install requirements`.
 
-If you're happy to use system randomness (`os.urandom`) then you don't need
-this dependency.
-
 ## Using kyber-py
 
-There are three functions exposed on the `Kyber` class which are intended
-for use:
+### ML-KEM
 
-- `Kyber.keygen()`: generate a keypair `(pk, sk)`
-- `Kyber.enc(pk)`: generate a challenge and a shared key `(c, K)`
-- `Kyber.dec(c, sk)`: generate the shared key `K`
+There are three functions exposed on the `ML_KEM` class which are intended for
+use:
 
-To use `Kyber()` it must be initialised with a dictionary of the 
-protocol parameters. An example can be seen in `DEFAULT_PARAMETERS`.
+- `ML_KEM.keygen()`: generate a keypair `(ek, dk)`
+- `ML_KEM.encaps(ek)`: generate a key and ciphertext pair `(key, ct)`
+- `ML_KEM.decaps(ct, dk)`: generate the shared key `key`
 
-Additionally, the class has been initialised with these default parameters, 
-so you can simply import the NIST level you want to play with:
+#### Example
+
+```python
+>>> from ml_kem import ML_KEM128
+>>> ek, dk = ML_KEM128.keygen()
+>>> key, ct = ML_KEM128.encaps(pk)
+>>> _key = ML_KEM128.decaps(ct, sk)
+>>> assert key == _key
+```
+
+The above example would also work with `ML_KEM192` and `ML_KEM256`.
+
+#### Benchmarks
+
+Included ere are some approximate benchmarks, although the purpose of this project is not speed, but rather education!
+
+|  Params    |  keygen  |  keygen/s  |  encap  |  encap/s  |  decap  |  decap/s |
+|------------|---------:|-----------:|--------:|----------:|--------:|---------:|
+|ML_KEM128    |    4.91ms|      203.76|   7.69ms|     130.08|   12.11ms|    82.56 |
+|ML_KEM192    |    6.93ms|      144.31|  10.89ms|      91.86|   17.10ms|    58.47 |
+|ML_KEM256   |    9.85ms |      101.54|  14.71ms|      67.97|   22.96ms|    43.56 |
+
+All times recorded using a Intel Core i7-9750H CPU and averaged over 1000 runs.
+
+### Kyber
+
+There are three functions exposed on the `Kyber` class which are intended for
+use:
+
+- `Kyber.keygen()`: generate a keypair `(pk, sk)`
+- `Kyber.enc(pk)`: generate a challenge and a shared key `(c, key)`
+- `Kyber.dec(c, sk)`: generate the shared key `key`
 
 #### Example
 
@@ -75,54 +122,61 @@ so you can simply import the NIST level you want to play with:
 
 The above example would also work with `Kyber768` and `Kyber1024`.
 
-### Benchmarks
+We expect users to pick one of the three initalised classes which use the
+default parameters of the Kyber specification. The three options are `Kyber512`,
+`Kyber768` and `Kyber1024`. However, by following the values in
+`DEFAULT_PARAMETERS` one could tweak these values to look at how Kyber behaves
+for different default values.
 
-For now, here are some approximate benchmarks, although the purpose of this project is not speed, but rather education!
+**NOTE**: it is relatively easy to change the parameters $k$, $\eta_1$, $\eta_2$
+$d_u$ and $d_v$ from the Kyber specification. However, if you wish to change the
+polynomial ring itself, then you will lose access to the NTT transforms which
+currently only support $q = 3329$ and $n = 256$.
+
+#### Benchmarks
+
+Included ere are some approximate benchmarks, although the purpose of this project is not speed, but rather education!
 
 |  Params    |  keygen  |  keygen/s  |  encap  |  encap/s  |  decap  |  decap/s |
 |------------|---------:|-----------:|--------:|----------:|--------:|---------:|
 |Kyber512    |    4.82ms|      207.59|   7.10ms|     140.80|  11.65ms|    85.82 |
 |Kyber768    |    6.87ms|      145.60|  10.11ms|      98.92|  16.51ms|    60.58 |
 |Kyber1024   |    9.72ms|      102.91|  13.71ms|      72.94|  22.20ms|    45.05 |
 
-All times recorded using a Intel Core i7-9750H CPU. 
+All times recorded using a Intel Core i7-9750H CPU and averaged over 1000 runs.
 
 ## Documentation (under active development)
 
 - https://kyber-py.readthedocs.io/en/latest/
 
-## Future Plans
+## Polynomials and Modules
 
-* Add documentation on `NTT` transform for polynomials
-* Add documentation for working with DRBG and setting the seed
+There are two main things to worry about when implementing Kyber/ML-KEM. The
+first thing to consider is the mathematics, which requires performing linear
+algebra in a module with elements in the ring $R_q = \mathbb{F}_q[X] /(X^n + 1)$
+and the second is the sampling, compression and decompression, which links to
+the cryptographic assurance of the protocol.
 
-## Discussion of Implementation
-
-### Kyber
-
-```
-TODO:
-
-Add some more information about how working with Kyber works with this
-library...
-```
+For those who don't know, a module is a generalisation of a vector space, where
+elements of a matrix are not selected from a field (such as the rationals, or
+element of a finite field $\mathbb{F}_{p^k}$), but rather in a ring (we do not
+require each element in a ring to have a multiplicative inverse). The ring in question for Kyber/ML-KEM is a polynomial ring where polynomials have coefficents in $\mathbb{F}_{q}$ with $q = 3329$ and the polynomial ring has a modulus $X^n + 1$ with $n = 256$ (and so every element of the polynomial ring has at most 256 coefficients).
 
 ### Polynomials
 
-The file [`polynomials.py`](polynomials.py) contains the classes 
-`PolynomialRing` and 
-`Polynomial`. This implements the univariate polynomial ring
+To help with experimenting with these polynomial rings themselves, the file [`polynomials_generic.py`](polynomials/polynomials_generic.py) has an implementation of the univariate polynomial ring
 
 $$
 R_q = \mathbb{F}_q[X] /(X^n + 1) 
 $$
 
-The implementation is inspired by `SageMath` and you can create the
+where the user can select any $q, n$. For example, you can create the
 ring $R_{11} = \mathbb{F}_{11}[X] /(X^8 + 1)$ in the following way:
 
 #### Example
 
 ```python
+>>> from polynomials.polynomials_generic import PolynomialRing
 >>> R = PolynomialRing(11, 8)
 >>> x = R.gen()
 >>> f = 3*x**3 + 4*x**7
@@ -136,72 +190,63 @@ ring $R_{11} = \mathbb{F}_{11}[X] /(X^8 + 1)$ in the following way:
 0
 ```
 
-We additionally include functions for `PolynomialRingKyber` and `PolynomialKyber`
-to move from bytes to polynomials (and back again). 
+We hope that this allows for some hands-on experience at working with these
+polynomials before starting to play with the whole of Kyber/ML-KEM.
+
+For the "Kyber-specific" functions, needed to implement the protocol itself, we
+have made a child class `PolynomialRingKyber(PolynomialRing)` which has the
+following additional methods:
 
 - `PolynomialRingKyber`
   - `parse(bytes)` takes $3n$ bytes and produces a random polynomial in $R_q$
   - `decode(bytes, l)` takes $\ell n$ bits and produces a polynomial in $R_q$
-  - `cbd(beta, eta)` takes $\eta \cdot n / 4$ bytes and produces a polynomial in $R_q$ with coefficents taken from a centered binomial distribution
+  - `cbd(beta, eta)` takes $\eta \cdot n / 4$ bytes and produces a polynomial in
+    $R_q$ with coefficents taken from a centered binomial distribution
 - `PolynomialKyber`
-  - `self.encode(l)` takes the polynomial and returns a length $\ell n / 8$ bytearray
+  - `encode(l)` takes the polynomial and returns a length $\ell n / 8$ bytearray
+  - `to_ntt()` converts the polynomial into the NTT domain for efficient
+    polynomial multiplication and returns an element of type
+    `PolynomialKyberNTT`
+- `PolynomialKyberNTT`
+  - `from_ntt()` converts the polynomial back from the NTT domain and returns an
+    element of type `PolynomialKyber`
 
-#### Example
-
-```python
-TODO
-```
+This class fixes $q = 3329$ and $n = 256$
 
 Lastly, we define a `self.compress(d)` and `self.decompress(d)` method for
-polynomials following page 2 of the 
+polynomials following page 2 of the
 [specification](https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf)
 
-$$
-\textsf{compress}_q(x, d) = \lceil (2^d / q) \cdot x \rfloor \textrm{mod}^+ 2^d,
-$$
+$$ \textsf{compress}_q(x, d) = \lceil (2^d / q) \cdot x \rfloor \textrm{mod}^+
+2^d, $$
 
-$$
-\textsf{decompress}_q(x, d) = \lceil (q / 2^d) \cdot x \rfloor.
-$$
+$$ \textsf{decompress}_q(x, d) = \lceil (q / 2^d) \cdot x \rfloor. $$
 
-The functions `compress` and `decompress` are defined for the coefficients 
-of a polynomial and a polynomial is (de)compressed by acting the function
-on every coefficient. 
-Similarly, an element of a module is (de)compressed by acting the
+The functions `compress` and `decompress` are defined for the coefficients of a
+polynomial and a polynomial is (de)compressed by acting the function on every
+coefficient. Similarly, an element of a module is (de)compressed by acting the
 function on every polynomial.
 
-#### Example
-
-```python
-TODO
-```
-
-**Note**: compression is lossy! We do not get the same polynomial back 
-by computing `f.compress(d).decompress(d)`. They are however *close*.
-See the specification for more information.
+**Note**: compression is lossy! We do not get the same polynomial back by
+computing `f.compress(d).decompress(d)`. They are however *close*. See the
+specification for more information.
 
 ### Number Theoretic Transform
 
-```
-TODO:
-
-Talk about what is available, and how it is used and the two Polynomial types
-we have to handle this.
-```
+**TODO**: it would be good to write something more detailed here.
 
 ### Modules
 
-The file [`modules.py`](modules.py) contains the classes `Module` and `Matrix`.
-A module is a generalisation of a vector space, where the field
-of scalars is replaced with a ring. In the case of Kyber, we 
-need the module with the ring $R_q$ as described above. 
+Building on `polynomials_generic.py` we also include a file
+[`modules_generic.py`](modules/modules_generic.py) which has all of the
+functions needed to perform linear algebra given a ring.
 
-`Matrix` allows elements of the module to be of size $m \times n$
-but for Kyber, we only need vectors of length $k$ and square
-matricies of size $k \times k$.
+Note that `Matrix` allows elements of the module to be of size $m \times n$ but
+for Kyber, we only need vectors of length $k$ and square matrices of size $k
+\times k$.
 
-As an example of the operations we can perform with out `Module`
-lets revisit the ring from the previous example:
+As an example of the operations we can perform with out `Module` lets revisit
+the ring from the previous example:
 
 #### Example
 
@@ -245,6 +290,8 @@ lets revisit the ring from the previous example:
 [        2 + 6*x^4 + x^5]
 ```
 
-### TODO
-
-Explain the extra functions available in `ModuleKyber` and `MatrixKyber`.
+On top of this class, we have the classes `ModuleKyber(Module)` and
+`MatrixKyber(Matrix)` which have helper functions which (for example) encode
+every element of a matrix, or convert every element to or from the NTT domain.
+These are simple functions which call the respective `PolynomialKyber` methods
+for every element.