Cool-chic (pronounced /kul ʃik/ as in French 🥖🧀🍷) is is a low-complexity neural image and video codec based on overfitting. Image coding performance are on par with VVC for 2000 multiplication per decoded pixels, while video coding performance compete with AVC with as few as 500 multiplication per decoded pixels.
All the documentation is available on the Cool-chic page
- Fev. 24: version 3.1
- Cool-chic video from Cool-chic video: Learned video coding with 800 parameters, Leguay et al.
- Random access and low-delay video coding, competitive with AVC.
- Jan. 24: version 3.0
- Re-implement most of the encoder-side improvements proposed by C3: High-performance and low-complexity neural compression from a single image or video, Kim et al.
- 15% to 20% rate decrease compared to Cool-chic 2
- July 23: version 2 Low-complexity Overfitted Neural Image Codec, Leguay et al.
- Several architecture changes for the decoder: convolution-based synthesis, learnable upsampling module
- Friendlier usage: support for YUV 420 input format in 8-bit and 10-bit & Fixed point arithmetic for cross-platform entropy (de)coding
- March 23: version 1 COOL-CHIC: Coordinate-based Low Complexity Hierarchical Image Codec, Ladune et al.
Up to come: a fast decoder implementation will be released soon for near real-time CPU decoding 🏎️ 🔥.
Cool-chic results are provided for both image and video compression inside the
results/
directory alongside compressed bitstream.
Image compression performance are presented on the kodak, clic20-pro-valid and jvet datasets.
Dataset | Vs. Cool-chic 2 | Vs. Cool-chic 1 | Vs. C3, Kim et al. | Vs. HEVC (HM 16.20) | Vs. VVC (VTM 19.1) | Min decoder complexity [MAC / pixel] | Max decoder complexity [MAC / pixel] | Avg decoder complexity [MAC / pixel] |
---|---|---|---|---|---|---|---|---|
kodak | - 19.4 % | - 29.1 % | - 1.6 % | - 14.6 % | + 6.6 % | 299 | 2291 | 1841 |
clic20-pro-valid | - 16.8 % | / | + 3.3 % | - 21.4 % | + 2.3 % | 545 | 2295 | 1897 |
jvet | - 23.0 % | / | / | - 13.7 % | + 25.4 % | 300 | 2295 | 1680 |
Video compression performance are presented on the first 33 frames (~= 1 second) from the CLIC24 validation subset, composed of 30 high resolution videos. We provide results for 2 coding configurations:
- Low-delay P: address use-cases where low latency is mandatory;
- Random access: address use-cases where compression efficiency is primordial e.g. video streaming.
Dataset | Config | Vs. HEVC (HM 16.20) | Vs. x265 medium | Vs. x264 medium | Min. decoder complexity [MAC / pixel] | Max decoder complexity [MAC / pixel] | Avg decoder complexity [MAC / pixel] |
---|---|---|---|---|---|---|---|
clic24-valid-subset | random-access | + 60.4 % | +18.1 % | -15.5 % | 460 | 460 | 460 |
clic24-valid-subset | low-latency | + 122.0 % | +73.8 % | +28.9 % | 460 | 460 | 460 |
More details available on the Cool-chic page
Python version should be at least 3.10!
python3 --version # Should be at least 3.10
python3 -m pip install virtualenv # Install virtual env if needed
python3 -m virtualenv venv && source venv/bin/activate # Create and activate a virtual env named "venv"
(venv) pip install -r requirements.txt # Install the required packages
Already encoded files are provided as bitstreams in results/<configuration>/<dataset_name>/
.
<configuration>
can beimage
,video-low-latency
,video-random-access
<dataset_name>
can bekodak, clic20-pro-valid, clic24-valid-subset, jvet
.
For each dataset, a script is provided to decode all the bitstreams.
(venv) python results/decode_one_dataset.py <configuration> <dataset_name> # Can take a few minutes
The file results/<configuration>/<dataset_name>/results.tsv
provides the results that should be obtained.
Special thanks go to:
- Robert Bamler for the constriction package which serves as our entropy coder. More details @ Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective, Bamler.
- Hyunjik Kim, Matthias Bauer, Lucas Theis, Jonathan Richard Schwarz and Emilien Dupont for their great work enhancing Cool-chic: C3: High-performance and low-complexity neural compression from a single image or video, Kim et al.