libfqfft is a C++ library for Fast Fourier Transforms (FFTs) in finite fields with multithreading support (via OpenMP). The library is developed by SCIPR Lab and contributors (see AUTHORS file) and is released under the MIT License (see LICENSE file). The library provides functionality for fast multipoint polynomial evaluation, fast polynomial interpolation, and fast computation of Lagrange polynomials. Check out the Performance section for memory, runtime, and field operation profiling data.
- Directory structure
- Introduction
- Background
- Domains
- Performance
- Build guide
- Testing
- Profiling
- Tutorials
- References
The directory structure is as follows:
- libfqfft: C++ source code, containing the following modules:
- evaluation_domain: declaration of interfaces for evaluation domains
- kronecker_substitution: Kronecker substitution for polynomial multiplication
- polynomial_arithmetic: polynomial arithmetic and extended GCD
- profiling: functionality to profile running time, space usage, and number of field operations
- tests: collection of GTests
- tools: tools for evaluation domains
- depends: dependency libraries
- tutorials: tutorials for getting started with libfqfft
This library implements Fast Fourier Transform (FFT) for finite fields, built to be flexible for a variety of software applications. The implementation has support for fast evaluation/interpolation, by providing FFTs and Lagrange-coefficient computations for various domains. These include the standard radix-2 FFT, along with the arithmetic sequence and geometric sequence. The library has multicore support using OpenMP for parallelized computations, where applicable.
Check out the Tutorials section for examples with our high-level API.
There is currently a variety of algorithms for computing the Fast Fourier Transform (FFT) over the field of complex numbers. For this situation, there exists many libraries, such as FFTW, that have been rigorously developed, tested, and optimized. Our goal is to use these existing techniques and develop novel implementations to address the more interesting case of FFT in finite fields. We will see that in many instances, these algorithms can be used for the case of finite fields, but the construction of FFT in finite fields remains, in practice, challenging.
Consider a finite field F with 2^m elements. We can define a discrete Fourier transform by choosing a 2^m − 1 root of unity ω ∈ F. Operating over the complex numbers, there exists a variety of FFT algorithms, such as the Cooley-Tukey algorithm along with its variants, to choose from. And in the case that 2^m - 1 is prime - consider the Mersenne primes as an example - we can turn to other algorithms, such as Rader's algorithm and Bluestein's algorithm. In addition, if the domain size is an extended power of two or the sum of powers of two, variants of the radix-2 FFT algorithms can be employed to perform the computation.
However, in a finite field, there may not always be a root of unity. If the domain size is not as mentioned, then one can consider adjoining roots to the field. Although, there is no guarantee that adjoining such a root to the field can render the same performance benefits, as it would produce a significantly larger structure that could cancel out benefits afforded by the FFT itself. Therefore, one should consider other algorithms which continue to perform better than the naïve evaluation.
Given a domain size, the library will determine and perform computations over the best-fitted domain. Ideally, it is desired to perform evaluation and interpolation over a radix-2 FFT domain, however, this may not always be possible. Thus, the library provides the arithmetic and geometric sequence domains as fallback options, which we show to perform better than naïve evaluation.
Basic Radix-2 | Extended Radix-2 | Step Radix-2 | Arithmetic Sequence | Geometric Sequence | |
---|---|---|---|---|---|
Evaluation | O(n log n) | O(n log n) | O(n log n) | M(n) log(n) + O(M(n)) | 2M(n) + O(n) |
Interpolation | O(n log n) | O(n log n) | O(n log n) | M(n) log(n) + O(M(n)) | 2M(n) + O(n) |
The radix-2 FFTs are comprised of three domains: basic, extended, and step radix-2. The radix-2 domain implementations make use of pseudocode from [CLRS 2n Ed, pp. 864].
The basic radix-2 FFT domain has size m = 2^k and consists of the m-th roots of unity. The domain uses the standard FFT algorithm and inverse FFT algorithm to perform evaluation and interpolation. Multi-core support includes parallelizing butterfly operations in the FFT operation.
The extended radix-2 FFT domain has size m = 2^(k + 1) and consists of the m-th roots of unity, union a coset of these roots. The domain performs two basic_radix2_FFT operations for evaluation and interpolation in order to account for the extended domain size.
The step radix-2 FFT domain has size m = 2^k + 2^r and consists of the 2^k-th roots of unity, union a coset of 2^r-th roots of unity. The domain performs two basic_radix2_FFT operations for evaluation and interpolation in order to account for the extended domain size.
The arithmetic sequence domain is of size m and is applied for more general cases. The domain applies a basis conversion algorithm between the monomial and the Newton bases. Choosing sample points that form an arithmetic progression, a_i = a_1 + (i - 1)*d, allows for an optimization of computation over the monomial basis, by using the special case of Newton evaluation and interpolation on an arithmetic progression, see [BS05].
The geometric sequence domain is of size m and is applied for more general cases. The domain applies a basis conversion algorithm between the monomial and the Newton bases. The domain takes advantage of further simplications to Newton evaluation and interpolation by choosing sample points that form a geometric progression, a_n = r^(n-1), see [BS05].
We now discuss performance data of the library in terms of number of field operations, running time, and memory usage, across all evaluation domains.
Machine Specification: The following benchmark data was obtained on a 64-bit Intel i7 Quad-Core machine with 16GB RAM (2x8GB) running Ubuntu 14.04 LTS. The code is compiled using g++ 4.8.4.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 4
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 94
Stepping: 3
CPU MHz: 4008.007
BogoMIPS: 8016.01
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
The library has the following dependencies:
The library has been tested on Linux, but it is compatible with Windows and Mac OS X. (Nevertheless, memory profiling works only on Linux machines.)
On Ubuntu 14.04 LTS:
sudo apt-get install build-essential git libboost-all-dev cmake libgmp3-dev libssl-dev libprocps3-dev pkg-config gnuplot-x11
Fetch dependencies from their GitHub repos:
git submodule init && git submodule update
To compile, starting at the project root directory, create the Makefile:
mkdir build && cd build && cmake ..
The following flags change the behavior of the compiled code:
-
cmake .. -DCMAKE_INSTALL_PREFIX=/install/path
Specifies the install location from the provided install prefix path. -
cmake .. -DMULTICORE=ON
Enables parallelized execution using OpenMP. This will utilize all cores on the CPU for heavyweight parallelizable operations such as FFT. -
cmake .. -DOPT_FLAGS={ FLAGS }
Passes specified optimizations flags to compiler. -
cmake .. -PROF_DOUBLE=ON
Enables profiling with Double (default: ON). If the flag is turned off, profiling will useFr<edwards_pp>
. -
cmake .. -DWITH_PROCPS=OFF
Links libprocps for usage in memory profiling. If this flag is turned off, memory profiling will not work. -
cmake .. -DDEPENDS_DIR=...
Sets the dependency installation directory to the provided absolute path (default: installs dependencies in the respective submodule directories)
Then, to compile the library and profiler, run:
make
The above makes the build
folder and compiles the profiling executables to the project root directory. To remove all executables, from the build folder, run make clean
.
To install the libfqfft library:
make install
Depending on the specified install location from the optional -DCMAKE_INSTALL_PREFIX
, this will install the requisite headers into /install/path/include; so your application should be compiled using -I/install/path/include.
The library uses Google Test for its unit tests. The unit tests cover polynomial evaluation, polynomial interpolation, Lagrange polynomials evaluation, and vanishing polynomial evaluation, for all evaluation domains. There are also unit tests for polynomial arithmetic, Kronecker substitution, and extended Euclidean GCD. The test suite is easily extensible to support a wide range of fields and domain sizes.
To run the tests for this library, run:
make check
This will compile and run the tests. Alternatively, from the build
folder, one can also run ./libfqfft/gtests
after compiling.
The unit tests are divided into three GTest files located under libfqfft/tests
:
- Evaluation domains:
evaluation_domain_test.cpp
- Polynomial arithmetic:
polynomial_arithmetic_test.cpp
- Kronecker substitution:
kronecker_substitution_test.cpp
Warning: Profiling of memory usage is Linux-specific as it makes use of getrusage()
from <sys/resource.h>
. Compatibility of the getrusage()
BSD syscall equivalent is kernel specific, such as with the getrusage()
call listed under Darwin/OSX XNU-3248.20.55 in bsd/kern/kern_resource.c
.
The library includes functionality for profiling running time, memory usage, and number of field operations, and also for plotting the resulting data with gnuplot. All profiling and plotting activity is logged in the folder libfqfft/profiling/logs
; logs are sorted into a directory hierarchy by profiling type and timestamp, respectively. The running time and memory usage profiling also supports multi-threading.
To compile the profiler, run:
make profiler profiling_menu
To start the profiler, navigate to the project root directory and run:
./profiling_menu
Below is an explanation of profiling and plotting options.
Radix-2 FFT profiling numbers are in accordance to a vector of input size n. Polynomial multiplication computes two polynomials of degree n by performing FFT on a resulting vector of size 2n. For arithmetic and geometric sequence profiling, both evaluation and interpolation take in vectors of size n and return a vector of degree n.
Profiling options include:
- Profiling type: Running time, memory usage, and/or fieldops count (any combination)
- Domain type: All domains, radix-2 domains, or arithmetic/geometric sequence domains
- Domain sizes: Preset small, preset large, or custom size
Profiling results are saved in libfqfft/profiling/logs/{datetime}
.
Plotting uses gnuplot scripts that are generalized for varying requests per profiling type. Runtimes are plotted for all domains, comparing domain size to runtime in seconds. Memory usage are plotted for all domains by comparing domain size to memory usage in kilobytes. Field operations are plotted in two graphs: one comparing domain size to total operation counts, another comparing each type of operation - addition, subtraction, multiplication, division, and negation - with its respective count, for all domains.
Plotting options include:
- Profiling type: Plotting Runtime, Memory, or Field Operation Counts
- File selection: Lists all previous profile logs of profiling type
Plots are saved in the directory chosen at step 2, File Selection.
The library includes the following tutorial examples, found in the tutorials folder. To compile the tutorials, run:
make tutorials
The above will compile the executables to the build/tutorials
folder, and then run them.
-
File: tutorials/polynomial_multiplication_on_fft_example.cpp
-
Run:
./polynomial_multiplication
We construct two polynomials, a and b, and then call the _polynomial_multiplication()
function in libfqfft/polynomial_arithmetic/basic_operations.hpp
to perform our operation. The result is stored into polynomial c, and then printed out. Note that polynomials are stored in C++ STL vectors in order from lowest to highest degree.
-
Run:
./polynomial_evaluation
We construct a polynomial f and domain size m. Then, we get an evaluation domain by calling get_evaluation_domain(m)
, which will determine the best suitable domain to perform evaluation on given the domain size. Now we compute the FFT over our determined domain of the polynomial f, then print out the result.
-
Run:
./lagrange_polynomial_evaluation
We define an element t and domain size m. Then, we determine our evaluation domain by invoking get_evaluation_domain(m)
as before. Next, we call evaluate_all_lagrange_polynomials(t)
to evaluate all Lagrange polynomials. The output is a vector (a[0], ... ,a[m-1]), where a[i] is the evaluation of L_{i,S}(z) at z = t. Lastly, we print out this result.
Evaluation Domains:
[BS05] Polynomial Evaluation and Interpolation on Special Sets of Points, Alin Bostan and Eric Schost 2005
[BLS03] Tellegen’s Principle into Practice, Alin Bostan, Gregoire Lecerf, and Eric Schost 2003
Kronecker Substitution:
[S15] Arithmetic in Finite Fields, Andrew Sutherland 2015
[H07] Faster Polynomial Multiplication via Multipoint Kronecker Substitution, David Harvey 2007