The RAPIDS cuSignal project leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is a direct port of Scipy Signal to leverage GPU compute resources via CuPy but also contains Numba CUDA kernels for additional speedups for selected functions. cuSignal achieves its best gains on large signals and compute intensive functions but stresses online processing with zero-copy memory (pinned, mapped) between CPU and GPU.
NOTE: For the latest stable README.md ensure you are on the master
branch.
cuSignal has an API that mimics SciPy Signal. In depth functionality is displayed in the notebooks section of the repo, but let's examine the workflow for Polyphase Resampling under multiple scenarios:
Scipy Signal (CPU)
import numpy as np
from scipy import signal
start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3
cx = np.linspace(start, stop, num_samps, endpoint=False)
cy = np.cos(-cx**2/6.0)
cf = signal.resample_poly(cy, resample_up, resample_down, window=('kaiser', 0.5))
This code executes on 2x Xeon E5-2600 in 2.36 sec.
cuSignal with Data Generated on the GPU with CuPy
import cupy as cp
import cusignal
start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3
gx = cp.linspace(start, stop, num_samps, endpoint=False)
gy = cp.cos(-gx**2/6.0)
gf = cusignal.resample_poly(gy, resample_up, resample_down, window=('kaiser', 0.5))
This code executes on an NVIDIA P100 in 258 ms.
cuSignal with Data Generated on the CPU with Mapped, Pinned (zero-copy) Memory
import cupy as cp
import numpy as np
import cusignal
start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3
# Generate Data on CPU
cx = np.linspace(start, stop, num_samps, endpoint=False)
cy = np.cos(-cx**2/6.0)
# Create shared memory between CPU and GPU and load with CPU signal (cy)
gpu_signal = cusignal.get_shared_mem(num_samps, dtype=np.complex128)
gpu_signal[:] = cy
gf = cusignal.resample_poly(gpu_signal, resample_up, resample_down, window=('kaiser', 0.5))
This code executes on an NVIDIA P100 in 154 ms.
cuSignal with Data Generated on the CPU and Copied to GPU [AVOID THIS FOR ONLINE SIGNAL PROCESSING]
import cupy as cp
import numpy as np
import cusignal
start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3
# Generate Data on CPU
cx = np.linspace(start, stop, num_samps, endpoint=False)
cy = np.cos(-cx**2/6.0)
gf = cusignal.resample_poly(cp.asarray(cy), resample_up, resample_down, window=('kaiser', 0.5))
This code executes on an NVIDIA P100 in 728 ms.
- NVIDIA GPU (Maxwell or Newer)
- CUDA Divers
- Anaconda/Miniconda (3.7 version)
- CuPy >= 6.2.0
- Optional: RTL-SDR or other SDR Driver/Packaging. Find more information and follow the instructions for setup here. NOTE: pyrtlsdr is automatically installed with the default cusignal environment. To make use of some of the examples in the Notebooks, you'll need to buy/install an rtl-sdr.
conda env create -f cusignal_conda_env.yml
conda activate cusignal
python setup.py install
conda env update -f cusignal_conda_env.yml
pytest -v
for verbose mode with pytest -v -k <function name>
for more select testing
Review the CONTRIBUTING.md file for information on how to contribute code and issues to the project.
You can learn more about the cuSignal stack and motivations by viewing these GTC DC 2019 slides, located here. The recording of this talk can be found at GTC On Demand