cuSignal

The RAPIDS cuSignal project leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is a direct port of Scipy Signal to leverage GPU compute resources via CuPy but also contains Numba CUDA kernels for additional speedups for selected functions. cuSignal achieves its best gains on large signals and compute intensive functions but stresses online processing with zero-copy memory (pinned, mapped) between CPU and GPU.

NOTE: For the latest stable README.md ensure you are on the master branch.

Quick Start

cuSignal has an API that mimics SciPy Signal. In depth functionality is displayed in the notebooks section of the repo, but let's examine the workflow for Polyphase Resampling under multiple scenarios:

Scipy Signal (CPU)

import numpy as np
from scipy import signal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

cx = np.linspace(start, stop, num_samps, endpoint=False) 
cy = np.cos(-cx**2/6.0)

cf = signal.resample_poly(cy, resample_up, resample_down, window=('kaiser', 0.5))

This code executes on 2x Xeon E5-2600 in 2.36 sec.

cuSignal with Data Generated on the GPU with CuPy

import cupy as cp
import cusignal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

gx = cp.linspace(start, stop, num_samps, endpoint=False) 
gy = cp.cos(-gx**2/6.0)

gf = cusignal.resample_poly(gy, resample_up, resample_down, window=('kaiser', 0.5))

This code executes on an NVIDIA P100 in 258 ms.

cuSignal with Data Generated on the CPU with Mapped, Pinned (zero-copy) Memory

import cupy as cp
import numpy as np
import cusignal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

# Generate Data on CPU
cx = np.linspace(start, stop, num_samps, endpoint=False) 
cy = np.cos(-cx**2/6.0)

# Create shared memory between CPU and GPU and load with CPU signal (cy)
gpu_signal = cusignal.get_shared_mem(num_samps, dtype=np.complex128)
gpu_signal[:] = cy

gf = cusignal.resample_poly(gpu_signal, resample_up, resample_down, window=('kaiser', 0.5))

This code executes on an NVIDIA P100 in 154 ms.

cuSignal with Data Generated on the CPU and Copied to GPU [AVOID THIS FOR ONLINE SIGNAL PROCESSING]

import cupy as cp
import numpy as np
import cusignal

start = 0
stop = 10
num_samps = int(1e8)
resample_up = 2
resample_down = 3

# Generate Data on CPU
cx = np.linspace(start, stop, num_samps, endpoint=False) 
cy = np.cos(-cx**2/6.0)

gf = cusignal.resample_poly(cp.asarray(cy), resample_up, resample_down, window=('kaiser', 0.5))

This code executes on an NVIDIA P100 in 728 ms.

Dependencies

NVIDIA GPU (Maxwell or Newer)
CUDA Divers
Anaconda/Miniconda (3.7 version)
CuPy >= 6.2.0
Optional: RTL-SDR or other SDR Driver/Packaging. Find more information and follow the instructions for setup here. NOTE: pyrtlsdr is automatically installed with the default cusignal environment. To make use of some of the examples in the Notebooks, you'll need to buy/install an rtl-sdr.

Install cuSignal

Download and install Andaconda then create conda environment.

conda env create -f cusignal_conda_env.yml

Activate conda environment

conda activate cusignal

Install cuSignal module

python setup.py install

Once installed, periodically update environment

conda env update -f cusignal_conda_env.yml

Also, confirm unit testing via PyTest

pytest -v for verbose mode with pytest -v -k <function name> for more select testing

Contributing Guide

Review the CONTRIBUTING.md file for information on how to contribute code and issues to the project.

GTC DC Slides and Presentation

You can learn more about the cuSignal stack and motivations by viewing these GTC DC 2019 slides, located here. The recording of this talk can be found at GTC On Demand

elenacuoco / cusignal