error applying lfilter

Geodude-93 opened this issue · comments


I get an error when I try to apply a bandpass filter using cupyx.scipy.signal.lfilter.
Other functions like resample_poly work well.
I have attached a code examples. See the error below.

File ~/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/ in compile

nvrtc.compileProgram(self.ptr, options)

File cupy_backends/cuda/libs/nvrtc.pyx:125 in cupy_backends.cuda.libs.nvrtc.compileProgram

File cupy_backends/cuda/libs/nvrtc.pyx:138 in cupy_backends.cuda.libs.nvrtc.compileProgram

File cupy_backends/cuda/libs/nvrtc.pyx:53 in cupy_backends.cuda.libs.nvrtc.check_status


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File ~/.conda/envs/cupy/lib/python3.12/site-packages/spyder_kernels/ in compat_exec
exec(code, globals, locals)

File /mnt/Datastore/usr/keving/Scripts/Test/
data_cp_filt = cpsignal.lfilter(b_bp, a_bp, data_cp, axis=0, zi=None)

File ~/.conda/envs/cupy/lib/python3.12/site-packages/cupyx/scipy/signal/ in lfilter
out = apply_iir(out, a_r, axis=axis, zi=prev_out, dtype=iir_dtype)

File ~/.conda/envs/cupy/lib/python3.12/site-packages/cupyx/scipy/signal/ in apply_iir
corr_kernel = _get_module_func(

File ~/.conda/envs/cupy/lib/python3.12/site-packages/cupyx/scipy/signal/ in _get_module_func
kernel = module.get_function(kernel_name)

File cupy/_core/raw.pyx:472 in cupy._core.raw.RawModule.get_function

File cupy/_core/raw.pyx:396 in cupy._core.raw.RawModule.module.get

File cupy/_core/raw.pyx:404 in cupy._core.raw.RawModule._module

File cupy/_util.pyx:64 in cupy._util.memoize.decorator.ret

File cupy/_core/raw.pyx:538 in cupy._core.raw._get_raw_module

File cupy/_core/core.pyx:2236 in cupy._core.core.compile_with_cache

File cupy/_core/core.pyx:2254 in cupy._core.core.compile_with_cache

File ~/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/ in _compile_module_with_cache
return _compile_with_cache_cuda(

File ~/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/ in _compile_with_cache_cuda
ptx, mapping = compile_using_nvrtc(

File ~/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/ in compile_using_nvrtc
return _compile(source, options, cu_path,

File ~/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/ in _compile
compiled_obj, mapping = prog.compile(options, log_stream)

File ~/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/ in compile
raise CompileException(log, self.src,, options,

CompileException: /tmp/tmpicj34uwe/ catastrophic error: cannot open source file "cuda_runtime.h"

1 catastrophic error detected in the compilation of "/tmp/tmpicj34uwe/".
Compilation terminated.

To Reproduce

import numpy as np
import cupy as cp 
import cupyx.scipy.signal as cpsignal

# settings data
winlen = 10_000 # in secs
freq1, freq2 = 50, 220 # in Hz

# bandpass
cutoff_freqs=(10,100) # in Hz

t = np.linspace(0, winlen, num_samps)
dt = np.round( t[1]-t[0], 6 ) 

# create data
noise = np.random.rand(num_samps)
data = 1 *np.sin((2*np.pi)*t*freq1) +  0.5 *np.sin((2*np.pi)*t*freq2) + 2.5*noise

data_cp = cp.asarray(data)
b_bp, a_bp = cpsignal.butter(order_bp, cutoff_freqs, btype="bandpass", fs=1/dt)
data_cp_filt = cpsignal.lfilter(b_bp, a_bp, data_cp, axis=0, zi=None)
data_filt = cp.asnumpy(data_cp_filt)



Conda-Forge (conda install ...)


OS                           : Linux-5.11.0-36-generic-x86_64-with-glibc2.31
Python Version               : 3.12.1
CuPy Version                 : 13.0.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.26.3
SciPy Version                : 1.11.4
Cython Build Version         : 0.29.37
Cython Runtime Version       : None
CUDA Root                    : /home/keving/.conda/envs/cupy
nvcc PATH                    : None
CUDA Build Version           : 11080
CUDA Driver Version          : 11020
CUDA Runtime Version         : 11080 (linked to CuPy) / 11020 (locally installed)
cuBLAS Version               : (available)
cuFFT Version                : 10401
cuRAND Version               : 10203
cuSOLVER Version             : (11, 1, 0)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 2)
Thrust Version               : 200200
CUB Build Version            : 200200
Jitify Build Version         : 95b2a2d
cuDNN Build Version          : 8800
cuDNN Version                : 8907
NCCL Build Version           : None
NCCL Runtime Version         : None
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : GeForce RTX 2080 Ti
Device 0 Compute Capability  : 75
Device 0 PCI Bus ID          : 0000:65:00.0

Additional Information

No response

What's the output of conda list?

here is the output of conda list:

Name Version Build Channel

Thanks, @Geodude-93, nothing unusual from a glance. Would you be able to rerun the reproducer, but this time with the environment variable CUPY_DUMP_CUDA_SOURCE_ON_ERROR=1 set, and attach the output? I cannot reproduce the error so I'd like to see what the generated code was.

How/where do I set the environment variable? (sorry, I am new to cupy)

It's a Linux thing, not CuPy specific. This should work in a terminal:


#2100 seems to be the same issue?

setting CUPY_DUMP_CUDA_SOURCE_ON_ERROR=1 did not help, I still get the same error.

As suggested earlier, could you attach the output? Adding the env var is meant for us to debug, not to solve or work around the bug.

ah sorry, I read over that part. Here is the output with CUPY_DUMP_CUDA_SOURCE_ON_ERROR=1 :

/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupyx/jit/ FutureWarning: cupyx.jit.rawkernel is experimental. The interface can change in the future.
Traceback (most recent call last):
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/", line 677, in compile
nvrtc.compileProgram(self.ptr, options)
File "cupy_backends/cuda/libs/nvrtc.pyx", line 125, in cupy_backends.cuda.libs.nvrtc.compileProgram
File "cupy_backends/cuda/libs/nvrtc.pyx", line 138, in cupy_backends.cuda.libs.nvrtc.compileProgram
File "cupy_backends/cuda/libs/nvrtc.pyx", line 53, in cupy_backends.cuda.libs.nvrtc.check_status
cupy_backends.cuda.libs.nvrtc.NVRTCError: NVRTC_ERROR_COMPILATION (6)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/mnt/Datastore/usr/keving/Scripts/Test/", line 32, in
data_cp_filt = cpsignal.lfilter(b_bp, a_bp, data_cp, axis=0, zi=None)
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupyx/scipy/signal/", line 797, in lfilter
out = apply_iir(out, a_r, axis=axis, zi=prev_out, dtype=iir_dtype)
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupyx/scipy/signal/", line 577, in apply_iir
corr_kernel = _get_module_func(
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupyx/scipy/signal/", line 509, in _get_module_func
kernel = module.get_function(kernel_name)
File "cupy/_core/raw.pyx", line 472, in cupy._core.raw.RawModule.get_function
File "cupy/_core/raw.pyx", line 396, in cupy._core.raw.RawModule.module.get
File "cupy/_core/raw.pyx", line 404, in cupy._core.raw.RawModule._module
File "cupy/_util.pyx", line 64, in cupy._util.memoize.decorator.ret
File "cupy/_core/raw.pyx", line 538, in cupy._core.raw._get_raw_module
File "cupy/_core/core.pyx", line 2236, in cupy._core.core.compile_with_cache
File "cupy/_core/core.pyx", line 2254, in cupy._core.core.compile_with_cache
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/", line 484, in _compile_module_with_cache
return _compile_with_cache_cuda(
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/", line 562, in _compile_with_cache_cuda
ptx, mapping = compile_using_nvrtc(
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/", line 319, in compile_using_nvrtc
return _compile(source, options, cu_path,
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/", line 303, in _compile
compiled_obj, mapping = prog.compile(options, log_stream)
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/", line 696, in compile
raise CompileException(log, self.src,, options,
cupy.cuda.compiler.CompileException: /tmp/tmpciit15lq/ catastrophic error: cannot open source file "cuda_runtime.h"

1 catastrophic error detected in the compilation of "/tmp/tmpciit15lq/".
Compilation terminated.

Hi @Geodude-93 this is not the full output. There should be more before the traceback, in particular there should be a line says CUDA source: followed by the CUDA source code sent to the compiler.

ah sorry, here is the first part:

/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupyx/jit/ FutureWarning: cupyx.jit.rawkernel is experimental. The interface can change in the future.
NVRTC compilation error: /tmp/tmpyyhgi8xo/ catastrophic error: cannot open source file "cuda_runtime.h"

1 catastrophic error detected in the compilation of "/tmp/tmpyyhgi8xo/".
Compilation terminated.

Name: /tmp/tmpyyhgi8xo/
Options: -I/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/_core/include/cupy/_cccl/cub -I/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/_core/include/cupy/_cccl/thrust -I/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/_core/include/cupy/_cccl/libcudacxx -std=c++11 -I/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/_core/include -I/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/_core/include/cupy/_cuda/cuda-11 -I/home/keving/.conda/envs/cupy/include -ftz=true -arch=sm_75
CUDA source:

002 #include <cuda_runtime.h>
003 #include <device_launch_parameters.h>
005 #include <cupy/math_constants.h>
006 #include <cupy/carray.cuh>
007 #include <cupy/complex.cuh>
009 template<typename U, typename T>
010 __global__ void compute_correction_factors(
011         const int m, const int k, const T* b, U* out) {
012     int idx = blockDim.x * blockIdx.x + threadIdx.x;
013     if(idx >= k) {
014         return;
015     }
017     U* out_start = out + idx * (k + m);
018     U* out_off = out_start + k;
020     for(int i = 0; i < m; i++) {
021         U acc = 0.0;
022         for(int j = 0; j < k; j++) {
023             acc += ((U) b[j]) * out_off[i - j - 1];
025         }
026         out_off[i] = acc;
027     }
028 }
030 template<typename T>
031 __global__ void first_pass_iir(
032         const int m, const int k, const int n, const int n_blocks,
033         const int carries_stride, const T* factors, T* out,
034         T* carries) {
035     int orig_idx = blockDim.x * (blockIdx.x % n_blocks) + threadIdx.x;
037     int num_row = blockIdx.x / n_blocks;
038     int idx = 2 * orig_idx + 1;
040     if(idx >= n) {
041         return;
042     }
044     int group_num = idx / m;
045     int group_pos = idx % m;
047     T* out_off = out + num_row * n;
048     T* carries_off = carries + num_row * carries_stride;
050     T* group_start = out_off + m * group_num;
051     T* group_carries = carries_off + k * group_num;
053     int pos = group_pos;
054     int up_bound = pos;
055     int low_bound = pos;
056     int rel_pos;
058     for(int level = 1, iter = 1; level < m; level *=2, iter++) {
059         int sz = min(pow(2.0f, ((float) iter)), ((float) m));
061         if(level > 1) {
062             int factor = ceil(pos / ((float) sz));
063             up_bound = sz * factor - 1;
064             low_bound = up_bound - level + 1;
065         }
067         if(level == 1) {
068             pos = low_bound;
069         }
071         if(pos < low_bound) {
072             pos += level / 2;
073         }
075         if(pos + m * group_num >= n) {
076             break;
077         }
079         rel_pos = pos % level;
080         T carry = 0.0;
081         for(int i = 1; i <= min(k, level); i++) {
082             T k_value = group_start[low_bound - i];
083             const T* k_factors = factors + (m + k) * (i - 1) + k;
084             T factor = k_factors[rel_pos];
085             carry += k_value * factor;
086         }
088         group_start[pos] += carry;
089         __syncthreads();
090     }
092     if(pos >= m - k) {
093         if(carries != NULL) {
094             group_carries[pos - (m - k)] = group_start[pos];
095         }
096     }
098 }
100 template<typename T>
101 __global__ void correct_carries(
102     const int m, const int k, const int n_blocks, const int carries_stride,
103     const int offset, const T* factors, T* carries) {
105     int idx = threadIdx.x;
106     int pos = idx + (m - k);
107     T* row_carries = carries + carries_stride * blockIdx.x;
109     for(int i = offset; i < n_blocks; i++) {
110         T* this_carries = row_carries + k * (i + (1 - offset));
111         T* prev_carries = row_carries + k * (i - offset);
113         T carry = 0.0;
114         for(int j = 1; j <= k; j++) {
115             const T* k_factors = factors + (m + k) * (j - 1) + k;
116             T factor = k_factors[pos];
117             T k_value = prev_carries[k - j];
118             carry += factor * k_value;
119         }
121         this_carries[idx] += carry;
122         __syncthreads();
123     }
124 }
126 template<typename T>
127 __global__ void second_pass_iir(
128         const int m, const int k, const int n, const int carries_stride,
129         const int n_blocks, const int offset, const T* factors,
130         T* carries, T* out) {
132     int idx = blockDim.x * (blockIdx.x % n_blocks) + threadIdx.x;
133     idx += offset * m;
135     int row_num = blockIdx.x / n_blocks;
136     int n_group = idx / m;
137     int pos = idx % m;
139     if(idx >= n) {
140         return;
141     }
143     T* out_off = out + row_num * n;
144     T* carries_off = carries + row_num * carries_stride;
145     const T* prev_carries = carries_off + (n_group - offset) * k;
147     T carry = 0.0;
148     for(int i = 1; i <= k; i++) {
149         const T* k_factors = factors + (m + k) * (i - 1) + k;
150         T factor = k_factors[pos];
151         T k_value = prev_carries[k - i];
152         carry += factor * k_value;
153     }
155     out_off[idx] += carry;
156 }

Thanks, @Geodude-93. Could you try removing this line locally and see if it works?

#include <cuda_runtime.h>

It should be in this file on your filesystem:


Hi, I removed the line. Now I get the following:

(cupy) keving@CGF-C9X299-PG300F:/mnt/Datastore/usr/keving/Scripts/Test$ python
/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupyx/jit/ FutureWarning: cupyx.jit.rawkernel is experimental. The interface can change in the future.
Traceback (most recent call last):
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/", line 677, in compile
nvrtc.compileProgram(self.ptr, options)
File "cupy_backends/cuda/libs/nvrtc.pyx", line 125, in cupy_backends.cuda.libs.nvrtc.compileProgram
File "cupy_backends/cuda/libs/nvrtc.pyx", line 138, in cupy_backends.cuda.libs.nvrtc.compileProgram
File "cupy_backends/cuda/libs/nvrtc.pyx", line 53, in cupy_backends.cuda.libs.nvrtc.check_status
cupy_backends.cuda.libs.nvrtc.NVRTCError: NVRTC_ERROR_COMPILATION (6)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/mnt/Datastore/usr/keving/Scripts/Test/", line 32, in
data_cp_filt = cpsignal.lfilter(b_bp, a_bp, data_cp, axis=0, zi=None)
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupyx/scipy/signal/", line 797, in lfilter
out = apply_iir(out, a_r, axis=axis, zi=prev_out, dtype=iir_dtype)
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupyx/scipy/signal/", line 576, in apply_iir
corr_kernel = _get_module_func(
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupyx/scipy/signal/", line 508, in _get_module_func
kernel = module.get_function(kernel_name)
File "cupy/_core/raw.pyx", line 472, in cupy._core.raw.RawModule.get_function
File "cupy/_core/raw.pyx", line 396, in cupy._core.raw.RawModule.module.get
File "cupy/_core/raw.pyx", line 404, in cupy._core.raw.RawModule._module
File "cupy/_util.pyx", line 64, in cupy._util.memoize.decorator.ret
File "cupy/_core/raw.pyx", line 538, in cupy._core.raw._get_raw_module
File "cupy/_core/core.pyx", line 2236, in cupy._core.core.compile_with_cache
File "cupy/_core/core.pyx", line 2254, in cupy._core.core.compile_with_cache
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/", line 484, in _compile_module_with_cache
return _compile_with_cache_cuda(
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/", line 562, in _compile_with_cache_cuda
ptx, mapping = compile_using_nvrtc(
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/", line 319, in compile_using_nvrtc
return _compile(source, options, cu_path,
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/", line 303, in _compile
compiled_obj, mapping = prog.compile(options, log_stream)
File "/home/keving/.conda/envs/cupy/lib/python3.12/site-packages/cupy/cuda/", line 696, in compile
raise CompileException(log, self.src,, options,
cupy.cuda.compiler.CompileException: /tmp/tmp_cvzik4z/ catastrophic error: cannot open source file "device_launch_parameters.h"

1 catastrophic error detected in the compilation of "/tmp/tmp_cvzik4z/".
Compilation terminated.

Nice, could you try removing that include line too (right below the line that you removed)?

Great, it works now, thanks!

Thanks for reporting/testing, @Geodude-93! We'll get it fixed.