microsoft / goodpoints

A Python package for generating concise, high-quality summaries of a probability distribution

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`compress()` exceeds recursion limit (jupyter exit code 3221225725) when the number of rows in X is not a power of 4

hbaniecki opened this issue · comments

Hi, thanks for the great algorithms!

I think compress() does not stop when the number of rows in X is not a power of 2.

How to reproduce: Run compress() on data with 442 rows:

from sklearn.datasets import load_diabetes
import numpy as np
X, _ = load_diabetes(return_X_y=True)
print(X.shape)
# X = X[0:64, :] #:# this will work fine
# print(X.shape)
from goodpoints import kt, compress
def kernel_gaussian(y, X, gamma=1):
    k_vals = np.sum((X-y)**2, axis=1)
    return(np.exp(-gamma*k_vals/2))
f_halve = lambda x: kt.thin(X=x, m=1, split_kernel=kernel_gaussian, swap_kernel=kernel_gaussian)
id_compressed = compress.compress(X, halve=f_halve, g=0)
print(id_compressed)

I guess that all experiments in the Compress++ paper are on datasets of size 2**n.

What does 3221225725 (0xc00000fd) exit code means?

Edit:
Apparently, 128 does not work also:

from sklearn.datasets import load_diabetes
import numpy as np
X, _ = load_diabetes(return_X_y=True)
print(X.shape)
X = X[0:128, :]
print(X.shape)
from goodpoints import kt, compress
def kernel_gaussian(y, X, gamma=1):
    k_vals = np.sum((X-y)**2, axis=1)
    return(np.exp(-gamma*k_vals/2))
f_halve = lambda x: kt.thin(X=x, m=1, split_kernel=kernel_gaussian, swap_kernel=kernel_gaussian)
id_compressed = compress.compress(X, halve=f_halve, g=0)
print(id_compressed)

Thank you for bringing this to our attention @hbaniecki. The latest commit ensures appropriate behavior for any number of rows.