Teichlab / bbknn

Batch balanced KNN

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError: No hyperplanes of adequate size were found! When not using annoy

mtvector opened this issue · comments

Hi there,
Having an issue when I try to run BBKNN without annoy. Had this error, then freshly installed everything in a new conda environment, I'm still getting the error passing from pynndescent when I run the code:
bbknn.bbknn(adata,batch_key='batch_name',use_annoy=False,metric='manhattan',neighbors_within_batch=3)

Thanks so much! This package works amazingly for correcting batch-driven compositional problems!!

Full error message below:

    122         batch_list = adata.obs[batch_key].values
    123         #call BBKNN proper
--> 124 	bbknn_out = bbknn_matrix(pca=pca, batch_list=batch_list, approx=approx,
    125 							 use_annoy=use_annoy, metric=params['metric'], **kwargs)
    126         #store the parameters in .uns['neighbors']['params'], add use_rep and batch_key

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/bbknn/matrix.py in bbknn(pca, batch_list, neighbors_within_batch, n_pcs, trim, approx, annoy_n_trees, pynndescent_n_neighbors, pynndescent_random_state, use_annoy, use_faiss, metric, set_op_mix_ratio, local_connectivity)
    312         params = check_knn_metric(params, counts)
    313         #obtain the batch balanced KNN graph
--> 314         knn_distances, knn_indices = get_graph(pca=pca,batch_list=batch_list,params=params)
    315         #sort the neighbours so that they're actually in order from closest to furthest
    316         newidx = np.argsort(knn_distances,axis=1)

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/bbknn/matrix.py in get_graph(pca, batch_list, params)
    173                 ind_to = np.arange(len(batch_list))[mask_to]
    174                 #create the faiss/cKDTree/KDTree/annoy, depending on approx/metric
--> 175                 ckd = create_tree(data=pca[mask_to,:params['n_pcs']], params=params)
    176                 for from_ind in range(len(batches)):
    177                         #this is the batch that will have its neighbours identified

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/bbknn/matrix.py in create_tree(data, params)
     95                                                                         n_neighbors=params['pynndescent_n_neighbors'],
     96 									random_state=params['pynndescent_random_state'])
---> 97                 ckd.prepare()
     98         elif params['computation'] == 'faiss':
     99                 ckd = faiss.IndexFlatL2(data.shape[1])

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/pynndescent_.py in prepare(self)
   1524     def prepare(self):
   1525         if not hasattr(self, "_search_graph"):
-> 1526             self._init_search_graph()
   1527         if not hasattr(self, "_search_function"):
   1528             if self._is_sparse:

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/pynndescent_.py in _init_search_graph(self)
    962                 best_trees = [self._rp_forest[idx] for idx in best_tree_indices]
    963                 del self._rp_forest
--> 964                 self._search_forest = [
    965                     convert_tree_format(tree, self._raw_data.shape[0])
    966                     for tree in best_trees

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/pynndescent_.py in <listcomp>(.0)
    963                 del self._rp_forest
    964                 self._search_forest = [
--> 965                     convert_tree_format(tree, self._raw_data.shape[0])
    966                     for tree in best_trees
    967                 ]

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/rp_trees.py in convert_tree_format(tree, data_size)
   1161     if tree.hyperplanes[0].ndim == 1:
   1162         # dense hyperplanes
-> 1163         hyperplane_dim = dense_hyperplane_dim(tree.hyperplanes)
   1164         hyperplanes = np.zeros((n_nodes, hyperplane_dim), dtype=np.float32)
   1165     else:

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/rp_trees.py in dense_hyperplane_dim()
   1143             return hyperplanes[i].shape[0]
   1144 
-> 1145     raise ValueError("No hyperplanes of adequate size were found!")
   1146 
   1147 

ValueError: No hyperplanes of adequate size were found!```

I figured out that if you set pynndescent_n_neighbors to a lower number it solves this issue. Perhaps there should be an internal conditional or this!

Thanks for the kind words.

Good catch - I've already got a second condition in place for pynndescent, as it seems unable to process 10 or fewer observations no matter how you tweak the parameterisation:

bbknn/bbknn/matrix.py

Lines 220 to 222 in d2d5a65

#pynndescent wants at least 11 cells per batch, from testing
if np.min(counts) < 11:
raise ValueError("Not all batches have at least 11 cells in them - required by pynndescent.")

In testing, it appears that the default pynndescent neighbour count of 30 runs just fine on data with 31 observations. You've got some super tiny batches going on, is that intentional?

Reopening as I'll need to add a workaround into the code. This is not pressing as having 30 cell batches is not the most common.

Oops sorry!