Teichlab / bbknn

Batch balanced KNN

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

np.matrix and ridge_regression

maryellenlynall opened this issue · comments

Hello,
I'm trying to use bbknn.ridge_regression but get the following output when I run
bbknn.ridge_regression(adata, batch_key=['batch'], confounder_key=['cell_type'])
Is this an issue with compatibility with current numpy?
Many thanks


TypeError Traceback (most recent call last)
Cell In[19], line 9
7 import bbknn
8 # bbknn.bbknn(adata_v3)
----> 9 bbknn.ridge_regression(adata_v3, batch_key=['batch'], confounder_key=['cell_type'])
10 # scanpy.tl.pca(adata_v3)
11 # bbknn.bbknn(adata_v3)

File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/bbknn/init.py:196, in ridge_regression(adata, batch_key, confounder_key, chunksize, copy, **kwargs)
193 X_exp = X_exp.todense()
194 #fit the ridge regression model, compute the expression explained by the technical
195 #effect, and the remaining residual
--> 196 LR.fit(dummy,X_exp)
197 X_explained.append(dm.dot(LR.coef_[:,batch_index].T))
198 X_remain.append(X_exp - X_explained[-1])

File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/base.py:1151, in _fit_context..decorator..wrapper(estimator, *args, **kwargs)
1144 estimator._validate_params()
1146 with config_context(
1147 skip_parameter_validation=(
1148 prefer_skip_nested_validation or global_skip_validation
1149 )
1150 ):
-> 1151 return fit_method(estimator, *args, **kwargs)

File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/linear_model/_ridge.py:1134, in Ridge.fit(self, X, y, sample_weight)
1114 """Fit Ridge regression model.
1115
1116 Parameters
(...)
1131 Fitted estimator.
1132 """
1133 _accept_sparse = _get_valid_accept_sparse(sparse.issparse(X), self.solver)
-> 1134 X, y = self._validate_data(
1135 X,
1136 y,
1137 accept_sparse=_accept_sparse,
1138 dtype=[np.float64, np.float32],
1139 multi_output=True,
1140 y_numeric=True,
1141 )
1142 return super().fit(X, y, sample_weight=sample_weight)

File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/base.py:621, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, cast_to_ndarray, **check_params)
619 y = check_array(y, input_name="y", **check_y_params)
620 else:
--> 621 X, y = check_X_y(X, y, **check_params)
622 out = X, y
624 if not no_val_X and check_params.get("ensure_2d", True):

File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/utils/validation.py:1163, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
1143 raise ValueError(
1144 f"{estimator_name} requires y to be passed, but the target y is None"
1145 )
1147 X = check_array(
1148 X,
1149 accept_sparse=accept_sparse,
(...)
1160 input_name="X",
1161 )
-> 1163 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric, estimator=estimator)
1165 check_consistent_length(X, y)
1167 return X, y

File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/utils/validation.py:1173, in _check_y(y, multi_output, y_numeric, estimator)
1171 """Isolated part of check_X_y dedicated to y validation"""
1172 if multi_output:
-> 1173 y = check_array(
1174 y,
1175 accept_sparse="csr",
1176 force_all_finite=True,
1177 ensure_2d=False,
1178 dtype=None,
1179 input_name="y",
1180 estimator=estimator,
1181 )
1182 else:
1183 estimator_name = _check_estimator_name(estimator)

File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/utils/validation.py:753, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
662 """Input validation on an array, list, sparse matrix or similar.
663
664 By default, the input is checked to be a non-empty 2D array containing
(...)
750 The converted and validated array.
751 """
752 if isinstance(array, np.matrix):
--> 753 raise TypeError(
754 "np.matrix is not supported. Please convert to a numpy array with "
755 "np.asarray. For more information see: "
756 "https://numpy.org/doc/stable/reference/generated/numpy.matrix.html"
757 )
759 xp, is_array_api_compliant = get_namespace(array)
761 # store reference to original array to check if copy is needed when
762 # function returns

TypeError: np.matrix is not supported. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html

Please provide information on what is in your adata.X.

It is:
<302045x22050 sparse matrix of type '<class 'numpy.float32'>'
with 562021468 stored elements in Compressed Sparse Row format>

The following has been applied to count data to generate adata.X:
sc.pp.normalize_per_cell(adata, counts_per_cell_after=1e4)
sc.pp.log1p(adata)

Thanks!

As a way to get you going with your analysis, run sc.pp.scale() on the object after subsetting it to highly variable genes. You'll probably want to do that before the PCA anyway. This should turn .X into a dense numpy array which is what this was fed in recent times.

I think I know how to fix the sparse version, but can't get into the local computing setup to test my hypothesis.

I have pushed a fix to the sparse scenario to GitHub, install it via pip install git+https://github.com/Teichlab/bbknn.git. At some point sklearn stopped supporting numpy matrices, which is what scipy sparse arrays become when .todense()'d. Thanks for bringing this to my attention.

Thank you for updating that so quickly, much appreciated - I'm no longer getting the error.