theislab / ehrapy

Electronic Health Record Analysis with Python.

Home Page:https://ehrapy.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sparse encoding

Zethson opened this issue · comments

Description of feature

I looked a bit into sparse encoding. One-hot encoding being the most important:

  1. scikit-learn's one-hot encoding supports a sparse_output parameter that should return a CSR matrix.
  2. We're getting original_values as numpy arrays when calling the function. May or may not be fine.
  3. Currently we default the sparse_output parameter to False without checking the type of matrix.
  4. The _update_encoded_data does not take sparse matrices into account