quinngroup / parkinsons_multimodal

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError when trying to do a PCA

himaniyadav opened this issue · comments

My code is up to date on the main branch. I've added a requirements.txt file for ease of reproducibility.

I've gotten the VAE to train. Just to be able to visualize the resulting latent vectors, I've been trying to run a simple PCA with 2 dimensions to be able to visualize the latent space -- I'm definitely not expecting the results to be particularly good, but would just like to see what it looks like at this stage. The resulting latent vectors have 256 dimensions and are of type torch.Tensor. I'm just grabbing the latent vector for each batch after it's trained. So into the PCA I'm passing in a list of size 114 (# batches) with each item being having shape [1, 256] (batch size of 1, each vector 256 dimensions). I keep getting the following error: ValueError: only one element tensors can be converted to Python scalars.

There might be a simple solution but a few online searches didn't yield any helpful results. I feel like it could be something obvious I'm missing but I couldn't exactly find what would be causing the error. Would appreciate any help if something stands out!

Command:
python src/main.py

Full stack trace:

Traceback (most recent call last):
  File "main.py", line 49, in <module>
    pca_embedding = PCA(n_components=2).fit_transform(big_mus)
  File "/home/himani/anaconda3/lib/python3.7/site-packages/sklearn/decomposition/_pca.py", line 376, in fit_transform
    U, S, V = self._fit(X)
  File "/home/himani/anaconda3/lib/python3.7/site-packages/sklearn/decomposition/_pca.py", line 398, in _fit
    ensure_2d=True, copy=self.copy)
  File "/home/himani/anaconda3/lib/python3.7/site-packages/sklearn/base.py", line 420, in _validate_data
    X = check_array(X, **check_params)
  File "/home/himani/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)
  File "/home/himani/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 598, in check_array
    array = np.asarray(array, order=order, dtype=dtype)
  File "/home/himani/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py", line 83, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: only one element tensors can be converted to Python scalars

Try to provide some other diagnostics if you can -- e.g. the observed vs expected shape of big_mus when being fed to the transform, and any other things you think may be relevant. Initial guess would be that big_mus is of the wrong shape.

"So into the PCA I'm passing in a list of size 114 (# batches) with each item being having shape [1, 256] (batch size of 1, each vector 256 dimensions)"

Is this list that you're passing in the same as big_mus? I'm going to assume yes for the rest of this comment. Then, the expected shape is [114,1,256] correct? I believe per sklearn's API, the transform data should be a second-order list, e.g. [114,256]. And make sure to convert from torch tensors to numpy arrays (you can use the tensor .numpy() method, amongst other options).

Implement both those changes and let me know if the problem persists. If it does, try to provide more contextual debugging info if possible

Passed in a new numpy array of shape (114, 256) and it works now, thank you!