Error when opening hdf5 in pandas

Question

Error when opening hdf5 in pandas

ajasja opened this issue 2 years ago · comments

Picasso version: 0.4 (pip)
Python version: 3.8.5
Operating System: Win 10

Description

I opened a tiff file, localized and fit the spots and then saved the localization (ps what is the save spots feature?).
Then I wanted to open the example in pandas, according to sample notebook https://github.com/jungmannlab/picasso/blob/master/samples/SampleNotebook.ipynb, but I got an error.

What I Did

import pandas as pd
locs = pd.read_hdf('C2-2B9_SingleMolecule_1nM_A8fibres_25C_75mMNaCl_1_locs_filter.hdf5')
locs.head(5)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
d:\data\2022-04-26_WALKER_DATA\2022-04-26__analysis\10-analyse-picked.ipynb Cell 1' in <cell line: 2>()
      [1](vscode-notebook-cell:/d%3A/data/2022-04-26_WALKER_DATA/2022-04-26__analysis/10-analyse-picked.ipynb#ch0000000?line=0) import pandas as pd
----> [2](vscode-notebook-cell:/d%3A/data/2022-04-26_WALKER_DATA/2022-04-26__analysis/10-analyse-picked.ipynb#ch0000000?line=1) locs = pd.read_hdf('C2-2B9_SingleMolecule_1nM_A8fibres_25C_75mMNaCl_1_locs_filter.hdf5')
      [3](vscode-notebook-cell:/d%3A/data/2022-04-26_WALKER_DATA/2022-04-26__analysis/10-analyse-picked.ipynb#ch0000000?line=2) locs.head(5)

File C:\bin\python\anaconda64\envs\picasso\lib\site-packages\pandas\io\pytables.py:438, in read_hdf(path_or_buf, key, mode, errors, where, start, stop, columns, iterator, chunksize, **kwargs)
    [436](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=435) groups = store.groups()
    [437](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=436) if len(groups) == 0:
--> [438](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=437)     raise ValueError(
    [439](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=438)         "Dataset(s) incompatible with Pandas data types, "
    [440](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=439)         "not table, or no datasets found in HDF5 file."
    [441](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=440)     )
    [442](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=441) candidate_only_group = groups[0]
    [444](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=443) # For the HDF file to have only one dataset, all other groups
    [445](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=444) # should then be metadata groups for that candidate group. (This
    [446](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=445) # assumes that the groups() method enumerates parent groups
    [447](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=446) # before their children.)

ValueError: Dataset(s) incompatible with Pandas data types, not table, or no datasets found in HDF5 file.

Adding the table name explicity fixes the issue:

import pandas as pd
locs = pd.read_hdf('C2-2B9_SingleMolecule_1nM_A8fibres_25C_75mMNaCl_1_locs_filter.hdf5', 'locs')
locs.head(5)

Maximilian Strauss · Answer 1 · Thu May 05 2022 23:29:22 GMT+0800 (China Standard Time)

Yes, it should be the locs-table. Thanks for sharing.

To save locs and allow them to be re-openend you need to write a yaml file. Below a code snippet from SampleNotebook3:

from picasso import io
import os
from h5py import File

base, ext = os.path.splitext(path)
dbscan_info = {
    "Generated by": "Picasso HDBSCAN",
    "Min samples": min_samples,
    "Min cluster size": min_cluster_size,
}
info.append(dbscan_info)
io.save_locs(base + "_dbscan.hdf5", locs, info)
with File(base + "_dbclusters.hdf5", "w") as clusters_file:
    clusters_file.create_dataset("clusters", data=clusters)
print('Complete')