Error when opening hdf5 in pandas
ajasja opened this issue · comments
- Picasso version: 0.4 (pip)
- Python version: 3.8.5
- Operating System: Win 10
Description
I opened a tiff file, localized and fit the spots and then saved the localization (ps what is the save spots feature?).
Then I wanted to open the example in pandas, according to sample notebook https://github.com/jungmannlab/picasso/blob/master/samples/SampleNotebook.ipynb, but I got an error.
What I Did
import pandas as pd
locs = pd.read_hdf('C2-2B9_SingleMolecule_1nM_A8fibres_25C_75mMNaCl_1_locs_filter.hdf5')
locs.head(5)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
d:\data\2022-04-26_WALKER_DATA\2022-04-26__analysis\10-analyse-picked.ipynb Cell 1' in <cell line: 2>()
[1](vscode-notebook-cell:/d%3A/data/2022-04-26_WALKER_DATA/2022-04-26__analysis/10-analyse-picked.ipynb#ch0000000?line=0) import pandas as pd
----> [2](vscode-notebook-cell:/d%3A/data/2022-04-26_WALKER_DATA/2022-04-26__analysis/10-analyse-picked.ipynb#ch0000000?line=1) locs = pd.read_hdf('C2-2B9_SingleMolecule_1nM_A8fibres_25C_75mMNaCl_1_locs_filter.hdf5')
[3](vscode-notebook-cell:/d%3A/data/2022-04-26_WALKER_DATA/2022-04-26__analysis/10-analyse-picked.ipynb#ch0000000?line=2) locs.head(5)
File C:\bin\python\anaconda64\envs\picasso\lib\site-packages\pandas\io\pytables.py:438, in read_hdf(path_or_buf, key, mode, errors, where, start, stop, columns, iterator, chunksize, **kwargs)
[436](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=435) groups = store.groups()
[437](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=436) if len(groups) == 0:
--> [438](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=437) raise ValueError(
[439](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=438) "Dataset(s) incompatible with Pandas data types, "
[440](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=439) "not table, or no datasets found in HDF5 file."
[441](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=440) )
[442](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=441) candidate_only_group = groups[0]
[444](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=443) # For the HDF file to have only one dataset, all other groups
[445](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=444) # should then be metadata groups for that candidate group. (This
[446](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=445) # assumes that the groups() method enumerates parent groups
[447](file:///c%3A/bin/python/anaconda64/envs/picasso/lib/site-packages/pandas/io/pytables.py?line=446) # before their children.)
ValueError: Dataset(s) incompatible with Pandas data types, not table, or no datasets found in HDF5 file.
Adding the table name explicity fixes the issue:
import pandas as pd
locs = pd.read_hdf('C2-2B9_SingleMolecule_1nM_A8fibres_25C_75mMNaCl_1_locs_filter.hdf5', 'locs')
locs.head(5)
Yes, it should be the locs
-table. Thanks for sharing.
To save locs and allow them to be re-openend you need to write a yaml file. Below a code snippet from SampleNotebook3:
from picasso import io
import os
from h5py import File
base, ext = os.path.splitext(path)
dbscan_info = {
"Generated by": "Picasso HDBSCAN",
"Min samples": min_samples,
"Min cluster size": min_cluster_size,
}
info.append(dbscan_info)
io.save_locs(base + "_dbscan.hdf5", locs, info)
with File(base + "_dbclusters.hdf5", "w") as clusters_file:
clusters_file.create_dataset("clusters", data=clusters)
print('Complete')