pycroscopy / pycroscopy

Scientific analysis of nanoscale materials imaging data

Home Page:https://pycroscopy.github.io/pycroscopy/about.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KMeans Clustering not writing to file

sulaymandesai opened this issue · comments

Hi,

Hope you're well.

I am following this published notebook: https://nbviewer.jupyter.org/github/pycroscopy/papers/blob/master/Notebooks/EM/STEM/Image_Cleaning_Atom_Finding.ipynb

When I try to run the KMeans clustering I have the following error:

num_clusters = 4
# num_clusters = 32
estimator = px.processing.Cluster(h5_U, KMeans(n_clusters=num_clusters), num_comps=num_comps)

if estimator.duplicate_h5_groups==[]:
    t0 = time()
    h5_kmeans = estimator.compute()
    print('kMeans took {} seconds.'.format(round(time()-t0, 2)))
else:
    h5_kmeans = estimator.duplicate_h5_groups[-1]
    print( 'Using existing results.') 
    
print( 'Clustering results in {}.'.format(h5_kmeans.name))

half_wind = int(win_size*0.5)
# generate a cropped image that was effectively the area that was used for pattern searching
# Need to get the math righ on the counting
cropped_clean_image = clean_image_mat[half_wind:-half_wind + 1, half_wind:-half_wind + 1]

# Plot cluster results Get the labels dataset
labels_mat = np.reshape(h5_kmeans['Labels'][()], [num_rows, num_cols])

fig, axes = plt.subplots(ncols=2, figsize=(14,7))
axes[0].imshow(cropped_clean_image,cmap=spiepy.NANOMAP, origin='lower')
axes[0].set_title('Cleaned Image', fontsize=16)
axes[1].imshow(labels_mat, aspect=1, interpolation='none',cmap=spiepy.NANOMAP, origin='lower')
axes[1].set_title('K-means cluster labels', fontsize=16);
for axis in axes:
    axis.get_yaxis().set_visible(False)
    axis.get_xaxis().set_visible(False)
usid.jupyter_utils.save_fig_filebox_button(fig, 'Clustered_Clean_Image.png')
Consider calling test() to check results before calling compute() which computes on the entire dataset and writes results to the HDF5 file
Group: <HDF5 group "/Measurement_000/Channel_000/Plane_Mean_Subtracted_Data-Windowing_000/Image_Windows-SVD_000/U-Cluster_000" (0 members)> had neither the status HDF5 dataset or the legacy attribute: "last_pixel".
Group: <HDF5 group "/Measurement_000/Channel_000/Plane_Mean_Subtracted_Data-Windowing_000/Image_Windows-SVD_000/U-Cluster_001" (0 members)> had neither the status HDF5 dataset or the legacy attribute: "last_pixel".
Group: <HDF5 group "/Measurement_000/Channel_000/Plane_Mean_Subtracted_Data-Windowing_000/Image_Windows-SVD_000/U-Cluster_002" (0 members)> had neither the status HDF5 dataset or the legacy attribute: "last_pixel".
Performing clustering on /Measurement_000/Channel_000/Plane_Mean_Subtracted_Data-Windowing_000/Image_Windows-SVD_000/U.
Took 5.76 sec to compute KMeans
Calculated the Mean Response of each cluster.
Took 340.1 msec to calculate mean response per cluster
Writing clustering results to file.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-6b9a66d30096> in <module>
      7 if estimator.duplicate_h5_groups==[]:
      8     t0 = time()
----> 9     h5_kmeans = estimator.compute()
     10     print('kMeans took {} seconds.'.format(round(time()-t0, 2)))
     11 else:

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy-0.60.7-py3.8.egg/pycroscopy/processing/cluster.py in compute(self, rearrange_clusters, override)
    226 
    227         if self.h5_results_grp is None:
--> 228             h5_group = self._write_results_chunk()
    229             self.delete_results()
    230         else:

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy-0.60.7-py3.8.egg/pycroscopy/processing/cluster.py in _write_results_chunk(self)
    282         h5_cluster_group = create_results_group(self.h5_main, self.process_name,
    283                                                 h5_parent_group=self._h5_target_group)
--> 284         self._write_source_dset_provenance()
    285 
    286         write_simple_attrs(h5_cluster_group, self.parms_dict)

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pyUSID/processing/process.py in _write_source_dset_provenance(self)
    793 
    794     @staticmethod
--> 795     def _map_function(*args, **kwargs):
    796         """
    797         The function that manipulates the data on a single instance (position). This will be used by

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/sidpy/hdf/hdf_utils.py in write_simple_attrs(h5_obj, attrs, verbose)
    371                         '{}'.format(type(attrs)))
    372     if not isinstance(h5_obj, (h5py.File, h5py.Group, h5py.Dataset)):
--> 373         raise TypeError('h5_obj should be a h5py File, Group or Dataset object'
    374                         ' but is instead of type '
    375                         '{}t'.format(type(h5_obj)))

TypeError: h5_obj should be a h5py File, Group or Dataset object but is instead of type <class 'NoneType'>t

Any help would be appreciated!

Hi,

Just double checking someone was able to look at this?

Best wishes,
Sulayman

Hi @sulaymandesai - we are looking into this and should be able to make more progress tomorrow. One of us will get in touch with you with a solution.

Hi Ramav, thanks a lot. Pycroscopy is a really useful module and I am enjoying using it. Will let you know if I run into any other issues.

Best wishes,
Sulayman

Hi Ramav,

I have a quick question about the output of the K-Means clustering function. There are 2 datasets outputted, the labels matrix which is the cluster each pixel is assigned to, and the means response. Can you give me an intuitive explanation of what the means response is and how it is calculated?

It looks like this issue has been resolved. Closing