KMeans Clustering not writing to file
sulaymandesai opened this issue · comments
Hi,
Hope you're well.
I am following this published notebook: https://nbviewer.jupyter.org/github/pycroscopy/papers/blob/master/Notebooks/EM/STEM/Image_Cleaning_Atom_Finding.ipynb
When I try to run the KMeans clustering I have the following error:
num_clusters = 4
# num_clusters = 32
estimator = px.processing.Cluster(h5_U, KMeans(n_clusters=num_clusters), num_comps=num_comps)
if estimator.duplicate_h5_groups==[]:
t0 = time()
h5_kmeans = estimator.compute()
print('kMeans took {} seconds.'.format(round(time()-t0, 2)))
else:
h5_kmeans = estimator.duplicate_h5_groups[-1]
print( 'Using existing results.')
print( 'Clustering results in {}.'.format(h5_kmeans.name))
half_wind = int(win_size*0.5)
# generate a cropped image that was effectively the area that was used for pattern searching
# Need to get the math righ on the counting
cropped_clean_image = clean_image_mat[half_wind:-half_wind + 1, half_wind:-half_wind + 1]
# Plot cluster results Get the labels dataset
labels_mat = np.reshape(h5_kmeans['Labels'][()], [num_rows, num_cols])
fig, axes = plt.subplots(ncols=2, figsize=(14,7))
axes[0].imshow(cropped_clean_image,cmap=spiepy.NANOMAP, origin='lower')
axes[0].set_title('Cleaned Image', fontsize=16)
axes[1].imshow(labels_mat, aspect=1, interpolation='none',cmap=spiepy.NANOMAP, origin='lower')
axes[1].set_title('K-means cluster labels', fontsize=16);
for axis in axes:
axis.get_yaxis().set_visible(False)
axis.get_xaxis().set_visible(False)
usid.jupyter_utils.save_fig_filebox_button(fig, 'Clustered_Clean_Image.png')
Consider calling test() to check results before calling compute() which computes on the entire dataset and writes results to the HDF5 file
Group: <HDF5 group "/Measurement_000/Channel_000/Plane_Mean_Subtracted_Data-Windowing_000/Image_Windows-SVD_000/U-Cluster_000" (0 members)> had neither the status HDF5 dataset or the legacy attribute: "last_pixel".
Group: <HDF5 group "/Measurement_000/Channel_000/Plane_Mean_Subtracted_Data-Windowing_000/Image_Windows-SVD_000/U-Cluster_001" (0 members)> had neither the status HDF5 dataset or the legacy attribute: "last_pixel".
Group: <HDF5 group "/Measurement_000/Channel_000/Plane_Mean_Subtracted_Data-Windowing_000/Image_Windows-SVD_000/U-Cluster_002" (0 members)> had neither the status HDF5 dataset or the legacy attribute: "last_pixel".
Performing clustering on /Measurement_000/Channel_000/Plane_Mean_Subtracted_Data-Windowing_000/Image_Windows-SVD_000/U.
Took 5.76 sec to compute KMeans
Calculated the Mean Response of each cluster.
Took 340.1 msec to calculate mean response per cluster
Writing clustering results to file.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-35-6b9a66d30096> in <module>
7 if estimator.duplicate_h5_groups==[]:
8 t0 = time()
----> 9 h5_kmeans = estimator.compute()
10 print('kMeans took {} seconds.'.format(round(time()-t0, 2)))
11 else:
~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy-0.60.7-py3.8.egg/pycroscopy/processing/cluster.py in compute(self, rearrange_clusters, override)
226
227 if self.h5_results_grp is None:
--> 228 h5_group = self._write_results_chunk()
229 self.delete_results()
230 else:
~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy-0.60.7-py3.8.egg/pycroscopy/processing/cluster.py in _write_results_chunk(self)
282 h5_cluster_group = create_results_group(self.h5_main, self.process_name,
283 h5_parent_group=self._h5_target_group)
--> 284 self._write_source_dset_provenance()
285
286 write_simple_attrs(h5_cluster_group, self.parms_dict)
~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pyUSID/processing/process.py in _write_source_dset_provenance(self)
793
794 @staticmethod
--> 795 def _map_function(*args, **kwargs):
796 """
797 The function that manipulates the data on a single instance (position). This will be used by
~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/sidpy/hdf/hdf_utils.py in write_simple_attrs(h5_obj, attrs, verbose)
371 '{}'.format(type(attrs)))
372 if not isinstance(h5_obj, (h5py.File, h5py.Group, h5py.Dataset)):
--> 373 raise TypeError('h5_obj should be a h5py File, Group or Dataset object'
374 ' but is instead of type '
375 '{}t'.format(type(h5_obj)))
TypeError: h5_obj should be a h5py File, Group or Dataset object but is instead of type <class 'NoneType'>t
Any help would be appreciated!
Hi,
Just double checking someone was able to look at this?
Best wishes,
Sulayman
Hi @sulaymandesai - we are looking into this and should be able to make more progress tomorrow. One of us will get in touch with you with a solution.
Hi Ramav, thanks a lot. Pycroscopy is a really useful module and I am enjoying using it. Will let you know if I run into any other issues.
Best wishes,
Sulayman
Hi Ramav,
I have a quick question about the output of the K-Means clustering function. There are 2 datasets outputted, the labels matrix which is the cluster each pixel is assigned to, and the means response. Can you give me an intuitive explanation of what the means response is and how it is calculated?
It looks like this issue has been resolved. Closing