SVD error

Question

SVD error

sulaymandesai opened this issue 3 years ago · comments

Hi,

I have been following the example notebooks on this GitHub page to perform SVD. I get the following error:

 1 decomposer = px.processing.svd_utils.SVD(h5_main, num_components=100)
----> 2 h5_svd_group = decomposer.compute()
      3 
      4 h5_u = h5_svd_group['U']
      5 h5_v = h5_svd_group['V']

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy/processing/svd_utils.py in compute(self, override)
    161         """
    162         if self.__u is None and self.__v is None and self.__s is None:
--> 163             self.test(override=override)
    164 
    165         if self.h5_results_grp is None:

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy/processing/svd_utils.py in test(self, override)
    137             raise ValueError('Could not reshape U to N-Dimensional dataset! Error:' + success)
    138 
--> 139         v_mat, success = reshape_to_n_dims(self.__v, h5_pos=np.expand_dims(np.arange(self.__u.shape[1]), axis=1),
    140                                            h5_spec=self.h5_main.h5_spec_inds)
    141         if not success:

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pyUSID/io/hdf_utils/model.py in reshape_to_n_dims(h5_main, h5_pos, h5_spec, get_labels, verbose, sort_dims, lazy)
     84     else:
     85         if not isinstance(h5_main, (h5py.Dataset, np.ndarray, da.core.Array)):
---> 86             raise TypeError('h5_main should either be a h5py.Dataset or numpy array')
     87 
     88     if h5_pos is not None:

TypeError: h5_main should either be a h5py.Dataset or numpy array

Any help would be appreciated!

Suhas Somnath · Answer 1 · Thu Dec 10 2020 11:48:50 GMT+0800 (China Standard Time)

@sulaymandesai Did you manage to identify the source of and/or resolve the problem? One of us would be able to look into this issue on Friday otherwise.

Sulayman Desai · Answer 2 · Thu Dec 10 2020 21:43:15 GMT+0800 (China Standard Time)

Hi, I was unable to solve this issue. I tried using another function from your processing class and had this error from the following code:

# This creates a 4D data set that associates each pixel with a window
fft_mode = None # Options are None, 'abs', 'data+abs', or 'complex'
t0 = time()
h5_wins = iw.do_windowing(win_x=win_size,
                          win_y=win_size,
                          save_plots=False,
                          show_plots=False,
                          win_fft=fft_mode)
print('Windowing took {} seconds.'.format(round(time()-t0, 2)))
print('\nRaw data was of shape {} and the windows dataset is now of shape {}'.format(h5_main.shape, h5_wins.shape))
print('Now each position (window) is descibed by a set of pixels')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-ed5732a2cf29> in <module>
      2 fft_mode = None # Options are None, 'abs', 'data+abs', or 'complex'
      3 t0 = time()
----> 4 h5_wins = iw.do_windowing(win_x=win_size,
      5                           win_y=win_size,
      6                           save_plots=False,

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy/processing/image_processing.py in do_windowing(self, win_x, win_y, win_step_x, win_step_y, win_fft, *args, **kwargs)
    157             win_y = win_test
    158 
--> 159         image, h5_wins, win_pos_mat, have_old = self._setup_window_h5(h5_main, psf_width, win_fft, win_step_x,
    160                                                                       win_step_y, win_type, win_x, win_y)
    161 

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy/processing/image_processing.py in _setup_window_h5(self, h5_main, psf_width, win_fft, win_step_x, win_step_y, win_type, win_x, win_y)
    365             ds_group.attrs['psf_width'] = psf_width
    366             ds_group.attrs['fft_mode'] = win_fft
--> 367             image_refs = self.hdf.write(ds_group)
    368 
    369             '''

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy/io/hdf_writer.py in write(self, data, print_log)
    222         else:
    223             # For a group we write it and its attributes
--> 224             h5_grp = self._create_group(h5_file[data.parent], data, print_log=print_log)
    225             root = h5_grp.name
    226             ref_list.append(h5_grp)

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy/io/hdf_writer.py in _create_group(h5_parent_group, micro_group, print_log)
    319 
    320         # Write attributes
--> 321         write_simple_attrs(h5_new_group, micro_group.attrs, 'group', verbose=print_log)
    322 
    323         return h5_new_group

TypeError: write_simple_attrs() got multiple values for argument 'verbose'

Rajiv Giridharagopal · Answer 3 · Fri Dec 11 2020 02:08:14 GMT+0800 (China Standard Time)

Hi Sulayman,
I tried running this notebook (I assume you are looking at this one).

I did not get any errors so I could not reproduce this problem, in both Jupyter and as a sanity check in Spyder. It looks like maybe it didn't read in the data correctly for you when run, otherwise h5_main should be a Dataset.

Can you clarify how you ran this? That might help.

Sulayman Desai · Answer 4 · Fri Dec 11 2020 04:35:11 GMT+0800 (China Standard Time)

Maybe the issue arises from my translation? I used the NumpyTranslator example from the pyUSID page. Below is the majority of my code.

h5_path = '/Users/sulaymandesai/Documents/Year_4/MSciProject/LoadData/h5/default_2017Jun09-162147_STM-STM_Spectroscopy--11_1_down-bwd.h5'

h5_file = h5py.File(h5_path, mode='r+')
usid.hdf_utils.print_tree(h5_file)

h5_main = usid.USIDataset(h5_file['/Measurement_000/Channel_000/Raw_Data'])

h5_main

<HDF5 dataset "Raw_Data": shape (122500, 1), type "<f8">
located at: 
	/Measurement_000/Channel_000/Raw_Data 
Data contains: 
	Z-height (nm) 
Data dimensions and original shape: 
Position Dimensions: 
	Y - size: 350 
	X - size: 350 
Spectroscopic Dimensions: 
	arb. - size: 1
Data Type:
	float64

decomposer = px.processing.svd_utils.SVD(h5_main, num_components=100)
h5_svd_group = decomposer.compute()

Sulayman Desai · Answer 5 · Fri Dec 11 2020 04:35:46 GMT+0800 (China Standard Time)

I can provide the code I used to translate my data if that helps?

Suhas Somnath · Answer 6 · Fri Dec 11 2020 06:19:24 GMT+0800 (China Standard Time)

Yes. The source data and complete code would certainly help us. Thanks.

Sulayman Desai · Answer 7 · Fri Dec 11 2020 18:29:04 GMT+0800 (China Standard Time)

The FlatFile function which I am using extracts the data (Z-Piezo height values) from my flatfile. Info is just the metadata associated with each file. I've done a for loop over each file as each file can contain multiple images e.g. forward/up scan and backward/up scan.

class FlatFileTranslator(usid.NumpyTranslator):
    """
    The above definition of the class states that our FlatFileTranslator inherits all the capabilities and
    behaviors of the NumpyTranslator class and builds on top of it.
    For more information on the Numpy translator: 
    https://pycroscopy.github.io/pyUSID/auto_examples/beginner/plot_numpy_translator.html
    """

    def translate(self, input_folder_path, h5_folder_path):
        """
        This function extracts data and metadata from Omicron Scanning Tunnelling Microscope (STM) Flat Files
        and translates this into the Pycroscopy compatible pyUSID format. 
        
        Parameters
        ----------
        input_folder_path : str
            Path to the input data folder containing all the files and their information
        
        h5_folder_path : str
            Path to h5 data folder to store the USID formatted files

        Returns
        -------
        h5_path : str
            Path to the USID h5 output folder
        """
        """
        --------------------------------------------------------------------------------------------
        1. Extracting data and metadata out of the proprietary file
        --------------------------------------------------------------------------------------------
        1.2 Read the contents of the file into memory
        """
        prevdir = os.getcwd()
        os.chdir(input_folder_path)
        
        h5_path_array = []
        
        for file in os.listdir(input_folder_path):
            file_flat = file
            load_file = FlatFile(file_flat)
            d = load_file.getData()
            
            """
            1.3 Extract all experiment and instrument related parameters
            """
            for i in enumerate(d):
                
                raw_data = d[i[0]].data
                metadata = d[i[0]].info
                
                """
                1.4 Prepare the output file path
                """
                folder_path, file_name = os.path.split(file_flat)
                file_name = file_name[:-7] + '_' + metadata['direction']
                h5_path = os.path.join(h5_folder_path, file_name + '.h5')
                
                """
                1.5 Reshape raw_data into USID 2D shape (position x spectral)
                """
                raw_data_2D = np.reshape(raw_data, (raw_data.shape[0] * raw_data.shape[1], 1))
                
                """
                1.6 Extract or generate parameters that define the position and spectral dimensions
                """
                xaxis = metadata['xreal']
                yaxis = metadata['yreal']
                
                xaxis = xaxis/2
                yaxis = yaxis/2
                
                num_rows = int(metadata['yres'])
                num_cols = int(metadata['xres'])
                num_pos = num_rows * num_cols
                
                y_qty = 'Y'
                y_units = 'nm'
                y_vec = np.linspace(-yaxis, yaxis, num_rows, endpoint=True)

                x_qty = 'X'
                x_units = 'nm'
                x_vec = np.linspace(-xaxis, xaxis, num_cols, endpoint=True)
                
                main_data_name = 'STM'
                main_qty = 'Z-height'
                main_units = 'nm'
                
                """
                --------------------------------------------------------------------------------------------
                2. Writing to h5USID file using pyUSID
                --------------------------------------------------------------------------------------------
                2.2 Expressing the Position and Spectroscopic Dimensions using pyUSID.Dimension objects
                """
                pos_dims = [usid.Dimension(x_qty, x_units, x_vec),
                            usid.Dimension(y_qty, y_units, y_vec)]
                
                spec_dims = usid.Dimension(name = 'arb.', units = '', values = int(1))
                
                """
                2.3 Call the translate() function of the base NumpyTranslator class   
                """
                _ = super(FlatFileTranslator, self).translate(h5_path, main_data_name,
                                                     raw_data_2D, main_qty, main_units,
                                                     pos_dims, spec_dims,
                                                     parm_dict=metadata)
                
                h5_path_array.append(h5_path)
        
        # Changing back to original directory
        os.chdir(prevdir)
        
        return h5_path_array

ramav87 · Answer 8 · Sat Dec 12 2020 04:10:35 GMT+0800 (China Standard Time)

Do you have an example file that we can take a look at as well?

Sulayman Desai · Answer 9 · Sat Dec 12 2020 22:02:07 GMT+0800 (China Standard Time)

Archive.zip
I've attached a zip file which contains a flat file plus the corresponding translated .h5 files. The parser I used to extract the flat file data in python is here: https://github.com/tobias-gill/STM_file_management/tree/fd40b5da828368c0a3948fb2e8e31a77b5b8ef06

Suhas Somnath · Answer 10 · Tue Dec 15 2020 00:59:09 GMT+0800 (China Standard Time)

Thanks to @nccreang who is now able to reproduce this issue using synthetic data. @sulaymandesai - we are working on this issue.

Sulayman Desai · Answer 11 · Tue Dec 15 2020 01:59:31 GMT+0800 (China Standard Time)

Thanks for letting me know!

Suhas Somnath · Answer 12 · Mon Dec 21 2020 00:00:12 GMT+0800 (China Standard Time)

@sulaymandesai - please try using the master branch of pycroscopy. Your issue should be fixed with the latest pull request.

You will need to uninstall pycroscopy via:
pip uninstall pycroscopy

Then:
pip install -U git+https://github.com/pycroscopy/pycroscopy@master

Fundamentally, SVD is not a useful tool if you have a singular spectroscopic dimension as is the case with your example.

Please let us know if you have any questions or if you continue to face any issues

Sulayman Desai · Answer 13 · Mon Dec 21 2020 00:30:14 GMT+0800 (China Standard Time)

Hi, thanks for this. I tried what you suggested and keep receiving an install failure:

ERROR: Command errored out with exit status 128: git clone -q https://github.com/pycroscopy/pycroscopy /private/var/folders/sl/1sn4ncwj71d45kqy5mnl11q40000gn/T/pip-req-build-4et0paz5 Check the logs for full command output.

Suhas Somnath · Answer 14 · Mon Dec 21 2020 04:58:51 GMT+0800 (China Standard Time)

Indeed. There seems to be an issue with pip.

I would suggest cloning the repository and then installing:
git clone https://github.com/pycroscopy/pycroscopy.git
followed by:
cd pycroscopy
and then install:
python setup.py install