pycroscopy / pycroscopy

Scientific analysis of nanoscale materials imaging data

Home Page:https://pycroscopy.github.io/pycroscopy/about.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SVD error

sulaymandesai opened this issue · comments

Hi,

I have been following the example notebooks on this GitHub page to perform SVD. I get the following error:

 1 decomposer = px.processing.svd_utils.SVD(h5_main, num_components=100)
----> 2 h5_svd_group = decomposer.compute()
      3 
      4 h5_u = h5_svd_group['U']
      5 h5_v = h5_svd_group['V']

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy/processing/svd_utils.py in compute(self, override)
    161         """
    162         if self.__u is None and self.__v is None and self.__s is None:
--> 163             self.test(override=override)
    164 
    165         if self.h5_results_grp is None:

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy/processing/svd_utils.py in test(self, override)
    137             raise ValueError('Could not reshape U to N-Dimensional dataset! Error:' + success)
    138 
--> 139         v_mat, success = reshape_to_n_dims(self.__v, h5_pos=np.expand_dims(np.arange(self.__u.shape[1]), axis=1),
    140                                            h5_spec=self.h5_main.h5_spec_inds)
    141         if not success:

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pyUSID/io/hdf_utils/model.py in reshape_to_n_dims(h5_main, h5_pos, h5_spec, get_labels, verbose, sort_dims, lazy)
     84     else:
     85         if not isinstance(h5_main, (h5py.Dataset, np.ndarray, da.core.Array)):
---> 86             raise TypeError('h5_main should either be a h5py.Dataset or numpy array')
     87 
     88     if h5_pos is not None:

TypeError: h5_main should either be a h5py.Dataset or numpy array

Any help would be appreciated!

@sulaymandesai Did you manage to identify the source of and/or resolve the problem? One of us would be able to look into this issue on Friday otherwise.

Hi, I was unable to solve this issue. I tried using another function from your processing class and had this error from the following code:

# This creates a 4D data set that associates each pixel with a window
fft_mode = None # Options are None, 'abs', 'data+abs', or 'complex'
t0 = time()
h5_wins = iw.do_windowing(win_x=win_size,
                          win_y=win_size,
                          save_plots=False,
                          show_plots=False,
                          win_fft=fft_mode)
print('Windowing took {} seconds.'.format(round(time()-t0, 2)))
print('\nRaw data was of shape {} and the windows dataset is now of shape {}'.format(h5_main.shape, h5_wins.shape))
print('Now each position (window) is descibed by a set of pixels')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-ed5732a2cf29> in <module>
      2 fft_mode = None # Options are None, 'abs', 'data+abs', or 'complex'
      3 t0 = time()
----> 4 h5_wins = iw.do_windowing(win_x=win_size,
      5                           win_y=win_size,
      6                           save_plots=False,

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy/processing/image_processing.py in do_windowing(self, win_x, win_y, win_step_x, win_step_y, win_fft, *args, **kwargs)
    157             win_y = win_test
    158 
--> 159         image, h5_wins, win_pos_mat, have_old = self._setup_window_h5(h5_main, psf_width, win_fft, win_step_x,
    160                                                                       win_step_y, win_type, win_x, win_y)
    161 

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy/processing/image_processing.py in _setup_window_h5(self, h5_main, psf_width, win_fft, win_step_x, win_step_y, win_type, win_x, win_y)
    365             ds_group.attrs['psf_width'] = psf_width
    366             ds_group.attrs['fft_mode'] = win_fft
--> 367             image_refs = self.hdf.write(ds_group)
    368 
    369             '''

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy/io/hdf_writer.py in write(self, data, print_log)
    222         else:
    223             # For a group we write it and its attributes
--> 224             h5_grp = self._create_group(h5_file[data.parent], data, print_log=print_log)
    225             root = h5_grp.name
    226             ref_list.append(h5_grp)

~/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pycroscopy/io/hdf_writer.py in _create_group(h5_parent_group, micro_group, print_log)
    319 
    320         # Write attributes
--> 321         write_simple_attrs(h5_new_group, micro_group.attrs, 'group', verbose=print_log)
    322 
    323         return h5_new_group

TypeError: write_simple_attrs() got multiple values for argument 'verbose'

Hi Sulayman,
I tried running this notebook (I assume you are looking at this one).

I did not get any errors so I could not reproduce this problem, in both Jupyter and as a sanity check in Spyder. It looks like maybe it didn't read in the data correctly for you when run, otherwise h5_main should be a Dataset.

Can you clarify how you ran this? That might help.

Maybe the issue arises from my translation? I used the NumpyTranslator example from the pyUSID page. Below is the majority of my code.

h5_path = '/Users/sulaymandesai/Documents/Year_4/MSciProject/LoadData/h5/default_2017Jun09-162147_STM-STM_Spectroscopy--11_1_down-bwd.h5'

h5_file = h5py.File(h5_path, mode='r+')
usid.hdf_utils.print_tree(h5_file)
h5_main = usid.USIDataset(h5_file['/Measurement_000/Channel_000/Raw_Data'])
h5_main
<HDF5 dataset "Raw_Data": shape (122500, 1), type "<f8">
located at: 
	/Measurement_000/Channel_000/Raw_Data 
Data contains: 
	Z-height (nm) 
Data dimensions and original shape: 
Position Dimensions: 
	Y - size: 350 
	X - size: 350 
Spectroscopic Dimensions: 
	arb. - size: 1
Data Type:
	float64
decomposer = px.processing.svd_utils.SVD(h5_main, num_components=100)
h5_svd_group = decomposer.compute()

I can provide the code I used to translate my data if that helps?

Yes. The source data and complete code would certainly help us. Thanks.

The FlatFile function which I am using extracts the data (Z-Piezo height values) from my flatfile. Info is just the metadata associated with each file. I've done a for loop over each file as each file can contain multiple images e.g. forward/up scan and backward/up scan.

class FlatFileTranslator(usid.NumpyTranslator):
    """
    The above definition of the class states that our FlatFileTranslator inherits all the capabilities and
    behaviors of the NumpyTranslator class and builds on top of it.
    For more information on the Numpy translator: 
    https://pycroscopy.github.io/pyUSID/auto_examples/beginner/plot_numpy_translator.html
    """

    def translate(self, input_folder_path, h5_folder_path):
        """
        This function extracts data and metadata from Omicron Scanning Tunnelling Microscope (STM) Flat Files
        and translates this into the Pycroscopy compatible pyUSID format. 
        
        Parameters
        ----------
        input_folder_path : str
            Path to the input data folder containing all the files and their information
        
        h5_folder_path : str
            Path to h5 data folder to store the USID formatted files

        Returns
        -------
        h5_path : str
            Path to the USID h5 output folder
        """
        """
        --------------------------------------------------------------------------------------------
        1. Extracting data and metadata out of the proprietary file
        --------------------------------------------------------------------------------------------
        1.2 Read the contents of the file into memory
        """
        prevdir = os.getcwd()
        os.chdir(input_folder_path)
        
        h5_path_array = []
        
        for file in os.listdir(input_folder_path):
            file_flat = file
            load_file = FlatFile(file_flat)
            d = load_file.getData()
            
            """
            1.3 Extract all experiment and instrument related parameters
            """
            for i in enumerate(d):
                
                raw_data = d[i[0]].data
                metadata = d[i[0]].info
                
                """
                1.4 Prepare the output file path
                """
                folder_path, file_name = os.path.split(file_flat)
                file_name = file_name[:-7] + '_' + metadata['direction']
                h5_path = os.path.join(h5_folder_path, file_name + '.h5')
                
                """
                1.5 Reshape raw_data into USID 2D shape (position x spectral)
                """
                raw_data_2D = np.reshape(raw_data, (raw_data.shape[0] * raw_data.shape[1], 1))
                
                """
                1.6 Extract or generate parameters that define the position and spectral dimensions
                """
                xaxis = metadata['xreal']
                yaxis = metadata['yreal']
                
                xaxis = xaxis/2
                yaxis = yaxis/2
                
                num_rows = int(metadata['yres'])
                num_cols = int(metadata['xres'])
                num_pos = num_rows * num_cols
                
                y_qty = 'Y'
                y_units = 'nm'
                y_vec = np.linspace(-yaxis, yaxis, num_rows, endpoint=True)

                x_qty = 'X'
                x_units = 'nm'
                x_vec = np.linspace(-xaxis, xaxis, num_cols, endpoint=True)
                
                main_data_name = 'STM'
                main_qty = 'Z-height'
                main_units = 'nm'
                
                """
                --------------------------------------------------------------------------------------------
                2. Writing to h5USID file using pyUSID
                --------------------------------------------------------------------------------------------
                2.2 Expressing the Position and Spectroscopic Dimensions using pyUSID.Dimension objects
                """
                pos_dims = [usid.Dimension(x_qty, x_units, x_vec),
                            usid.Dimension(y_qty, y_units, y_vec)]
                
                spec_dims = usid.Dimension(name = 'arb.', units = '', values = int(1))
                
                """
                2.3 Call the translate() function of the base NumpyTranslator class   
                """
                _ = super(FlatFileTranslator, self).translate(h5_path, main_data_name,
                                                     raw_data_2D, main_qty, main_units,
                                                     pos_dims, spec_dims,
                                                     parm_dict=metadata)
                
                h5_path_array.append(h5_path)
        
        # Changing back to original directory
        os.chdir(prevdir)
        
        return h5_path_array

Do you have an example file that we can take a look at as well?

Archive.zip
I've attached a zip file which contains a flat file plus the corresponding translated .h5 files. The parser I used to extract the flat file data in python is here: https://github.com/tobias-gill/STM_file_management/tree/fd40b5da828368c0a3948fb2e8e31a77b5b8ef06

Thanks to @nccreang who is now able to reproduce this issue using synthetic data. @sulaymandesai - we are working on this issue.

Thanks for letting me know!

@sulaymandesai - please try using the master branch of pycroscopy. Your issue should be fixed with the latest pull request.

You will need to uninstall pycroscopy via:
pip uninstall pycroscopy

Then:
pip install -U git+https://github.com/pycroscopy/pycroscopy@master

Fundamentally, SVD is not a useful tool if you have a singular spectroscopic dimension as is the case with your example.

Please let us know if you have any questions or if you continue to face any issues

Hi, thanks for this. I tried what you suggested and keep receiving an install failure:

ERROR: Command errored out with exit status 128: git clone -q https://github.com/pycroscopy/pycroscopy /private/var/folders/sl/1sn4ncwj71d45kqy5mnl11q40000gn/T/pip-req-build-4et0paz5 Check the logs for full command output.

Indeed. There seems to be an issue with pip.

I would suggest cloning the repository and then installing:
git clone https://github.com/pycroscopy/pycroscopy.git
followed by:
cd pycroscopy
and then install:
python setup.py install