pycroscopy / pycroscopy

Scientific analysis of nanoscale materials imaging data

Home Page:https://pycroscopy.github.io/pycroscopy/about.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reorganization

ssomnath opened this issue · comments

Here's one way we could reorganize the package:

  • pycroscopy
    • learn
      • ml
        • unsupervised
          • cluster
          • decomposition
          • other unmixing
          • gaussian mixture modelling
      • DL
        • cleaning / denoising
        • auto encoders
    • stats
      • gaussian processes
      • compressed sensing
    • image
      • general feature extraction
      • geometry feature extraction
      • atom finding
      • denoising
      • windowing ?
      • transforms
    • signal
      • filtering
      • peak finding
      • moving average
      • integration
      • derivation
      • outlier removal
      • transforms
    • fft
      • power spectra
      • smart fft?
      • wavelet
    • corr (2D with 1D, simulation with experiment / ML)
      • pan-sharpening
      • simple correlation
    • viz
      • dashboards (e.g. of hyper spectral data)

@gduscher @ramav87 @ziatdinovmax @rajgiriUW @dxm447 what are your thoughts?

This seems sensible. I see the Phoenix branch is actively moving things around. I very routinely used the SVD part of pycroscopy, and it sort of makes sense to keep this the "analysis" package in some sense since there's already a grabbag of useful things in here

@rajgiriUW - we will indeed have similar capabilities under the learn sub package. However, the new pycroscopy will not do any file I/O itself. This is part of an effort to lower the barrier to entry (easier for users to supply a numpy array than a HDF5 file and easier for developers to contribute code).

For those, such as yourself, who prefer the classical way of how things are being done - USID main dataset in -> USID main dataset out in the HDF5 file, I can think of a few solutions:

  1. Continue using the older (current) version of pycroscopy. However, it will not be possible to have both the old and new versions of pycroscopy simultaneously so this option is for folks who do not need what is coming up.
  2. Use the newer pycroscopy functions instead but add the file I/O commands to your own scripts to get back the functionality that we have right now
  3. Maintain a copy of the existing SVD, Cluster, Decomposition, etc. modules in your own package.

What are your thoughts?

Ah, I see, in that case I suggest option 2 (maintaining file I/O in users' separate packages) rather than trying to maintain a legacy version. That seems most sensible to compartmentalize, perhaps?

@rajgiriUW - Option 2 is indeed the way to go in the long run. However, the code for the stats and learn modules in the new pycroscopy will probably take a few months to be built, test, and deemed useable.

Since, you are interested in file -> compute -> file workflow, I would suggest:

  1. Copying over any legacy code (option 3) you like to continue to use.
  2. Build helper functions that do what you desire in your own package. Obviously, since the existing code already does what you want, your helper functions would be very thin wrappers, if at all, on top of the legacy code.
  3. When the new pycroscopy code comes online, you can swap out the legacy code with the new pycroscopy code + add the file input / output codes to continue to maintain the same functionality in your helper code.
    These helper codes will help you insulate the rest of your code from changes underneath.

Does this sound good?