[ENH] 1.0.0rc feedback

Question

[ENH] 1.0.0rc feedback

ryanhammonds opened this issue 4 years ago · comments

I met with @nschawor, who proposed:

Return a single df instead of two. return_samples kwarg now adds/removes columns, rather than returning to separate dataframes (#75). Edit: A utility func has been added allow df_samples to be separate from df_features if desired.
Increased flexibility for compute_features_2d and compute_features_3d (global thresholds / edges).
Update and add new examples/tutorials.
~~Rename compute_burst_features~~
- this doesn't actually do burst detection (that's done in burst.detect_bursts_amp) so I don't think this func needs renaming.
Plotting in jupyter vs ipython
- plots need plt.show if not in jupyter
Add self imports to all docstrings so they can be copy/pasted to run.
- This is something that needs to be updated across all repos (ndsp, fooof, etc)

@TomDonoghue brought up these points in #75:

what are the pros / cons to 1 vs 2 DFs. I think at some point someone (Erik?) requested the idea of two, which might have benefits?

It makes thing a single df little simpler I suppose. The dataframe isn't too large. I wish there was some kind of sub-dataframe we could have with pandas that is ~~shown~~ hidden but can still be accessed. Maybe @nschawor could chime in on this.

what does "add self imports to all docstrings so they can be copy/pasted to run" mean? The doctests do run right, considering that we test this with doctest?

If one copies/pastes the docstring examples they won't run because the function the docstring is from isn't imported.

for plotting, we want to be careful adding plt.show everywhere, I think. It can make for weird interactions when the user tries to use it outside the function, and also if functions get called repeatedly (for example, when we build plots with multiple plot calls). Some modules have a show=True/False and then call plt.show accordingly, which is a possible option. In the end it comes down to what modes we support, and we don't want to make other modes worse to use.

I agree that adding plt.show could be problematic since it would display the plot twice in a notebook (I think). A kwarg may be helpful for those you prefer ipython (or what @nschawor prefers to use, I can't remember what it was).

Anyone else's feedback from exploring 1.0.0rc is welcome here!

tom · Answer 1 · Tue Oct 27 2020 12:31:51 GMT+0800 (China Standard Time)

Small / quick follow ups:

dataframes

yeh, for the DFs, it feels like it would be nice to have the option of separate / combined. I can definitely imagine, for example, wanting a restricted DF of just the cycle points, to try things with. Maybe there could be some useful helper functions for combining / extracting sub-DFs (so they wouldn't have to be combined)? The worry though is to end up with too many ways to do things, so we should prefer one approach - which is perhaps somewhat simpler with the 1 DF approach?

doctests

Hmm, yeh. As far as I know, we have doctests set up in the current 'standard' way, which means you are assumed to have imported the function you're trying to use. I think it might be a bit weird to explicitly import every function we demo - I don't think other modules do this (?). It makes every every example longer, and it makes us have a non-standard approach, which means we have to remember to standardize this - and it's the kind of thing we'd want to be the same across all modules.

plt.show

Yeh, so tbh, I'm a little hesitant to add plt.show, even propogating through the API. It feels potentially error prone / adds burden. In the Ipython case, the user can call plt.show after the function call, and it works right? It's unclear how many users would want this built in more (I haven't heard it come up before), and the simpler approach is to keep it as is.

Ryan Hammonds · Answer 2 · Wed Oct 28 2020 02:36:52 GMT+0800 (China Standard Time)

Dataframes

Right now (in #75) the extra "sample_*" columns are included by default when using compute_features. Those rows can be dropped using return_samples = False. In the case where you want to split these columns out, a one line pandas call could accomplish splitting the dfs (I think, I'll need to test this to confirm):

df_samples = pd.concat([df_features.pop(col) for col in ['sample_trough', 'sample_rises', 'etc']], axis=1)

I ~~can wrap~~ wrapped this into a utilitity function.

Doctests

Some packages do this (scipy's curve fit does), and some do not (numpy's zeros).

I also see @nschawor point and it makes sense that one could copy/paste the doc examples expecting them to run. I think this kinda goes along with the idea of a single df instead of two. That is, we should make the bycycle as easy to use/learn as possible, keeping python beginners in mind.

plt.show
~~Yeah, I have the same hesitation. It could be something like: check if jupyter is installed/running, if not, use plt.show. But I agree that may be complicated to implement cleanly.~~
This was actually very easy to fix (#78).

tom · Answer 3 · Thu Nov 26 2020 05:54:01 GMT+0800 (China Standard Time)

Main ByCycle notes are addressed.
Note that plt.show & doctests are broader (relate to all modules), and will be considered for overall updates.