sktime / sktime

A unified framework for machine learning with time series

Home Page:https://www.sktime.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[DOC] 00_sktime_intro suggestions for improvement

Prakruthi12345 opened this issue · comments

Describe the issue linked to the documentation

Hi, I have gone over the introductory notebook (00_sktime_intro), and it was very informative! I have come up with a couple of suggestions for the notebook regarding clarity, conciseness and ordering of sections.

Suggest a potential alternative/fix

I have included my suggestions below:

  1. Reordering of sections and modification of headings: Clearer headings and a different ordering of sections might improve the flow of the notebook and make it easier for beginners to get a good grasp of the sktime basics.

a. “What is sktime”: Separate section at the top explaining what sktime is (unified, scikit-learn-like toolbox interface to multiple time series learning tasks). Currently there is a definition under contents and in the later sections.
b. “Need for sktime”: A section devoted to explaining the need for a package like sktime specifically. This information is currently present in section 3 (the package space for time series is highly fragmented, integrates the ecosystem, etc)
c. “Tasks performed with sktime”: A section devoted to explaining the tasks sktime is capable of carrying out. Currently this list is in section 2 (Forecasting, Classification, Regression, etc)
d. “sklearn interface”: section 1’s information can be moved to this section. Can include 3 subsections: “sklearn interface properties”, “supervised estimator interface” - including the 3 interface points + diagram, and “example of supervised learning on IRIS dataset” - including the snippets of code + explanations of what the code is doing
e. “Examples of sktime interfaces”: the bulk of section 2’s information can be moved to this section. Can include the table of tasks + links to the API. Can include clear subsection headings, e.g., “Example of forecasting on airline data”, etc.

  1. General formatting/structuring issues:
  • Contents section could be structured better: Could take out the sktime definition at the top and rename the section to “Table of Contents”. This section can detail the various sections outlined in point 1.
  • Quotation marks in the “Training data” phrase in the “supervised estimator conceptual model” picture should be formatted correctly
  • Remove the “above in code” line in section 1 and instead include subsection headings in the manner outlined in part 1
  • “Strategy pattern” in section 1 structured better: bulleted list/numbered list
  1. More information for beginners:
    Information about what sklearn/skbase is in specific (Simple and efficient tools for predictive data analysis, features various classification, regression, clustering algorithms, built on NumPy, SciPy, and matplotlib, etc)
    Information on the datasets used in the code examples
    Can include a few lines on what SVC and RandomForest are

  2. Cell outputs in the code examples could potentially be condensed: in the “classification” example in section 2, the shapes could all be printed in one cell using a dictionary potentially.

Thanks, and please let me know what you think!

Could you kindly share the suggestions here in the issue? google docs links expire, and it's good to have a permanent record of issues.

Sure, have edited the original post!

Thanks, and please let me know what you think!

I think these are excellent suggestions!

We could leave them here for someone to pick up, or you could give it a try to improve these?

I would suggest to work on separate sections in different PR, to avoid a large PR and allow uncontroversial changes to be merged quickly.

I can definitely give it a try! Will make sure to submit different PRs for the various sections!