datalad / datalad

Keep code, data, containers under control with git and git-annex

Home Page:http://datalad.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Brainstorming: path to DataLad v2?

mih opened this issue · comments

This is not simply about a datalad v2. This is about a strategy to reorganize the DataLad ecosystem, of which datalad, but also its extensions are only one gear in the box.

The primary aim is to create more homogeneous modules, with streamlined dependencies. Modules that decouple code bases that evolve at different paces (more stable foundation, faster iteration on prototypes and focused applications), have disjoint dependencies (not just installation, but also how much code needs to be imported to be able to use a particular piece of DataLad), have different test demands (network operations with specific services vs local code).

One (possibly more) scenario(s) will be posted below. They should be discussed regarding their individual merits and problems. This issue is about collecting idea, not about making decisions.

Please do not use this issue for discussions -- github issues don't work well for that. Rather post any alternative/derived ideas (longform) into a dedicated response. If we keep individual ideas self-contained, and also updated over time, it will be easier to refer to them and also refine them.

To communicate appreciation or opposition for individual concept, please use the "reactions" interface.

Factor out a fundational package (FP)

The purpose of such a package would be to serve as a foundation to build DataLad-powered libraries and apps -- implemented in Python. This package is:

The development procedures should be suitable for creating a package that radiates confidence to build 3rd-party code on

  • mandatory code-reviews by two or more people
  • release when "done"
  • benchmarks
  • mandatory "full" (something like >95%) test coverage
  • detailed documentation targeting developers
  • PRs need to be comprehensive (code, test, documentation), all at once

"Phase-in" process

The FP would be introduced gradually, by shifting and elevating code from other projects. Pretty much never would from-scratch implementations be introduced to the FP directly.

This will make sure that code has seen some usage, and some "application" code already exists downstream to illustrate concrete usage patterns, and immediately justify a code addition to serve dependent packages.

After being established, code can flow to the FP from any source, and the source project sheds that code and adds a dependency to this FP, once a release was made.

Envisioned development trajectory for "datalad/datalad"

With respect to a v2 concept, code would flow out of the present main datalad package, and it would gain the dependency on FP. It would continue to be the main entrypoint.

If and when we would approach a modernization of the CLI, we would need to reevaluate the role again. It could then become an application/meta package:

graph TD;
    FP-->datalad;
    FP-->datalad-cli;
    datalad-cli-->datalad;

or continue as a provider of assorted functionality that is exposed via different API (hence have its own CLI implementation stripped).

graph TD;
    FP-->datalad;
    FP-->datalad-cli;
    datalad-->datalad-cli;
    datalad-->datalad-gooey
    FP-->datalad-gooey

Pros

  • starting an FP from scratch has the benefit of laying out clear rules from the start that contributions have to follow, and all code matches them
  • people have expressed discomfort re the complexity of the datalad package, a bottleneck that can be avoided with a clean setup
  • zero impact forced onto present users of datalad. The main package can make independent decisions how to deal with changes, whether or not to grease transitions, or to provide traditional interfaces (forever)

Cons

  • the two-reviewer-rules is important for creating a useful (consensus) library. However, it will be hard to make a reality. @yarikoptic and @mih can do that, but when they do development themselves at least one qualified additional reviewer must be found.
  • introducing additions to the FP does not simultaneously improve the main package (just like with datalad-next). Demonstrations of impact (if applicable) would need to come as a companion PR to the main package (that diverts the dependency to a PR branch). This is cumbersome.

Discussion

  • ...

Updates

  • the originally employed name datalad-core has been replaced by "foundational package" (FP) to reduce the ambiguity wrt the many purposes the label "core" has been used in the past

An effort towards a foundational library has started at https://github.com/datalad/datasalad