Best practices for collaborative ML R & D: How to structure frameworks and collaboration

Question

Best practices for collaborative ML R & D: How to structure frameworks and collaboration

klieret opened this issue a year ago · comments

Kilian Lieret commented a year ago

Examples of challenges/discussion points:

Technological aspects:

How can we cut boilerplate and standardize interfaces so that people can focus on developing models without sacrificing "hackability". Pytorch lightning is a popular option for pytorch, but IMO the way it is laid out by default has its own challenges (and might lead to duplicated code)
How can we share results between the collaborators and bring everyone "on the same page" (for example using weights & biases)

Social aspects:

How can we make sure to move in the same direction without constraining ourselves? How do we keep everyone engaged in building a common framework and avoid people "branching off forever".
How do we balance more technical SW development work with model development? A lot of people want to focus on developing their model; few people want to work on framework issues. A good collaboration needs both.

I originally suggested this as a subtopic for #6 (doing open source). It also overlaps with #1 (packaging), #5 (fitting), and #19 (ML workflows for analysis). However, I think the challenges are very distinct because this targets development and R & D, rather than use in production/integration with other tools (for example, backwards compatibility isn't as big of an issue as is allowing for creativity).

Lindsey Gray · Answer 1 · Sat Jul 22 2023 20:49:09 GMT+0800 (China Standard Time)

This has a large overlap in themes with #19. Usefully different scope and kinds of requirements though!

Kilian Lieret · Answer 2 · Mon Jul 24 2023 23:41:40 GMT+0800 (China Standard Time)

Yes, I was thinking about this too, but the title of #19 led me to believe that its mainly about ML Ops and facilities (?).

Lindsey Gray · Answer 3 · Mon Jul 24 2023 23:42:59 GMT+0800 (China Standard Time)

User interface necessarily must deal with collaboration and frameworks.

Kilian Lieret · Answer 4 · Fri Jul 28 2023 00:12:48 GMT+0800 (China Standard Time)

Live notes

ML R & D Breakout session (Tuesday)

Present: Philip, Kilian, Richa, Raghav, Josue, Mike

Some of the questions that were discussed:

What frameworks do people use (lightning & friends)?

pytorch lightning

ML Flow might also do some things that lightning does

Onnyx for plugging in ML in other frameworks/model exchange

Dashboards (wandb & friends)?

ML Flow

Weights & Biases

Projects that were mentioned:

https://github.com/jet-net/JetNet: Interface for Jet datasets (combining different data sources).

Kilian: https://github.com/gnn-tracking/gnn_tracking

https://github.com/FAIR4HEP/cookiecutter4fair: Cookie cutter for data science

Conclusions:

Dashboards like (W&B / ML Flow) are a good way to bring people "on the same page" and compare/review/debug performance

Frameworks that are built around hooks and plugin/callback structure are a good way to allow extensibility without growing "Dinosaur classes. For example, lightning hooks like on_validation_epoch_ends allow you to write callbacks to do stuff at the end of an epoch rather than sublassing/modifying your class