PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.

Home Page:www.pecanproject.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve PR Labeling

allgandalf opened this issue · comments

Bug Description

We recently implemented automated labeling of Pull Requests in #3199, #3204,

This was with very limited scope in mind, and by looking at some comments in the review from @infotroph #3204 (comment), #3204 (comment) I am of the opinion that we need to improve the labels to better define the scope of the Pull requests,

The file for the labels is : https://github.com/PecanProject/pecan/blob/develop/.github/labeler.yml , no need to update automation just the labels. :)

If someone could first propose over here the file structure we should follow and some basic implementation details then it would really be helpful for us

Adding good first issue label, as this is a beginner friendly issue

Also, asked for help on slack: https://pecanproject.slack.com/archives/C06E3309CJK/p1709884113573149

I've been brainstorming about our PR labeling and came up with some ideas to tweak our process. Thought I'd share them here :

  1. Refining Labels: How about splitting 'Documentation' into 'User' and 'Developer' docs? It could help us quickly spot the focus of changes.

  2. Sub-Categories: In sections like 'Models' or 'Modules', sub-categories might be a game-changer. For example, 'Models' could have 'atmosphere' for file data.atmosphere` and similar sub-labels.

  3. Language-Specific Labels: Adding labels for R, Fortran, bash etc., could streamline reviews, especially when changes are language-specific.

  4. Smarter Automation?: While initially said no changes to automation, I'm wondering if it's worth exploring a system that picks up keywords from PRs for label suggestions.

Let me know what you think. If we're on the same page, I can get started on these updates.

Here's how I want to proceed with the labeler.yaml:

'Documentation/User':
  - book_source/**
  - documentation/user_guides/**

'Documentation/Developer':
  - CONTRIBUTING.md
  - DEBUGGING.md
  - DEV-INTRO.md

# Introducing sub-categories under Models
'Models/Climate':
  - models/climate/**

'Models/Biophysical':
  - models/biophysical/**

# Adding a new label for Configuration changes
'Configuration':
  - '**/*.config'
  - setup.sh

# Specific labels for different programming languages
'Python':
  - '**/*.py'

'R':
  - '**/*.R'

'JavaScript':
  - '**/*.js'

These changes aim to make our labeling more intuitive and aligned with our project's structure. Let me know what you think! Also, as an addition to the Documentation part. Like, the book-source already has details about changes whgich will be made to book source. Similarly we can maybe move DEV-INTRO.md IN 'DEVELOPER' documentation and README.md within 'USER' guides.

Insights @GandalfGwaihir ??

@infotroph , if you can find time to review this proposaal, overall I find it useful and what @Sweetdevil144 suggested here is really good I guess, but would love to hear what you have to say :)

To define how labeling should work, we should start by agreeing who these labels are for and what they need them to do. Here are a couple user stories that I'm picturing -- please push back if you think these aren't right.

  • "As a first-time contributor to PEcAn who has found a small bug in [module], I want to see if it is fixed in a currently-open PR before I open my own, so that I can avoid duplicating effort"
  • "As a regular contributor to [component] of PEcAn, I want to easily see whether a PR touches [component], so that I can be responsive where my reviews are needed without being overwhelmed by notifications about modules I do not own"
  • "As a maintainer who works across the whole system, I want an overview of what components have changes pending, so that I can group my review effort and make sure that people working on related systems are talking to each other as they work."

For these particular three users (who, to be clear, I just made up!), I suspect the right labeling granularity would be something like individual models (Many people only use PEcAn with one model, so they'll pay attention to the sipnet label but not basgra or vise-versa), a few sub-categories of modules (something like "meteorology", "data assimilation", "format wrangling", ... ?), and single broad-brush labels for topics like "build", "CI", "documentation", "Docker", "web". But my understanding of the labeler patterns is that they need to correspond to paths that actually exist within the directory, so I'm not immediately sure how to define the subcategories other than by listing all the packages that should get that label.

I don't see a need for language-specific labels -- to a first approximation, PEcAn is in R and the exceptions are in places that will be ~obvious from the other labels (e.g. if we had a PHP label it would appear in basically the same PRs labeled web).

@Sweetdevil144 your thoughts on ^ ?

I suspect the right labeling granularity would be something like individual models (Many people only use PEcAn with one model, so they'll pay attention to the sipnet label but not basgra or vise-versa), a few sub-categories of modules (something like "meteorology", "data assimilation", "format wrangling", ... ?), and single broad-brush labels for topics like "build", "CI", "documentation", "Docker", "web".

I agree with this. Splitting labeling according to sub-models/sub-modules would help a User determine the scope of modules/models affected by a particular PR.

other than by listing all the packages that should get that label

That would be the case for now as Regex-based Path Matching would help in determining individual paths. (But it would be entirely manual, that is, defining which specific sub-modules/sub-models to be implemented, maybe any other approach to optimise/automate it?)

I don't see a need for language-specific labels -- to a first approximation,

That works fine for me. Also, an increased number of labels would surely overwhelm a beginner contributor.

I know you're busy with your GSoC project @Sweetdevil144 , when you find time can you give me template of how the labelling should be now after all this discussion ? I'll come up with a PR for the same by next month

Hey. Sorry for late response. I think the only changes that can be done should be

  • The Documentation splitting into Dev-Docs and User-Docs
  • Splitting modules and models into each separate paths.

It would go something like this :

# Documentation divided into User and Developer specific
'Documentation/User':
  - book_source/**
  - documentation/user_guides/**
  - README.md

'Documentation/Developer':
  - CONTRIBUTING.md
  - DEBUGGING.md
  - DEV-INTRO.md
  - documentation/developer_guides/**

... Rest of Blocks

# Model-specific sub-categories
'Models/sipnet':
  - models/sipnet/**

'Models/basgra':
  - models/basgra/**

# Modules with defined sub-categories
'Modules/Date-Atmosphere':
  - modules/data.atmosphere/**

'Modules/Data-Assimilation':
  - modules/data_assimilation/**

Should this work fine?