FR: Topics (alternative to branches)

Question

FR: Topics (alternative to branches)

noahmayr opened this issue 2 months ago · comments

Is your feature request related to a problem? Please describe.

There have been several discussions, both on discord as well as spread across github issue and discussion comments (e.g. #2425 (comment)) about a potential jj feature called "topics". They are somewhat related to same concept in mercurial, but their exact implementation and behavior in jj have yet to be decided.

According to a post linked by @arxanas below, there's a disconnect between the most dominant mental model (1) of branches and how they actually work in git/jj (3)

Having an implementation that's actually associated with all changes of a ~~branch~~ topic instead of just the head could help bridge that disconnect.

Describe the solution you'd like

As of now, jj has branches which while having the same name as git branches do not behave the same. As jj has no notion of an "active" or "checked out" branch, the head of the branch is not automatically advanced to new commits (see #2338).

The core difference between a topic and a branch is that branches only ever point to the last revision on that branch while topics would be more like a marker/metadata on the revision.

The current consesus is, that topics would be "infectious", meaning new revisions descending from a topic's revision automatically become part of that topic as well.

They would also likely be jj's model for integrating with git, while the existing branches could be renamed to bookmarks.

However there are still open questions:

Can revisions belong to more than one topic
Can revisions not belong to any topic, or would they belong to a special unnamed topic (the latter would be in line with how the working copy is also just a revision like everything else)
What revisions can be part of the same topic:
- No constraints, a topic is just an arbitrary set of revisions
  
  Can be very powerful, especially for a native jj backend down the line.
  Even in the git world this would allow working on several changes related to one bigger topic and submitting them all as independent PRs for review (or multiple dependent ones if changes have common ancestors). Basically turning a tree (or multiple if disjoint) of changes into an ideal set of independently reviewable units.
- The revisions need to have a single head they are all ancestors of
  
  While the option above is most definitely very powerful, mapping changes to the graph to git (which we would be doing for a while still) could become quite complex (we'd have to keep the same branch on a given head when it's updated or new revisions are added, removed/merged heads would potentially render ongoing pull requests invalid, etc.)
  
  This maps more cleanly to branches with the git backend, since we can always connect the topic to it's corresponding branch
  
  Alternatively we could limit git export only to topics fullfilling this constraint and show a warning for other topics.
What workflows can we enable with topics, that we would not be able to with branches?

Potential usage:

Basic CRUD to relate revisions to topics:
- jj topic set my_topic -r 'trunk()..@' to set the topic my_topic on those revisions (removing all other topics currently associated with them)
- jj topic add my_topic -r 'trunk()..@' to add the topic my_topic on those revisions (keeping all other topics currently associated with them, this assumes that revisions can have more than one topic)
- jj topic clear -r '@' clears the revision's topics
- jj topic remove my_topic -r '@' removes a single topic from a revision (assumes multiple topics per revision)
Integration into existing commands:
- jj new --topic my_topic to start a new revision that is only associated with my_topic
- jj new --no-topic to not inherit the ancestors topics
- jj git fetch could map remote git branches to topics, starting with trunk() and then adding marking the remaining revisions trunk()..<branch head> with a topic named like the remote's branch.
- jj git push --topic to push the topic as a branch, depending on other flags/configuration this could either try to create as few branches as possible for code review or create a single branch for every revision (kind of like graphite.dev's stacked reviews). When creating multiple branches, instead of naming them push-<change_id> (like jj git push --change) we would use names based on the topic <topic>-<change_id> (see #1415)
- jj log -r 'topics([...])' and jj log -r my_topic to show revisions of that topic
- jj abandon --topic my_topic drops all revisions related to that topic
Future commands for managing PRs from jj
- jj github push --topic / jj gitlab push --topic or my prefered variant jj submit --topic which would automatically create the necessary branches and PRs on the forge you use. Ideally managing branches in jj would not be necessary at all. (see #485)

Describe alternatives you've considered

For interop with the git world, #2338 would be an alternative for being able to work with branches more effectively with possibly further changes down the line making jj branches work more like git branches. However I think topics could provide a more idiomatic jj approach while still providing great interop with branches.

Additional context

This is meant more as a meta issue tracking progress across several different aspects of how topics would integrate into jj. Based on feedback additional use cases might be added or the current ones may be refined further. If specific issues are opened for any individual use case, those will also be linked.

Discord discussion

Khionu Sybiern · Answer 1 · Wed Apr 03 2024 04:10:24 GMT+0800 (China Standard Time)

Could we rename the issue to have a little more context? Maybe "alternative branch story: topics"

icxc12 · Answer 2 · Mon Apr 15 2024 04:53:35 GMT+0800 (China Standard Time)

Recently also looked into jj new --branch br and found the discord discussion to be very helpful. Was also thinking revsets mapped nicely onto branches and was happy to be redirected here (thanks Ilya/Noah).

Here are some thoughts on your open questions:

Can revisions belong to more than one topic:

Think this is pretty useful. For example, if you start to work on a topic, then switch to a new topic based on work from the previous topic, probably want the original revision in both topics.

Can revisions not belong to any topic, or would they belong to a special unnamed topic:

~~Feel that having a model that diverges from working copy is fine here -- simply because do not always want to be working on/thinking about topics -- only want to use when it is relevant.~~ Edit: This attempted to convey what @necauqua says below, but it’s conveyed better there. Just read that instead.

What revisions can be part of the same topic

Would be curious to hear what others think on this. Do think that there is another option that a topic can include a section of the revset dag (i.e. not just flow in one direction). This would fit nicely with Ilya's suggestion (within the current branching workflow) to be able to do jj new --branch br both "up" and "down" for existing commit id prefix br.

What workflows can we enable with topics, that we would not be able to with branches

The big one for me is just not having to keep track of names of prefixes. Also added benefit of using revsets is you would get all the benefits of revsets in topics (which you currently do not get in branches). Do think it is should not attempt to adhere to Mercurial's topic extensions (for example, basing topics on branches as opposed to viewing topics as a branch alternative) in a way that would compromise git branch interop.

Anton Bulakh · Answer 3 · Mon Apr 15 2024 06:32:25 GMT+0800 (China Standard Time)

Would be curious to hear what others think on this.

I ~~was~~ am strongly on the side of unconstrained topics, git interop could be dealt with, but topics just being a list of string 'tags' (not git tags) on each commit in jj metadata is both simple and powerful imo.

Opposed to git branches, which are defined as a pointer to a head from which you manually walk back to root to have an idea what the branch includes - it's hard to quantify, but topics feel like they fit better with the jj model, and the infectiousness fixes the issue with branches not advancing, while just making the branches advance feels like a step back for some reason.

My answers to other questions:

Can revisions belong to more than one topic

Yes 🤷

Can revisions not belong to any topic

Yes 🤷
Actually, this one is simple.
Say there is this special unnamed topic.
There are two ways it could be done - all revisions have it, or all revisions without any other topics have it.
The first one is useless, it's just all(), I only needed to clarify that out of pedantry, the question was about the second one.
Whose only purpose can be, I think, to have a way to find revisions that have no other topics - but that could be just a revset function, no need to implement an additional concept that's actually pretty weird if you think about it (a transient pseudotopic that exists when the list of topics is empty, and doesn't when it's not empty).

What workflows can we enable with topics

It's a nice fix to the branches not advancing issue, topics can be disjointed (if they were limited then I'd not see them as much different from branches, it'd be more of a "rename it so it sounds exciting" thing then). Again, thinking about them as each commit having a list of string tags (not git tags) attached enables arbitrary tagging setups to be invented by people

Matt · Answer 4 · Mon Apr 15 2024 13:36:14 GMT+0800 (China Standard Time)

FWIW, I strongly agree with the use case of topics. A while back, I joined a session where some people were curious about jj and explained it to them, and the biggest feedback that I got was "why does jj punish me for attempting to use my git workflows" (WRT there being no active branch).

However, I think the problem is that different people want different things, and I think we need to acknowledge that no-one is necessary wrong. One thing we may want to consider is to, rather than prescribing our own opinions upon the user, making topics themselves configurable (but have a reasonable set of defaults). For example:

When you create a new commit, does it:
- Stay on the old commit (on deletion: do nothing)
- Get copied to the new commit (on deletion: do nothing)
- Move to the new commit (on deletion: move to parent)
Is a topic unique (not valid for the "copied to new commit" mode)

I think that the biggest problem with an approach like the one I just described will be conveying that to the user. With the things above, there are 5 different configurations you could create for a given topic. I can see potential value (with different use cases) for several of them. For example:

Unique, don't move: See #3482 - This is useful to create aliases for given commits. It's also useful to associate with a gerrit commit, for example (crrev.com/c/123)
Non-unique, don't move: Arbitrary tags you could apply to commits. I've seen requests for this so that you could come up with a tag that you can exclude from the default revset, for example
Copy: See other people's comments in this PR
Move, unique: This is good for anyone who wants to replicate the design of git branches. This is precisely what the people I got feedback from wanted.
Move, not unique: Can't think of any use cases off the top of my head.

I think that even if we don't make topics themselves configurable, we should at the very least make it configurable on the backend level, so that when someone wants another one of these things, the work is then trivial.

Philip Metzger · Answer 5 · Mon Apr 15 2024 23:53:18 GMT+0800 (China Standard Time)

I very much agree with @necauqua assessment of topics and consider them pretty much additional metadata on a commit.

However, I think the problem is that different people want different things, and I think we need to acknowledge that no-one is necessary wrong. One thing we may want to consider is to, rather than prescribing our own opinions upon the user, making topics themselves configurable (but have a reasonable set of defaults). For example:

When you create a new commit, does it:

Stay on the old commit (on deletion: do nothing)

Get copied to the new commit (on deletion: do nothing)

Move to the new commit (on deletion: move to parent)

Is a topic unique (not valid for the "copied to new commit" mode)

I think that the biggest problem with an approach like the one I just described will be conveying that to the user. With the things above, there are 5 different configurations you could create for a given topic. I can see potential value (with different use cases) for several of them. For example:

Unique, don't move: See FR: Convenient names for changes #3482 - This is useful to create aliases for given commits. It's also useful to associate with a gerrit commit, for example (crrev.com/c/123)

Non-unique, don't move: Arbitrary tags you could apply to commits. I've seen requests for this so that you could come up with a tag that you can exclude from the default revset, for example

Copy: See other people's comments in this PR

Move, unique: This is good for anyone who wants to replicate the design of git branches. This is precisely what the people I got feedback from wanted.

Move, not unique: Can't think of any use cases off the top of my head.

So supporting these use-cases should be trivial if we allow arbitrary metadata on commits, which probably should be separate feature from topics which use a subset of the metadata to create "virtual branches".

icxc12 · Answer 6 · Tue Apr 16 2024 02:11:46 GMT+0800 (China Standard Time)

if we allow arbitrary metadata on commits

Was thinking about this as well because branches are currently a HashMap<String, RefTarget>, where RefTarget is effectively a CommitId. Had you given any thought as to where you might want to keep metadata (the commit struct in backend seems like an option, but saying this as someone who is still very new here)?

Anton Bulakh · Answer 7 · Tue Apr 16 2024 03:34:52 GMT+0800 (China Standard Time)

We have jj-only commit metadata storage for change ids and a list of predecessors, maybe other things I'm not remembering - seems obvious to just chuck a topics: Vec<String> field there

Also by the way operation objects actually do contain tags: HashMap<String, String> for arbitrary metadata. Currently those are only used to store command args to be shown in the oplog.

~~Although when I used them to mark snapshot operations Martin refactored that into a separate field - so I guess generic tags thing is not even needed as you could always just add a field directly.~~
Ok forget that, I think those could be useful for custom backends to do custom stuff without changing the upstream storage format.

Anyway my point is that actually implementing the "list of string (non git) tags on every commit" metadata thing is like super easy actually. And then have commands to CRUD them, revset functions to query them, and maybe something about indexing that I never looked into for "querying them" to be fast (that last part prooobably the hardest?. 🙃 ).

The harder part is arguing about the design here :)
Like I actually think a world where jj has no branches but topics (which are truly a jj concept as we've described above) map to one/multiple git branches with some rules is very interesting.

icxc12 · Answer 8 · Tue Apr 16 2024 04:06:32 GMT+0800 (China Standard Time)

Anyway my point is that actually implementing the "list of string (non git) tags on every commit" metadata thing is like super easy actually.

Thanks this is helpful. Also provides incentive to look into operations more thoroughly.

map to one/multiple git branches with some rules is very interesting

This is the part that still confuses me. Can you explain a bit more at the design level how git interop should work with ~~topics~~ “unconstrained” topics?

Edit: was specifically interested in interop with “unconstrained” topics.

Anton Bulakh · Answer 9 · Tue Apr 16 2024 05:53:23 GMT+0800 (China Standard Time)

The simplest thing would be to only export those topics that do follow the constraints, and for others log hints based on some heuristics or something.

If there's a config switch to flip those hints into hard errors - well that just made topics constrained :)

Another approach is this - given a set of commits that are marked by some topic, export every head (that is, a commit that's not a parent of any other commit in the set) as a separate branch. For topics that follow the rules this means a single commit will be marked with a branch, and for various disjointed/non-standard ones we could log hints and export multiple branches with some name pattern. Or, again, a config switch that just makes it so that if there's multiple heads we don't export anything or get a hard error - basically turning this into option 1/constrained.

Both of these approaches have been mentioned in discussions here/on discord.

One thing the above does not mention is importing - say git has some branches (e.g. fetched from a remote) and we want branchless-jj-with-topics to see those as topics.

There are two similar approaches I see here:

Mark pointed commit and its ancestors upto root with the topic, following the definition of git branch, as for example github shows a list of branches that "include" a commit
Mark pointed commit and its ancestors upto trunk, following what a lot of people think about when reasoning about feature branches

Or maybe we can mark a single commit that the branch points to, that actually does work too, with the above export method (exporting the heads specifically) it's kind of idempotent?.
And then when you jj new that commit the topic gets expanded to the child effectively advancing the branch, which was the point.
And say some commits where added to the git branch on remote and you fetch - if the topic already existed I guess you can mark all the commits "between" those that were already marked and the newly pointed to one.

Joy Reynolds · Answer 10 · Tue Apr 16 2024 06:34:46 GMT+0800 (China Standard Time)

a potential jj feature called "topics". They are somewhat related to same concept in mercurial, but their exact implementation and behavior in jj have yet to be decided.

This is supposed to explain the problem to solve, but it doesn't. Can you expand on the problem definition without referring to a VCS?

Colton Donnelly · Answer 11 · Fri Apr 26 2024 14:04:29 GMT+0800 (China Standard Time)

i'm personally interested in topics, and think that there can be some really neat tooling that's compatible with stacked diffs via this behavior specifically:

Non-unique, don't move: Arbitrary tags you could apply to commits. I've seen requests for this so that you could come up with a tag that you can exclude from the default revset, for example

imagine a command jj github pr create <topic>. this command creates a git branch of the same name as the topic, duplicates each commit in the topic, and rebases/merges them to be on top of each other within the git branch. jj github pr update <topic> might then perform a 'restack' (inserting/updating any git commits in the branch as necessary) before resubmitting the branch to the remote.

maybe this is better as 3rd party tooling, but nonetheless such behavior is unblocked by topics - just sharing my 2 cents on how this might improve my own workflow

Waleed Khan · Answer 12 · Mon Apr 29 2024 02:36:41 GMT+0800 (China Standard Time)

Pasting my comment from #3505 (comment) as I think it's also relevant to this discussion (particularly that I think Git branches satisfy multiple disparate workflows — we should consider how topics address those workflows):

We could consider this from the perspective of how topics intuitively work (/should work), and port the behavior to branches somehow (or change the jj model, use topics natively, and import/export branches somehow).

Some number of commits belong to a topic, and abandoning one of those commits doesn't automatically abandon the whole topic.
- If there is a parent commit in the same topic, then the imaginarily-exported Git branch would probably be repointed to that parent commit as the new topic head commit.
- If not, then there are no more commits in the topic (probably?), so the topic/branch would be deleted.
- Empty commits being skipped behaves the same.
This also answers the jj split question: a split commit surely has both of its successors join the same topic (by default), so then the Git branch would point to the new topic head, which would be the child commit.

The confusing cases from the implementation perspective are when multiple branches point to the same commit, which doesn't exactly have a topic analogue.

I would say those cases are the exception. In such cases, branches don't implement the "feature branching" model — they implement something else that we should consider entirely separately. I think there are two main cases:

When you create a new feature branch "off of" another branch, the Git implementation requires you to create the branch first, and only then commit to it. I think it's actually pretty strange that they didn't collapse it into a helper operation. Whom does the intermediate state benefit?
- Consider the git checkout -b command, which many people (including me) use — why not git commit -b?
- I experimented with this workflow in git-branchless (git record accepts --create) and I think it's perfectly fine. Maybe a little better because you have to think about less ambient state, but a little worse because the first commit to a branch is treated differently than the later commits (assuming that you don't pass the same branch name to each commit operation).
- In jj+topics, it would probably be even easier. If you make a commit in Git to the wrong branch, then you have to 1) rewind the old branch and 2) create the new branch. With topics and one-topic-per-commit, then you could actually change the current commit's topic, in one operation, which is the operation you were trying to logically do anyways.
- The sliding behavior would be essentially irrelevant here.
When you're using long-lived development branches (like stable + devel).
- This is quite different than feature branching. It handles the case of merging in changes, rather than branching out changes, and I think it makes perfect sense to use different workflows for the two.
- Git happens to rely on the same auto-moving behavior of branches to handle both. But these branches are a lot more like tags/pointers to a part of the commit graph than feature branches/topics.
- The sliding behavior is not really relevant for the merge operation itself. It becomes relevant for consumers when they want to consume the newly-merged changes (i.e. jj git sync). Then the sliding behavior kicks in, and works only if there are still some unmerged commits on that branch. Otherwise, the merged branch gets slid onto main and sticks around undesiredly.

When you consider the sliding behavior for the feature branch workflow only, it's clear that it doesn't really add value by itself; it's a hack to work around the lack of principled feature branch tracking available in Git.

Waleed Khan · Answer 13 · Mon Apr 29 2024 02:48:01 GMT+0800 (China Standard Time)

To motivate "topics" more, as @joyously points out that there's not much detail in the thread, here's a poll (@jvns 2024-01-06):

poll: how do you think about git branches? (I'll put an image in a reply with pictures for the 3 options)

as with all of these polls obviously all 3 are valid, I'm curious which one feels the most true to you

(59%) 1. just the commits that "branch" off

(22%) 2. the history of every previous commit

(16%) 3. just the commit at the end ("branch = pointer")

(3%) other / show results

· 1,966 people · Closed

Notably, a majority of people don't think of branches in terms of how they're actually implemented. This leads to impedance mismatches in some workflows when users try to rely on Git to infer the commits that "belong" to a branch, when it turns out that the concept is not always usefully defined.

For example, there is no way in stock Git to rebase only the commits in a single branch in a stack: with git rebase, you have to either explicitly define the start of the range to rebase (i.e. look up the "parent" branch manually and provide that) or use the implicit default (calculate the merge-base and use that as the start of the range).

Topics are a possible solution that actually matches the typical user's mental model and workflows.

I'll also suggest that having a "currently-checked-out branch" is one more piece of global contextual state that the user has to keep in mind. It might be that there's a pleasant solution to reduce that complexity (but I'm not sure if topics provide it or not).