py-mine / mcproto

Library providing easy low level interactions with minecraft protocol

Home Page:https://mcproto.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Decide on final versioning structure

ItsDrike opened this issue · comments

This issue currently block most of the API development of the project

Originally, I intended this project to have relatively simple versioning, by simply splitting the packets folder into several version subdirectories like v757, holding all of the packets for that specific version. This was the intuitive way to handle this, that didn't really require much thought.

However after actually designing this out, it turned out that this kind of versioning brings a few issues in:

Code duplication

Obviously, having version folders with the packets for corresponding minecraft versions will mean the content in these folders will unavoidably have to be duplicated. At the very least, this duplication would be structural, i.e. in the file structure, so each version would have say a folder for handshake state packets, login state, ... and each of those folder would then have files containing the actual packets. This alone is pretty annoying, but it's not the worst thing about it.

As to the actual code in these files, there's 2 choices we can take:

  • Copy-paste method

    The obvious thing that comes to mind is to simply copy-paste the packets into version folders. However this means a complete code duplication (and therefore probably also test duplication), which comes with the common code duplication downsides, that is, if the code needs to change for whatever reason (say there's a bug), it now needs to change everywhere.

    This leaves you having to make the same change in potentially dozen of files, which are mostly the same, but might have slight differences that you potentially need to pay attention to. This gets super annoying very quickly, it's hard to review, and can lead to a lot of issues where you don't pay attention to the slight differences between the versions and miss something.

  • Inheritance method

    This method reduces a lot of direct code duplication, by utilizing inheritance. Simply, we can just have all of the packet classes in a new version inherit from the packet classes from the previous version, and only override the methods that need changing (if any), to handle the differences between protocol versions. This would result in code like this:

    from mcproto.packets.v757.handshaking.handshake import Handshake as OldHandshake
    
    class Handshake(OldHandshake):
       ...

    On one hand, this avoids a lot of code duplication, but it brings in a whole set of other issues, related to this method quickly becoming a dependency hell:

    • Performance

      If all of the classes inherit from the previous version, the more versions we add, the less performant the new packets will be, as they'll have bigger and bigger MRO. I'm not sure just how big this performance decrease would be, but it's certainly there. It also becomes a pretty big import chain, as we're importing the class from previous version, which is doing the same from the previous, etc. Which means loading more code into memory. Overall, it's just not a great place to be in performance-wise.

    • Orientation in code

      Another big issue is that such a dependency chain would make it very hard to actually know what's going on in a particular packet class. It would mean that you'd have to manually follow the dependency chain down to the class where the function you're looking for was last overwritten, and check at how it works.

      The madness comes when that class is making calls to more inherited internal functions, or utilizing super to call the previous version of that method. At that point, you need to keep in mind what this function is doing, follow the chain to find the super call, and put that together in your mind, and worse off, that function can also be making a super call.

      It then becomes absolutely impossible to even just figure out what the code for that specific packet class would even look like when it's all put together, and very mentally straining to try and understand how the code works.

    • Typing issue: 1 `final` classes

      Using inheritance like this actually brings in another issue, this time, a typing related one. The issue is that we're marking the packet classes as @typing.final, which makes the type-checker understand that this class is ready to be used, and needs to contain implementations for any abstract classes (coming from the Packet class ABC). This is really nice, as it avoids us forgetting to implement some method, and immediately results in typing errors when the Packet ABC would gain another abstract method, etc. However the problem is that final classes aren't meant to be inherited from. This means that any time we use a final class as a base for another class, we'd need a type ignore.

    • Typing issue 2: Breaking LSP

      While the first typing issue isn't that huge, there's yet another typing issue, that's more fundamental. This one is about the "Liskov Substitution Principle" / LSP. Essentially, LSP talks about how a class that inherits from some other parent class should keep all of the parent's methods fully compatible, so if a function takes an instance of class X, it can be passed class Y that inherits from X without issues.

      However with this use-case, we'd be completely breaking this principle, since we can't guarantee that the child classes will be the same, due to potential unexpected protocol changes, that will need method overrides which might even have a different signature. What this comes down to is that the child packet classes won't actually be proper "childs", OOP-wise, and so this inheritance is fundamentally flawed. It would technically work, but it's really weird, and would likely need a lot of type-ignores.

    • Missleading isinstance checks

      Along with this comes the issue of users potentially wanting to run an isinstance check, verifying that some packet they received is actually packet of type X. Issue now is that if the packet they're checking would be say a Ping packet from version 758, and they'd be checking if it's an instance of a v757 packet, this check would pass, because our new packets inherit from older versions. This may, or may not cause a lot of issues to the users.

    Clearly then, the inheritance method isn't actually a very good choice either.

Versioning other things

Type classes

The minecraft protocol has some shared types between the packets, such as a type that holds NBT data, or a type holding Chat data. These, as anything in the protocol, are subject to change between different protocol versions. This means that we might need to introduce versioning not just for packets, but also for types. As an example, we can have mcproto.types.v757.chat.Chat class, representing the chat type.

  • Issues with inheritance based packets

    Unfortunately, versioning types causes even more typing trouble with inheritance based packet classes. The versioned packets would now need to use the appropriate versioned types, so a v757 packet should only use v757 type.

    This would however mean that since each packet inherits from the previous one, the type in the previous one would not be compatible anymore. This means we'd need to modify the signature of every function to change the type annotation to now only accept the new version of the type class. That is for all functions which relied on this type, which will at the very least include every __init__ for packets that used a custom type.

    This alone breaks LSP for every such packet class that had it's type overwritten, meaning we'll need yet more type ignores, and it introduces yet more inconsistency in our class structure, as these parent classes make no sense type-wise, they're just shorthands to avoid code repetition.

  • Copy-paste method packets would technically work well though

    If the packet classes each had their own rewrite, it would simply have to be adjusted everywhere, and it means these types will also have their own versioned rewrites for each version, meaning no typing issues at all.

Then again, these types probably shouldn't really change between the versions, and 1 implementation might even be sufficient, but the issue is that they might change, and if that does happen, we'd have to release a completely breaking change, suddenly introducing versioning for the types.

Connection classes??

What if something fundamentally changes between the protocol versions, and say writing varints becomes different, should we also version the connection classes? If so, it would mean having to maintain different, but very similar, codebases for all of the supported protocol versions, which I'm not a huge fan of.

While the way connection is handled is very unlikely to get changed, as it was like this for an incredibly long time, we can't actually be sure that it won't change...


Ok, so what now?

Well, a decision on how to proceed needs to be made, because before that, any further progress on implementing new packets is sort of blocked, as most of the new packets will already require some custom types, and without knowing whether to version them or not, a lot of extra work might end up being spent on something that'll need rewriting anyway, due to a different decision about how to handle this being made here.

Potential ways to handle this

  • Make the library only support a single (latest) version

    This would be the simplest and most elegant solution. No code duplication, no inheritance hell, and very simple, clean and readable codebase, that simply goes with the flow as minecraft protocol changes.

    However it isn't without issues:

    Library Versioning

    The library is currently using the semantic versioning model, which has 3 main version parts: major, micro, patch versions + optionally some extra optional info at the end. Semver standard states very specifically when should there be an update to which of these version components, that is, non-breaking bugfixes should be a patch version bump, new features should be a micro version bumps and only if there are some breaking changes should there be a major version bump.

    It's safe to assume that any protocol version bump would come with a major version bump, as even if the protocol versions were essentially compatible (which does happen), we're officially dropping support for the old version, which qualifies as a breaking change.

    However, what if we make a breaking change without a protocol version bump? Does that also mean a major version update? Semver says it should! However this kind of versioning could end up being pretty confusing to users, and it could be really hard to then understand what minecraft/protocol version is supported in say mcproto v8.4.2.

    The alternative solution would be to use the minecraft protocol versions as the major versions, and even on a breaking change, we'd just change the minor version. This isn't great, as many package managers, like say python-poetry or pipenv don't expect breaking changes from a micro version update, and so users depending on the library would need to specify the dependency in a very specific way, to make these tools recognize when it's safe to update mcproto.

    Another issue with choosing this versioning approach, with protocol version as the major component is that it would be very hard to move away from. Even now the protocol version is at v762, so if we wanted to move away, our next versions would need to have even bigger version numbers, which is super annoying, and could be very confusing as people might've gotten used to it being the protocol version.

    Maintaining older versions?

    If we do go this route, do we now need to consider whether to maintain older versions of mcproto, for previous protocol versions? i.e. for mcproto v757.5.2, we could fix a bug and release v757.5.3, while focusing feature development on the newever v758.X.Y versions. Of course, this assumes we'd move to the protocol version based project versioning.

    Without moving to this type of versioning, and staying at semantic versioning, we'd be facing a pretty big potential issue, that is, if mcproto v7.2.1 is for protocol version v757, and there's already mcproto v8.0.0 for v758, however a bug is found in the older version, that requires a breaking change, we can't just bump the major version as semver would tell us to, as 8.0.0 is already occupied.

  • Use git branches for different versions, each branch only having support for it's version
    • Essentially, this is just full code duplication, except this time it's not just code copied between files, it's between branches.
    • How to ship this?
  • Version the entire project, make mcproto.v757, mcproto.v758,... with very little being shared

    This would essentially be full code duplication of huge amount of code, with the obvious issues coming with code duplication.

  • Only version what needs to be versioned (packets, and perhaps types)
    • What if something more basic changes? Like in the connection?
    • Do we choose full code duplication in the individual versions, or inheritance (with typing issues)?
  • Something I'm not thinking of

My current opinion

I'm inclined towards just supporting a single version, and probably moving away from semantic versioning to a versioning scheme uses the packet version as the major component. We could probably even get away with versions consisting of 4 parts, protocol version + normal semver, not sure how well would tools like poetry play with this though, and it would mean there's pretty much no going back, as this would immediately make our major version a pretty big number, which would be very annoying if we go back and it could cause a lot of confusion.

As for maintaining older versions, I'd probably be open to bug fixes there, but I wouldn't carry features over, at least not in most cases. Also, these older versions wouldn't keep being supported forever, and there should be a time scheduler there.

The packet versions are mostly compatible, and so supporting only the latest version will generally not cause issues and will remain backwards compatible with the newer protocol versions. While there may be changes, it's usually purely to say, introduce a new packet, or a new type, however the general networking and structuring behind most packets shouldn't really change.

This means that the library should probably remain mostly compatible with the older protocol versions, even if it only supported a single (latest) version at a time, especially for simpler things, such as getting server status. The server could complain if we try to login from an unsupported (newer) version, however we can "trick" the server into believing that we're using the appropriate version, when really we're using a much newer one, and just continue anyway.

It's important to say that doing this could obviously cause issues and depending on how deep the interactions are, it certainly could cause problems. Like if this library would be used as a full bot account, that interacts with the world, as we might end up sending something that the server can't yet understand, even though it's already supported in the newer protocol spec.

Because of this, and because of the complexity of the other choices, I think the way forward will be to keep using semantic versioning normally, as protocol version updates usually aren't completely breaking updates and only consider doing something else if this approach would end up causing too many issues.

This isn't a particularly easy decision to make, but I really want this library to move forward with the development, and this issue is a major blocking point, which simply needs to be addressed somehow. As there wasn't much feedback from others on how to handle this (which isn't surprising as the library is still heavily WIP), I will just follow what I feel like will be the best, which for now is the above approach.

A suggestion was made by @LiteApplication to create a class decorator that could limit a type/packet to only a certain range of protocol versions.

With a proper game state class in place, it should be possible to implement a decorator to target a specific slice of protocols (500->700) and make it so that any code called from the game state class is the one that matches the game state class

— See the original message on Discord: here

This idea still needs some more bikeshedding, as it's not at all fully thought out, but I do like the sound of it. The way I see it, there's 2 options that I see for that kind of approach:

  1. Purely informative approach, with the decorator simply setting something like __mcproto_protocol_version_span__ dunder on the class, users can check it and make sure it matches the version they wish to be using.

    Perhaps mcproto could even contain some logic for this checking in places where the protocol version is known, such as in the client/server classes (once merged) and raising an exception in case of a mismatch.

    This will mean mcproto will still only focus on supporting the latest protocol version, however, it will be very clear which packets can be re-used for older versions, making it potentially easier for people to know what packets they might need to implement themselves if they need support for older stuff and just generally easier for maintainers to know and keep track of which versions our types support.

  2. Active approach, that somehow lets us maintain multiple versions of types/packets, marked for use on given protocol versions though this decorator.

    The previous suggested approach for this scenario (having the library support multiple protocol versions at once from a single mcproto release) was essentially just considering making directories for each protocol version, with full copies of the classes in each. This approach allows us to avoid a bunch of code repetition that would otherwise be necessary for the originally suggested approach and that's without the considered method of using base classes that include some currently shared logic which still wouldn't be guaranteed not to change though.

    Sufficed to say, I like this. It seems way more approachable and maintainable, but there are still things to resolve here:

    • Typing and runtime checking issues

      Like with the other approaches that I've mentioned, the issue with type-checkers appears here too. That's simply because we will still need to somehow maintain multiple versions of the same type/packet class, so if we have a method that expects a certain packet type, we'll need to annotate it somehow, do we just use a big union of all of these classes? But they might differ, maybe a lot.

      This is actually not only a typing issue, but even a packet reading/receiving issue, since isinstance checks are the general way that we sort out the packets that were received, but which class do we check for now, all of them?

    • Naming/File location issues:

      Currently, the packet/type classes have a simple naming scheme, that just reflects the name of that packet/type, as shown in wiki.vg. If we'll have multiple classes that handle the same packet, we will need to choose a different naming scheme to avoid running into conflicts. Will the protocol version simply be a part of that name? But it's a range of versions and it's likely to change (as new protocol versions come out, but don't change the existing packets/types, the supported range changes, does the name of these types change with that too? That'd make for a really bad UX). Should it just be the start protocol version of the range maybe, or simply something like v1, v2, depending on how many classes we have, regardless of the actual protocol versions?

      Additionally, where will these classes be defined? In the same files right next to the existing ones? But we often use files that contain a bunch of such classes, say 10, will they now need to also hold all versioned variants? Won't that quickly get out of hand? Should we instead make files for each type, or go even crazier, and make files for each version of each type? If these classes do exist in different files, it at least somewhat mitigates the naming issue, as they could now have the same name, but it's probably still not a good idea, as it'd be very annoying if someone needed to import both versions from the same place for whatever reason.

    • User experience

      We need to consider how this will affect the library users too, with multiple versions in place, the users will need to pick the proper class to send. Depending on how the naming is handled, this may not be that bad, but it's still far less convenient than the simple imports for these classes that we have now.

    • Version picking implementation

      I actually think that this is almost a resolved issue. Mcproto actually contains the logic to dynamically construct packet tables already, all we'd need is to make sure we collect all of the classes (so all versions). Then, while reading the packets (only reading, not writing, since with writing, the user actually initializes the class itself), we would need to also know the protocol version (another argument to the reading function) and then when picking the appropriate packet from the table to be returned, we just pick the type where the protocol version lies within the supported range.


I'd say even if we just did this with the 1st option, and kept it purely informative, it is better than nothing and it will lie within the decision I made to have mcproto only support the latest protocol version.

The thing is, while the 2nd option is certainly much more viable than the previously suggested options, it's still not great and clearly has its issues, not entirely unlike those I mentioned before with other approaches. Unless there's something I haven't thought of, this approach would inevitably worsen the experience of others using this library, so the question is, whether it's worth it. The mc protocol does change a fair bit with each new release and projects relying on mcproto might need that older version support, so maybe it is worth it, but it just comes with a lot of hassles for maintaining.

Regardless of which approach we will end up taking, the idea of the class decorator that defines the supported protocol versions is great, and should be implemented, even if only for informative purposes.

I'm dropping the priority of this issue from critical to high, since this was a resolved issue and it's no longer blocking pretty much all development. If option 2 is chosen, it will still mean significant changes, but probably not a complete restructuring.

The critical aspect of this issue was already addressed via creation of the system to dynamically collect all defined packets and use those for reading, which was the functionality that this issue was blocking almost any further development.