jamescourtney / FlatSharp

Fast, idiomatic C# implementation of Flatbuffers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mapping flatbuffer value type structs to existing C# types

vvuk opened this issue · comments

For (large) convenience, it would be nice if we could annotate that some types defined in the .fbs should map to already existing C# types. For example:

struct Vec3 (fs_valueStruct) {
    x: float;
    y: float;
    z: float;
}

would be nice to write:

struct Vec3 (fs_valueStruct, fs_existingType=System.Numerics.Vector3) {

and have the schema compiler just use System.Numerics.Vector3 and not emit Vec3 at all. This would really only work for ahead-of-time schema compilation (as Vector3 won't have attribute annotations), and where the type already has identically-named fields; so probably low number of types that match this, but really common and useful ones. (Vector/Math types, GUIDs, etc.)

This is an interesting idea -- thanks for this! I think FlatSharp would likely need to validate that the layout of the struct was the same as the layout declared. However, I'm not sure if .NET makes promises not to change the layout of structs between versions (or even patches), so there would need to be some serious runtime validation to ensure that the layout is actually correct.

Out of curiousity, what's wrong with this approach?

unsafe int Main()
{
	CustomStruct s = new CustomStruct { X = 1, Y = 2, Z = 3 };
	ref Vector3 v = ref Unsafe.As<CustomStruct, Vector3>(ref s);
	return (int)(v.X + v.Y + v.Z);
}

[StructLayout(LayoutKind.Explicit, Size = 12)]
public struct CustomStruct
{
	[FieldOffset(0)]
	public float X;
	[FieldOffset(4)]
	public float Y;
	[FieldOffset(8)]
	public float Z;
}

Unsafe.As would work -- it's just a lot of casting that would be nice to avoid, given how clean code using FlatSharp types normally looks. There also may be more complex types (e.g. NativeArray<Vector3>/Vector3[]) that are harder to cast.

That's fair commentary. I agree that Unsafe.As can be cumbersome. I think the following syntax would work in principle:

struct SomeStruct (fs_valueStruct, fs_existingType="System.Numerics.Vector3")
{
    X : float;
    Y : float;
    Z : float;
}

The challenge will be validating at runtime that Vector3 is equivalent to the definition we've laid out. It might even be enough to validate that Vector3 is the same size as SomeStruct.

The thing that concerns me about this idea is that Unsafe.As is, well, unsafe. Today, if you use Unsafe.As on a struct that FlatSharp gave you, and it goes wrong, then that's not FlatSharp's problem. But if we expose this in FlatSharp, then there needs to be lots of confidence that the feature won't corrupt data.

Please don't interpret this as me saying that I won't add it or that it's a bad idea. I think it's interesting. I just need to noodle on it a bit more.

I've given this a small amount of thought, and I think this is going to be a "use it at your own risk feature". A rough implementation might be:

  • Marking it as "extremely unsafe -- you should test this yourself for any use cases". Some sort of scary annotation like fs_unsafeValueStructMaping:"Vector3".

  • We'd introduce a static initialization check for these types that happens at runtime and simply does a fast fail if the size of the mapping struct we evaluate at runtime is different than the size of the FlatBuffer struct. But otherwise, I don't think we should try to do anything clever because there are cases where structs are defined with overlapping fields, which gets messy.

  • Excellent documentation about the risks and drawbacks/risks of this feature.

I'll pickup looking at this next @jamescourtney

@jamescourtney I was thinking about this issue. Would the target alias type need to be available at AOT compile time for the RoslynSerializerGenerator to reference? If so, I was thinking if this could mapping/aliasing could be done at another/final CodeWritingPass? Or perhaps during the SerializerAndRpcGeneration step as that doesn't seem to perhaps a C# compilation for validation?

Another Option 1:
Alternatively, I would need a way to specify additional reference assemblies to the FlatSharpCompiler?

Another Option 2 (I think I like this if it will work, no changes needed):
Could I "recreate" stub definitions of my target types in another referenced fbs file, but not compile/use the generated C# when compiling later? For example, Unity has a UnityEngine.Vector3 struct. I suppose I could define:

// UnityEngineStubs.fbs

namespace UnityEngine;

struct Vector3 {
    x:float;
    y:float;
}

And then reference this definition in another fbs file I am processing with FlatSharpCompiler. However, when I then take the generated C#, I would not include UnityEngineStubs.fbs and instead satisfy type usage via referencing prebuilt Unity assemblies.

Hmm, I tried the last option and for some reason my type from a different fbs file was included in the generated C# for an including fbs file. This doesn't seem to be the case with the Includes sample so perhaps I'm missing something.

Okay, I can confirm at 6.3.3 that each fbs file got their own individual generated C# file. On main types from all referenced/included fbs files gets written to a single FlatSharp.generated.cs file.

Was this change intentional as the old behavior would be useful for me now :-)

Yes, this is quite intentional and is one of the big things in V7. Let me explain the rationale a little bit. In 6.X and before, all generated serializers are self-contained, so you get something like:

public class TableA
{
    private class GeneratedSerializer { ... }
}


public class TableB
{
    private class GeneratedSerializer { ... }
}

Each of those GeneratedSerializer classes is completely self-contained for its entire graph of dependencies, which is where the problem comes from.

So, for schemas like this, we end up with lots of duplicate code:

table A (fs_serializer) { Common : C }
table B (fs_serializer) { Common : C }
table C {
   // tons of fields
}

The code duplication comes from having two equivalent-but-distinct serialize/parse paths for C. As you can imagine, the CPU cache and branch predictor are less than impressed with this design.

So, in V7, things are switched to only emit one Serializer, one Lazy Parser, one Greedy Parser, etc per type. Types each have their own namespace (computed as the SHA256 hash of the type name). As a result, the FlatSharp compiler is now much more greedy about the code it generates. Since different FBS files may end up generating code for the same included types, the FlatSharp compiler now accepts multiple FBS files and just aggregates them all together before dumping all the code into FlatSharp.generated.cs:

FlatSharp.Compiler.exe --input "a.fbs;b.fbs"

That's the end of the explainer. If you're looking for usage, check out FlatSharpEndToEndTests.csproj, which uses the compiler with some includes.

Can you let me know what isn't working? Happy to help how I can!

Can you let me know what isn't working?

Sorry, everything is working. However, the change eliminates the ability for "Option 2" above, where I was going to define a UnityTypes.fbs file that would be included by other fbs files, but then never include the UnityTypes.generated.cs in the compilation and instead reference existing Unity assemblies for those types.

Could we accomplish the same by having a fs_external attribute that tells the compiler to reference but not generate?

Could we accomplish the same by having a fs_external attribute that tells the compiler to reference but not generate?

This is really easy for me to add. Let me know if you think this would solve the problem and I can throw it together pretty quickly.

This is really easy for me to add. Let me know if you think this would solve the problem and I can throw it together pretty quickly.

How would the current code generation/compile loop happen? Would we pass extra assemblies to FlatSharpCompiler similar to what I did for the Unity native array PR? Or could we avoid the need for references altogether at FlatSharp compilation time?

And thanks, I'll obviously take any help you are offering! :-)

How would the current code generation/compile loop happen? Would we pass extra assemblies to FlatSharpCompiler similar to what I did for the Unity native array PR? Or could we avoid the need for references altogether at FlatSharp compilation time?

I realized I tagged you in that PR (#322) but never actually explained this.

Basically, the trick FlatSharp does is that it compiles in passes. The first pass just emits the table/struct/enum/union definitions with attributes, as if you were using Runtime mode. The second pass converts reference structs to value structs where necessary and makes a few other changes with the benefit of the type model we built in pass 1. The final pass emits serializers and RPC definitions. At each step, the assembly from the previous step is fed in, which is the trick. Why does it work like this? The short answer is that I built the runtime/reflection stuff first and realized that I could do AOT/FBS pretty easily by reusing that work.

So, for external types, when a type is marked "external", it means we'll emit it in these intermediate assemblies we create based on the definition in the .FBS file. However, in the final pass, we just don't output that type itself, though serializers for the type will be emitted. This means that FlatSharp can run with full knowledge of the schema and not need any additional assembly references.

It's a little convoluted. Let me know if this didn't make sense :)

I'm going to close this issue, but wanted to give a small update on what I mentioned about generics being tricky: They are way trickier than I thought. I spent about 10 hours over the last couple of days trying to get them working without any luck. The short version is that FlatSharp cannot reason about open generics, and due to the recursive nature of the FlatSharp compiler, you have to have the real type at each step of the process, so it's turtles all the way down.

There is one last hack I could do to support them, but it's entirely unpleasant: Detect generic types in extern definitions, substitute those for some "fake" name and namespace (Random GUIDs or the like), and after all the C# has been generated, do a find/replace for those Random GUIDs and map in the original class name. This allows FlatSharp to generate "non-generic" code and just swaps in the final type name at the last moment.

So, my current thought is that if you need a generic type, simply wrap it in a containing non-generic struct:

struct GenericWrapper
{
    public System.Numerics.Vector<byte> Vector;
}

And then reference GenericWrapper as your extern definition in FBS. It's not perfect, but I'm more confident in this than some hack at the moment.