Planned features and improvements for Ceras v5
rikimaru0345 opened this issue · comments
This issue tracks ideas for new features and improvements for the next version of Ceras (v5).
Breaking Changes
The following are changes that can't be implemented in the current version (v4) because they change the binary format Ceras uses.
ReferenceFormatter (Done!)
Together with all the other changes, there is a chance to optimize the most common code-paths taken by the ReferenceFormatter<>
!
It turns out that back-references are relatively rare, so we can make them into a special case!
So we can switch from a VarInt to a fixed 1-byte prefix to tells us about the upcoming data.
- Simple cases:
Null
NewObject
: following the object data directly
- Extended cases:
NewDerivedObject
: following aType
and the object dataInlineType
: following aType
, used as the object itselfExternalObject
: following a fixed Int32 for the external IDBackReference
: following a fixed Int32 for the already encountered object ID
The most common cases by far (99%+) are Null, NewObject, and NewDerivedObject.
The important thing to note here is that all the common cases will profit from much faster read/write performance.
Surprisingly all the uncommon cases will be faster as well, since even though we have an additional 1 byte, there are fewer branches in total (considering that VarInt contains 4+ branches by itself already)
- Rework
ReferenceFormatter
using the scheme described above
Type Serialization (Done!)
- Type Codes (for framework types)
In the rare case that Ceras has to embed a Type into the binary, it is written using its full name; which is perfectly fine for user-defined types.
But for framework types (likeList<>
,Int32
, ...) we could write something like a "builtin type-code" to save space.
AotGenerator (Done!)
-
Instead of generating the content of the formatters using strings; the new AotGenerator should actually construct the expression tree that the "normal" Ceras uses (when not using VersionTolerance); and then convert that into a source-code-string. That way we get some huge advantages:
- Improvements to
DynamicFormatter
will automatically be available in generated formatters! - Drastically reduced the potential for bugs, because now the generator doesn't have to essentially "rewrite" what the
DynamicFormatter
does - Performance features like merge-blitting are automatically implemented in aot code as well!
- Improvements to
-
Split the old AotGenerator.exe into a .dll and an .exe, so that usage in Unity is much easier. There could be a Unity-script that automatically listens for changes and recompiles the
-
Add an attribute to generated formatters. That way when
CerasAutoGenConfigAttribute
is used, Ceras knows that it should ignore any old generated formatters while re-generating them.
Encoding (Done!)
- Improved String Encoding
Currently we have to iterate over every string twice because we must know how many bytes it will require (usingGetByteCount
). That takes time. Another approach is to guess the byte-length, then write, then see if we have to relocate the string (in other words, do it all again).
Every serializer I know of does one of those two things.
I would prefer if Ceras would try to be more efficient by encoding strings in a more intelligent scheme. The idea is that we'd write up to 254 bytes and then, if there are still characters left, encode the remaining bytes in one big block.
The only blocking issue here is that String.Create is only available in .net standard 2.1; and without it we'd pay with a performance hit at deserialization time (having to allocate a char array, then creating a new string from that). However the performance impact might be negligible (memcpy is much faster than the utf8 decoding step), the char array can be thread-local and recycled, and we can completely avoid the hit in netstandard2.1 later; whereas we'd have to live with the not-as-efficient encoding forever if we don't do this change now.
Config
-
Ability to configure formatter per Member!
- "Late initialization" to allow changes to the TypeConfig for as long as possible (until the first de/-serialization)
- Ensure that declaring types of members using a custom formatter are in fact handled by
DynamicFormatter
-
config.IntegerEncoding
Allow users to decide when they want to use fixed encoding vs variable encoding. For example if you want to use Ceras for networking you want to throw in as much compression as you can, every cpu cycle that goes towards sending less data is worth it. So you could opt to encode allint
,short
,long
, ... with variable encoding, making Ceras use WriteUInt32 instead of WriteUInt32Fixed.
Or, if your aim is to save data to disk (save-game, settings, level-data, game-database...) you want things to go fast, so you can always use fixed encoding, which is larger (always 2/4/8bytes) but much faster.-
UseReinterpretFormatter
is now superseded byIntegerEncoding
-
-
PersistentName
for Types and Members. Influences member order. Enables Ceras to work together with obfuscators.- Add this new setting to TypeConfig
- Automatically set by
[MemberConfig]
,[DataMember]
, ormember.Name
- Maybe have "config.OnGetPersistentName", so the user can do all sorts of trickery (maybe having encrypted type names even in the attributes, only decrypting them when Ceras needs them)
Version Tolerance
Right now (in v4) when an application wants to read any binary data with Ceras, it must already have the correct Types (classes and structs). In other words, the format must be known.
This is fine for very high-performance use cases, but sometimes you want to trade in a bit of performance for a bit more leeway in terms of compatibility.
For example having a server-client network scenario where the client is slightly outdated...
Or another scenario would be an application wanting to inspect or even modify data when it doesn't know the specific format.
Formats like Json, Xml, or the MsgPack embed additional information so they can be completely self-describing.
With the following improvements, Ceras format can be a self-describing as well!
That way Ceras could even handle simple cases (like changing an int
field to a float
or something) automatically, and provide an API for the user to handle more complicated cases.
-
More information in embedded Schema
- Type.Name: allow for types to change their name! Also makes Ceras compatible with obfuscators!
- MemberType: each member also records its type, in addition to the name. Allows for members changing their type, and even automatic conversion.
- Formatter: embed an ID for each used formatter (reinterpret, array, list, varint, dynamic, user, ...). That allows us to be robust against changes in
IntegerEncoding
, or warn the user when they're trying to read something but some formatter is missing! (maybe the old data was written using a user-created formatter)
-
Ensure only Schema of types handled by
DynamicFormatter
/SchemaDynamicFormatter
are actually written (but allow users from manually generating/writing a Schema of any type) -
Make
Schema
public and add aOnSchemaRead
callback that is called when Ceras loaded a new Schema, and produced some mappings and conversions in order to load the old data. That way users can see how the format changed and what Ceras did to resolve the differences. Also provide a way to save/load a Schema to/frombyte[]
. -
(maybe) Inspect /
.ToJson()
With all the new information inSchema
, it should be possible to allow users to info it should be possible to even (one-way) convert it into a json-string. -
config.VersionTolerance.PrefixSize
setting to let the user select the prefix size of members (currentlyfixed UInt32
). It should be possible to selectushort
,byte
, and evenvarint
. Epecially interesting for networking purposes. -
config.EmbedSchema
setting that you can disable, in which case you're responsible for somehow storing the Schema manually. Could be useful for network scenarios.
Non breaking changes
-
Type encoding should use size-limited strings to prevent an attacker from overloading the serializer that way.
-
(Maybe) Special handling for very large structs (>64 bytes). We could have a
ISerializeByRef
interface implemented byDynamicFormatter
,ReinterpretFormatter
andArrayFormatter
. -
Ensure all lookups of private methods actually work in .NET Core as well (ex "GetUninitializedObject" which is private there)
-
Try to automatically select a constructor in more cases. Maybe filter the ones we can't use / map, then use the one that takes the most arguments?
-
When used in Unity: Catch and rethrow
MissingMethodException
and tell the user what the problem actually is (IL2CPP either removing a method, or not generating a generic instantiation for it). Explain how it can be fixed: Add link.xml for stripped methods. Call generic methods in their closed form beforehand. Maybe we could even generate some code for the user to copy-paste in the latter case. -
Support open and half-open MethodInfos. (comment)
-
Change exception when no ctor is found to tell people about
[CerasConstructor]