Cista++ is a simple, open source (MIT license) C++17 compatible way of (de-)serializing C++ data structures.
Single header. No macros. No source code generation.
- Raw performance - use your native structs. Supports modification/resizing of deserialized data!
- Supports complex and cyclic data structures including cyclic references, recursive data structures, etc.
- Save 50% memory: serialize directly to the filesystem if needed, no intermediate buffer required.
- Compatible with Clang, GCC, and MSVC
The underlying reflection mechanism can be used in other ways, too!
Example:
Download the latest release or try it online.
#include <cassert>
#include "cista.h"
int main() {
namespace data = cista::offset;
struct my_struct { // Define your struct.
int a_{0};
struct inner {
int b_{0};
data::string d_;
} j;
};
std::vector<unsigned char> buf;
{ // Serialize an object of class my_struct.
my_struct obj{1, {2, data::string{"test"}}};
buf = cista::serialize(obj);
} // End of life for the "obj" value
// Deserialize and read.
auto deserialized = data::deserialize<my_struct>(buf);
assert(deserialized->j.d_ == data::string{"test"});
}
Have a look at the benchmark repository for more details.
Library | Serialize | Deserialize | Fast Deserialize | Traverse | Deserialize & Traverse | Size |
---|---|---|---|---|---|---|
Cap’n Proto | 105 ms | 0.002 ms | 0.0 ms | 356 ms | 353 ms | 50.5M |
cereal | 239 ms | 197.000 ms | - | 125 ms | 322 ms | 37.8M |
Cista++ offset |
72 ms | 0.053 ms | 0.0 ms | 132 ms | 132 ms | 25.3M |
Cista++ raw |
3555 ms | 68.900 ms | 21.5 ms | 112 ms | 133 ms | 176.4M |
Flatbuffers | 2349 ms | 15.400 ms | 0.0 ms | 136 ms | 133 ms | 378.0M |
Reader and writer should run on the same architecture (e.g. 64 bit big endian). Examples:
- Asset loading for all kinds of applications (i.e. game assets, GIS data, large graphs, etc.)
- Transferring data over network
- shared memory applications
Currently, only C++17 software can read/write data. But it should be possible to generate accessors for other programming languages, too.
If you need to be compatible with other programming languages or require protocol evolution (downward compatibility) you should look for another solution:
- Declare the data structures you want to serialize as regular C++ structs
(using scalar types,
cista::raw/offset::string
,cista::raw/offset::unique_ptr<T&>
, andcista::raw/offset::vector<T&>
- more types such asmap
/set
/etc. may follow). - Do NOT declare any constructors (reflection will not work otherwise).
- Always use data types with known sizes such as
int32_t
,uint8_t
for compatibility across platforms (with the same architecture). - To use pointers: store the object you want to reference as
cista::raw/offset::unique_ptr<T&>
and use a raw pointerT*
to reference it. - Optional: if you need deterministic buffer contents, you need to fill spare bytes in your structs (see the advanced example below).
Cista++ supports two serialization formats:
Offset Based Data Structures
+
can be read without any deserialization step+
suitable for shared memory applications-
slower at runtime (pointers need to be resolved using on more add)
Raw Data Structures
-
deserialize step takes time (but still very fast also for GBs of data)-
the buffer containing the serialized data needs to be modified+
fast runtime access (raw access)
Both namespaces cista::offset
and cista::raw
have the same structure. They provide the same data structures and functions. They have the same behavior.
The following data structures exist in cista::offset
and cista::raw
:
vector<T>
: serializable version ofstd::vector<T>
string
: serializable version ofstd::string
unique_ptr<T>
: serializable version ofstd::unique_ptr<T>
ptr<T>
: serializable pointer:cista::raw::ptr<T>
is just aT*
,cista::offset::ptr<T>
is a specialized data structure that behaves mostly like aT*
(overloaded->
,*
, etc. operators).
Currently, vector
, string
, and unique_ptr
do not provide exactly the same interface as their std::
equivalents. Standard compliance was not a goal. This can change in future releases. It is possible to add more data structures to Cista++.
The following methods can be used to serialize either to a std::vector<uint8_t>
(default) or to an arbitrary serialization target.
std::vector<uint8_t> cista::serialize<T>(T const&)
serializes an object of typeT
and returns a buffer containing the serialized object.void cista::serialize<Target, T>(Target&, T const&)
serializes an object of typeT
to the specified target. Targets are eithercista::buf
orcista::sfile
. Custom target sturcts should providewrite
functions as described here.
The following functions exist in cista::offset
and cista::raw
:
T* deserialize<T, Container>(Container&)
deserializes an object from astd::vector<uint8_t>
or similar data structure. This function throws astd::runtimer_error
if the data is not well-formed.T* deserialize<T>(uint8_t* from, uint8_t* to)
deserializes an object from a pointer range. This function throws astd::runtimer_error
if the data is not well-formed.T* unchecked_deserialize<T, Container>(Container&)
deserializes an object from astd::vector<uint8_t>
or similar data structure. No checking is performed!T* unchecked_deserialize<T>(uint8_t* from, uint8_t* to)
deserializes an object from a pointer range. No checking is performed!
cista::offset::unchecked_deserialize
performs just a pointer cast!
The following example shows serialization and deserialization of more complex data structures.
Try it online!
// Use the raw serialization format.
namespace data = cista::raw;
// Forward declare `node` to be able to use it in `edge`.
struct node;
// Always use types that have a fixed size on every
// platform: (u)int16_t, (u)int32_t, ...
using node_id_t = uint32_t;
struct edge {
data::ptr<node> from_;
data::ptr<node> to_;
};
struct node {
void add_edge(edge* e) { edges_.emplace_back(e); }
node_id_t id() const { return id_; }
node_id_t id_{0};
node_id_t fill_{0};
data::vector<data::ptr<edge>> edges_;
data::string name_;
};
struct graph {
node* make_node(data::string name) {
return nodes_
.emplace_back(data::make_unique<node>(
node{next_node_id_++, 0, data::vector<data::ptr<edge>>{0u},
std::move(name)}))
.get();
}
edge* make_edge(node_id_t const from, node_id_t const to) {
return edges_
.emplace_back(
data::make_unique<edge>(edge{nodes_[from].get(), nodes_[to].get()}))
.get();
}
// Use unique_ptr to enable pointers to these objects.
data::vector<data::unique_ptr<node>> nodes_;
data::vector<data::unique_ptr<edge>> edges_;
node_id_t next_node_id_{0};
node_id_t fill_{0}; // optional: zero out spare bytes for
// deterministic buffer contents
};
// Create cyclic graph with nodes and edges.
{
graph g;
auto const n1 = g.make_node(data::string{"NODE A"});
auto const n2 = g.make_node(data::string{"NODE B"});
auto const n3 = g.make_node(data::string{"NODE C"});
auto const e1 = g.make_edge(n1->id(), n2->id());
auto const e2 = g.make_edge(n2->id(), n3->id());
auto const e3 = g.make_edge(n3->id(), n1->id());
n1->add_edge(e1);
n2->add_edge(e2);
n3->add_edge(e3);
cista::sfile f{"test.bin", "wb"};
cista::serialize(f, g);
} // EOL graph
auto b = cista::file("test.bin", "r").content();
auto const g = data::deserialize<graph>(b);
use(g);
Basically, the only thing the cista::serialize()
call does, is to copy everything into one coherent target
(e.g. file or memory buffer) byte-by-byte recursively.
Additionally, each pointer gets converted to an offset
at serialization and back to a real pointer at deserialization.
Every data structure can be (de-)serialized using a custom
(de-)serialization function (see below for more details).
All this is done recursively.
Generally, all serialized data is checked so that every pointer T*
is either a nullptr
or points to a valid position within the buffer with enough bytes for T
. Scalar values are checked to fit into the buffer at their position, too. Code can be found here.
Note that modifying serialized data may corrupt it in unexpected ways. Therefore, it is not safe to access modified deserialized data coming from untrusted sources. However, deserializing and reading data from untrusted sources (without modifying the data) is safe.
If you have 3rd-party structs, structs with constructors or structs that manage memory, etc. you need to override the serialize and deserialize functions.
Note that usually you should not need to do this! Every combination of the cista::offset/raw data structures should work without custom (de-)serialzation functions.
By default, every value gets copied raw byte-by-byte. For each type you would like serialize with a custom function, you need to override the following function for your type.
This function will be called with
- The serialization context (described below). It provides functions to write to the buffer and to translate pointers to offsets.
- A pointer to the original value (i.e. not the serialized!) of your struct.
- The offset value where the value of
YourType
has been copied to. You can use this information to adjust certain members ofYourType
. For example:ctx.write(pos + offsetof(cista::string, h_.ptr_), start)
. This overrides the pointer contained incista::string
with the offset, the bytes have been copied to by callingstart = c.write(orig->data(), orig->size(), 1)
.
template <typename Ctx>;
void serialize(Ctx&, YourType const*, cista::offset_t const);
The Ctx
parameter is templated to support different
serialization targets (e.g. file and buffer).
Ctx
provides the following members:
struct serialization_context {
struct pending_offset {
void* origin_ptr_; // pointer to the original
offset_t pos_; // offset where it is serialized
};
/**
* Writes the values at [ptr, ptr + size[ to
* the end of the serialization target buffer.
* Adjusts for alignment if needed and returns
* the new (aligned) offset the value was written to.
*
* Appends to the buffer (resize).
*
* \param ptr points to the data to write
* \param size number of bytes to write
* \param alignment the alignment to consider
* \return the alignment adjusted offset
*/
offset_t write(void const* ptr, offset_t const size,
offset_t alignment = 0);
/**
* Overrides the value at `pos` with value `val`.
*
* Note: does not append to the buffer.
* The position `pos` needs to exist already
* and provide enough space to write `val`.
*
* \param pos the position to write to
* \param val the value to copy to position `pos`
*/
template <typename T>;
void write(offset_t const pos, T const& val);
/**
* Lookup table from original pointer
* to the offset the data was written to.
*/
std::map<void*, offset_t>; offsets_;
/**
* Pending pointers that could not yet get
* resolved (i.e. the value they point to has not
* yet been written yet but will be later).
*/
std::vector<pending_offset>; pending_;
};
To enable a custom deserialization, you need to create a specialized function for your type with the following signature:
void deserialize(cista::deserialization_context const&, YourType*);
void unchecked_deserialize(cista::deserialization_context const&, YourType*);
With this function you should:
- for the non-"unchecked" version: check that
everything fits into the buffer and that
pointers point to memory addresses
that are located inside the buffer
using
deserialization_context::check()
- convert offsets back to pointers using the
deserialization_context::deserialize()
member function.
Both functions (deserialize()
and check()
)
are provided by the deserialization_context
:
struct deserialization_context {
/**
* Converts a stored offset back to the original pointer
* by adding the base address.
*
* \param ptr offset (given as a pointer)
* \return offset converted to pointer
*/
template <typename T, typename Ptr>;
T deserialize(Ptr* ptr) const;
/**
* Checks whether the pointer points to
* a valid memory address within the buffer
* where at least `size` bytes are available.
*
* \param el the memory address to check
* \param size the size to check for
* \throws if there are bytes outside the buffer
*/
template <typename T>;
void check(T* el, size_t size) const;
};
Feel free to contribute (bug reports, pull requests, etc.)!