sirixdb / sirix

SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.

Home Page:https://sirix.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Documentation

JohannesLichtenberger opened this issue · comments

We'd have to write documentation about the overall architecture, the secondary indexes, and the path summary...

I'm interested in doing this task.

I think we should probably use this: https://sirix-docs.readthedocs.io/en/latest/

Currently, some documentation is linked here: https://sirix.io/documentation.html

However, I'm creating new diagrams in the sirix/images folder (using Excalidraw)

I think we should probably use this: https://sirix-docs.readthedocs.io/en/latest/

Currently, some documentation is linked here: https://sirix.io/documentation.html

However, I'm creating new diagrams in the sirix/images folder (using Excalidraw)

how can I help for new docs?

You could, for instance, check if you can set up a SirixDB server and check if the documentation is correct.

Other than that maybe you can add the XQuery/JSONiq functions to the new docs...

BTW: What's your opinion on using readthedocs.io?

You could, for instance, check if you can set up a SirixDB server and check if the documentation is correct.

Other than that maybe you can add the XQuery/JSONiq functions to the new docs...

BTW: What's your opinion on using readthedocs.io?

Sure, I'll be happy to help you with that please add e to this task! Regarding the 'readdocs' issue, I have experience working with small libraries in Python/Django, such as the example you provided (exmplale). In my opinion, sometimes these kinds of documents can be a bit tedious and difficult to read for beginners, but they are very useful and easy for fast generating the docs. Personally, I prefer documentation formats like those used in Spring Project and other similar frameworks.

We could of course also stick to the sirix.io markdown files for instance. Maybe also a complete redesign of the website would be amazing, but yeah...

We also need Tutorials, HowTo Guides... https://youtu.be/t4vKPhjcMZg

Yes I feel lack of tutorials and good documentation too :) I will help you do this I need some time to read the code and understand it can you please help me in this? How to start and which parts to read.

You can, for instance, check the usage of JsonDocumentCreator and debug a little bit.

I think a top-down approach might be best.

In general there's a Database instance which encapsulates Resources, the equivalent to tables in a relational database system. These resources are either JSON or XML based (we store a binary encoding of a tree, think of it as a persistent DOM -- firstChild/lastChild/parent/leftSibling/rightSibling encoding).

Then from the database instance you can create a new resource or open a resource session to start N read-only trxs or a single read-write trx. Each JsonNodeTrx or JsonNodeReadOnlyTrx has a page reading trx dependency, which is essentially the storage engine (I think we could also rename the classes at some point ;-)). The page reading trx has a reader/writer dependency, which basically writes the pages to the storage device (currently to files via "normal" FileChannel based I/O or the use of memory mapped files or io_uring, but the latter somehow currently is slower than "normal" I/O, maybe due to the event loop used in the library we use, but I'm not sure... we currently also work on a file based + async mechanism to store the pages also in S3 buckets for instance.

The architecture is a huge tree of tries basically and new revisions are always appended. The data / key/value pages of the tries store the actual nodes (of the JSON or XML trees) or they store secondary indexes...

You can, for instance, check the usage of JsonDocumentCreator and debug a little bit.

I think a top-down approach might be best.

In general there's a Database instance which encapsulates Resources, the equivalent to tables in a relational database system. These resources are either JSON or XML based (we store a binary encoding of a tree, think of it as a persistent DOM -- firstChild/lastChild/parent/leftSibling/rightSibling encoding).

Then from the database instance you can create a new resource or open a resource session to start N read-only trxs or a single read-write trx. Each JsonNodeTrx or JsonNodeReadOnlyTrx has a page reading trx dependency, which is essentially the storage engine (I think we could also rename the classes at some point ;-)). The page reading trx has a reader/writer dependency, which basically writes the pages to the storage device (currently to files via "normal" FileChannel based I/O or the use of memory mapped files or io_uring, but the latter somehow currently is slower than "normal" I/O, maybe due to the event loop used in the library we use, but I'm not sure... we currently also work on a file based + async mechanism to store the pages also in S3 buckets for instance.

The architecture is a huge tree of tries basically and new revisions are always appended. The data / key/value pages of the tries store the actual nodes (of the JSON or XML trees) or they store secondary indexes...

Thank you for your very helpful comment I will start reading right know :)

Can you please assign this task to me?

Will do, once I'm back home. Can not find the button using my phone :-D BTW: you can also check the existing documentation and I hope that even the excalidraw images might provide a bit of an architecture overview (for instance how a JSON document is mapped to the tree structure), despite that I want to work on a new technical document about the concepts and architecture using the new illustrations/images...

Will do, once I'm back home. Can not find the button using my phone :-D BTW: you can also check the existing documentation and I hope that even the excalidraw images might provide a bit of an architecture overview (for instance how a JSON document is mapped to the tree structure), despite that I want to work on a new technical document about the concepts and architecture using the new illustrations/images...

Sure I will, can we keep in touch via email?I will have questions about the code and architecter :))

We have a discord channel, you can join. The link is in the README.