Documentation
JohannesLichtenberger opened this issue · comments
We'd have to write documentation about the overall architecture, the secondary indexes, and the path summary...
I'm interested in doing this task.
I think we should probably use this: https://sirix-docs.readthedocs.io/en/latest/
Currently, some documentation is linked here: https://sirix.io/documentation.html
However, I'm creating new diagrams in the sirix/images folder (using Excalidraw)
I think we should probably use this: https://sirix-docs.readthedocs.io/en/latest/
Currently, some documentation is linked here: https://sirix.io/documentation.html
However, I'm creating new diagrams in the sirix/images folder (using Excalidraw)
how can I help for new docs?
You could, for instance, check if you can set up a SirixDB server and check if the documentation is correct.
Other than that maybe you can add the XQuery/JSONiq functions to the new docs...
BTW: What's your opinion on using readthedocs.io
?
You could, for instance, check if you can set up a SirixDB server and check if the documentation is correct.
Other than that maybe you can add the XQuery/JSONiq functions to the new docs...
BTW: What's your opinion on using
readthedocs.io
?
Sure, I'll be happy to help you with that please add e to this task! Regarding the 'readdocs' issue, I have experience working with small libraries in Python/Django, such as the example you provided (exmplale). In my opinion, sometimes these kinds of documents can be a bit tedious and difficult to read for beginners, but they are very useful and easy for fast generating the docs. Personally, I prefer documentation formats like those used in Spring Project and other similar frameworks.
We could of course also stick to the sirix.io markdown files for instance. Maybe also a complete redesign of the website would be amazing, but yeah...
We also need Tutorials, HowTo Guides... https://youtu.be/t4vKPhjcMZg
Yes I feel lack of tutorials and good documentation too :) I will help you do this I need some time to read the code and understand it can you please help me in this? How to start and which parts to read.
You can, for instance, check the usage of JsonDocumentCreator
and debug a little bit.
I think a top-down approach might be best.
In general there's a Database
instance which encapsulates Resources
, the equivalent to tables in a relational database system. These resources are either JSON or XML based (we store a binary encoding of a tree, think of it as a persistent DOM -- firstChild/lastChild/parent/leftSibling/rightSibling encoding).
Then from the database instance you can create a new resource or open a resource session to start N read-only trxs or a single read-write trx. Each JsonNodeTrx
or JsonNodeReadOnlyTrx
has a page reading trx dependency, which is essentially the storage engine (I think we could also rename the classes at some point ;-)). The page reading trx has a reader/writer dependency, which basically writes the pages to the storage device (currently to files via "normal" FileChannel based I/O or the use of memory mapped files or io_uring, but the latter somehow currently is slower than "normal" I/O, maybe due to the event loop used in the library we use, but I'm not sure... we currently also work on a file based + async mechanism to store the pages also in S3 buckets for instance.
The architecture is a huge tree of tries basically and new revisions are always appended. The data / key/value pages of the tries store the actual nodes (of the JSON or XML trees) or they store secondary indexes...
You can, for instance, check the usage of
JsonDocumentCreator
and debug a little bit.I think a top-down approach might be best.
In general there's a
Database
instance which encapsulatesResources
, the equivalent to tables in a relational database system. These resources are either JSON or XML based (we store a binary encoding of a tree, think of it as a persistent DOM -- firstChild/lastChild/parent/leftSibling/rightSibling encoding).Then from the database instance you can create a new resource or open a resource session to start N read-only trxs or a single read-write trx. Each
JsonNodeTrx
orJsonNodeReadOnlyTrx
has a page reading trx dependency, which is essentially the storage engine (I think we could also rename the classes at some point ;-)). The page reading trx has a reader/writer dependency, which basically writes the pages to the storage device (currently to files via "normal" FileChannel based I/O or the use of memory mapped files or io_uring, but the latter somehow currently is slower than "normal" I/O, maybe due to the event loop used in the library we use, but I'm not sure... we currently also work on a file based + async mechanism to store the pages also in S3 buckets for instance.The architecture is a huge tree of tries basically and new revisions are always appended. The data / key/value pages of the tries store the actual nodes (of the JSON or XML trees) or they store secondary indexes...
Thank you for your very helpful comment I will start reading right know :)
Can you please assign this task to me?
Will do, once I'm back home. Can not find the button using my phone :-D BTW: you can also check the existing documentation and I hope that even the excalidraw images might provide a bit of an architecture overview (for instance how a JSON document is mapped to the tree structure), despite that I want to work on a new technical document about the concepts and architecture using the new illustrations/images...
Will do, once I'm back home. Can not find the button using my phone :-D BTW: you can also check the existing documentation and I hope that even the excalidraw images might provide a bit of an architecture overview (for instance how a JSON document is mapped to the tree structure), despite that I want to work on a new technical document about the concepts and architecture using the new illustrations/images...
Sure I will, can we keep in touch via email?I will have questions about the code and architecter :))
We have a discord channel, you can join. The link is in the README.