mmn80 / muesli

A document-oriented database library for Haskell

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is the project active?

AleXoundOS opened this issue · comments

Currently I just check that it builds on newer versions of GHC + Stackage.
Contributions are welcome.

I'm not at the level to be able to contribute. Just was searching around for active document-oriented DB projects supporting haskell. The most interesting, muesli, being the native solution, for some reason, is the least active.
Interesting project. Though lacks enough docs about usage scenarios, scalability, reliability, battle-testing, etc.

Sorry I did not respond sooner.

The goal of this project was to create a small, native, strongly typed, transactional DB engine / library, with expected usage scenarios somewhat similar to SQLite's, that is resilient (in terms of maintenance cost) to GHC changes by exclusively relying on Prelude for IO, with the rest of dependencies being pure libraries (so no mmap, etc.). This part was a success, as it didn't break since I uploaded it in 2015.

There was no focus whatsoever on low level optimization, just the high level stuff which is a requirement for databases (declarative indexing, incremental GC, etc.: all atomic operations complete in logarithmic time).

My original hope with this design was that muesli will continue to be useful for some years even if there is very little activity. For me this was a learning & research project, and if I'm to do this again, I would probably do it in Idris (but Idris's typechecker is extremely slow & I'm kinda waiting for E. Brady to release Idris v2, which he solemnly promised to be faster). I'd also implement proper disk based B-trees. All of these are completely incompatible changes.

I do not have much time for this any more (have a job). Maybe I'll do a big review & update at some point, probably when a version of GHC containing some exciting new features will be released. So for any people that may use muesli, I would very much appreciate your feedback (feel free to post questions or start general discussions) or any other kinds of contributions. Otherwise, I can vouch this: I'll deal with any issues posted here, and I'll ensure the library does not break with new deps.

@mmn80, thank you for the detailed answer!
I'm not an expert at all and there is nothing supporting my comment, but I have some thoughts on this, some of which you may find useful or not.

As for things being useful, imho, databases is such a piece of tech that should feature practical aspect at first. Otherwise, it is not that useful. And though most of the features in the README are indeed practical, something very important is still missed. Honestly, I haven't tried muesli in practice. And maybe one of the reasons is the lack of practical / experience usage reports, success cases. So that I don't know beforehand which volumes of data it can handle in practice and how well, etc.

And currently, there is no such a big demand in native Haskell databases that community picks up such projects automatically without related articles, experience reports, and other "promotional" activity.

It would be better if README starts with the thesis you've written in your comment to stand things more clear. Rather than it is read as follows.
The README starts with A simple document-oriented database engine for Haskell. followed by Use cases. So I read this as a ready to work database for real-world tasks. And reading the README to the end makes me think that it pretends to be mature, however badly supported one. There is no mention of:

this was a learning & research project

And this intention is not bad at all! It just assumes different kinds of project development principles and support. And this isn't clear out of the README. Maybe more clear goals which emphasize the things that are really important to the maturity of the project, to concentrate community efforts on, plus some articles could power the community interest.

As for Idris, I'm concerned if it's runtime performance affordable for such a task. If so, it will be very cool!

You are quite right about the README being misleading. When I wrote it, my intention was to also write a series of blog posts explaining the main points of the project as a whole, and so I kept in the README just some code examples so that people could form a quick impression of how it would look like using it, and some points to attract people disappointed with acid-state (like myself, this being the practical reason I started the project).

I have to add that at the same time I was actually working on a RSS/Atom feed reader using muesli, which was intended exactly as an "experience usage report" like you mentioned. In fact, development of muesli was driven by the practical demands of feed-reder. So expect this will work ok with <10000 documents or so. But for sure it's not for big servers, but rather as storage for smaller, potentially distributed apps like the feed reader.

Again, I was about to write this all in blog posts, but then I got high. I mean, Type Theory happened to me and my interests quickly switched to that. So shameful, I postponed updating the docs for years now.

The thing with Idris is that it's relatively easy to make your own run time (the current default one has almost no engineering put into it, it's educational). You could have a custom backend that includes a fast C implementation for the primitive DB ops, but keep using the typechecker + elaborator as a query optimizer. Other people hope to make a GC-less backend for the linearly typed parts of the code. Many interesting possibilities. Idris is intended for high performance, highly secure embedded apps, not one size fits all big language + runtime with single implementation like Haskell.

Anyway, thanks for reminding me of the README situation. Did you need such a document DB for something in particular?

I understand.

As for the need of such DB, to be honest, at the time of checking the project I didn't know exactly which kind of database suits better my needs. In any case < 10000 is less than expected to work with the planned projects. Currently, the decision is made to SQLite (either upgrade to PostgreSQL) with JSON extended functionality.


The whole Idris part has an impression on me of something at a fantastic scale. Having both a more mature type system with a more performant runtime system than Haskell is impressive. We'll see how well it works out in production.

The < 10000 only applies to client apps and is so because the index is entirely loaded into memory so the initial loading time is log-linear in the number of records. I tested it with feed-reader, which is a CLI app that features various commands, including one to generate random, but realistic records. So I generated 10000 feed items and I measured the loading time to < 100ms, which I considered acceptable. Other then that you are only limited by RAM, but the index is actually very small (for the 10000 example, it used only a few tens of MB of RAM!), and millions of records could be loaded easily even on android devices.

For server deployments, where the boot time does not matter that much, there is no limit in size. As I explained before, all ops are logarithmic, and this invariant was the most important property I focused on, since I completely agree with you that, when talking about databases, performance is not just a nicety, but the whole point of using a DB in the first place.

Where I suspect it to degrade performance is with many concurrent users. I used MVars, which are considered reasonably fast, it's just then I never tested it with many threads. If you end up testing it, please report back your results. Thanks.

I'll keep this thread open, if you don't mind, so that other people can find this info.

Thank you for the info about index RAM usage as it allows to roughly estimate resources. As for concurrency, I wasn't concerned about such a significant load. I will report my results once I use muesli in practice.