Persistence/serialization: reduce to using one data format

Question

Persistence/serialization: reduce to using one data format

rht opened this issue 8 years ago · comments

Currently, the complete logging data is stored as a combination of sqlite file (slow because of synchronous write to fs) + csv files for panel and aggregate. It'd be simpler to use just one format (one file) that can be readily ported to / parsed by other platforms.

The fastest out there is ujson[1] though with this, it is not possible to append/modify an entry without having to load the entire file into the memory.

[1] https://blog.hartleybrody.com/python-serialize/

Davoud Taghawi-Nejad · Answer 1 · Sat Oct 29 2016 02:52:27 GMT+0800 (China Standard Time)

Good idea as long as at the end there is a file that is easily importable
to R and excel.

On Fri, Oct 28, 2016, 6:34 PM rht notifications@github.com wrote:

Currently, the complete logging data is stored as a combination of sqlite
file + csv files for panel and aggregate. It'd be simpler to use just one
format (one file) that can be readily ported to / parsed by other platforms.

The fastest out there is ujson[1] though with this, it is not possible to
append/modify an entry without having to load the entire file into the
memory.

[1] https://blog.hartleybrody.com/python-serialize/

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/DavoudTaghawiNejad/abce/issues/16, or mute the thread
https://github.com/notifications/unsubscribe-auth/AA1K5dKE-GkkCZEaGTAlifQOhOuW3AOpks5q4iQwgaJpZM4KjoDA
.

rht · Answer 2 · Sat Oct 29 2016 03:24:36 GMT+0800 (China Standard Time)

Well, R reads json just fine (and other ABM frameworks are definitely so). Such key-value json file can be easily converted to xls as necessary.

Note that, after quick measurement, I discovered logging (with synchronous write to db) takes about 16% of the simulation time.

rht · Answer 3 · Sat Oct 29 2016 10:30:50 GMT+0800 (China Standard Time)

So, the default serialization format should be json, and implement postprocessing to_xls()?

Davoud Taghawi-Nejad · Answer 4 · Tue Sep 12 2017 00:36:06 GMT+0800 (China Standard Time)

An additional constraint is it must all be pypy3 compatible.

Davoud Taghawi-Nejad · Answer 5 · Fri Nov 17 2017 21:44:59 GMT+0800 (China Standard Time)

I think this is resolved. Sqlite is not syncronized. Sqlite is not removed, as pandas is not an option. It is all unified and simplified.

Davoud Taghawi-Nejad · Answer 6 · Fri Nov 17 2017 21:52:39 GMT+0800 (China Standard Time)

Actually, basic layout could be rewritten in such a way that it gets the sqlite database from memory and does not read the csv files. Then generating csv file could be made optional.