TeselaGen / fsml.org

A BioMADE Collaboration Project

Home Page:https://fsml.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Find or generate a large dataset to test CLI tool exporting

tgadam opened this issue · comments

Find or generate a large dataset to test CLI tool exporting. See #74 (comment)

We might use the Phycus randomized (and extended) dataset for this.

So far we've tested the CLI manifest generator with a 17,000 row data file. With this experiment, the following findings:

if the output file format is:

  • JSON:

    • the manifest generation is pretty fast
    • the manifest file compared to CSV data file is around 15x larger, which is kind of expected as a CSV file has much less information than the FSML manifest.
  • YAML:

    • the manifest generation is much slower than with JSON
    • the manifest file compared to CSV data file is around 8x

So essentially JSON is faster but its file size is larger, whereas YAML is slower but file size is smaller.

Bottom line, to attain unlimited scalability we might need to start exploring this other ticket: