NCEAS / recordr

Provenance tracking for R.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ensure that searching through all recorded runs is scalable / performant

gothub opened this issue · comments

Currently recordr stores information for each run in a separate directory. Methods such as
listRuns() must search through files for every run to find runs to list. We should evaluate whether
storing indexing info in a database is necessary and how much work is needed for
implementation. Only items necessary to search for a run would be stored in the db, i.e. no
provenance info would be stored in the db. Currently listRuns() can use these items to find
matching runs:

  • start: start time of execution
  • end: end time of execution
  • tag: descriptive string entered by user to identify a particular run
  • error: did an error occur during this execution
  • seq: a sequence number automatically assigned to each run

Execution metadata is now stored in an SQLite database, so searching for runs for listRuns(),
viewRun(), deleteRuns() is now done by querying the database. Currently all other data is
still stored in each of the 'run' directories, with each execution having a uniquely named
directory that contains a serialzied version of the datapackage, etc.

This changes was made in commit f61eff5