NCEAS / recordr

Provenance tracking for R.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add ability to 'prune' a lineage trace

gothub opened this issue · comments

When creating a lineage trace in the 'forward' direction, multiple branches can be encountered that
are multiple interations of the same program. In the diagram below, plot.R has been called 3 times and read hourly-temp-clean.csv from 3 separate executions.

I propose that plotRuns() and the method that it calls, which is traceRuns() have an additional
argument that will cause the tracing algorithm to only follow the most recent branch in this situation.
The criteria for determining if there are 'redundant' branches are when separate executions, running the same script, read the same file. In the example graph below, only one invocation of plot.R
would appear.

The new argument could specify which branch is not pruned, for example:

  • prune="older": would cause the latest branch to survive
  • prune="newer": would cause the oldest branch to survive
  • prune="none": would cause all branches to survive

The default would be prune="older"

Note that pruning is not required if tracing is done only in the 'backward' direction.

rplot01