tonybaloney / wily

A Python application for tracking, reporting on timing and complexity in Python code

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Windows: aggregated graphs only show data for modified/added files

devdanzin opened this issue · comments

When building, wily only calculates metrics for modified or added files. This works well for individual files, but gives erroneous results for aggregated metrics on directories because each data point comes only from files that changed in that revision. So for e.g. LOC, values fluctuate a lot because what's being counted is LOC of changed files, meaning commits touching a lot of files give high values and small commits, low values.

To Reproduce

  1. checkout wily's repo
  2. wily build -n 200
  3. wily graph src/ --aggregate loc --all
  4. See a graph similar to this one below, where wild variations of a metric occur:
    aggregated_unpatched

Expected behavior
An aggregated metric probably should account for all files instead of only change/added. That way, the values would reflect history of the metric for the repository, instead of for changed files in each commit.

By patching wily to calculate metrics for all tracked files, we get a much more consistent result (with drawbacks pointed below):
aggregated_patched

There are issues that make the simplest patch not ready to present: first, this makes building much slower (250s -> 450s to build 200 revisions of wily's repo), hopefully some of it can be mitigated by caching (edit: indeed it can, but see below). And second, there are some artifacts I can't explain yet:

aggregated_patched_artifact

Desktop

  • OS: Windows
  • Browser: Firefox
  • Version: 1.24.2

Edited to add: caching brings time down to almost the same as only processing modified files (250s -> 280s), but it has to be applied to radon functions. We could either recreate part of radon's machinery with added caching (may be a lot of code) or try to monkey-patch radon with caching versions of analyze(), h_visit(), mi_parameters(), mi_visit() and cc_visit(). However, now I think it would be easier and cleaner to keep a running tally of metrics for all files, adding, removing and updating files as needed from changes in commits.