kboyd / Roc

Everything ROC and Precision-Recall curves.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Better name for core class

afbarnard opened this issue · comments

I've been working on improving the documentation, and I think we need a better name for the core class, CurveData. A better name would communicate its purpose better, which is to use a ranking to compute ROC and PR curves and statistics. It is not just data. It is not a curve. What do you think @kboyd, @finnkuusisto? Any ideas?

The best I can come up with so far is CurveRepresentation but I believe a much better name exists. What about showing our penchant for humor and calling it CurveDaddy?

I think it is important to take care of this prior to the first release, self-documentation and all.

What about CurveSource or CurveGenerator?

  • Captures the ideas that the object is not itself a curve and that it can generate curves.
  • Raises questions about multiple curves.

Maybe plain, old "Curve" is better?

  • Ignores the distinction between the actual points of a curve and the abstract concept of something that can calculate the points. This can be mitigated by context and distinguishing when referring to points.
  • Is much more clearly the center of attention.
  • Thinking about multiple curves/aggregates is more natural.

I had also thought of "Ranking" but @kboyd had a counterexample that I don't remember.

  • Expresses more precisely what the class is but requires more understanding (i.e. that curves are generated from a ranking).
  • Not clearly a central class in a library about curves.

@kboyd and I were talking the other day about design and the idea of how R handles linear models came up causing us to consider modeling curves, so what about 'CurveModel' as the name for the core class?

'CurveModel' sounds good, certainly better than 'CurveData' for what it does. Also makes it more natural to subclass for different estimation/model types such as fitting a binormal ROC curve.

I am in favor of "Curve" until the complexity of things suggests otherwise. Will "CurveModel" help us understand the design/code at this point? We may eventually want both a "Curve" and a "CurveModel". At this point I just want to keep the API simple and make a release.

'Curve' makes it sound like each instance defines a general curve and nothing more. But there is more going on, the curves are generated from ranking data (not the typical function used to define a curve), and both ROC and PR curves can be obtained from an instance.

But perhaps it is clear from the project context that any curves are ROC/PR curves and not general mathematical functions?

Probably true, I don't really have a strong preference either way so if you prefer 'Curve' let's use that.

OK, at least for now (version 0.1.0) we'll go with "Curve". We'll see if there are better names as we do design down the road. I don't mind changing names, especially if the names increase clarity/documentation. "CurveModel" is a good name so we'll keep it in our pocket for later.

BTW, I felt that this was a good and productive discussion. So thanks.

Resolved in commit 4915a58.

This has been bugging me again. "Curve" just is not descriptive enough. "Ranking"? "CurveModel"? "ConfusionMatrixRanking"? "ClassificationThresholds"? "ClassificationAnalysis"? Surely there's something better?

I agree "Curve" is not descriptive and is a bit misleading. I think it would be useful to have "Ranking" in the name. So I'd lean towards "Ranking" for simplicity, or else "ConfusionMatrixRanking" or "PredictionRanking" for something more specific.