pageboy-za / data-science-synonyms

Help wading through the jargon

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Because many data analysis methods developed relatively independently in statistics, machine learning, biology, physics, chemistry, economics, psychology, engineering fields, and more, there is a lot of overlapping terminology.

This is an an attempt to sort it out.

###NOTES:

  • When terms are "near synonyms," it seems like it is v. educational to describe the difference.
  • It could be interesting to scrape departmental course web pages to see what type of terminology is used in different disciplines.
  • Maybe instead of an "all-encompassing" thesaurus, could write blog posts that talk about specific interesting cases.
  • Writing specific blog posts might be a good strategy even if there is a central tool.
  • [LRS] Have been envisioning very similar things for a while. See bottom for more and sketch.

I don't know the best way to structure this, for now I'm just going to jot some notes here.

  • False Positive (ML) = Type 1 Error (Statistics)
  • False Negative (ML) = Type 2 Error (Statistics)
  • Recall (Information Science) = Sensitivity (Medicine/Biology) = True Positive Rate (ML)
  • Precision (Information Science) = Positive Predictive Value (ML)
  • True Negative Rate (ML) = Specificity (Medicine/Biology)
  • Gaussian Distribution (Physics) = Normal Distribution (Statistics) = Bell Curve (Colloquial)
  • Hypothesis (Statistics) = Model (ML)
  • Marginal Likelihood (Frequentist) = Evidence (Bayesian)

In the above, Precision is not the same as the contrast between Accuracy and Precision in measurement:

  • Precision (Engineering) = Variance (Statitics) = Reliability (Psychometrics) = Variable Error
  • Accuracy (Engineering) = Bias (Statistics) = Validity (Psychometrics) = Constant Error

Lots of good ones for stochastics optimization problems!

  • Rough Landscape (Stochastic Optimization) = NP Complete (Computer Science)
  • Cost Function (Economics) = Loss Function (Statistics) = Utility (Economics) = Objective Function (Operations Management) = Reward Function () = Energy (Physics/Chemistry) = Fitness (Evolutionary Biology)
  • Search Space () = Fitness Landscape (Biology)
  • Reinforcement Learning (ML) = Approximate Dynamic Programming (Operations Management)

####These are not exactly equivalent, but are extremely closely related. NOTE: How could we illustrate that?

  • Structural Equation Model (Statistics)

  • Latent Variable Model () = Hidden Variable Model (Physics) = Bayesian Network (ML)

  • Independent Variables (Statistics) = Regressors (Statistics) = Explanatory Variables (Statistics) = Exogenous Variables (stats/statsmodels) = design (statistics) = Exposure Variable (Reliability Engineering) = Risk Factor (Medicine) = Feature (ML) = Input Variable (Engineering)

  • Dependent Variable (Statistics) = Response Variable (Statistics) = Regressand (Statistics) = Endogenous Variable (stats/statsmodels) = Outcome Variable (Medicine) = Output Variable (Engineering)

  • Hidden Variable () = Latent Variable () = Confounding Variable () --- seems like it's confounding when it's not there on purpose :)

  • Categorical Variable (Statistics) = Enumerated Type (Computer Science)

    • Binary Variable () = Dichotomous Variable ()

chart: Latent Variable Model nice chart on Wikipedia is a superset of all these things:

  • Factor Analysis ~= Principal Component Analysis --- seems like the difference between Factor Analysis and PCA is the assumptions made
  • Latent Trait Analysis = Item Response Theory
  • Latent Profile Analysis = Mixture Model
  • Latent Class Analysis

Mexican Hats

  • Difference of Gaussians = Mexican Hat function (really old, awesome American physicists)
  • Ricker Wavelet = Mexican hat wavelet/function (anyone else who says Mexican Hat function)
    But seriously, people. Let's not bicker over slivers of differences. We aren't launching spacecraft, here.

credit: JustinWick - Originally uploaded to en.wikipedia. Generated using Mathematica 5.0.


###[LRS note] -- This is my related but ginormo-scale thing that's been rolling around in my head for a while. Since I am visual/spatial, I was imagining more of a clicky thing where you could wander around. Maybe feed it some scraped & analyzed stuff, but allow plenty of expert refining, etc. My critical part is that you could add a link between things, and set (or vote on?) the strength or weakness of the links. Links would have a type (or strength) and some text, so you could have a place to explain what the differences are.

Here's my doodle:

About

Help wading through the jargon

License:MIT License