xKDR / CRRao.jl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Design for Linear Discriminant Analysis (LDA)

sourish-cmi opened this issue · comments

I am thinking about how we should do the Linear Discriminant Analysis (LDA) in CRRao. I am thinking out loud. Please correct me if I am saying something wrong. The design that I am thinking of is as follows:

container = @fitmodel(formula, data, modelClass,ClassificationType,CovarianceType)

Example: For binary classification:

container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Binary)
container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Binary,ShrinkageCov)
container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Binary,PythonCov)

Example: For multi-class classification:

container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Multi)
container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Multi,ShrinkageCov)
container = @fitmodel(Specied~PetalLength+SepalLength,data,LinearDiscriminantAnalysis(),Multi,PythonCov)

The default covariance type would be sample covariance.

What's PythonCov?

What's the state of Julia packages for robust covariance matrices? Should we let the user pass a function as an argument?

  • By PythonCov, I meant Python's covariance estimation process.
  • To my understanding, Julia is using simple covariance matrices.
  • Regarding your question: Should we let the user pass a function as an argument? - I am giving CovarianceType as one of the options. There may be a default setup.

I was thinking that there will be a general technology for covariance matrix estimation: simple, multiple robust methods, maybe something that works for spare matrices in big data, etc. So there should be a default (simple) but the caller should be able to supply a function that computes the covariance matrix. Or alternatively maybe there will be a function compute.cov(X, method), the caller to LDA should be able to supply the method.

Humm -- I like both ideas.

Idea 1) covariance matrix estimation: simple, multiple robust methods, shrinkage estimation methods etc.

Idea 2) a function compute.cov(X, method), the caller to LDA should be able to supply the method.

I like the second idea with a default robust method of R.

@ajaynshah @ayushpatnaikgit @codetalker7 @ShouvikGhosh2048

Struggling to decide - should we use MultivariateStat.jl for LDA. AND/OR should we use Aman's Julia code from scratch for LDA

Ayush's point if we rely on too many packages - then some people will never able to use CRRao because some package will be broken

On the other hand - why bother we are going to rely on lazy load in any way...

Requesting your comment -- now I want to move to LDA development for CRRao

For now, I am thinking about developing the LDA with Aman's code which is faster than R and Python sklearn but slower than MultivariateStat.jl

Once MultivariateStat.jl becomes stable - we can later adapt the LDA of MultivariateStat.jl as a fast option.

@ajaynshah @ayushpatnaikgit

We will raise an issue with MultivariateStat.jl that predict is not working. If they provide a solution then we would go ahead and take it in CRRao.jl

Otherwise, we will contribute in MultivariateStat.jl.

Yes, great, let's put all our knowledge on LDA to work to make MultivariateStat.jl stronger. And then in CRRao we will just call that LDA. Let's do the usual hard work:

  • Textual narration of what's wrong with the existing code
  • test cases which demo that
  • PR that solves this.

so that it gets rapidly accepted into the main package.

I have created this issue with MultivariateStats.jl

JuliaStats/MultivariateStats.jl#204

Basically I said the predict for MulticlassLDA is not working.