**IMPORTANT: On Contributing to HyperLearn + Note to Contributors

Question

**IMPORTANT: On Contributing to HyperLearn + Note to Contributors

danielhanchen opened this issue 6 years ago · comments

Hey Contributor!

Thanks for checking out HyperLearn!! Super appreciate it.

Since the package is new (only started like August 27th)..., Issues are the best place to start helping out, and or check out the Projects tab. There's a whole list of stuff I envisioned to complete.

Also, if you have a NEW idea: please post an issue and label it new enhancement.

In terms of priorities, I wanted to start from the bottom up, as to make all functions faster; that means:

Since Singular Value Decomp is the backbone for nearly all Linear algos (PCA, Linear, Ridge reg, LDA, QDA, LSI, Partial LS, etc etc...), we need to focus on making SVD faster! (Also Linear solvers).
When SVD optimization is OK, then slowly creep into Least Squares / L1 solvers. These need to be done before other algorithms in order for speedups to be apparent.
If NUMBA code is used, it needs to be PRE-COMPILED in order to save time, or else we need to wait a whopping 2-3 seconds before each call...
Then, focus on Linear Algorithms, including but not limited to:

Linear Regression
Ridge Regression
SVD Solving, QR Solving, Cholesky Solving for backend
Linear Discriminant Analysis
Quadratic Discriminant Analysis
Partial SVD (Truncated SVD --> maybe used Facebook's PCA? / Gensim's LSI? --> try not to use ARPACK's svds...)
Full Principal Component Analysis (complete SVD decomp)
Partial PCA (Truncated SVD)
Canonical Correlation Analysis
Partial Least Squares
Spline Regression based on Least Squares (will talk more later on this)
Correlation Regression (will talk more later on this)
Outlier Tolerant Regression (will talk more later on this)
SGDRegressor / SGDCLassifier easily through PyTorch
Batch Least Squares? --> Closed Form soln + averaging
Others...

boscs · Answer 1 · Sat Sep 01 2018 16:54:47 GMT+0800 (China Standard Time)

Hey, I'm new here, I'll try to make a good "full PCA" and do a pull request :)

Daniel Han · Answer 2 · Sat Sep 01 2018 20:22:46 GMT+0800 (China Standard Time)

Hey @boscs !! Welcome!
On the note on PCA, great!!!
I didn't actually write any code for the past few days, as I was busily comparing algorithms!!!

In fact, after a tiresome 3 days, I finally can showcase my findings in a moment!

Anyways welcome!

Aditya Gudimella · Answer 3 · Sat Sep 08 2018 13:58:10 GMT+0800 (China Standard Time)

Hey. I'd love to help out too. I'll look into the code over the weekend and see what's the best place for me to start at, and get back to you on Monday.
As a side note, have you looked at ray when considering parellelization? I've been working with it, and it is quite good.

Arman Naseri · Answer 4 · Thu Oct 18 2018 19:18:14 GMT+0800 (China Standard Time)

I'd also like to contribute! Such a project is new to me, but I think it's valuable experience.

Daniel Han · Answer 5 · Fri Oct 26 2018 17:23:09 GMT+0800 (China Standard Time)

@AdityaGudimella @Armannas Welcome guys! Sorry for the delay in replying. I'm glad HyperLearn has more contributors! In terms of what you guys can do, I am currently busily writing my mini book listing all my findings. So I am postponing continuous dev until November 15.

@AdityaGudimella Ray is cool! I have many requests to make me use Ray. It's good, but firstly i won't use Ray, since it is not Windows supportable, and my goal is to make HyperLearn universally avalaible (50%+ use Windows). In terms of Parallelization, I am using Numba --> so already super good :)