unslothai / hyperlearn

2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.

Home Page:https://hyperlearn.readthedocs.io/en/latest/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

**IMPORTANT: On Contributing to HyperLearn + Note to Contributors

danielhanchen opened this issue · comments

Hey Contributor!

Thanks for checking out HyperLearn!! Super appreciate it.

Since the package is new (only started like August 27th)..., Issues are the best place to start helping out, and or check out the Projects tab. There's a whole list of stuff I envisioned to complete.

Also, if you have a NEW idea: please post an issue and label it new enhancement.

In terms of priorities, I wanted to start from the bottom up, as to make all functions faster; that means:

  1. Since Singular Value Decomp is the backbone for nearly all Linear algos (PCA, Linear, Ridge reg, LDA, QDA, LSI, Partial LS, etc etc...), we need to focus on making SVD faster! (Also Linear solvers).

  2. When SVD optimization is OK, then slowly creep into Least Squares / L1 solvers. These need to be done before other algorithms in order for speedups to be apparent.

  3. If NUMBA code is used, it needs to be PRE-COMPILED in order to save time, or else we need to wait a whopping 2-3 seconds before each call...

  4. Then, focus on Linear Algorithms, including but not limited to:

  • Linear Regression
  • Ridge Regression
  • SVD Solving, QR Solving, Cholesky Solving for backend
  • Linear Discriminant Analysis
  • Quadratic Discriminant Analysis
  • Partial SVD (Truncated SVD --> maybe used Facebook's PCA? / Gensim's LSI? --> try not to use ARPACK's svds...)
  • Full Principal Component Analysis (complete SVD decomp)
  • Partial PCA (Truncated SVD)
  • Canonical Correlation Analysis
  • Partial Least Squares
  • Spline Regression based on Least Squares (will talk more later on this)
  • Correlation Regression (will talk more later on this)
  • Outlier Tolerant Regression (will talk more later on this)
  • SGDRegressor / SGDCLassifier easily through PyTorch
  • Batch Least Squares? --> Closed Form soln + averaging
  • Others...
commented

Hey, I'm new here, I'll try to make a good "full PCA" and do a pull request :)

Hey @boscs !! Welcome!
On the note on PCA, great!!!
I didn't actually write any code for the past few days, as I was busily comparing algorithms!!!

In fact, after a tiresome 3 days, I finally can showcase my findings in a moment!

Anyways welcome!

Hey. I'd love to help out too. I'll look into the code over the weekend and see what's the best place for me to start at, and get back to you on Monday.
As a side note, have you looked at ray when considering parellelization? I've been working with it, and it is quite good.

I'd also like to contribute! Such a project is new to me, but I think it's valuable experience.

@AdityaGudimella @Armannas Welcome guys! Sorry for the delay in replying. I'm glad HyperLearn has more contributors! In terms of what you guys can do, I am currently busily writing my mini book listing all my findings. So I am postponing continuous dev until November 15.

@AdityaGudimella Ray is cool! I have many requests to make me use Ray. It's good, but firstly i won't use Ray, since it is not Windows supportable, and my goal is to make HyperLearn universally avalaible (50%+ use Windows). In terms of Parallelization, I am using Numba --> so already super good :)