Beast code in Giters

Anish Acharya's repositories

NLP-CS388-UT

Common Libraries developed in "PyTorch" for different NLP tasks. Sentiment Analysis, NER, LSTM-CRF, CRF, Semantic Parsing

Language:PythonMIT7 40

DBMS-From-Scratch

Implementation of a DBMS working system ground Up

Language:C++MIT5 20

Bandits-Online-Learning

Simple Implementations of Bandit Algorithms in python

Language:Jupyter NotebookMIT4 30

Geometric median (GM) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0.5. However, its computational complexity makes it infeasible for robustifying stochastic gradient descent (SGD) for high-dimensional optimization problems. In this paper, we show that by applying Gm to only a judiciously chosen block of coordinates at a time and using a memory mechanism, one can retain the breakdown point of 0.5 for smooth non-convex problems, with non-asymptotic convergence rates comparable to the SGD with GM.

Language:PythonMIT4 30

Online-Embedding-Compression-AAAI-2019

Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these models in production system poses significant memory constraints. Existing compression methods are either lossy or introduce significant latency. We propose a compression method that leverages low rank matrix factorization during training, to compress the word embedding layer which represents the size bottleneck for most NLP models. Our models are trained, compressed and then further re-trained on the downstream task to recover accuracy while maintaining the reduced size. Empirically, we show that the proposed method can achieve 90% compression with minimal impact in accuracy for sentence classification tasks, and outperforms alternative methods like fixed-point quantization or offline word embedding compression. We also analyze the inference time and storage space for our method through FLOP calculations, showing that we can compress DNN models by a configurable ratio and regain accuracy loss without introducing additional latency compared to fixed point quantization. Finally, we introduce a novel learning rate schedule, the Cyclically Annealed Learning Rate (CALR), which we empirically demonstrate to outperform other popular adaptive learning rate algorithms on a sentence classification benchmark.

Language:Python4 20

DeLiCoCo-IEEE-Transactions

In compressed decentralized optimization settings, there are benefits to having multiple gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e.g. by means of reducing the precision of compressed information.

Language:PythonMIT3 3 1

Contrastive-Positive-Unlabled-PU-Learning

Language:Jupyter NotebookMIT2 10

Expectation-Maximization

Package in Matlab for generating Synthatic Data using GMM and EM Clustering on that

Language:MATLABMIT2 20

Image-Segmentation-fractional-filters

Software Related to CVPR 2015 Paper on Image Segmentation Using TRW Belief Propagation based Learning

Language:MATLAB2 30

Cracking-Coding-Interviews

Contains Basic Utility Functions and Common DS and Algo Questions

Language:C++MIT1 40

double-descent

We investigate double descent more deeply and try to precisely characterize the phenomenon under different settings. Specifically, we focus on the impact of label noise and regularization on double descent. None of the existing works consider these aspects in detail and we hypothesize that these play an integral role in double descent.

MIT1 30

Optimization-Mavericks

This repository provides a unified framework to perform Optimization experiments across Stochastic, Mini-Batch, Decentralized and Federated Setting.

Language:RoffMIT1 20

Search-Engine-From-Scratch

implementation of a search engine. Implements a complete search engine for the ICS domain.It also develops a[web] interface that provides the user with a text box to enter queries and returns relevant results

Language:JavaMIT1 20