Ajit Rajasekharan's repositories
unsupervised_NER
Self-supervised NER prototype - updated version (69 entity types - 17 broad entity groups). Uses pretrained BERT models with no fine tuning. State-of-art performance on 3 biomedical datasets
bert_vector_clustering
Clustering learned BERT vectors for downstream tasks like unsupervised NER, unsupervised sentence embeddings etc.
codebook_comparisons
Comparison of codebook vectors of autoencoders (DALLE's dVAE vs VQGAN) that map any image to a fixed vocabulary of vectors
JPTDP_wrapper
A http interface wrapper around Dat Quoc Nguyen's Joint POS tagging and Dependency parser.
multi_gpu_test
Scripts to set up an nvidia GPU machine (ubuntu)
ner_bio_phi_for_phrases
This is a tweaked version of self-supervised NER for tagging phrases
simple_sbd
Breaks down paragraph into sentences on period char taking into account not breaking on period in numeric sequences and abbreviations
huggingface_finetune_wrapper
Simple wrapper to fine tune and test a BERT model for sentence classificaition
image_text_redaction
Prototype for image text detection, recognition, and redaction. The models used can detect horizontal print and handwritten text. It cannot detected slanted /curved text etc.
ajitrajasekharan.github.io
This is a log of what I learn and work I have done that yielded usable results
bert_descriptors
BERT's MLM head model exposed as a service
cls_for_ood_detection
For supervised text classification tasks, use of CLS to represent sentence to detect OOD inputs relative to training set. Sentence representations are harvested from a self-supervised model (e.g. BERT)
lapos_server
An existing C++ CRF based POS tagger exposed as a service (suitable for fast POS tagging at scale)
simple_tense_detector
This is a simple present/past tense detector of a sentence using DEP-POS tagger