Satheesh K's starred repositories
Plastic-Bottles-Dataset
A dataset of 5,592 plastic bottles swimming in rivers and some attempts to build a model on that.
trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
semantic-kernel
Integrate cutting-edge LLM technology quickly and easily into your apps
TransformerPrograms
[NeurIPS 2023] Learning Transformer Programs
chatnoir-resiliparse
A robust web archive analytics toolkit
NLP-progress
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
kgi-slot-filling
This is the code for our KILT leaderboard submissions (KGI + Re2G models).
Mr.-Ranedeer-AI-Tutor
A GPT-4 AI Tutor Prompt for customizable personalized learning experiences.
RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
chat-langchain
Quarto version of chat-langchain
haystack
:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
empirical-philosophy
A collection of empirical experiments using large language models and other neural network architectures to test the usefulness of metaphysical constructs.
HDC_TUBerlin_version_1
This is the submission of the TU Berlin Team to the Helsinki Deblur Challenge 2021.
unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
llama_index
LlamaIndex is a data framework for your LLM applications
tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
gen-invoice
Template-based invoice generator