tomaarsen

followers

0

following

stars

Hugging Face

Netherlands

https://tomaarsen.com

Organizations

embeddings-benchmark

Hugging-Face-Helping-Hand

huggingface

nltk

Tom Aarsen's repositories

attention_sinks

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining

Language:PythonApache-2.0654 12 30

SpanMarkerNER

SpanMarker for Named Entity Recognition

Language:Jupyter NotebookApache-2.0380 9 42

AnglE

Angle-optimized Text Embeddings | 🔥 SOTA on STS and MTEB Leaderboard

Language:PythonMIT2 10

ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, ModelScope, etc.

Language:PythonApache-2.02 10

AIR-Bench

AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark

Language:PythonMIT100

bm25s

BM25S is an ultra-fast lexical search library that implements BM25 using scipy

Language:PythonMIT100

canopy

Retrieval Augmented Generation (RAG) framework and context engine powered by Pinecone

Language:PythonApache-2.01 10

ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)

Language:PythonMIT1 10

deep-learning-pytorch-huggingface

Language:Jupyter NotebookMIT100

EMO

[ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)

Language:Python1 10

setfit

Efficient few-shot learning with Sentence Transformers

Language:Jupyter NotebookApache-2.01 10

accelerate

🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision

Language:PythonApache-2.0010

api-inference-community

Language:PythonApache-2.0010

blog

Public repo for HF blog posts

Language:Jupyter Notebook010

datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Language:PythonApache-2.0000

dspy

DSPy: The framework for programming—not prompting—foundation models

Language:PythonMIT010

GLiNER

Generalist model for NER (Extract any entity types from texts)

Language:PythonApache-2.0010

Hotel-ID-2022

7th place entry to the Hotel-ID 2022 Kaggle challenge

Language:Python010

huggingface.js

Utilities to use the Hugging Face hub API

Language:TypeScriptMIT000

huggingface_hub

All the open source things related to the Hugging Face Hub.

Language:PythonApache-2.0010

langchain

⚡ Building applications with LLMs through composability ⚡

Language:PythonMIT010

llama_index

LlamaIndex (GPT Index) is a data framework for your LLM applications

Language:PythonMIT010

optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

Language:PythonApache-2.0010

postgresml

The GPU-powered AI application database. Get your app to market faster using the simplicity of SQL and the latest NLP, ML + LLM models.

Language:RustMIT000

sentence-transformers

Multilingual Sentence & Image Embeddings with BERT

Language:PythonApache-2.0010

stsb-multi-mt

Machine translated multilingual STS benchmark dataset.

Language:PythonNOASSERTION010

tomaarsen.com-backend

Backend for www.tomaarsen.com

Language:PythonMIT020

tomaarsen.com-frontend

Frontend for www.tomaarsen.com

Language:HTMLMIT020

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.0010

Verba

Retrieval Augmented Generation (RAG) chatbot powered by Weaviate

Language:PythonBSD-3-Clause000