MaveriQ

Haris Jabbar's repositories

minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Language:PythonNOASSERTION100

agency-jekyll-theme

Agency Theme for Jekyll

Language:JavaScriptApache-2.0000

amber-data-prep

Data preparation code for Amber 7B LLM

Language:Python000

goemotions

Language:Jupyter Notebook000

MicroLlama

This is a 300M MicroLlama version of TinyLlama

Language:PythonApache-2.0000

benchmark

Language:Python000

creative-jekyll-theme

Apache-2.0000

dolma

Data and tools for generating and inspecting OLMo pre-training data.

Apache-2.0000

flota

Language:PythonMIT000

jekyll-theme-neumorphism

Neumorphism designed Jekyll theme for personal websites, portfolios and resumes.

MIT000

langchain-chatbot-demo

Examples of chatbot implementations with Langchain and Streamlit

000

linggpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

Language:PythonApache-2.0000

LLaMA-Efficient-Tuning

Easy-to-use LLM fine-tuning framework (LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, ChatGLM2)

Apache-2.0000

lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

Language:PythonMIT000

minbpe_spark_gcp

Implementing MinBPE training on GCP DataProc (serverless spark on GCP)

Language:PythonMIT000

MobiLlama

MobiLlama : Small Language Model tailored for edge devices

Apache-2.0000

OLMo

Modeling, training, eval, and inference code for OLMo

Language:PythonApache-2.0000

pandas-ai

PandasAI is the Python library that integrates Gen AI into pandas, making data analysis conversational

MIT000

paralegal

Streamit app with langchain and huggingface

Language:Python000

promptbench

A unified evaluation framework for large language models

Language:PythonMIT000

promptsource

Toolkit for creating, sharing and using natural language prompts.

Language:PythonApache-2.0000

python-package-template

A template repo for Python packages from AllenAI

Apache-2.0000

sentence-transformers

Multilingual Sentence & Image Embeddings with BERT

Language:PythonApache-2.0000

spacyface

Align the token outputs from Spacy and Huggingface to help understand what language structures transformers see

Language:PythonApache-2.0000

sql-eval

Evaluate the accuracy of LLM generated outputs

Apache-2.0000

Streamlit-Authenticator

A secure authentication module to validate user credentials in a Streamlit application.

Apache-2.0000

tiktokenizer

Online playground for OpenAPI tokenizers

MIT000

useb

Heterogenous, Task- and Domain-Specific Benchmark for Unsupervised Sentence Embeddings used in the TSDAE paper: https://arxiv.org/abs/2104.06979.

Language:PythonApache-2.0000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

NOASSERTION000