Thomas Schranz's starred repositories
metal-flash-attention
Faster alternative to Metal Performance Shaders
tex-oberon
Make Project Oberon Pretty Again
llama-terminal-completion
A Python application which interacts with the llama.cpp library to provide virtual assistant capabilities through the command line. It allows you to ask questions and receive intelligent responses, as well as generate Linux commands based on your prompts.
selfextend
an implementation of Self-Extend, to expand the context window via grouped attention
LLM-SLERP-Merge
Spherical Merge Pytorch/HF format Language Models with minimal feature loss.
QuIP-for-all
QuIP quantization