Jan's repositories
cortex.cpp
Run and customize Local LLMs.
cortex.tensorrt-llm
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
cortex.llamacpp
cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server at runtime.
cortex.python
C++ code that run Python embedding
infinity
The AI-native database built for LLM applications, providing incredibly fast vector and full-text search
llama.cpp-avx-vnni
Port of Facebook's LLaMA model in C/C++
openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
pymaker
Make the py
tensorrtllm_backend
The Triton TensorRT-LLM Backend
trt-llm-as-openai-windows
This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows instead of cloud.
winget-pkgs
The Microsoft community Windows Package Manager manifest repository