Michalis Papadimitriou's repositories
llama-shepherd-cli
A CLI to manage install and configure llama inference implemenation in multiple languages
llama2.tornadovm.java
An extension to Llama2.java implementation accelerated with GPUs, using TornadoVM
collage-non-tvm-fork
Collage non-forked version for POC
commitgpt
Automatically generate commit messages using ChatGPT
fzf
:cherry_blossom: A command-line fuzzy finder
gpt-engineer
Specify what you want it to build, the AI asks for clarification, and then builds it.
java
Java bindings for TensorFlow
Jlama
Jlama is a pure Java implementation of a LLM inference engine.
jvm_allocation_ref
A toy application comparing primitive array allocation on heap with Panama off-heap memory segment allocation
kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
llama2.c
Inference Llama 2 in one file of pure C
llama2.java
Inference Llama 2 in one file of pure Java
llama3.java
Practical Llama 3 inference in Java
llamafile
Distribute and run LLMs with a single file.
llm-apps-java-spring-ai
Samples showing how to build Java applications powered by Generative AI and LLMs using Spring AI and Spring Boot.
mikepapadim
Github profile custom
rjvm
A tiny JVM written in Rust. Learning project
sd4j
Stable diffusion pipeline in Java using ONNX Runtime
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
TornadoVM
Tornado: A practical and efficient heterogeneous programming framework for managed languages
tutorials
Tutorials for creating and using ONNX models
wasmtime
A fast and secure runtime for WebAssembly