EmbeddedLLM

EmbeddedLLM

Organization data from Github https://github.com/EmbeddedLLM

EmbeddedLLM is the creator behind JamAI Base, a platform designed to orchestrate AI with spreadsheet-like simplicity.

Location:Singapore

Home Page:https://embeddedllm.com

GitHub:@EmbeddedLLM

Twitter:@EmbeddedLLM

EmbeddedLLM's repositories

JamAIBase

The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.

Language:PythonLicense:Apache-2.0Stargazers:1076Issues:5Issues:9

vllm

vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:93Issues:2Issues:56

SageAttention-rocm

ROCm Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Language:CudaLicense:Apache-2.0Stargazers:4Issues:0Issues:0

vllmtests

This is a repository containing the tools for testing vLLM correctness and perf regression

Language:PythonLicense:Apache-2.0Stargazers:3Issues:3Issues:0

vllmWorkshop

vLLM Workshop Content

License:Apache-2.0Stargazers:2Issues:3Issues:0

flash-attention-docker

This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.

Language:ShellLicense:Apache-2.0Stargazers:1Issues:2Issues:0

kvpress

LLM KV cache compression made easy

Language:PythonLicense:Apache-2.0Stargazers:1Issues:0Issues:0

vllm-rocmfork

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:1Issues:0Issues:0

jamaibase-ts-docs

Typescript Documentation of JamAISDK

Language:HTMLStargazers:0Issues:2Issues:0

aiter

AI Tensor Engine for ROCm

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

aiter-api-watcher

This is a repository to monitor the fast changing ROCm/aiter repository to alert user that AITER function of interests e.g. in vLLM, in SGLang has been updated at certain commit.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:81
Stargazers:0Issues:0Issues:0

axolotl-amd

Go ahead and axolotl questions

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0

etalon

LLM Serving Performance Evaluation Harness

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:2Issues:0

infinity-executable

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Liger-Kernel

Efficient Triton Kernels for LLM Training

Language:PythonLicense:BSD-2-ClauseStargazers:0Issues:0Issues:0

litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

LLM_Sizing_Guide

A calculator to estimate the memory footprint, capacity, and latency on NVIDIA AMD Intel

Language:PythonStargazers:0Issues:0Issues:0

LMCache

ROCm support of Ultra-Fast and Cheaper Long-Context LLM Inference

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:1
Language:PythonStargazers:0Issues:0Issues:0

lmcache-vllm

The driver for LMCache core to run in vLLM

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

roxl

NVIDIA Inference Xfer Library (NIXL)

License:Apache-2.0Stargazers:0Issues:0Issues:0

skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

Star-Attention

Efficient LLM Inference over Long Sequences

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

torchac_rocm

ROCm Implementation of torchac_cuda from LMCache

Language:CudaStargazers:0Issues:0Issues:0