Mr-Nineteen

Repository hosting code used to reproduce results in "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).

Language:PythonApache-2.0000

grok-1

Grok open release

Language:PythonApache-2.0000

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of Merlin-KV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.

Language:CudaApache-2.0000

how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Language:Cuda000

iTransformer

Official implementation for "iTransformer: Inverted Transformers Are Effective for Time Series Forecasting" (ICLR 2024 Spotlight), https://openreview.net/forum?id=JePfAI8fah

Language:PythonMIT000

llama

Inference code for LLaMA models

Language:PythonNOASSERTION000

LLaMA-Efficient-Tuning

Fine-tuning LLaMA with PEFT (PT+SFT+RLHF with QLoRA)

Language:PythonApache-2.0000

llama-recipes

Examples and recipes for Llama 2 model

Language:PythonNOASSERTION000

llm-action

本项目旨在分享大模型相关技术原理以及实战经验。

Language:PythonApache-2.0000

nvcomp

Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

Language:C++NOASSERTION000

onnx

Open standard for machine learning interoperability

Language:PythonApache-2.0000

onnx-simplifier

Simplify your onnx model

Language:C++Apache-2.0000

onnxruntime-inference-examples

Examples for using ONNX Runtime for machine learning inferencing.

Language:PythonMIT000

tensorrtx

Implementation of popular deep learning networks with TensorRT network definition API

Language:C++MIT000

Time-Series-Library

A Library for Advanced Deep Time Series Models.

Language:ShellMIT000

tmp

Language:Python010

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.0000

transformers-stream-generator

This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/Transformers.

Language:PythonMIT000

Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

Language:Jupyter NotebookMIT000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000