Raphael Jin (Raphael-Jin)

Raphael-Jin

Geek Repo

Company:University of Southern California

Location:Sunnyvale

Github PK Tool:Github PK Tool

Raphael Jin's starred repositories

ollama

Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models.

horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Language:PythonLicense:NOASSERTIONStargazers:14169Issues:334Issues:2241

flannel

flannel is a network fabric for containers, designed for Kubernetes

Language:GoLicense:Apache-2.0Stargazers:8729Issues:221Issues:1112

dropwizard

A damn simple library for building production-ready RESTful web services.

Language:JavaLicense:Apache-2.0Stargazers:8498Issues:372Issues:1575

skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

Language:PythonLicense:Apache-2.0Stargazers:6557Issues:71Issues:1718

karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration

Language:GoLicense:Apache-2.0Stargazers:4401Issues:72Issues:1648

x-deeplearning

An industrial deep learning framework for high-dimension sparse data

Language:PureBasicLicense:Apache-2.0Stargazers:4252Issues:231Issues:346

fun-rec

推荐系统入门教程,在线阅读地址:https://datawhalechina.github.io/fun-rec/

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:4179Issues:35Issues:56

virtual-kubelet

Virtual Kubelet is an open source Kubernetes kubelet implementation.

Language:GoLicense:Apache-2.0Stargazers:4179Issues:104Issues:376

volcano

A Cloud Native Batch System (Project under CNCF)

Language:GoLicense:Apache-2.0Stargazers:4084Issues:89Issues:1544

pyinfra

pyinfra turns Python code into shell commands and runs them on your servers. Execute ad-hoc commands and write declarative operations. Target SSH servers, local machine and Docker containers. Fast and scales from one server to thousands.

Language:PythonLicense:MITStargazers:3852Issues:37Issues:766

Alink

Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.

Language:JavaLicense:Apache-2.0Stargazers:3566Issues:138Issues:212

kserve

Standardized Serverless ML Inference Platform on Kubernetes

Language:PythonLicense:Apache-2.0Stargazers:3476Issues:63Issues:1826

submariner

Networking component for interconnecting Pods and Services across Kubernetes clusters.

Language:GoLicense:Apache-2.0Stargazers:2406Issues:57Issues:732

AI-RecommenderSystem

该仓库尝试整理推荐系统领域的一些经典算法模型

Language:Jupyter NotebookStargazers:1678Issues:14Issues:10

fluid

Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)

Language:GoLicense:Apache-2.0Stargazers:1633Issues:31Issues:1112

training-operator

Distributed ML Training and Fine-Tuning on Kubernetes

Language:GoLicense:Apache-2.0Stargazers:1571Issues:84Issues:974

RecSysPapers

推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.

Language:PythonLicense:BSD-2-ClauseStargazers:1235Issues:51Issues:0

NVTabular

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

Language:PythonLicense:Apache-2.0Stargazers:1037Issues:34Issues:785

kubeadmiral

Multi-Cluster Kubernetes Orchestration

Language:GoLicense:Apache-2.0Stargazers:790Issues:19Issues:35

godel-scheduler

a unified scheduler for online and offline tasks

Language:GoLicense:Apache-2.0Stargazers:430Issues:12Issues:18

embedx

embedx 是基于 c++ 开发的、完全自研的分布式 embedding 训练和推理框架。它目前支持 图模型、深度排序、召回模型和图与排序、图与召回的联合训练模型等

Language:C++License:NOASSERTIONStargazers:297Issues:14Issues:6

hai-platform

一种任务级GPU算力分时调度的高性能深度学习训练平台

Language:PythonLicense:LGPL-3.0Stargazers:293Issues:8Issues:15

gpu-feature-discovery

GPU plugin to the node feature discovery for Kubernetes

Language:GoLicense:Apache-2.0Stargazers:287Issues:13Issues:21

EasyParallelLibrary

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Language:PythonLicense:Apache-2.0Stargazers:261Issues:13Issues:10
Language:PythonLicense:Apache-2.0Stargazers:205Issues:9Issues:6
Language:JavaLicense:Apache-2.0Stargazers:195Issues:5Issues:7

GeoMX

GeoMX: A fast and unified system for distributed machine learning over geo-distributed data centers.

Language:C++License:Apache-2.0Stargazers:115Issues:3Issues:1

et-operator

Kubernetes Operator for AI and Bigdata Elastic Training