yashkant

followers

following

stars

University of Toronto

Toronto, Ontario

yashkant.github.io

Organizations

ArIESIITRoorkee

batra-mlp-lab

counselling-cell-iitr

Yash Kant's starred repositories

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonApache-2.023932 191 3760

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.020243 176 353

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonApache-2.014926 103 948

colmap

COLMAP - Structure-from-Motion and Multi-View Stereo

Language:C++NOASSERTION7110 172 1939

lora

Using Low-rank adaptation to quickly fine-tune diffusion models.

Language:Jupyter NotebookApache-2.06780 59 137

mergekit

Tools for merging pretrained large language models.

Language:PythonLGPL-3.04036 47 249

sd-forge-layerdiffuse

[WIP] Layer Diffusion for WebUI (via Forge)

Language:PythonApache-2.03642 35 90

VGen

Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models

Language:Python2749 29 117

DynamiCrafter

[ECCV 2024] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Language:PythonApache-2.02059 25 100

DPT

Dense Prediction Transformers

Language:PythonMIT1901 42 80

Latte

Latte: Latent Diffusion Transformer for Video Generation.

Language:PythonApache-2.01443 28 81

InstaFlow

:zap: InstaFlow! One-Step Stable Diffusion with Rectified Flow (ICLR 2024)

Language:PythonMIT1063 44 26

rcg

PyTorch implementation of RCG https://arxiv.org/abs/2312.03701

Language:PythonMIT772 7 33

momask-codes

Official implementation of "MoMask: Generative Masked Modeling of 3D Human Motions (CVPR2024)"

Language:PythonMIT691 28 54

ziplora-pytorch

Implementation of "ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs"

Language:PythonMIT476 11 19

RayDiffusion

Code for "Cameras as Rays"

Language:PythonMIT454 12 22

Panda-70M

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Language:Python438 11 41

conceptual-12m

Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.

NOASSERTION334 13 6

bgpt

Beyond Language Models: Byte Models are Digital World Simulators

Language:PythonMIT294 4 1

Dataset

News: the 7k dataset is ready for download.

Language:HTMLNOASSERTION252 13 22

T2I-CompBench

[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Language:PythonMIT159 2 18

NaViT

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

Language:PythonMIT143 7 3

spad

Code for SPAD : Spatially Aware Multiview Diffusers, CVPR 2024

Language:Python110 9 8

calico

code for: Calibration of Asynchronous Camera Networks: CALICO

Language:C++MIT74 7 4

geneval

GenEval: An object-focused framework for evaluating text-to-image alignment

Language:HTMLMIT50 1 5

housekeep

Official code for the paper "Housekeep: Tidying Virtual Households using Commonsense Reasoning" published at ECCV, 2022

Language:PythonMIT45 6 6

FusionVision

Official implementation of the paper " FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything "

Language:Jupyter NotebookUnlicense29 2 2

mats

Language:Python20 10

FineDiffusion

Language:PythonNOASSERTION17 1 1

perspective-enhanced-diffusion

Enhancing Diffusion Models with 3D Perspective Geometry Constraints (SIGGRAPH Asia 2023)

Language:Jupyter NotebookMIT200