Ferenas

Ferenas

Geek Repo

Company:Shanghai Jiao Tong University

Github PK Tool:Github PK Tool

Ferenas's starred repositories

segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:11129Issues:64Issues:259

ImageBind

ImageBind One Embedding Space to Bind Them All

Language:PythonLicense:NOASSERTIONStargazers:8251Issues:99Issues:89

open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

mae

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Language:PythonLicense:NOASSERTIONStargazers:7203Issues:56Issues:191

AnyText

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

Language:PythonLicense:Apache-2.0Stargazers:4245Issues:52Issues:124

NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Language:PythonLicense:BSD-3-ClauseStargazers:3227Issues:57Issues:98

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonLicense:Apache-2.0Stargazers:1849Issues:27Issues:121

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Language:PythonLicense:MITStargazers:1767Issues:18Issues:80

ml-4m

4M: Massively Multimodal Masked Modeling

Language:PythonLicense:Apache-2.0Stargazers:1568Issues:30Issues:21

AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Language:C++License:Apache-2.0Stargazers:1375Issues:33Issues:166

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonLicense:MITStargazers:1217Issues:21Issues:55

Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Curve-Text-Detector

This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

Language:Jupyter NotebookStargazers:636Issues:31Issues:58

OneLLM

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language

Language:PythonLicense:NOASSERTIONStargazers:564Issues:11Issues:24
Language:PythonLicense:Apache-2.0Stargazers:561Issues:17Issues:16

awesome-large-audio-models

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

Anything2Image

Generate image from anything with ImageBind and Stable Diffusion

Language:Jupyter NotebookStargazers:192Issues:7Issues:14

ICV

Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering

Language:PythonLicense:MITStargazers:132Issues:6Issues:9

Seeing-and-Hearing

[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Language:PythonLicense:NOASSERTIONStargazers:115Issues:12Issues:10

EasyComDataset

The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the *cocktail party effect* from an augmented-reality (AR) -motivated multi-sensor egocentric world view.

VQ-VAE

Pytorch Implementation of "Neural Discrete Representation Learning"

Language:Jupyter NotebookStargazers:79Issues:2Issues:3

Localizing-Visual-Sounds-the-Hard-Way

Localizing Visual Sounds the Hard Way

Language:PythonLicense:Apache-2.0Stargazers:76Issues:6Issues:14
Language:PythonLicense:Apache-2.0Stargazers:58Issues:5Issues:11

E2STR

The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

Language:PythonLicense:Apache-2.0Stargazers:41Issues:6Issues:4

VL-ICL

Code for paper: VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning

Awesome-Autoregressive-Visual-Generation

This is a repo to track the latest autoregressive visual generation papers.