Ferenas

Ferenas's starred repositories

segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.011129 64 259

ImageBind

ImageBind One Embedding Space to Bind Them All

Language:PythonNOASSERTION8251 99 89

open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

Apache-2.07357 121 91

mae

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Language:PythonNOASSERTION7203 56 191

AnyText

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

Language:PythonApache-2.04245 52 124

NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Language:PythonBSD-3-Clause3227 57 98

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonApache-2.01849 27 121

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Language:PythonMIT1767 18 80

ml-4m

4M: Massively Multimodal Masked Modeling

Language:PythonApache-2.01568 30 21

AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Language:C++Apache-2.01375 33 166

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonMIT1217 21 55

Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:Python1123 31 62

anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Language:Python652 8 39

Curve-Text-Detector

This repository provides train＆test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

Language:Jupyter Notebook636 31 58

OneLLM

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language

Language:PythonNOASSERTION564 11 24

unified-io-2

Language:PythonApache-2.0561 17 16

awesome-large-audio-models

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

558 20 3

Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

Language:HTML312 11 4

Anything2Image

Generate image from anything with ImageBind and Stable Diffusion

Language:Jupyter Notebook192 7 14

ICV

Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering

Language:PythonMIT132 6 9

Seeing-and-Hearing

[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Language:PythonNOASSERTION115 12 10

EasyComDataset

The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the *cocktail party effect* from an augmented-reality (AR) -motivated multi-sensor egocentric world view.

NOASSERTION103 10 7