Ferenas's starred repositories
segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
open_llama
OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Curve-Text-Detector
This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.
awesome-large-audio-models
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Anything2Image
Generate image from anything with ImageBind and Stable Diffusion
Seeing-and-Hearing
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
EasyComDataset
The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the *cocktail party effect* from an augmented-reality (AR) -motivated multi-sensor egocentric world view.
Localizing-Visual-Sounds-the-Hard-Way
Localizing Visual Sounds the Hard Way
Awesome-Autoregressive-Visual-Generation
This is a repo to track the latest autoregressive visual generation papers.