唐国梁Tommy's starred repositories

StructEval

This is the office repository for ACL 2024 paper "StructEval: Deepen and Broaden Large Language Assessment via Structured Evaluation"

Language:PythonLicense:Apache-2.0Stargazers:3Issues:0Issues:0
Language:PythonStargazers:31Issues:0Issues:0

omages

We present Object Images (Omages): An homage to the classic Geometry Images.

Stargazers:68Issues:0Issues:0

MMIU

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Language:PythonStargazers:17Issues:0Issues:0

ExoViP

[COLM 2024] ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning

Language:PythonLicense:MITStargazers:4Issues:0Issues:0

fantastic-data-engineering

Fantastic Data Engineering for Large Language Models

License:Apache-2.0Stargazers:15Issues:0Issues:0
Language:PythonStargazers:276Issues:0Issues:0

Hallu-PI

The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs".

License:MITStargazers:4Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:669Issues:0Issues:0

flux

Official inference repo for FLUX.1 models

Language:PythonLicense:Apache-2.0Stargazers:5417Issues:0Issues:0

nano-llama31

nanoGPT style version of Llama 3.1

Language:PythonStargazers:817Issues:0Issues:0

redel

ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems.

Language:PythonLicense:MITStargazers:5Issues:0Issues:0

UnifiedMLLM

UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model

License:Apache-2.0Stargazers:9Issues:0Issues:0

RAGFoundry

Framework for specializing LLMs for retrieval-augmented-generation tasks using fine-tuning.

Language:PythonLicense:Apache-2.0Stargazers:138Issues:0Issues:0

POA

Official implementation of ECCV24 paper: POA

License:Apache-2.0Stargazers:18Issues:0Issues:0

RagLLaVA

Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training.

Language:PythonLicense:MITStargazers:8Issues:0Issues:0

TEVAD

Official implementation for paper TEVAD: Improved video anomaly detection with captions

Language:PythonStargazers:19Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:155Issues:0Issues:0

SwinBERT

Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"

Language:PythonLicense:MITStargazers:235Issues:0Issues:0

CoCap

[ICCV 2023] Accurate and Fast Compressed Video Captioning

Language:PythonLicense:MITStargazers:32Issues:0Issues:0

segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:8856Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:8Issues:0Issues:0
Language:PythonStargazers:2Issues:0Issues:0

MovieSeq

[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences

Stargazers:7Issues:0Issues:0

DQU-CIR

[SIGIR 2024] - Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval

Language:PythonLicense:Apache-2.0Stargazers:16Issues:0Issues:0
Language:PythonStargazers:22Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:86Issues:0Issues:0

KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language:PythonLicense:MITStargazers:182Issues:0Issues:0

XHand

Official pytorch implementation of "XHand: Real-time Expressive Hand Avatar"

Language:PythonLicense:Apache-2.0Stargazers:54Issues:0Issues:0

AdaCLIP

This repository contains the code for AdaCLIP, a computation and latency-aware system for pragmatic multimodal video retrieval.

Language:PythonLicense:NOASSERTIONStargazers:8Issues:0Issues:0