Beast code in Giters

The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919

Language:PythonNOASSERTION010

HTML4Vision

A simple HTML visualization tool for computer vision research :hammer_and_wrench:

Language:PythonMIT010

linzhiqiu.github.io

Zhiqiu Lin's site

Language:JavaScriptMIT010

LLaVA

[NeurIPS 2023 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards GPT-4V level capabilities.

Language:PythonApache-2.0010

llm-can-optimize-vlm.github.io

Language:JavaScript000

lmms-eval

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

NOASSERTION000

mmselfsup

OpenMMLab Self-Supervised Learning Toolbox and Benchmark

Language:PythonApache-2.0010

PerceptualSimilarity

LPIPS metric. pip install lpips

Language:PythonBSD-2-Clause010

pytorchvideo

A deep learning library for video understanding research.

Apache-2.0000

streamlit-feedback-video

Collect user feedback from within your Streamlit app

MIT000

streamlit-video-captioning

Streamlit LLM app

Apache-2.0000

video_annotation

Video Annotation Format

Language:Python000

video_captioning

Language:Python000

vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023

Language:PythonMIT010

why-winoground-hard

Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022

Language:PythonMIT010