wanboyang's repositories

Anomaly_AR_Net_ICME_2020

This repository is for Weakly Supervised Video Anomaly Detection via Center-Guided Discriminative Learning(ICME 2020). The original paper can be found (https://ieeexplore.ieee.org/document/9102722) or (https://arxiv.org/abs/2104.07268)

Language:PythonLicense:MITStargazers:52Issues:3Issues:8

Awesome-Multimodal-Large-Language-Models

Latest Papers and Datasets on Multimodal Large Language Models

Stargazers:2Issues:0Issues:0

Protein-Localization-Transformer

Code for CELL-E: Biological Zero-Shot Text-to-Image Synthesis for Protein Localization Prediction

Language:PythonLicense:MITStargazers:1Issues:0Issues:0
License:NOASSERTIONStargazers:0Issues:2Issues:0

UCF_2018_CVPR

A reproduce code for Real-world Anomaly Detection in Surveillance Videos

Language:PythonStargazers:0Issues:2Issues:0

awesome-industrial-anomaly-detection

Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。

Stargazers:0Issues:0Issues:0

CAA

Channelized Axial Attention for Semantic Segmentation (AAAI-2022)

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

camel

CaMEL: Mean Teacher Learning for Image Captioning. arXiv 2022.

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:1Issues:0
Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0

Chinese-STD-GB-T-7714-related-csl

GB/T 7714相关的csl以及Zotero使用技巧及教程。

License:GPL-3.0Stargazers:0Issues:1Issues:0

DAT

Repository of Vision Transformer with Deformable Attention

Language:PythonStargazers:0Issues:1Issues:0

davit

Code for paper "DaViT: Dual Attention Vision Transformer"

Language:Jupyter NotebookLicense:MITStargazers:0Issues:1Issues:0
Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:1Issues:0
Stargazers:0Issues:0Issues:0

grit

GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)

Language:PythonStargazers:0Issues:0Issues:0

GroundingDINO

Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Language:ShellLicense:NOASSERTIONStargazers:0Issues:0Issues:0

LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

LLaMA-Adapter

Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

License:GPL-3.0Stargazers:0Issues:0Issues:0

LLMsPracticalGuide

A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)

Stargazers:0Issues:0Issues:0

LLMVA-GEBC

Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

Neighborhood-Attention-Transformer

[Preprint] Neighborhood Attention Transformer

Language:PythonStargazers:0Issues:1Issues:0

pykaldi

A Python wrapper for Kaldi

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

stable-diffusion

A latent text-to-image diffusion model

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:0Issues:0

Textual-Visual-Semantic-Dataset

Visual Semantic Relatedness Dataset for Image Captioning. https://arxiv.org/abs/2301.08784

Language:PythonStargazers:0Issues:0Issues:0

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

wanboyang.github.io

AcadHomepage: A Modern and Responsive Academic Personal Homepage

Language:SCSSLicense:MITStargazers:0Issues:0Issues:0

Xmodal-Ctx

Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

Language:PythonStargazers:0Issues:1Issues:0