Bowen Jiang (Lauren)'s repositories
ControlText
ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations
scene_graph_commonsense
[WACV 2025] Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge
Agent_Rationality
[NAACL 2025] Towards Rationality in Language and Multimodal Agents: A Survey
llm_token_bias
[EMNLP 2024] A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners
Multi-Agent-VQA
[CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering
AnyText
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
Awesome-LLM-Reasoning
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought and OpenAI o1 🍓
CCD
[ICCV2023] Self-supervised Character-to-Character Distillation for Text Recognition
CFR_VQA
Coarse-to-Fine Reasoning for Visual Question Answering (CVPRW'22)
Image-Generation-CoT
Investigating CoT Reasoning in Autoregressive Image Generation
Rethinking-Text-Segmentation
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
SeeAct
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
VLSAT
CVPR2023 : VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
verl
verl: Volcano Engine Reinforcement Learning for LLMs