bowen-upenn

Bowen Jiang (Lauren)'s repositories

ControlText

ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations

Language:PythonApache-2.031 4 2

scene_graph_commonsense

[WACV 2025] Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge

Language:PythonMIT27 1 5

Agent_Rationality

[NAACL 2025] Towards Rationality in Language and Multimodal Agents: A Survey

MIT26 10

llm_token_bias

[EMNLP 2024] A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners

Language:PythonMIT19 20

Multi-Agent-VQA

[CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering

Language:PythonMIT11 20

AnyText

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

Language:PythonApache-2.0000

Awesome-LLM-Reasoning

Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought and OpenAI o1 🍓

MIT000

CCD

[ICCV2023] Self-supervised Character-to-Character Distillation for Text Recognition

Language:Python000

CFR_VQA

Coarse-to-Fine Reasoning for Visual Question Answering (CVPRW'22)

Language:PythonMIT000

Image-Generation-CoT

Investigating CoT Reasoning in Autoregressive Image Generation

Language:Python000

Rethinking-Text-Segmentation

[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Language:Python000

SeeAct

[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).

Language:PythonNOASSERTION000

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonMIT000

VLSAT

CVPR2023 : VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud

Language:Python000

verl

verl: Volcano Engine Reinforcement Learning for LLMs

Apache-2.0000