kliu128 / figuring-out-figures

Multimodal image + text captioning for 416k figures from arXiv. Uses CLIP + SciBERT + GPT-2 in an encoder-decoder architecture. CS224N final project.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This repository is not active

About

Multimodal image + text captioning for 416k figures from arXiv. Uses CLIP + SciBERT + GPT-2 in an encoder-decoder architecture. CS224N final project.


Languages

Language:Jupyter Notebook 96.4%Language:Python 3.5%Language:Shell 0.0%