zhiqwang / sightseq

Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🔭sightseq

Now, Let's go sightseeing by vision and sequence language multimodal around the deep learning world.

What's New:

  • July 30, 2019: Add faster rcnn models. And I rename this repo from image-captioning to sightseq, this is the last time I rename this repo, I promise.
  • June 11, 2019: I rewrite the text recognition part base on fairseq. Stable version refer to branch crnn, which provides pre-trained model checkpoints. Current branch is work in process. Very pleasure for suggestion and cooperation in the fairseq text recognition project.

Features:

sightseq provides reference implementations of various deep learning tasks, including:

Additionally:

  • All features of fairseq
  • Flexible to enable convolution layer, recurrent layer in CRNN
  • Positional Encoding of images

General Requirements and Installation

  • PyTorch (There is a bug in nn.CTCLoss which is solved in nightly version)
  • Python version >= 3.5
  • Fairseq version >= 0.7.1
  • torchvision version >= 0.3.0
  • For training new models, you'll also need an NVIDIA GPU and NCCL

Pre-trained models and examples

License

sightseq is MIT-licensed. The license applies to the pre-trained models as well.

About

Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection

License:MIT License


Languages

Language:Python 100.0%