sebastianbujwid / zsl_text_imagenet

ImageNet-Wiki: matching, processing, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

zsl_text_imagenet

ImageNet-Wiki: matching, processing, etc. The project contains:

  • ImageNet-Wiki automatic matching
  • Extracting, parsing of Wikipedia articles
  • Feature extraction from Wikipedia articles' text

Download encoded text from Wikipedia articles

Available to download: Encoded Wikipedia articles (extracted features). Contains Wikipedia articles corresponding to ImageNet classes encoded with:

  • GloVe features
  • Word2Vec features
  • ALBERT (xxlarge & base)

Wikipedia dump used

We used enwiki-20200120 (20 Jan 2020) dump of English Wikipedia, downloaded from Wikimedia Downloads page. The original version of the dump that we have used is available on request only (due to large size).

For the ImageNet-Wikipedia articles correspondences and the original extracted text refer to: ImageNet-Wiki Dataset repository

Conda environment

conda.yml contains a Conda environment used for the project. Note that it contains more dependencies than this project requires!

Issue with downloading ALBERT weights

Due to some changes in newer versions of transformers library and models you probably won't be able to download ALBERT weights correctly (most likely will get some logging about that - make sure not to miss them!). If interested, see more details in the corresponding Github issue.

To make it work you can try to manually load the cached models we used: Older, cached ALBERT models from transfomers library

Project

The code from this repository was used in our work, see our project page.

About

ImageNet-Wiki: matching, processing, etc.

License:GNU General Public License v3.0


Languages

Language:Python 98.1%Language:Shell 1.9%