hudeven / text

Data loaders and abstractions for text and NLP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is a temp repo for hack week: Data APIs for NLP

Get started

  • install HuggingFace datasets. We copied it here to jump start. Eventually, we will build our own.

pip install -e stl_text/dataframes/datasets

  • install PyTorch and torchtext nightlies as some of the tasks depend on the prototype work in torchtext library.

to install cpu version on Linux:

pip install --pre torch torchtext -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html; pip install --upgrade --pre torch torchtext -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html;

to install cuda 10.1 version on Linux:

pip install --pre torch torchtext -f https://download.pytorch.org/whl/nightly/cu101/torch_nightly.html;

More detailed instructions are available here.

  • install this package

pip install -e .

  • run an example

python examples/hf_dataset_quick_tour.py

About

Data loaders and abstractions for text and NLP


Languages

Language:Python 93.9%Language:Jupyter Notebook 5.0%Language:JavaScript 0.6%Language:Shell 0.3%Language:CSS 0.2%Language:Makefile 0.0%Language:Batchfile 0.0%Language:Smalltalk 0.0%