nateraw / huggingface-datasets-converter

Scripts to convert datasets from various sources to Hugging Face Datasets.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hugging Face Datasets Converter

Scripts to convert datasets from various sources to Hugging Face datasets.

Demo

Convert Any Kaggle Dataset To a Hugging Face Dataset: Open In Colab

Usage

Setup

git clone https://github.com/nateraw/huggingface-datasets-converter.git
cd huggingface-datasets-converter
pip install -r requirements.txt

Make sure to authenticate with Hugging Face Hub

huggingface-cli login

Convert Kaggle Dataset

Make sure you have your kaggle.json file in ~/.kaggle. Then...

Provide the kaggle dataset ID and the Hugging Face Hub repo ID that you'd like to upload to (it will be created if it doesn't exist).

python run_kaggle.py --kaggle_id evangower/airbnb-stock-price --repo_id nateraw/airbnb-stock-price

Convert Zenodo Dataset

Provide the record ID and the name of the repo on Hugging Face Hub you'd like to upload to (it will be created if it doesn't exist).

python run_zenodo.py --zenodo_record 6606485 --repo_id nateraw/espeni

For zenodo, you can also pass --workers flag if you want to do this with multiprocessing.

python run_zenodo.py --zenodo_record 6606485 --repo_id nateraw/espeni --workers 2

About

Scripts to convert datasets from various sources to Hugging Face Datasets.


Languages

Language:Python 72.8%Language:Jupyter Notebook 27.2%