alexandrainst / tts_text

Code for collection/generation of text for tts data collection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tts_text

Code for collection/generation of text for tts data collection


Documentation License LastCommit Code Coverage Contributor Covenant

Developers:

Quick Start

The quickest way to build the dataset is using Docker. With Docker installed, simply write make docker and the final dataset will be built in the data/processed directory, with the individual datasets in data/raw.

Development Setup

To install the project for further development, run the following steps:

  1. Run make install, which installs Poetry (if it isn't already installed), sets up a virtual environment and all Python dependencies therein.
  2. Run source .venv/bin/activate to activate the virtual environment.

With the project installed, you can build the dataset by running:

python src/scripts/build_tts_dataset.py

NB: Running the above script on a machine running MacOS may result in an urllib.error.URLError-exception being thrown, in which case one should follow the steps described here.

Project structure

.
├── .devcontainer
│   └── devcontainer.json
├── .github
│   └── workflows
│       ├── ci.yaml
│       └── docs.yaml
├── .gitignore
├── .pre-commit-config.yaml
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── Dockerfile
├── LICENSE
├── README.md
├── config
│   ├── __init__.py
│   ├── config.yaml
│   └── hydra
│       └── job_logging
│           └── custom.yaml
├── data
│   ├── final
│   │   └── .gitkeep
│   ├── processed
│   │   └── .gitkeep
│   └── raw
│       └── .gitkeep
├── docs
│   └── .gitkeep
├── gfx
│   ├── .gitkeep
│   └── alexandra_logo.png
├── makefile
├── models
│   └── .gitkeep
├── notebooks
│   └── .gitkeep
├── poetry.lock
├── poetry.toml
├── pyproject.toml
├── src
│   ├── scripts
│   │   ├── build_tts_dataset.py
│   │   └── fix_dot_env_file.py
│   └── tts_text
│       ├── __init__.py
│       ├── __pycache__
│       ├── bus_stops_and_stations.py
│       ├── dates.py
│       ├── times.py
│       └── utils.py
└── tests
    ├── __init__.py
    ├── __pycache__
    └── test_dummy.py

About

Code for collection/generation of text for tts data collection

License:MIT License


Languages

Language:Python 90.3%Language:Makefile 9.1%Language:Dockerfile 0.6%