khasbilegt / sanitizer

Python library for dealing with duplicated training data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sanitizer

Sanitizer is a Python library for dealing with duplicated training data. It utilizes a module that is added in Python 3, called concurrent.futures to minimizes the time that is needed for the general process.

Usage

What you have to do first is to change labels.json and config.json files to your needs.

labels.json - This contains the input folder names as key and their related id, ascii symbol as value in json format.

config.json - This holds symbol values from labels.json as key and result folder name as value in json format.

poetry run sanitizer

Screenshots

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

About

Python library for dealing with duplicated training data.


Languages

Language:Python 100.0%