There are 1 repository under dataset-filtering topic.
Trainable categorization tool
Multi-Language Dataset Cleaner/Creator for Mozilla's DeepSpeech Framework
Cleaning discord data for NLP
Command-line filter for GitHub repositories that contain "samples", instead of real project or framework or library
:rocket: Whenever you need to look through huge pile of images and cannot use force of file explorer, or you just work on a remote headless machine, you can use this tool. It also allows to move files from one folder to another, creating destination if it does not exist. Work in progress.
Face recognition approach by exploring information jointly in space, scale and orientation.
A simple library that wraps common data processing tasks into an easy to use preprocessing engine. The library currently supports transformation of csv files loaded into Pandas dataframe.
Compare pictures, keep 2
Fast Spark Expression - Write column expressions quickly and easily like a string
Condense datasets with millions of conversations down to only a handful of the most unique ones.
Data Cleaning - A project which takes all colleges in the US, and narrows down the suitable colleges by slicing, dicing and concatenating startup activity data and crime statistics.
A set of tools to generate and label dataset from academic papers