onesuper / HuggingFace-Datasets-Text-Quality-Analysis

Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in dataset using pandas

Home Page:https://huggingface.co/spaces/Dreamsome/HuggingFace-Datasets-Text-Quality-Analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

onesuper/HuggingFace-Datasets-Text-Quality-Analysis Stargazers