huggingface / dataset-viewer

Lightweight web API for visualizing and exploring any dataset - computer vision, speech, text, and tabular - stored on the Hugging Face Hub

Home Page:https://huggingface.co/docs/datasets-server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Presidio scan

lhoestq opened this issue · comments

Can help detecting pii and warn authors / users.

I'm working on something similar to the opt-out urls scan and use the default presidio config.

Note that it has false positives so we might have to adapt messages on the Hub (or have some sort of additional filtering)