huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Log progress

jordane95 opened this issue · comments

Can we add a logging function to print out how much files we have processed? So that we could estimate the total runtime of one job dynamically