Sample data from the roots corpus
[1] Go to the dataset section of https://huggingface.co/bigscience-data
[2] Open each of the data split and accept BigScience Ethical Charter
. Otherwise you won't be able to download data.
[3] Open the notebook. Load and Sample data according to your wish. Please note that the notebook supports sampling from multinomial distribution.