Daniel van Strien's repositories
awesome-synthetic-datasets
awesome synthetic (text) datasets
huggingface-tldr
Experimental tl;dr summaries for datasets on the Hugging Face Hub!
auto_dataset_card
Wouldn't it be nice to generate parts of our dataset card automagically?
Python-introduction-for-digital-collections
Workshop materials on Python as part of a series of Library Carpentry workshops at the British Library
LLM-pubmed-query-generation-evaluation
LLM PubMed Query Generation Evaluation
argilla
Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
awesome-list
Awesome AI in Libraries
Computer-Vision-for-the-Humanities-workshop
Computer Vision for the Humanities workshop
data-preparation
Code used for sourcing and cleaning the BigScience ROOTS corpus
distilabel
⚗️ AI Feedback framework for scalable LLM alignment
gahd
GAHD: A German Adversarial Hate speech Dataset
huggingface_hub
All the open source things related to the Hugging Face Hub.
iiif2annos
OCR a IIIF images in a manifest and generate annotations
monitor-prompts-hf
A Gradio app to monitor annotation effort done by users using the Argilla HF Space over the prompt dataset
Website-Classification
Trying to classify web archives using metadata...