chemnlp_datasets Extract text and metdata from Biorxiv & Medrxiv papers. Final datasets: Biorxiv, Medrxiv Docker image for OCR text extraction