Pedro Ortiz Suarez's repositories
cc-downloader
A polite and user-friendly downloader for Common Crawl data
oscar-utils
A new set of utilities to work with the OSCAR Corpus
portizs-en
Pedro's Personal Website in English
advent-of-code-2023
My bad solutions to Advent of Code-2023
CamemBERT-site
The website of CamemBERT
cc_net
Tools to download and cleanup Common Crawl data
CommonCrawler
🕸 A simple way to extract data from Common Crawl
ctclib
A collection of utilities related to CTC
datasets
🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools
dispel
Easily apply transformer models to downstream NLP tasks
isogloss
ISO 639 and IETF Language Code Lookup Tool
latex-mimosis
A minimal & modern LaTeX template for your (bachelor's | master's | doctoral) thesis
LEM17
Data and models for lemmatising and POS-tagging modern French (16-18th c.)
parquet2text
Parquet2text
portizs-de
Pedro's Personal Website in German
portizs-es
Pedro's Personal Website in Spanish
portizs-fr
Pedro's Personal Website in French
presto-parser
A parser for the Presto corpus
rust-html2text
Rust library to render HTML as text.
scdx
A simple tool for querying the Common Crawl CDX
wowchemy-hugo-themes
🔥 Hugo website builder, Hugo themes & Hugo CMS. No code, build with widgets! 创建在线课程,学术简历或初创网站。