Pedro Ortiz Suarez (pjox)

pjox

Geek Repo

Company:@commoncrawl

Location:Paris

Home Page:https://portizs.eu

Twitter:@pjox13

Github PK Tool:Github PK Tool


Organizations
bigscience-workshop
commoncrawl
oscar-project

Pedro Ortiz Suarez's repositories

cc-downloader

A polite and user-friendly downloader for Common Crawl data

Language:RustLicense:Apache-2.0Stargazers:5Issues:2Issues:0

thesis

My Ph.D. Thesis

Language:TeXLicense:NOASSERTIONStargazers:3Issues:2Issues:0

oscar-utils

A new set of utilities to work with the OSCAR Corpus

Language:RustLicense:Apache-2.0Stargazers:2Issues:2Issues:0

portizs

My personal website

Language:TeXLicense:Apache-2.0Stargazers:2Issues:1Issues:3

portizs-en

Pedro's Personal Website in English

Language:HTMLStargazers:1Issues:1Issues:0

advent-of-code-2023

My bad solutions to Advent of Code-2023

Language:RustStargazers:0Issues:1Issues:0

alephn

The Alephn Site

Language:CSSStargazers:0Issues:2Issues:0

CamemBERT-site

The website of CamemBERT

Language:TeXStargazers:0Issues:4Issues:0

cc_net

Tools to download and cleanup Common Crawl data

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

CommonCrawler

🕸 A simple way to extract data from Common Crawl

Language:GoLicense:MITStargazers:0Issues:0Issues:0

ctclib

A collection of utilities related to CTC

Language:RustLicense:MITStargazers:0Issues:0Issues:0

datasets

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

dispel

Easily apply transformer models to downstream NLP tasks

Language:PythonStargazers:0Issues:0Issues:0

hplt2wet

HPLT to WET conversion

Language:RustLicense:Apache-2.0Stargazers:0Issues:1Issues:0

isogloss

ISO 639 and IETF Language Code Lookup Tool

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

latex-mimosis

A minimal & modern LaTeX template for your (bachelor's | master's | doctoral) thesis

Language:TeXLicense:MITStargazers:0Issues:0Issues:0

LEM17

Data and models for lemmatising and POS-tagging modern French (16-18th c.)

Language:ShellStargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

parquet2text

Parquet2text

Language:RustStargazers:0Issues:1Issues:0
Stargazers:0Issues:1Issues:0

portizs-de

Pedro's Personal Website in German

Language:HTMLStargazers:0Issues:0Issues:0

portizs-es

Pedro's Personal Website in Spanish

Language:HTMLStargazers:0Issues:1Issues:0

portizs-fr

Pedro's Personal Website in French

Language:HTMLStargazers:0Issues:1Issues:0

presto-parser

A parser for the Presto corpus

Language:RustLicense:Apache-2.0Stargazers:0Issues:1Issues:0

rust-html2text

Rust library to render HTML as text.

Language:RustLicense:MITStargazers:0Issues:0Issues:0

scdx

A simple tool for querying the Common Crawl CDX

License:MITStargazers:0Issues:0Issues:0

wowchemy-hugo-themes

🔥 Hugo website builder, Hugo themes & Hugo CMS. No code, build with widgets! 创建在线课程,学术简历或初创网站。

Language:SCSSLicense:MITStargazers:0Issues:0Issues:0