Plan de Tecnologías del Lenguaje - Gobierno de España's repositories
lm-spanish
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).
lm-legal-es
Language Models for the legal domain in Spanish done @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).
lm-biomedical-clinical-es
Official source for Spanish pretrained biomedical and clinical language models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).
Biomedical-Word-Embeddings-for-Spanish
Biomedical Word embeddings generated from Spanish Biomedical corpora.
SPACCC_MEDDOCAN
MEDDOCAN: Corpus, guidelines, IAA and scripts.
corpus-cleaner
Generic toolkit for corpus cleaning
AbreMES-DB
[Plan TL/medicine/lexical/terminological resource] A Spanish Medical Abbreviation DataBase.
Medical-Translator
[PlanTL/medicine/neural machine translation/translation models] Files needed to use the Neural Machine Translation system for the Biomedical Domain.
covid-predictive-model
A RNN Predictive Model for COVID-19 mortality prediction.
EHR-normalizer
[PlanTL/medicine/document/NLP preprocessing] Software to convert PDF files into HTML, TXT or XML files and to normalize EHRs.
BVS-Corpus
Biblioteca Virtual en Salud - Parallel Corpus
controversy-detection-model
This repository contains the code of the paper "Anticipating the Debate: Predicting Controversy in News with Transformer-based NLP"
MEDDOCAN-Format-Converter-Script
Script to convert files between MEDDOCAN-Brat, MEDDOCAN-XML, and i2b2 formats.
shared-task-resource-example
Example README file for Shared Task submissions
SPACCC_POS-TAGGER
[PlanTL/medicine/document annotation/NLP preprocessing/part-of-speech] Part-of-Speech Tagger for medical domain corpus in Spanish based on FreeLing.
atc7-es-en
ATC7 (Sistema de Clasificación Anatómica 7) spanish-english translations
MEDDOCAN-Evaluation-Script
Official evaluation script of the Medical Document Anonymization (MEDDOCAN) task.
SPACCC_Sentence-Splitter
[PlanTL/medicine/document annotation/NLP preprocessing/sentence splitter] Sentence splitting model created using the Apache OpenNLP machine learning toolkit
SPACCC_Tokenizer
[PlanTL/medicine/document annotation/NLP preprocessing/tokenization] Tokenization model created using the Apache OpenNLP machine learning toolkit.
spanish-benchmark
Spanish Benchmark website