Ignacio Arroyo's repositories
dummy_fraud_detection
Fraud detection in credit card payments and auto insurance claims using PySpark
sentence_embedding
A sentence embedding method based on weighted series
nlp-pipeline
Script series for NLP: PMI, TF-IDF and Neural cooccurrence vectorization, vector (TF/IDF & PMI) data base distributed querying and population with Hadoop. Deep learning and kernel learning in sklearn.
describe_corpus
This is a dataset where each file is associated to a term. Each file in turn contains definitions for the associated term. All text snippets are embedded into doc2vec vector representations.
expconditions
Learning Machine trained for extraction of experimental conditions from scientific literature in the biomedical area
open-ncd-kbc
This repo contains software and results derived from the PRODEP project entitled "Reinforcement learning in the automatic acquisition of knowledge in noncommunicable diseases"
seismic_embeddings
This project aims to represent seismic data samples in an embedding space to observe similarities among embeddings. Data samples were provided by the Mexican National Seismic service (Servicio Sismológico Nacional) including intensity measurements from 1900 to 2018.
2nd_half_wiki_generator
This is a document to document prediction model. Given the fisrt half of a Wikipedia article, predict first the probable topics of his 2nd half and then, try to generate such 2nd half article.
address_duplix
Address Duplication problem with supervised learning
contexto_nlp
Automated Q&A for assessing lexicon acquisition
csk4open-agro-reasoning
Common Sense Knowledge for Open Vocabulary Reasoning in Agroecology
cultural_nlp
Natual Language Applications to Cultural Heritage
discrimative_attributes
Implementation of the unsupervised model for semantic discriminative attributes using neural word embeddings. Participant system SemEval 2018 -- Task 10: Capturing Discriminative Attributes
elastic_pytorch_loader
Python class to load a page of es_page_size from ElasticSearch. This page is consumed in batches of batch_size documents by a pytorch data loader. A new page is loaded before the last batch is consumed by the torch model in training time.
iarroyof.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
KonwChainApi
Una API para extracción automática de conocimiento biomédico basada en Inteligencia Artificial
mlprl_orderbook
Baseline estimator for profit maximization in orderbook RL environments
ov-llm-reasoning
Open Vocabulary LLM Reasoning
rl4kbc-csr
RL4KBC&CSR is a self-attention based Neural Language Model trained with different Knowledge Bases. The main application of RL4KBC&CSR is focused on supporting biomedical research related to the study of NonCommunicable Diseases. The goal of trained NLM is reconstruct/generate missing parts of semantic structures.
semanticrl
Semantic Reinforcement Learning. This preprint provides first insights: Arroyo-Fernández, I., Carrasco-Ruíz, M., & Arias-Aguilar, J. A. (2019). On the Possibility of Rewarding Structure Learning Agents: Mutual Information on Linguistic Random Sets. arXiv preprint arXiv:1910.04023.
topic_prediction
Latest version of topic predictor using multiple SVMs as a generative model