ArneD's repositories
BERT_doc_classification
Document classification with BERT
bert_document_classification
architectures and pre-trained models for long document classification.
BERT_NER
NER with BERT
cache-conda-envs
Speed up your builds by caching Anaconda environments on GitHub Actions
CVDD-PyTorch
A PyTorch implementation of Context Vector Data Description (CVDD), a method for Anomaly Detection on text.
Demo
Demo repo for tutotial articles on Opensource.com
diffgram
Training Data (Data Labeling, Annotation, Catalog, Workflow) for all Data Types (Image, Video, 3D, Text, Geo, Audio, more) at scale.
dkpro-cassis
UIMA CAS processing library written in Python
DPR
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
fake_news_semantics
Code for the paper "Do Sentence Interactions Matter ? Leveraging Sentence Level Representations for Fake News Classification"
FakeNewsCorpusSpanish
The Spanish Fake News Corpus contains a collection of 971 news divided into 491 real news and 480 fake news. The corpus covers news from 9 different topics: Science, Sport, Economy, Education, Entertainment, Politics, Health, Security, and Society
files2rouge
Calculating ROUGE score between two files (line-by-line)
ganbert-pytorch
Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace
Legal-Docs-Large-MLTC
Multi Label Text Classification for Legal documents. Work on mono-lingual and multilingual parallel data
lmtc-eurlex57k
Large-Scale Multi-Label Text Classification on EU Legislation
mlm-scoring
Python library & examples for Masked Language Model Scoring (ACL 2020)
multi-eurlex
MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer
multilingual-fake-news
The code related to the paper
Multimodal-Toolkit
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
neural-document-aligner
Document aligner which uses neural technologies to search matches across bilingual documents
question_generator
An NLP system for generating reading comprehension questions
spatialdata
An open and universal framework for processing spatial omics data
TopicalChange
Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.
trafilatura
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)
Voice-Privacy-Challenge-2020
Baseline Recipe for VoicePrivacy Challenge 2020: https://www.voiceprivacychallenge.org/docs/VoicePrivacy_2020_Eval_Plan_v1_3.pdf
word2word
Easy-to-use word-to-word translations for 3,564 language pairs.
wordfreq
Access a database of word frequencies, in various natural languages.