Jelke Bloem's repositories
2023-coding-the-humanities
Material for UvA course Coding the Humanities 2023
AUC_TM_2023
Materials for "Text Mining", a course by the Amsterdam University College.
5verbclusters
Dit is de repository die hoort bij het paper "Een corpus waar alle constructies in gevonden zouden moeten kunnen worden?".
AUC_TM_2024
Materials for "Text Mining", a course by the Amsterdam University College.
Wiki-rnd
Wiki-rnd dataset for small data distributional semantic evaluation (without gold standard), used in the paper "Evaluating the consistency of word embeddings from small data". This dataset contains automatically extracted terms from the index of Quine's Word & Object, and non-overlapping random samples of sentences from a 140M word preprocessed Wikipedia snapshot containing those terms. Split into a training set and a test set by terms. In the sentences files, the format is one sentence per line. Of a line, the format is: target term, \t, sentence, \n. Within the sentence, the target term is marked with __xxNN, where NN is the number of the sample. For each target term, there are five samples, containing between N/5 and 10 non-overlapping random sentences, where N is the total number of sentences containing the target term.
2022-coding-the-humanities
Material for UvA course Coding the Humanities 2022
2024-coding-the-humanities
Material for UvA course Coding the Humanities 2024
AUC_TMCI_2022
Materials for "Text Mining", a course by the Amsterdam University College.
clam
Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice.
foliatools
A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
nonce2vec
This is the repo accompanying the paper "High-risk learning: acquiring new word vectors from tiny data" (Herbelot & Baroni, 2017)
truepseudodenominals
This is the repo accompanying the paper "The Distinction Between True and Pseudo Denominals? It's an Illusion!"