Unifying different dates from Google Docs.
Convert a Google Docs CSV file to JSON for passim.
Convert a CSV dataset to a plain text corpus for MALLET. Split the corpus to subcorpora based on language.
R scripts to run ldatuning library to our dataset to estimate k-value for MALLET.
Run MALLET to subcorpus. Set the path before running. Select a subcorpus by giving language as an argument (relative to subfolders in corpus).
Usage: run_mallet.sh Finnish
Read the topic participation matrix produced by MALLET back to Pandas data frame. See the 'titles' datatable for a combined table for the topic participation of different newspaper titles.
Usage: e.g., make_matrix.py Finnish