Nicola Tonellotto's repositories
bigdata
Code samples, summaries, cheatsheets and other study material for Hadoop MapReduce and Apache Spark
datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++
docker-cheat-sheet
Docker Cheat Sheet
docker-hadoop-build
Docker contaier to build Apache Hadoop, based on https://github.com/sequenceiq/docker-hadoop-build
freebase-triples
A methodology to process triples data from the Freebase data dumps.
generalized-kmeans-clustering
This project generalizes the Spark MLLIB Batch and Streaming K-Means clusterers in every practical way.
graph-bisection
Dhulipala, Laxman, et al. "Compressing Graphs and Indexes with Recursive Graph Bisection." arXiv preprint arXiv:1602.08820 (2016).
hadoop-docker
Hadoop docker image
homebrewery
Create authentic looking D&D homebrews using only markdown
java8-the-missing-tutorial
Java 8 for all of us
JustEnoughScalaForSpark
A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
LightGBM
A fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks. It is under the umbrella of the DMTK(http://github.com/microsoft/dmtk) project of Microsoft.
LuceneTutorial
A simple tutorial of Lucene for CS 5604 students at Virginia Tech (Fall 2018).
modern-cpp-features
A cheatsheet of modern C++ language and library features.
python-lecture
lecture slides for python
RankingComplexLayouts
Repository for SIGIR'18 paper: "Ranking for Relevance and Display Preferences in Complex Presentation Layouts"
RL
A set of RL experiments. Currently including: (1) the MDP rank experiment, based on policy gradient algorithm
SparkMaxFlow
Spark implementation of Ford-Fulkerson algorithm
tagme
Entity Linking system by A3 lab
TextSegmenter
A text segmenter based on unigram/bigram statistics in Java, inspired by the segmenter by Peter Norvig
ThinkPythonItalian
LaTeX source for the Italian Translation of Think Python.