Antonio Mallia's repositories
Catch2
A modern, C++-native, header-only, test framework for unit-tests, TDD and BDD - using C++11, C++14, C++17 and later (or C++03 on the Catch1.x branch)
contextual-search-features-for-large-datasets-in-spark
This repository has a simple implementation to calculate contextual search features in spark used on the 1 billion ClueWeb12 webpages
early_irene_experiments
super deprecated, see: https://github.com/jjfiv/irene
euclidesdb
A multi-model machine learning feature embedding database
goturbopfor
Teaching implementation of the TurboPFor integer compression algorithm
indexing-cw-with-terrier
Files to use for index Clueweb 09 and 12 collections with Terrier 4.2
LocustDB
Massively parallel, high performance analytics database that will rapidly devour all of your data.
m2cgen
Transform ML models into a native code (Java, C, Python, etc.) with zero dependencies
mio
Cross-platform header-only C++11 library for memory mapped file IO
notebook
Jupyter Interactive Notebook
PlotNeuralNet
Latex code for making neural networks diagrams
pmu-tools
Intel PMU profiling tools
posterdown
Use RMarkdown to generate PDF Conference Posters via HTML or LaTeX
PRPN
Parsing Reading Predict Network
pydoop
A Python MapReduce and HDFS API for Hadoop
rax
A radix tree implementation in ANSI C
robinhood-to-csv
Python script to export Robinhood trades to a CSV file
selective-search
Selective search partitions large scale dataset into subsets(shards) such that only few shards needs to be searched for a query, thus improving search efficiency and effectiveness
sustain
🎹 Personal blog powered by Jekyll
terrier
You know what this is...
TikaLuceneWarc
A tool to create D2SI format collections from the CC-NEWS crawl using Apache Tika and Lucene
trec-data
scripts to download and standardize trec query and document sets
zfp
Library for compressed numerical arrays that support high throughput read and write random access