deep-programmer / interesting-papers

Interesting papers I'd like to implement (or at least have implementations of)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is a list of papers I would like to implement, or would like to have an
implementation of.  This list is likely to change as my interests change,
including deletions.  Do not expect this list to remain static.

These are in no particular order:

SONIK:  Efficient In-situ All Item Rank Generation using Bit Operations
    - https://arxiv.org/abs/1605.06992

CAMP: A Cost Adaptive Multi-Queue Eviction Policy for Key-Value Stores
    - http://dblab.usc.edu/users/papers/CAMPTR.pdf

SimString: A fast and simple algorithm for approximate string matching/retrieval
http://www.chokkan.org/software/simstring/

Simpira: cryptographic permutations designed to be fast on modern 64-bit
processors, yet provide a comfortable security margin against all
currently-known attacks.
    - http://mouha.be/simpira/

Autoscaling Bloom Filter: Controlling Trade-off Between True and False Positives
    - https://arxiv.org/abs/1705.03934

Adaptive Cuckoo-Filters
    - https://arxiv.org/abs/1704.06818

Continuous Top-k Queries over Real-Time Web Streams
    - https://arxiv.org/abs/1610.06500

A practical index for approximate dictionary matching with few mismatches
    - https://arxiv.org/abs/1501.04948

Robust benchmarking in noisy environments
    - https://arxiv.org/abs/1608.04295

Fast intersection of sorted lists with SSE:
    - https://highlyscalable.wordpress.com/2012/06/05/fast-intersection-sorted-lists-sse/
    - Also, https://arxiv.org/abs/1401.6399

PAD: Performance Anomaly Detection in Multi-Server Distributed Systems
    https://www.microsoft.com/en-us/research/wp-content/uploads/2014/06/PAD-Performance-Anomaly-Detection-in-Multi-Server-Distributed-Systems.pdf

Detecting Abnormal Machine Characteristics in Cloud Infrastructures
    - https://ti.arc.nasa.gov/publications/4268/download/

PerfAugur: Robust Diagnostics for Performance Anomalies in Cloud Services
    - https://www.microsoft.com/en-us/research/publication/perfaugur-robust-diagnostics-for-performance-anomalies-in-cloud-services/

Statistical Techniques for Online Anomaly Detection in Data Centers
    - http://www.hpl.hp.com/techreports/2011/HPL-2011-8.pdf

Fast table-driven base64 encoding/decoding:
    - https://github.com/powturbo/TurboBase64/blob/master/turbob64d.c

Assembly versions of hash functions / cryptographic algorithms:
    - t1ha (Go version: https://github.com/dgryski/go-t1ha )
    - rc5 / rc6 (Go version: https://github.com/dgryski/go-rc5 / https://github.com/dgryski/go-rc6 )

In-memory data layout for Netflix's Hollow:
    - http://hollow.how/advanced-topics/#in-memory-data-layout

Omnisearch Index Formats
    - https://blog.twitter.com/2016/omnisearch-index-formats

NORX8 and NORX16: Authenticated Encryption for Low-End Systems
    - https://eprint.iacr.org/2015/1154

LightMAC: A MAC Mode for Lightweight Block Ciphers:
    - https://eprint.iacr.org/2016/190.pdf

Fast Deterministic Selection (adaptive QuickSelect)
    - https://arxiv.org/abs/1606.00484

A Bloom filter based semi-index on q-grams
    - https://arxiv.org/abs/1507.02989

Faster Population Counts using AVX2 Instructions
    - https://arxiv.org/abs/1611.07612

Quasi-Succinct Indices (compressed inverted indexes):
    - http://vigna.di.unimi.it/ftp/papers/QuasiSuccinctIndices.pdf

Efficient Summing over Sliding Windows (stream statistics)
    - http://arxiv.org/pdf/1604.02450v1.pdf

A Novel Technique for Long-Term Anomaly Detection in the Cloud
    - https://www.usenix.org/system/files/conference/hotcloud14/hotcloud14-vallis.pdf
    - Twitter's anomaly detection algorithm
    - related, http://www.ebaytechblog.com/2015/08/19/statistical-anomaly-detection/
    - related, http://nerds.airbnb.com/anomaly-detection/

TinySet - An Access Efficient Self Adjusting Bloom Filter Construction
    - http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2015/CS/CS-2015-03.pdf

Detecting Change in Data Streams:
    - https://cs.uwaterloo.ca/~shai/vldb04.pdf

Hierarchical Delta Debugging:
    - https://blog.acolyer.org/2015/11/17/hierarchical-delta-debugging/
    - (to go with https://github.com/dgryski/go-ddmin )

FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space
    - http://cs.fit.edu/~pkc/papers/tdm04.pdf
    - many implementations to use as base, for example https://github.com/slaypni/fastdtw/blob/master/fastdtw.py

Mining frequent items in the time fading model
    - http://arxiv.org/pdf/1601.03892v1.pdf

Hierarchical Agglomerative Clustering:
    - http://nlp.stanford.edu/IR-book/html/htmledition/hierarchical-agglomerative-clustering-1.html
    - needed for https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/rebucket-icse2012.pdf
    - preliminary implementation of rebucket:  https://github.com/dgryski/go-rebucket

Balanced Allocation: Patience is not a Virtue (FirstDiff load balancing):
    - http://arxiv.org/abs/1602.08298

Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream
    - http://www.cs.ubc.ca/~xujian/paper/quant.pdf

The Eternal Sunshine of the Sketch Data Structure
    - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.2889&rep=rep1&type=pdf

Copysets and Chainsets: A Better Way to Replicate
    http://hackingdistributed.com/2014/02/14/chainsets/

A Fast Algorithm for Approximate Quantiles in High Speed Data Streams
    - http://web.cs.ucla.edu/~weiwang/paper/SSDBM07_2.pdf
    - this algorithm has haunted me for ages, I could never get my code working
    - unresponsive authors, details missing from papers, etc
    - there now appear to be more implementations that could be used as a base

About

Interesting papers I'd like to implement (or at least have implementations of)