mohaps / select-k

A C++20 single header-only library for doing Top K / Bottom K kinda selection from a stream of candidates (with custom scoring functions and score types)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Select-K

Single file header-only C++20 library to implement k-nearest / k-best type logic

Sample Usage:

Just include select_k/select_k.h and do

    auto scoringFunction = [](const Candidate& c) { 
        /* return an int score */ 
    };
    k::Top<Candidate, int> selector(k, scoringFunction);
    
    for (auto c : candidates) {
        selector.offer(c);
    }
    
    // now selector has min(k, canddiates.size()) best candidates 
    // (top scores from scoring function)

Doing the same with k::Bottom will get you the bottom K from the candidates

One Shot Helpers

Helper functions kTop::compute() and kBottom::compute() are provided for one shot calls where the entire list of candidates is available e.g. in a std::vector

This is the sample code for computin k-nearest points to origin.

std::vector<Point> results;
k::Bottom<Point, int>::compute(
    std::back_inserter(results), 
    4, 
    inputs.begin(), inputs.end(),
    sqEuclidian
);

Scoring Function and ScoreType

The scoring function returns a ScoreType for a Candidate

  • k::Top preserves the K top score candidates
  • k::Bottom preserves the K lowest score candidates

Custom scoring functions can be passed in the constructor

auto sqEuclidian = [](const Point& p){
    // custom scoring function (square of euclidean distance from origin)
    return p.first * p.first + p.second * p.second;
};
k::Bottom<Point, int> nearest(4, sqEuclidian);

See src/select_k_sample.cpp for usage details

Build and Run

Sample Makefile and code (src/select_k_usage) is included.

$ make run
rm -rf bin/select_k_sample
g++ -std=c++20 -Iinclude -O2 -o bin/select_k_sample src/select_k_sample.cpp
bin/select_k_sample
**** TESTING INTS ... 
Inputs : [1, 4, 2, 30, 5, 6, 11, 10, 9, 100]
Top =>
  => 100
  => 30
  => 11
Bottom =>
  => 1
  => 2
  => 4
**** TESTING POINTS ...
finding k-nearest points to origin k=4 ...
Trying iterative/streaming compute ...
 => 1,1
 => 2,1
 => 1,2
 => 2,2
Trying one-shot compute ...
 => 1,1
 => 2,1
 => 1,2
 => 2,2

Complexity

For N candidates and selection of K samples:

  • Runtime Complexity: O(N * Log(K))
  • Space Complexity: O(K)

Credits

About

A C++20 single header-only library for doing Top K / Bottom K kinda selection from a stream of candidates (with custom scoring functions and score types)

License:Apache License 2.0


Languages

Language:C++ 98.1%Language:Makefile 1.9%