NVIDIA / cuCollections

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ENHANCEMENT]: Perf guide

sleeepyjack opened this issue · comments

Is your feature request related to a problem? Please describe.

cuCollections exposes a set of knobs that allow optimizing a hashing data structure for a specific use case.

For example:

  • which probing scheme should I use?
  • what's the best CG size?
  • how does the input data type affect performance?
  • can I use particular operations concurrently? How does that impact performance?

The interaction between those choices is also non-trivial.
Finding out which combination works best for an application is a time-consuming task.

Describe the solution you'd like

Write a perf guide. Could be as simple as a Markdown file.

Describe alternatives you've considered

No response

Additional context

No response

We do provide performance guidance in the probing sequence doc, e.g.:

  • * Linear probing is efficient when few collisions are present. Performance hints:
    * - Use linear probing when collisions are rare. e.g. low occupancy or low multiplicity.
    * - `CGSize` = 1 or 2 when hash map is small (10'000'000 or less), 4 or 8 otherwise.
  • * Default probe sequence for `cuco::static_multimap`. Double hashing shows superior
    * performance when dealing with high multiplicty and/or high occupancy use cases. Performance
    * hints:
    * - `CGSize` = 1 or 2 when hash map is small (10'000'000 or less), 4 or 8 otherwise.

Having a performance tuning section in README doesn't seem right.

Right. This would be too mich information for a readme. I would put it in a separate file and link to it from the readme.