Improve GPU offloading mechanisms

Question

achimnol opened this issue 9 years ago · comments

The current GPU offloading implementation has several limitations:

For subsequent offloadable elements, we should just skip the "aggregation" phase for the previous "batch of batches" that were offloaded.
The aggregation phase and/or the load balancer should perform "adaptive batching" -- when there are few packets/batches, we should stick on CPUs instead of GPUs.
- Currently there is no way to compare the queue lengths of CPUs and GPUs, because CPUs does not have processing input queues at all! However, we could determine whether to use CPUs or GPUs by inspecting the packet aggregation array in the RX phase, like SSLShader and Kargus.
- We need to combine "opportunistic/dynamic" offloading with our adaptive load balancing algorithm.
The current aggregation phase just counts the number of batches only: we need to do it smarter -- for example, use total payload sizes for variable-length datablocks and the number of packets for fixed-length datablocks. (Of course, such differentiation should be implemented light-weight.)