Optimization for batch traversal
leeopop opened this issue · comments
In 591fe18, Joongi has implemented swap-based continuous batches to eliminate branching overheads in per-packet iterations.
As a packet processing framework, it would be nice if we could avoid packet reordering even inside batches.
I think ffs (find-first significant bit), ffz (find-first zero bit) intrinsics may be a great solution for this.
- Rewrite all iteration loops to use a common iterator definition (using macro or functions).
- Ensure that the current scheme work with new iteration loop definition.
- Implement ffs/ffz-based iteration loop and remove
collect_excluded_packets()
in PacketBatch.
Done with #5 issue and a feature branch.
Now time to run branching workload tests.