Doc improvement: Numpy is better than wsaccel?
Arcitec opened this issue · comments
Hello! :-)
I was reading the docs at https://websocket-client.readthedocs.io/en/latest/faq.html#why-is-this-library-slow, and it explains that "please install both numpy and wsaccel, but note that wsaccel can sometimes cause other issues".
This message is way too vague and scared and intrigued me. So I checked the source code:
- I see that
wsaccel
is used in https://github.com/websocket-client/websocket-client/blob/master/websocket/_utils.py and it's used for accelerating UTF8 correctness validation. - I see that both
numpy
andwsaccel
are used in the ABNF library https://github.com/websocket-client/websocket-client/blob/master/websocket/_abnf.py, and that the code prefersnumpy
and ignoreswsaccel
if numpy was found. - As I understand it, the most "heavy" operation is the "mask" XOR stuff that WebSockets do, and which is very slow for big binary data. That's what numpy is the preferred accelerator for. So if I read this correctly, the most important speedup is to have numpy.
- How important is wsaccel's acceleration of UTF-8 validation?
- What are the "other issues" that wsaccel can cause? I have read every issue that mentions wsaccel, and all the docs. There's 1 mention in the docs about "other issues" and 1 mention in a ticket about "other issues" but never what "other issues" means.
I hope this discussion can help the community figure out what libraries to use. For now, I am going to assume that numpy is the best choice and will only install that, due to the vague "other issues" warning about wsaccel. ;-)
Thanks for the analysis. I don't usually use websocket-client with a very large number of connections, so I hope others with experience in this arena can chime in if they have noticed different performance results with wsaccel or numpy.
I do think #433 is worth implementing, but I don't know when I will get around to it. PRs welcome!
Can a helpful reader try running the code below and check my result?
I investigated this and encountered a result I was not expecting. I expected that installing numpy would make the mask()
function faster, but I found that running the mask()
function with numpy installed is 4 times slower than with just wsaccel, which means numpy should either be completely removed from this project (in the case where the pure Python masking is faster than numpy) or at minimum wsaccel should be prioritized over numpy (in the case where the pure Python masking is slower than numpy).
- Release 1.0.1 without numpy or wsaccel: 3.4 seconds
- Master with commit 287970e without numpy or wsaccel: 0.6 seconds
- Release 1.0.1 with numpy and wsaccel (numpy is prioritized): 2.25 seconds
- Release 1.0.1 with wsaccel (no numpy): 0.5 seconds
I ran the below testmask.py code using time python3 testmask.py
3 times for each numpy/wsaccel combination mentioned above:
from websocket._abnf import *
test_abnf = ABNF(0,0,0,0, opcode=ABNF.OPCODE_PING, mask=1, data=b"\x49\x20\x61\x6d\x20\x72\x75\x6e\x6e\x69\x6e\x67\x20\x61\x20\x71\x75\x69\x63\x6b\x20\x74\x65\x73\x74\x20\x6f\x6e\x20\x55\x54\x46\x38\x20\x64\x65\x63\x6f\x64\x69\x6e\x67\x20\x61\x6e\x64\x20\x6e\x65\x65\x64\x20\x73\x61\x6d\x70\x6c\x65\x20\x64\x61\x74\x61\x20\x66\x6f\x72\x20\x74\x65\x73\x74\x69\x6e\x67\x20")
for i in range(300000):
a = test_abnf._get_masked(b"\x08\x03\x22\x14")
print(a)
If I don't hear any inputs in the next week or so, I will plan to remove all numpy use from websocket-client
before the next release. If numpy is only used for masking and it only slows things down, there's not much point.
And the FAQ that previously mentioned "other issues" has had that phrase removed with more clarification around where wsaccel and numpy provide performance boosts.
My results:
1) b'\x08\x03"\x14A#Cy(qWzfjLs(b\x02e}jA\x7f(wGg|#Mz(VvR0#FqklF}fd\x02ufg\x02zmfF4{bOddf\x02piwC4nlP4|fQ`amE4'
real 0m4.135s
user 0m4.070s
sys 0m0.033s
2) b'\x08\x03"\x14A#Cy(qWzfjLs(b\x02e}jA\x7f(wGg|#Mz(VvR0#FqklF}fd\x02ufg\x02zmfF4{bOddf\x02piwC4nlP4|fQ`amE4'
real 0m0.769s
user 0m0.740s
sys 0m0.020s
3) b'\x08\x03"\x14A#Cy(qWzfjLs(b\x02e}jA\x7f(wGg|#Mz(VvR0#FqklF}fd\x02ufg\x02zmfF4{bOddf\x02piwC4nlP4|fQ`amE4'
real 0m2.163s
user 0m2.344s
sys 0m0.157s
4) b'\x08\x03"\x14A#Cy(qWzfjLs(b\x02e}jA\x7f(wGg|#Mz(VvR0#FqklF}fd\x02ufg\x02zmfF4{bOddf\x02piwC4nlP4|fQ`amE4'
real 0m0.862s
user 0m0.576s
sys 0m0.025s
I think, the original idea of numpy being the best option goes back to very early days of Python 2.7+.
I am not sure if wsaccel is even worth it at this point. Maybe we can go with pure Python implementation.
Thanks @wildraid for checking. With commit a462d45, numpy has been removed from this project. While the wsaccel performance boost for masking is minimal (around 10% improvement), I left support for wsaccel as is because of the roughly 2x performance boost on UTF8 validation. Here is the code I used for UTF8 testing and my results in case someone wants to test this in the future:
UTF8 test without wsaccel: real 4.943s (user 4.93s, sys 0.01s)
UTF8 test with wsaccel: real 3.124s (user 3.12s, sys 0.00s)
from websocket import _utils
for i in range(600000):
_utils.validate_utf8(b"\x72\x75\x6e\x6e\x69\x6e\x67\x20\x61\x20\x74\x65\x73\x74\x20\x6f\x6e\x20\x55\x54\x46\x38\x20\x64\x65\x63\x6f\x64\x69\x6e\x67")
I am closing this issue because both the documentation and code around wsaccel and numpy usage have been updated.