Switch to threadfin?
sigaloid opened this issue · comments
https://github.com/sagebind/threadfin
Provides automatic pool resizing.
It's a bit bigger of a dependency, but it avoids my (admittedly amateurish) resizing of the threadpool... I don't see a super compelling argument to feature-gating it, as it's the threadpool library, not something the enduser typically wants to have to configure.
I've actually been playing with the concept of removing threading altogether and using a state machine. When I was reading up on slowloris one of the things that was pointed out was that NGINX was immune to the attack because it doesn't have a thread pool, rather it uses one thread per CPU and utilizes a state machine to handle requests.
I've set up a basic implementation and ran drill against it. I've gotten a result of 1500 requests per second with 80 simultaneous requests being handled by the server and all on a single thread.
I'd love to see this implemented; gotta do some reading on state machines first 😅. Does this affect the speed of parsing of requests as well? That way we can easily benchmark it to quantify the change in speed.
https://github.com/JEBailey/vial/tree/nonblocking
So the idea behind it is to break down the processing into steps and keep track of where you are in the steps.
The main loop does the following:
- Check for new incoming requests, if there is one add it to the pool
- Iterate through the pool
- Read some data
- Are we done reading data? parse it.
- Are we done parsing? Make a request
- Do we have a request? Get a response?
- Write the response out.
My current implementation is very rough, my initial attempt was much more granular but I got to a point where I was reading up on RefCell and RC and I'm not up to speed on that. So I kept it simple.
Things I discovered. I had to write my own read timeout because setting it on a non-blocking stream doesn't work. I also had to implement it because certain browsers will initiate the TCP connection if you hover over a link and never actually touch it.
Scaling is done by the size of the Vector so I can set a max size and just drop connections if I exceed that amount. I haven't been able to max it out yet.
Very impressive, I like it! Is the performance better than currently? It seems like it would be. Unfortunately the existing Criterion benchmarks I have really only benchmark the request-parsing so I'm not sure if it would measure a difference in the case of a state machine.
Right now the results are very similar. There's a couple of things I've taken away from that. First, I haven't taxed the server enough to the point of there being a difference between the two. Second, that this is very favorable towards the state model as a naive implementation is able to hold its own against a thread pool.