Site-wide denial of service in search box on browse page
DataKinds opened this issue · comments
Hi all. This issue details a DoS attack which is easy for a user to accidentally perform on a live Flight Review instance
High level description
Making a request to /browse_data_retrieval
seems to block requests to the rest of the app. With thousands of logs uploaded, this request can block for upwards of multiple seconds. The request is issued on every input to the search box on /browse
, so it is easy for a user to accidentally bring down the Flight Review server for an extended time just by typing into the search box and leaving the page open.
This can block long enough to produce 504 timeout errors from Nginx or cause the Bokeh JS wrapper to fail to connect.
Steps to reproduce
- Open a log (in the
/plot_app
endpoint) in a new tab. - Open the
/browse
endpoint, and open your browser's devtools to the network request tab. - Begin typing in the search box until your devtools are full of network requests to
/browse_data_retrieval
. - Switch back to the
/plot_app
tab and refresh. - The refresh will block until the last of the
/browse_data_retrieval
requests are served.
(If you're unlucky and your local instance displays the same errata as below, the first request will block forever and the /plot_app
will never refresh. The above 5 steps seem to work to replicate this behavior on the live http://review.px4.io/ instance though).
Errata
Sometimes, the search errors out on the /browse
page with an AJAX error. This seems to immediately return the server to a working state (it probably kills the pending network connections, haven't been able to reproduce in the last hour so I can't check).
On our local instance, despite having parity with PX4/flight_review, the requests to /browse_data_retrieval
seem to block forever. Not sure if this is a difference in DB setup, in browser configuration, or in deploy environment configuration, but it has forced us to remote in & hard reboot the flight review server on multiple occasions when we couldn't locate the faulty connected client.
Hi, thanks for reporting. There is something off indeed. Do you have time to look into this a bit further?
I'll likely be looking into this over the next week in order to patch it in our in-house instance. I'd be happy to submit a PR upstream once that work is done.
Cool, thanks. Changing the tornado version might already help.
Hi @DataKinds, did you find anything?