Site-wide denial of service in search box on browse page

Question

Site-wide denial of service in search box on browse page

DataKinds opened this issue a year ago · comments

Hi all. This issue details a DoS attack which is easy for a user to accidentally perform on a live Flight Review instance

High level description

Making a request to /browse_data_retrieval seems to block requests to the rest of the app. With thousands of logs uploaded, this request can block for upwards of multiple seconds. The request is issued on every input to the search box on /browse, so it is easy for a user to accidentally bring down the Flight Review server for an extended time just by typing into the search box and leaving the page open.

This can block long enough to produce 504 timeout errors from Nginx or cause the Bokeh JS wrapper to fail to connect.

Steps to reproduce

Open a log (in the /plot_app endpoint) in a new tab.
Open the /browse endpoint, and open your browser's devtools to the network request tab.
Begin typing in the search box until your devtools are full of network requests to /browse_data_retrieval.
Switch back to the /plot_app tab and refresh.
The refresh will block until the last of the /browse_data_retrieval requests are served.

(If you're unlucky and your local instance displays the same errata as below, the first request will block forever and the /plot_app will never refresh. The above 5 steps seem to work to replicate this behavior on the live http://review.px4.io/ instance though).

Errata

Sometimes, the search errors out on the /browse page with an AJAX error. This seems to immediately return the server to a working state (it probably kills the pending network connections, haven't been able to reproduce in the last hour so I can't check).

On our local instance, despite having parity with PX4/flight_review, the requests to /browse_data_retrieval seem to block forever. Not sure if this is a difference in DB setup, in browser configuration, or in deploy environment configuration, but it has forced us to remote in & hard reboot the flight review server on multiple occasions when we couldn't locate the faulty connected client.

Beat Küng · Answer 1 · Fri Mar 24 2023 14:33:49 GMT+0800 (China Standard Time)

Hi, thanks for reporting. There is something off indeed. Do you have time to look into this a bit further?

Tyler · Answer 2 · Mon Mar 27 2023 05:59:04 GMT+0800 (China Standard Time)

I'll likely be looking into this over the next week in order to patch it in our in-house instance. I'd be happy to submit a PR upstream once that work is done.

Beat Küng · Answer 3 · Mon Mar 27 2023 13:25:52 GMT+0800 (China Standard Time)

Cool, thanks. Changing the tornado version might already help.

Beat Küng · Answer 4 · Tue Apr 11 2023 15:44:26 GMT+0800 (China Standard Time)

Hi @DataKinds, did you find anything?

Beat Küng · Answer 5 · Mon Jun 19 2023 21:33:50 GMT+0800 (China Standard Time)

Fixed in 5ced4b0