python-restx / flask-restx

Fork of Flask-RESTPlus: Fully featured framework for fast, easy and documented API development with Flask

Home Page:https://flask-restx.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Swagger schema creation can crash if multiple requests arrive quickly on startup [theory]

peterhorsley opened this issue · comments

Hello flask-restx team!

This is a bit of a nasty one sorry! We have recently twice observed a crash (call stack below) inside the Swagger() constructor on application startup, when it receives its first request. The exception being thrown ("dictionary changed size during iteration") is indicative of a threading issue where there are multiple threads concurrently trying to construct a Swagger() object, which is assigned to a cached property on the Api class when the first request that requires validation arrives (or when the swagger-ui url is loaded). As there are no locks and no threads in flask-restx, it appears that the Swagger() constructor is not thread-safe, and if multiple requests arrive very quickly at application startup (and flask is running with threaded=True), it is possible that data corruption and crashes can happen during schema rendering. Please note this is just my theory on root cause, and I'm submitting this issue to hear from anyone else in case I've assumed wrong. The crash randomly happens (we've seen it twice in the last week), and despite trying, I have so far not found a way to reproduce it unfortunately.

As for a fix, it would seem that a lock should be used to guarantee thread-safety of the Swagger() constructor. I would be happy to work on a PR for that if advised by flask-restx maintainers.

Code

Happy to provide, in particular the model definitions we use, if it helps, but as this is largish application and the call stack indicates a non-reproducible threading condition, my thought is that the root cause is not directly related to our model definitions. So I initially wanted to seek advice on course of action based on the call stack and my interpretation. We do have Nested fields, but only a single level of nesting.

Repro Steps (if applicable)

Sorry, not known.

Expected Behavior

If multiple requests reach the server quickly on startup, schema creation should be synchronized to ensure it is created before any request is processed.

Actual Behavior

If schema creation fails, the application continues to run, but requests that expect validation using can crash during validation when schema is referenced, indicative of corrupt/incomplete schema, for example, we see this:

Traceback (most recent call last):
File "/home/app/.local/lib/python3.8/site-packages/jsonschema/validators.py", line 966, in resolve_fragment
document = document[part]
KeyError: 'definitions'

Error Messages/Stack Trace

2023-05-29 11:52:47,766 ERROR T140221658154752 [api.schema] Unable to render schema
Traceback (most recent call last):
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/api.py", line 573, in schema
self._schema = Swagger(self).as_dict()
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 275, in as_dict
serialized = self.serialize_resource(
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 482, in serialize_resource
path[method] = self.serialize_operation(doc, method)
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 488, in serialize_operation
"responses": self.responses_for(doc, method) or None,
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 622, in responses_for
responses[code]["schema"] = self.serialize_schema(d["model"])
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 672, in serialize_schema
self.register_model(model)
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 703, in register_model
self.register_field(field)
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 713, in register_field
self.register_field(field.container)
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/swagger.py", line 711, in register_field
self.register_model(field.nested)
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/fields.py", line 261, in nested
return getattr(self.model, "resolved", self.model)
File "/home/app/.local/lib/python3.8/site-packages/werkzeug/utils.py", line 109, in get
value = self.fget(obj) # type: ignore
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/model.py", line 176, in resolved
resolved = copy.deepcopy(self)
File "/usr/local/lib/python3.8/copy.py", line 153, in deepcopy
y = copier(memo)
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/model.py", line 236, in deepcopy
[(key, copy.deepcopy(value, memo)) for key, value in self.items()],
File "/home/app/.local/lib/python3.8/site-packages/flask_restx/model.py", line 236, in
[(key, copy.deepcopy(value, memo)) for key, value in self.items()],
File "/usr/local/lib/python3.8/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/local/lib/python3.8/copy.py", line 270, in _reconstruct
state = deepcopy(state, memo)
File "/usr/local/lib/python3.8/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/local/lib/python3.8/copy.py", line 229, in _deepcopy_dict
for key, value in x.items():
RuntimeError: dictionary changed size during iteration
2023-05-29 11:52:47,866 DEBUG T140221658154752 PUT to /api/v1/devices/100 processed in 169ms code 200
2023-05-29 11:52:47,880 DEBUG T140221972719360 PUT to /api/v1/devices/101 processed in 178ms code 200
2023-05-29 11:52:47,886 DEBUG T140221658154752 POST to /api/v1/devices/query processed in 17ms code 200
2023-05-29 11:52:47,888 DEBUG T140221689624320 PUT to /api/v1/devices/102 processed in 188ms code 200
2023-05-29 11:52:47,909 DEBUG T140221972719360 POST to /api/v1/devices/query processed in 4ms code 200

^^^ Note the multiple requests arriving on different theads within the same second as the crash, logged after the call stack ^^^

Environment

  • Python version 3.8.10
  • Flask version 2.0.3
  • Flask-RESTX version 1.0.6
  • Other installed Flask extensions (none)

Thanks for your time.

@peterhorsley How is this application being deployed? I suspect you are probably correct in that flask-restx is not designed to be thread safe! However, I have a production application deployed on AWS EB with gunicorn and I have never seen this issue on scaling, so I'm wondering is it related to the flask development server.

@peter-doggart we can reproduce in both production and dev environments. our production environment is deployed in docker containers in aws k8s using apache, specifically using the python mod-wsgi package. we can also reproduce using flask dev server by using locust to hammer the server with requests on startup. for now we have implemented a workaround by adding a global lock to flask's @app.before_request method forcing generation of the schema by accessing the internal schema attribute, like this:

@app.before_request
def before_request():
    with MyApp.request_lock:
       if not MyApp.schema_generated:
            logging.info(f'Generating swagger spec for api')
            json.dumps(MyApp.api.__schema__) # <-- Force flask-restx schema to be generated
            MyApp.schema_generated = True

But of course would be better to fix in flask-restx so this is not needed.

im facing off this problem too, the problem is my schema is partially generated so i dont know if i can do the same approach of solution mentioned by @peterhorsley