python-fitparse is not thread-safe
AartGoossens opened this issue · comments
I run python-fitparse in multiple threads (using Apache Beam/Google Cloud Dataflow) and was hitting issues with FIT files that contain developer data that raised errors like fitparse.utils.FitParseError: No such field 8 for dev_data_index 0
.
At first I thought the issue was related to this #124 but although it looks similar that was not the case. I discovered that the the global variable DEV_TYPES
in fitparse/records.py
was causing the issue.
The exact issue is that if multiple files are processed concurrently, the initialization of a developer_data_id
can "reset" the developer_data_id
for all the other files that are being processed, effectively erasing the field_description
messages that were added to the DEV_TYPE[developer_data_index]
in another thread.
This script reproduces the issue. It can be run in the root of the repo.
import copy
import threading
import time
from io import BytesIO
import fitparse
with open("tests/files/developer-types-sample.fit", "rb") as f:
buf = BytesIO(f.read())
# Count the field description message to know which is the last one
buf_copy = copy.deepcopy(buf)
fit_file = fitparse.FitFile(buf_copy)
FIELD_DESCRIPTION_COUNT = 0
for message in fit_file.get_messages():
if message.mesg_type.name == "field_description":
FIELD_DESCRIPTION_COUNT += 1
def thread_function_sleeps_after_last_field_description(buf):
fit_file = fitparse.FitFile(buf)
field_description_count = 0
for message in fit_file.get_messages():
if message.mesg_type.name == "field_description":
field_description_count += 1
if field_description_count >= FIELD_DESCRIPTION_COUNT:
# Sleep for a bit to wait for the other thread to initialize the developer_data_id
time.sleep(1)
def thread_function_break_after_developer_data_id(buf):
fit_file = fitparse.FitFile(buf)
for message in fit_file.get_messages():
if message.mesg_type.name == "developer_data_id":
break
buf_copy = copy.deepcopy(buf)
thread_1 = threading.Thread(target=thread_function_sleeps_after_last_field_description, args=(buf_copy,))
buf_copy = copy.deepcopy(buf)
thread_2 = threading.Thread(target=thread_function_break_after_developer_data_id, args=(buf_copy,))
thread_1.start()
thread_2.start()
The full stacktrace of the exception this script raises is:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
self.run()
File "/usr/lib/python3.9/threading.py", line 892, in run
self._target(*self._args, **self._kwargs)
File "/home/aart/projects/python-fitparse/multithreading.py", line 28, in thread_function_sleeps_after_field_description
for message in fit_file.get_messages():
File "/home/aart/projects/python-fitparse/fitparse/base.py", line 470, in get_messages
for message in super(CacheMixin, self).get_messages(names, with_definitions, as_dict):
File "/home/aart/projects/python-fitparse/fitparse/base.py", line 440, in get_messages
message = self._parse_message()
File "/home/aart/projects/python-fitparse/fitparse/base.py", line 456, in _parse_message
self._messages.append(super(CacheMixin, self)._parse_message())
File "/home/aart/projects/python-fitparse/fitparse/base.py", line 154, in _parse_message
message = self._parse_definition_message(header)
File "/home/aart/projects/python-fitparse/fitparse/base.py", line 224, in _parse_definition_message
field = get_dev_type(dev_data_index, field_def_num)
File "/home/aart/projects/python-fitparse/fitparse/records.py", line 476, in get_dev_type
raise FitParseError("No such field %s for dev_data_index %s" % (field_def_num, dev_data_index))
fitparse.utils.FitParseError: No such field 8 for dev_data_index 0
I have been working on a solution in my fork. Relevant commit here: AartGoossens@4c837b8
I really do not want to put any pressure on merging my fix into this repo and am open to work on an alternative solution, or create a PR with the existing fix.
Existing fix looks good, if you want to submit a PR