dtcooper / python-fitparse

Python library to parse ANT/Garmin .FIT files

Home Page:http://pythonhosted.org/fitparse/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

python-fitparse is not thread-safe

AartGoossens opened this issue · comments

I run python-fitparse in multiple threads (using Apache Beam/Google Cloud Dataflow) and was hitting issues with FIT files that contain developer data that raised errors like fitparse.utils.FitParseError: No such field 8 for dev_data_index 0.

At first I thought the issue was related to this #124 but although it looks similar that was not the case. I discovered that the the global variable DEV_TYPES in fitparse/records.py was causing the issue.

The exact issue is that if multiple files are processed concurrently, the initialization of a developer_data_id can "reset" the developer_data_id for all the other files that are being processed, effectively erasing the field_description messages that were added to the DEV_TYPE[developer_data_index] in another thread.

This script reproduces the issue. It can be run in the root of the repo.

import copy
import threading
import time
from io import BytesIO

import fitparse


with open("tests/files/developer-types-sample.fit", "rb") as f:
    buf = BytesIO(f.read())


# Count the field description message to know which is the last one
buf_copy = copy.deepcopy(buf)
fit_file = fitparse.FitFile(buf_copy)
FIELD_DESCRIPTION_COUNT = 0
for message in fit_file.get_messages():
    if message.mesg_type.name == "field_description":
        FIELD_DESCRIPTION_COUNT += 1


def thread_function_sleeps_after_last_field_description(buf):
    fit_file = fitparse.FitFile(buf)
    field_description_count = 0
    for message in fit_file.get_messages():
        if message.mesg_type.name == "field_description":
            field_description_count += 1
            if field_description_count >= FIELD_DESCRIPTION_COUNT:
                # Sleep for a bit to wait for the other thread to initialize the developer_data_id
                time.sleep(1)


def thread_function_break_after_developer_data_id(buf):
    fit_file = fitparse.FitFile(buf)
    for message in fit_file.get_messages():
        if message.mesg_type.name == "developer_data_id":
            break


buf_copy = copy.deepcopy(buf)
thread_1 = threading.Thread(target=thread_function_sleeps_after_last_field_description, args=(buf_copy,))

buf_copy = copy.deepcopy(buf)
thread_2 = threading.Thread(target=thread_function_break_after_developer_data_id, args=(buf_copy,))

thread_1.start()
thread_2.start()

The full stacktrace of the exception this script raises is:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/home/aart/projects/python-fitparse/multithreading.py", line 28, in thread_function_sleeps_after_field_description
    for message in fit_file.get_messages():
  File "/home/aart/projects/python-fitparse/fitparse/base.py", line 470, in get_messages
    for message in super(CacheMixin, self).get_messages(names, with_definitions, as_dict):
  File "/home/aart/projects/python-fitparse/fitparse/base.py", line 440, in get_messages
    message = self._parse_message()
  File "/home/aart/projects/python-fitparse/fitparse/base.py", line 456, in _parse_message
    self._messages.append(super(CacheMixin, self)._parse_message())
  File "/home/aart/projects/python-fitparse/fitparse/base.py", line 154, in _parse_message
    message = self._parse_definition_message(header)
  File "/home/aart/projects/python-fitparse/fitparse/base.py", line 224, in _parse_definition_message
    field = get_dev_type(dev_data_index, field_def_num)
  File "/home/aart/projects/python-fitparse/fitparse/records.py", line 476, in get_dev_type
    raise FitParseError("No such field %s for dev_data_index %s" % (field_def_num, dev_data_index))
fitparse.utils.FitParseError: No such field 8 for dev_data_index 0

I have been working on a solution in my fork. Relevant commit here: AartGoossens@4c837b8

I really do not want to put any pressure on merging my fix into this repo and am open to work on an alternative solution, or create a PR with the existing fix.

Existing fix looks good, if you want to submit a PR