not able to read heavily fragmented TDMS (v1.6.0)

Question

not able to read heavily fragmented TDMS (v1.6.0)

philippbaumm opened this issue 2 years ago · comments

Hey everyone!
I wanted to try out the new defragment method on my files, but it seems the module is not able to read them in the first place:

OS: Win10
Python 3.10.4

Traceback (most recent call last):
File "", line 474, in
tdms_file = TdmsFile.read(fpath) // tdms_file = TdmsFile.open(fpath)
File "\nptdms\tdms.py", line 131, in init
self._read_file(
File "\nptdms\tdms.py", line 293, in _read_file
self._read_data(tdms_reader)
File "\nptdms\tdms.py", line 305, in _read_data
for chunk in tdms_reader.read_raw_data():
File "\nptdms\reader.py", line 144, in read_raw_data
self._verify_segment_start(segment)
File "\nptdms\reader.py", line 344, in _verify_segment_start
raise ValueError(
ValueError: Attempted to read data segment at position 380796024 but did not find segment start header. Check that the tdms_index file matches the tdms data file.

I can provide the following files via email if desired:

fragmented.tdms (363 MB)
fragmented.tdms_index (207 MB)
defragmented_by_labview.tdms (155 MB)
defragmented_by_labview.tdms_index (38.8 KB)
----> 7zipped 77.4 MB

The tool i use for defragmentation sadly doesnt support my planned automation process (GUI application).

Regards Philipp

Marcel Eifert · Answer 1 · Mon Sep 05 2022 15:16:36 GMT+0800 (China Standard Time)

Hello Philipp,

It seems to be a problem with trying to read the index file while reading the data. Can you read the data file with nptdms.TdmsFile(filepath) or do you get the same error with that? Does the error also occur when there is no .tdms_index present?

For defragementation the index file is not needed. All the function does is taking the data file, reading all the data, moving them so single Segments and if a index_file is desired taking the data bytes string, replacing b?TDSm with bTDSh and throw away the data. Deleting the index_file prior defragmentation or not creating one in the first place may be desired when the defragmentation step shall always be carried out after data creation.

With the code snippet you can defragment files present in filesystem and store data and index file with suffix _defragemented.

import nptdms
from io import BytesIO

filepath = "path/to/your/file"
with open(filepath, "rb") as f:
    data_stream = BytesIO()
    index_stream = BytesIO()
    nptdms.TdmsWriter.defragment(f, data_stream, index_file=index_stream)

defragmented_path = filepath.replace(".tdms", "_defragmented.tdms")
with open(defragmented_path, "wb") as f:
    data_stream.seek(0, 0)
    f.write(data_stream.read())

with open(defragmented_path + "_index", "wb") as f:
    index_stream.seek(0, 0)
    f.write(index_stream.read())

Philipp Baumm · Answer 2 · Mon Sep 05 2022 16:24:15 GMT+0800 (China Standard Time)

Hello Eifi1.
TdmsFile(), .read() and .open() do not work if an index file is present, however it works when i remove it.
Applying the defragment method then works too. And your provided code aswell!

https://www.ni.com/docs/en-US/bundle/labview/page/glang/tdms_file_open.html
The .tdms_index file is optional in TDMS applications. When you distribute a TDMS application or .tdms file to another computer, you do not need to include the corresponding .tdms_index file. You can use this function to create a new .tdms_index file for your TDMS application if necessary.

Thank you for clarifing this and the super fast help!!