Serial-ATA / lofty-rs

Reproducer

Parsing fails with BadFrameId errors.

Summary

According to http://id3.org/id3v2.4.0-frames ID3v2.4 uses the null character to separate multiple values, which are allowed for all text information frames (tags beginning with T like TEP1 and TCOM).

Expected behavior

t.b.d.

Might require API changes to handle those multiple values properly.

A lossy implementation that is compatible with the current API could only read the first, non-empty string and silently discard all subsequent strings. While this might be suitable for an application (Example: Mixxx), it is inappropriate for a general purpose library.

Assets

ID3v2.4 example:
txxx_utf16_multi_value_id3v24.zip

@Serial-ATA v0.16.1 could be released before fixing this bug. It is a known issue that affects all previous versions.

I've looked into it, and the issue is that we simply don't parse UTF-16 values correctly.

The first issue is that it stops on a null terminator no matter what:

lofty-rs/src/util/text.rs

Line 181 in b87afe4

[0, 0] => None,

And secondly, the tag you had actually encoded the strings properly, which I've never seen before. Normally a UTF-16 encoded frame has its BOM specified in only one of the values and the rest are just meant to be inferred. Your tag actually has a BOM for every value, which simply isn't handled.

When handling multiple values, we retain all of the null separators, treating the frame content as one big string, and simply splitting/replacing the separators in the background. This means that there shouldn't have to be any API changes, rather we just have to strip the BOM(s) and (of course) stop halting the reader at the null terminator.

And secondly, the tag you had actually encoded the strings properly, which I've never seen before. Normally a UTF-16 encoded frame has its BOM specified in only one of the values and the rest are just meant to be inferred. Your tag actually has a BOM for every value, which simply isn't handled.

The repeated BOM at the start of each substring in the attached example is indeed uncommon and could be considered an error. But an application must have created it somehow. Not unlikely that others stumble over it if lofty is adopted more widely. Unfortunately, I am not aware of the actual origin of this file.

IDv2.4: Parsing multi-valued UTF-16 text fields fails

Reproducer

Summary

Expected behavior

Assets