Sorting variable %fn is not UTF8 normalized
mgutt opened this issue · comments
SABnzbd version
4.3.1
Operating system
Arch Linux (binhex Docker Container)
Using Docker image
Other
Description
How the problem occurred
After Copy & Paste something in the name field of the NZB upload form, the file has been renamed after download as follows (special chars in hex):
/ M C3 B6 v i e ( 1 9 5 0 ) / M 6F CC 88 v i e . 1 9 5 0 . 4 8 0 p . m k v
The renaming is based on this Sabnzbd sorter rule:
%title (%y)/%fn.%ext
This means "%title" uses a normalized UTF8 char to represent "ö" and the "%fn" variable uses non-normalized UTF8 (I don't know if it was already unnormalized while pasting the string or because something which happens inside of Sabnzbd). More information about the two different representations:
https://stackoverflow.com/questions/12147410/different-utf-8-signature-for-same-diacritics-umlauts-2-binary-ways-to-write
I renamed this file manually as follows:
cd "/Mövie (1950)"
find . -type f -exec sh -c 'mv "{}" "$(echo "{}" | uconv -x any-nfc)"' \;
Why was this a problem?
Nextcloud does not support the non-normalized UTF8 representation:
docker exec --user 99 nextcloud php occ files:scan --path="/jogi/files/Movie"
Starting scan for user 1 out of 1 (jogi)
Entry "Mövie (1950)/Mövie.1950.480p.mkv" will not be accessible due to incompatible encoding
So it's not really a bug. It has only nasty side-effects in special scenarios.
Question
Is it possible to influence the "%fn" variable with a pre or post script? Maybe something like this?
export SAB_FILENAME=$(echo "$SAB_FILENAME" | uconv -x any-nfc)
EDIT: Does not work as the non-normalized filename seems to be the one which is inside the archive.
Just sounds like you're not setting the locale correctly for your env ?
Do locale -a
and see what your using.. if it's not utf8 then prob need to set env stuff.
Also you can try and guard against some stuff by using:
Config > switches > Make Windows compatible