sabnzbd / sabnzbd

SABnzbd - The automated Usenet download tool

Home Page:http://sabnzbd.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sorting variable %fn is not UTF8 normalized

mgutt opened this issue · comments

SABnzbd version

4.3.1

Operating system

Arch Linux (binhex Docker Container)

Using Docker image

Other

Description

How the problem occurred
After Copy & Paste something in the name field of the NZB upload form, the file has been renamed after download as follows (special chars in hex):

/ M C3 B6 v i e  ( 1 9 5 0 ) / M 6F CC 88 v i e . 1 9 5 0 . 4 8 0 p . m k v

The renaming is based on this Sabnzbd sorter rule:

%title (%y)/%fn.%ext

This means "%title" uses a normalized UTF8 char to represent "ö" and the "%fn" variable uses non-normalized UTF8 (I don't know if it was already unnormalized while pasting the string or because something which happens inside of Sabnzbd). More information about the two different representations:
https://stackoverflow.com/questions/12147410/different-utf-8-signature-for-same-diacritics-umlauts-2-binary-ways-to-write

I renamed this file manually as follows:

cd "/Mövie (1950)"
find . -type f -exec sh -c 'mv "{}" "$(echo "{}" | uconv -x any-nfc)"' \;

Why was this a problem?
Nextcloud does not support the non-normalized UTF8 representation:

docker exec --user 99 nextcloud php occ files:scan --path="/jogi/files/Movie"
Starting scan for user 1 out of 1 (jogi)
        Entry "Mövie (1950)/Mövie.1950.480p.mkv" will not be accessible due to incompatible encoding

So it's not really a bug. It has only nasty side-effects in special scenarios.

Question
Is it possible to influence the "%fn" variable with a pre or post script? Maybe something like this?

export SAB_FILENAME=$(echo "$SAB_FILENAME" | uconv -x any-nfc)

EDIT: Does not work as the non-normalized filename seems to be the one which is inside the archive.

Just sounds like you're not setting the locale correctly for your env ?

Do locale -a and see what your using.. if it's not utf8 then prob need to set env stuff.

Also you can try and guard against some stuff by using:
Config > switches > Make Windows compatible

I can imagine there's a difference, because %fn is based also on disk-listing, while %title is based on parsing.
Have to really do a deep dive to see where to do this.

We have other problems related to this also in other parts: #1633