[Feature Request] Hashing prioritisation

Question

[Feature Request] Hashing prioritisation

Reinachan opened this issue a year ago · comments

Current behaviour seems random, however, people usually watch anime in sequential order, so it would make sense to prioritise hashing files sequentially based on the episode number when possible.

I suggest that for files with similar names where the only difference is a number, ShokoAnime should prioritise files with a lower number before those with higher numbers. If it's unable to determine the episode number, it should do things the way it's currently doing it.

Mikal S. · Answer 1 · Sat Apr 15 2023 00:25:10 GMT+0800 (China Standard Time)

Context;

Shoko doesn't care about filenames at all.
The files are processed in the order they are discovered in (with some exceptions).

Mikal S. · Answer 2 · Sat Apr 15 2023 00:29:42 GMT+0800 (China Standard Time)

I'm not against adding a bit more "predictability" to the process, but i also don't see the benefit of adding this behaviour. Others on the team might see it differently though.

Nina Louise · Answer 3 · Sat Apr 15 2023 00:30:48 GMT+0800 (China Standard Time)

Context;

Shoko doesn't care about filenames at all.

The files are processed in the order they are discovered in (with some exceptions).

That's what I assumed. I had that issue with my fileserver when reconstructing chunked uploads and ended up fetching filenames first and then initialise the process of reconstructing the file.

I'd suggest doing something similar for Shoko. First grab the filenames, check for prioritisation, then run the hasher.

BigRetroMike · Answer 4 · Sat Apr 15 2023 00:33:43 GMT+0800 (China Standard Time)

The files are processed in the order they are discovered in (with some exceptions).

If that would be true, wouldn't same series be hashed in order as they should be in same directory ? (assuming they are)

Mikal S. · Answer 5 · Sat Apr 15 2023 00:41:09 GMT+0800 (China Standard Time)

The files are processed in the order they are discovered in (with some exceptions).

If that would be true, wouldn't same series be hashed in order as they should be in same directory ? (assuming they are)

Only if they are discovered in sequential order.

Nina Louise · Answer 6 · Sat Apr 15 2023 01:05:48 GMT+0800 (China Standard Time)

If that would be true, wouldn't same series be hashed in order as they should be in same directory ? (assuming they are)
@bigretromike

Assuming C# (or whatever library) is using the same APIs under the hood as Rust does, that's not the case, no.

This function currently corresponds to the opendir function on Unix and the FindFirstFile function on Windows. Advancing the iterator currently corresponds to readdir on Unix and FindNextFile on Windows. [...]

The order in which this iterator returns entries is platform and filesystem dependent.
(source)

That said, this is only an issue when the server is recieving a directory, not when it recieves individual files (like if you're downloading the episodes separately). Idk if those are distinguishable events for the server or not.

Cazzar · Answer 7 · Sat Apr 15 2023 01:29:16 GMT+0800 (China Standard Time)

Ultimately without being stupidly slow in file discovery I don’t feel this will be that viable, and there is the difference between the full file tree scan and the filesystem watcher, once the commands are in the queue, they may be processed typically in order of priority then last updated, but that could change.

we don’t have any sorting currently as to do that we would need to load the entire import folder tree into memory before sorting and such a situation will lead quickly into poor performance in larger collections, and we already have a large memory footprint

Maximo Piva · Answer 8 · Sat Apr 15 2023 02:19:28 GMT+0800 (China Standard Time)

You mean for initial import or forced rescan? Because after that the system do ingests from file system watcher events When the file system watcher detect new files, the order of the import is usually the order you store/copy/move your files in there. We cannot sort something that is not in directory yet. The only case is when you move an entire directory into from the same physical location, which is almost immediately otherwise the system will copy one file at the time, every new file will trigger the event, and the import. El El vie, 14 de abr. de 2023 a la(s) 13:23, Nina Louise < ***@***.***> escribió:

…

Current behaviour seems random, however, people usually watch anime in sequential order, so it would make sense to prioritise hashing files sequentially based on the episode number when possible. I suggest that for files with similar names where the only difference is a number, ShokoAnime should prioritise files with a lower number before those with higher numbers. If it's unable to determine the episode number, it should do things the way it's currently doing it. — Reply to this email directly, view it on GitHub <#1058>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAI4G3MGBUYV2ZDJ7IHMTJTXBF2YFANCNFSM6AAAAAAW6T7SLM> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Nina Louise · Answer 9 · Sun Apr 16 2023 23:38:36 GMT+0800 (China Standard Time)

I primarily mean on filewatcher events when you put a directory into the import folder. The way I have things set up is that once an anime is fully downloaded, it'll hardlink the containing folder into the Shoko import folder.

I don't think this should be done on an initial import, nor on individual files placed into the import folder. Only when a directory with multiple anime in it is placed into the import folder. You could also make it optional.

Basically, on filesystem event directiry, read entries in directory, determine sorting, perform in that order.

As for memory footprint, I personally don't mind short spikes of increased memory. You can mark the setting as "potentially memory intensive during imports" if it turns out to be a problem.

da3dsoul · Answer 10 · Sun Apr 16 2023 23:54:19 GMT+0800 (China Standard Time)

That could be done, since a directory detection is unique from a file detection

Maximo Piva · Answer 11 · Mon Apr 17 2023 00:05:12 GMT+0800 (China Standard Time)

While it could do directory events. Your use case expect the directory appear instantly with their files in the import location, for that specific use case hard link or move directory in the same physical location. It could be done. But if the user copies a directory into the import location. Files are copied one by one, and import order will be the order the system copies the files inside. If your fine with that I think it could be done. El El dom, 16 de abr. de 2023 a la(s) 12:38, Nina Louise < ***@***.***> escribió:

…

I primarily mean on filewatcher events when you put a directory into the import folder. The way I have things set up is that once an anime is fully downloaded, it'll hardlink the containing folder into the Shoko import folder. I don't think this should be done on an initial import, nor on individual files placed into the import folder. Only when a directory with multiple anime in it is placed into the import folder. You could also make it optional. Basically, on filesystem event directiry, read entries in directory, determine sorting, perform in that order. As for memory footprint, I personally don't mind short spikes of increased memory. You can mark the setting as "potentially memory intensive during imports" if it turns out to be a problem. — Reply to this email directly, view it on GitHub <#1058 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAI4G3LH7V34SACA4V56O5LXBQHAPANCNFSM6AAAAAAW6T7SLM> . You are receiving this because you commented.Message ID: ***@***.***>