ShokoAnime / ShokoServer

Repository for Shoko Server.

Home Page:http://shokoanime.com/shoko-server/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature Request] Hashing prioritisation

Reinachan opened this issue · comments

Current behaviour seems random, however, people usually watch anime in sequential order, so it would make sense to prioritise hashing files sequentially based on the episode number when possible.

I suggest that for files with similar names where the only difference is a number, ShokoAnime should prioritise files with a lower number before those with higher numbers. If it's unable to determine the episode number, it should do things the way it's currently doing it.

Context;

  1. Shoko doesn't care about filenames at all.
  2. The files are processed in the order they are discovered in (with some exceptions).

I'm not against adding a bit more "predictability" to the process, but i also don't see the benefit of adding this behaviour. Others on the team might see it differently though.

Context;

  1. Shoko doesn't care about filenames at all.
  2. The files are processed in the order they are discovered in (with some exceptions).

That's what I assumed. I had that issue with my fileserver when reconstructing chunked uploads and ended up fetching filenames first and then initialise the process of reconstructing the file.

I'd suggest doing something similar for Shoko. First grab the filenames, check for prioritisation, then run the hasher.

  1. The files are processed in the order they are discovered in (with some exceptions).

If that would be true, wouldn't same series be hashed in order as they should be in same directory ? (assuming they are)

  1. The files are processed in the order they are discovered in (with some exceptions).

If that would be true, wouldn't same series be hashed in order as they should be in same directory ? (assuming they are)

Only if they are discovered in sequential order.

If that would be true, wouldn't same series be hashed in order as they should be in same directory ? (assuming they are)
@bigretromike

Assuming C# (or whatever library) is using the same APIs under the hood as Rust does, that's not the case, no.

This function currently corresponds to the opendir function on Unix and the FindFirstFile function on Windows. Advancing the iterator currently corresponds to readdir on Unix and FindNextFile on Windows. [...]

The order in which this iterator returns entries is platform and filesystem dependent.
(source)

That said, this is only an issue when the server is recieving a directory, not when it recieves individual files (like if you're downloading the episodes separately). Idk if those are distinguishable events for the server or not.

commented

Ultimately without being stupidly slow in file discovery I don’t feel this will be that viable, and there is the difference between the full file tree scan and the filesystem watcher, once the commands are in the queue, they may be processed typically in order of priority then last updated, but that could change.

we don’t have any sorting currently as to do that we would need to load the entire import folder tree into memory before sorting and such a situation will lead quickly into poor performance in larger collections, and we already have a large memory footprint

I primarily mean on filewatcher events when you put a directory into the import folder. The way I have things set up is that once an anime is fully downloaded, it'll hardlink the containing folder into the Shoko import folder.

I don't think this should be done on an initial import, nor on individual files placed into the import folder. Only when a directory with multiple anime in it is placed into the import folder. You could also make it optional.

Basically, on filesystem event directiry, read entries in directory, determine sorting, perform in that order.

As for memory footprint, I personally don't mind short spikes of increased memory. You can mark the setting as "potentially memory intensive during imports" if it turns out to be a problem.

That could be done, since a directory detection is unique from a file detection