drewnoakes / metadata-extractor-dotnet

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add async support

lejsekt opened this issue · comments

What do you think about adding async method overloads? There is a lot of operations on streams that could benefit from that.

I've pondered this. It would require some consideration, and potentially heavy use of ValueTask<> to avoid excessive GC overhead.

@kwhopper have you given this any thought as part of your IO work in #250?

Not specifically, but it's certainly easier to deal with in a RAS structure; fewer classes and situations to deal with. I'd thought about adding interfaces to the new classes and only using those in the core library. Then the implementation could be swapped out or dynamically loaded, which could potentially play into this.

I have done a few things with async in web applications but am no expert. A caller would need to know very early whether to use async-specific methods. So for the library in general, wouldn't it need to be async (or not) all the way down? It seems a bit difficult to be one way or the other in compiled code.

@kwhopper I would like to be able to understand more what it explains about "a library must be complete asynchronous or not", because I have a lot of interest in contributing in this matter, this library was perfect in a project that I maintain, so the only thing that made it a bit difficult for me is having to work a synchronous method in an environment where I do a lot of I/O operations, a synchronous method is a super plus.

@oliver021 All I mean is a truly "asynchronous" pipeline has no parts that are synchronous in it...

https://docs.microsoft.com/en-us/archive/msdn-magazine/2013/march/async-await-best-practices-in-asynchronous-programming#async-all-the-way

I'm totally fine with exploring an asynchronous version of MetadataExtractor. The trick is making it async from start to finish all in one swoop, at least for the IO-intensive or blocking operations that matter. There can't be blocking, synchronous calls sprinkled in here or there or we're asking for random trouble. As the article indicates, there are real possibilities for deadlocks when the two are mixed inappropriately.

@kwhopper I completely agree with you, so doing an asynchronous method with Task.Run implies a false implementation of asynchronicity.

The fundamental idea is to take turns using the
threadpool I / O operations and non-I / O operations
that is, those that process or work with the result that the I / O operation brought,
to explain it better, let's imagine that we are reading a file sequentially
and extracting bytes from the reading which implies that when you already have the array
of bytes (buffer), whatever we do with that delay array
at the next reading (I / O operation), with an asynchronous flow we keep busy with the results of reading the file and another thread takes care to deliver the next result of reading the file,
but this does not mean that our operations are faster because they are asynchronous although what is streamlined by having the operations reading on another thread is slowed down by synchronization operations between threads
and the abstractions implied by the System.Threading.Tasks.Task, any benchmark
in .Net reveals that a single asynchronous operation with respect to its synchronous version
it is slower, asynchronicity is not speed, it is resistance and scalability.
Now using Task.Run or ThreadPool.QueueUserWorkItem we can fall into the
problem that you are pointing out because the thread that calls the method
asynchronous implemented with the two previous forms you will perceive that
the implementation is asynchronous, for the thread that the threadpool designates to execute
synchronous implementation runs a serious risk of deadlocking or threadpool will be forced to generate many threads for dozens of calls with this type of implementation or queuing too many calls causing disasters in performance without achieving scalability, in short the asynchronous code is business, we invest (that is, we lose speed) to achieve scalability