balsoft / lambda-launcher

Application launcher in haskell. Mostly Just For Fun.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Get the results in parallel using Streaming

balsoft opened this issue · comments

Get the results in parallel using Streaming

What is unsafeInterleaveIO even supposed to do in the code? From what I could determine by just reading the code, it should not make any difference if you just remove it. In updateResults you delay the IO action with unsafeInterleaveIO and then immediately force it with a case expression:

updateResults :: Text -> [IO [Result]] -> IO (Maybe Event)
updateResults _ [] = return Nothing
updateResults q (result:results) = do
first <- unsafeInterleaveIO $ optional $ result
case first of
Nothing -> updateResults q results
Just f -> return $ Just $ ResultAdded q f results

If you want to delay the partial computations of each element in the list, you would have to call unsafeInterleaveIO repeatedly each time you generate an element. See for example the source code for readFile which recursively calls lazyRead, which starts with an unsafeInterleaveIO, for each chunk of data it will return.

So, for now I suggest either simply removing it or changing it so it actually does something.

@anka-213 I removed unsafeInterelaveIO for now.

So what exactly do you want it to do? Run each plugin in parallel? Delay getting results from a plugin until it is needed? If it is the first, streaming/pipes/conduits is not sufficient, you will need to run it in separate threads.

I suggest using async which has some nice primitives for concurrency and waiting for multiple results, especially mapConcurrently. Otherwise, it is fairly easy to implement yourself using forkIO and for example MVars.

The problem is I'm not aware of any out-of-the-box solution that would be [IO a] -> IO [a] where it returns the list "sorted" on the time that the IO finished (so evalutating the list element-by-element the way I'm doing this right now will result in search results appearing one-by-one as they arrive)

If you still want the type-signature [IO a] -> IO [a], you would indeed need to use unsafeInterleaveIO together with some method for racing the threads, for example a queue. Otherwise you could either return a Stream or something similar or simply store the queue together with the value.

I think the easiest solution for the racing would be to spawn all the threads in parallel (and if needed limit the amount of parallelism with a semaphore) and then when a thread is done, have it add the result to a concurrent queue (e.g. TQueue). Then you can consume the queue and use them in any pace you want. You'll also need some kind of counter to know when all threads are done. Also, don't forget to fully evaluate the value before putting it into the queue.

Does this seem sensible to you? If so, I might try implementing something like this if I get bored.

This seems absolutely sensible and it was indeed what I intended to do with unsafeInterleaveIO call. I just didn't think it through because the main part of this was written in a couple of hours as an experiment with gi-gtk-declarative.

If you wish to implement that, please do! It might also be nice to have it as a separate library so that others can use that.

@anka-213 Hi! Would you be willing to check out the latest commit with "Honest parallelism" and tell me if it seems alright? I implemented parallel execution with forkIO and Chan. Seems to work, but maybe it doesn't.

I looked at the code and I think it looks good. I left a minor comment in the commit.

I haven't tried running it, but if you want to test that the parallelism works you can add an artificial delay (with threadDelay) in several of the plugins and check that it doesn't take longer to complete than the slowest plugin.

Parallelism seems to work, I was just worried if there's some terrible leak or something. I'll rewrite it with mapM_, thanks.

Nope, nothing terrible. It should be perfectly fine now. :) (And even without the mapM_ it would have been perfectly fine, just very very marginally less efficient)

A bit late, but I recently discovered that the library streamly does exactly what we want here. It has a really nice interface, similar to Streaming. For example, this program performs a number of actions asynchronously and forwards the results as soon as they arrive:

import Streamly
import qualified Streamly.Prelude as S
import Control.Concurrent (threadDelay)
import Data.Function ((&))

main = S.fromList [5,4..1] 
     & S.mapM (\n -> n <$ threadDelay (n * 1000000))
     & asyncly
     & S.mapM_ print

There is a tutorial here: https://hackage.haskell.org/package/streamly-0.7.2/docs/Streamly-Tutorial.html