rdicosmo / parmap

Parmap is a minimalistic library allowing to exploit multicore architecture for OCaml programs with minimal modifications.

Home Page:http://rdicosmo.github.io/parmap/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

do you want parallel file operations?

UnixJunkie opened this issue · comments

I have this one currently:

let parmap_on_file (ncores: int) (fn: string) (f: 'a -> 'b) (read_one: in_channel -> 'a): 'b list = ...

Or, I wonder if I should create a separate library depending on parmap ...

I will create a separate library if I gather enough interesting primitives.

I think such a function is quite useful. I'd like to contribute it to parmap.
Here is the current signature:

let parmap_on_file
    (ncores: int)
    (fn: filename)
    (f: 'a -> 'b)
    (read_one: in_channel -> 'a): 'b list

If deemed useful, we can probably add later friend functions such as pariter_on_file,
parmap_fold_on_file, etc.

Let me know if you have a better interface to propose.

This is the second time I need such a functionality in a project, so I guess it can be quite
useful to other parmap users as well.
I do chemoinformatics, but I guess bioinformatics people might have such needs as well.

Regards,
Francois.

I haven't been using parmap in a while, so my opinion not useful at this time.

@UnixJunkie that sounds useful when we want to have only ncores items at once in memory.
A more general version would use any stream-like input: unit -> 'a option.

PS: I haven't done any "analysis-level" bioinformatics in a long while though :)

@smondet Is the option just used to send the end of file info via a None?

Maybe the most generic construct is:
let parallelize
(ncores: int)
(demux: () -> 'a)
(work: 'a -> 'b)
(mux: 'b -> ()): ()
but then that's so generic that it should reside out of parmap.

@UnixJunkie Yes, "End of Stream" actually 👍

parany can be used for that