do you want parallel file operations?
UnixJunkie opened this issue · comments
I have this one currently:
let parmap_on_file (ncores: int) (fn: string) (f: 'a -> 'b) (read_one: in_channel -> 'a): 'b list = ...
Or, I wonder if I should create a separate library depending on parmap ...
I will create a separate library if I gather enough interesting primitives.
I think such a function is quite useful. I'd like to contribute it to parmap.
Here is the current signature:
let parmap_on_file
(ncores: int)
(fn: filename)
(f: 'a -> 'b)
(read_one: in_channel -> 'a): 'b list
If deemed useful, we can probably add later friend functions such as pariter_on_file,
parmap_fold_on_file, etc.
Let me know if you have a better interface to propose.
This is the second time I need such a functionality in a project, so I guess it can be quite
useful to other parmap users as well.
I do chemoinformatics, but I guess bioinformatics people might have such needs as well.
Regards,
Francois.
I haven't been using parmap in a while, so my opinion not useful at this time.
@UnixJunkie that sounds useful when we want to have only ncores
items at once in memory.
A more general version would use any stream-like input: unit -> 'a option
.
PS: I haven't done any "analysis-level" bioinformatics in a long while though :)
@smondet Is the option just used to send the end of file info via a None?
Maybe the most generic construct is:
let parallelize
(ncores: int)
(demux: () -> 'a)
(work: 'a -> 'b)
(mux: 'b -> ()): ()
but then that's so generic that it should reside out of parmap.
@UnixJunkie Yes, "End of Stream" actually 👍
parany can be used for that