mp_shared_memory_apply

A proper way to do parallel apply without copying the data across the processes using Multiprocessing?

It has 3 very nice features:

Can apply any pickable function.
Can use any numpy.ndarray tensor (which was the original intention but should be applicable to any indicable data structure).
The target data will not be copied across the processses (unless the data are returned in ur function).
No extra 3rd party libraries required (unless u need them in ur function obviously).
Switch between Multiprocess and Multiprocessing backend, to be able to use in Jupyter and console.

And 3 annoying limitations:

The function to be applied has to be pickable (it's a limitation in Multiprocessing, currently I see no way walkaround it).
The function definition has to be in a Python module file to be imported.
Multiprocessing cannot return any object larger than 2GB (it's a limitation or bug in Multiprocessing).
- See uqfoundation/pathos#217

Define ur pickable function as a Python module and store it in the same folder as the boilerplate.
Call parallel_tensor_apply( func_module_name, data_tensor, index_set, max_processes = 99, \**kwargs ) where:

func_module_name is ur function module file.
data_tensor is the indicable data u want the function to apply on.
index_set is the set containing the indicies of how the function should be applied. The parallel backend will distribute the task according to this set.
**kwargs will be passed further as keyword arguments to ur function, note that it has to be the same for indicies, so they are rather constant parameters than some "extra indicies".
max_processes is the number of processes u wanna use.

The data will then be automatically copied to shared memory and the function will be applied to it in parallel.