joblib / joblib

Computing with Python functions.

Home Page:http://joblib.readthedocs.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

numjobs 0 should be all cores and -1 should be all but one

Liquidmasl opened this issue · comments

the num_jobs parameter currently works like this:

The maximum number of concurrently running jobs, 
such as the number of Python worker processes when backend=”multiprocessing” 
or the size of the thread-pool when backend=”threading”. 
If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, 
and the behavior amounts to a simple python for loop. 
This mode is not compatible with timeout. 
For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. 
Thus for n_jobs = -2, all CPUs but one are used. 
None is a marker for ‘unset’ that will be interpreted as n_jobs=1 unless the call is performed under a parallel_config() context manager that sets another value for n_jobs.

This seams unnecessarily obscure

currently -2 equals to all cores but 1
and -5 to all but 4

instead of
-1 equals to all but 1 and so on

what is the reason its not

+n -> n cores
0  -> all cores
-n -> available cores - n

Especially when None is supported anyway.

What is currently num_jobs = 0 do?

I understand that changing this in a patch is probably problematic for backwards compatibility, especially its "just" for readability.

I'm closing this issue because we have many others that require attention and I want to focus our tracker on blockers.

Thanks for the discussion.

joblib has existing for almost 15 years and is massively used. We cannot do such a change which would impact millions of people worldwide.

sadly, i totally get it, and i feared so.
In the meanwhile I found that a buunch of other libraries do it the same..
so for the sake of uniformity we also do it like that now in our library, but it pains me haha
thats how confusing functionality stays alive!

Thanks for your quick reply anyway