imap - only a single engine is still used after a few iterations
tommedema opened this issue · comments
I've setup an ipyparalle cluster which resulted in a decent speed up from 15 it/s to about 80-120 it/s:
Sample code:
import ipyparallel as ipp
import os
clusterProcessesCount = os.cpu_count() # == 16
cluster = ipp.Cluster(n = clusterProcessesCount)
cluster.start_cluster_sync()
rc = cluster.connect_client_sync()
rc.wait_for_engines(clusterProcessesCount)
lview = rc.load_balanced_view()
dview = rc[:]
# ...
# push settings to child processes
dview.push(dict(
cvTrainingFoldMinMatchCount = cvTrainingFoldMinMatchCount,
cvTrainingFoldMaxMatchCount = cvTrainingFoldMaxMatchCount,
cvEarlyAbandonMinPromisingSuccessRate = cvEarlyAbandonMinPromisingSuccessRate
))
# ....
for result in lview.imap(parallelTrainTestQuery, cvIndicesAndQueries, ordered = False, max_outstanding = 'auto'):
print(result)
Note that since I am using this for cross validation, I am referring to the the number of folds that were processed until the engines stop being used. After 2 folds (iterations of iterations) it is only using 1 engine out of 16:
There are no error messages that I can see. At the first fold all 16 engines are fully used at 100% CPU. I did notice that when I first boot the cluster there are various warning messages, but none seem to stop the cluster from working.
Is there anything I can do to help resolve this?
Is there a chance the 100% cpu process is not an engine at all, but rather the notebook kernel or perhaps a scheduler? Can you tell which process that is (you can check the command-line of hte process with e.g. psutil.Process(pid).cmdline()
or ps ax
?
@minrk thanks for the response!
ps ax gave me:
1264 ?? Rs 1130:11.85 /Users/tommedema/opt/anaconda3/bin/python -m ipykern
also here are some screenshots from activity monitor:
That means it's your kernel (the client) not any engines stuck doing work, perhaps processing incoming results. If you interrupt your notebook when this happens, do you get a traceback? How big are the result
objects of your individual tasks? 300k is quite a few tasks. Depending on how many you have, you might want to add e.g. chunksize=10
to bundle 10 function calls per IPython Parallel message.
@minrk interesting, because the processing of a result is as simple as just adding it to a pandas dataframe:
with tqdm(total=cvQueriesLength) as pbar:
for result in lview.imap(parallelTrainTestQuery, cvIndicesAndQueries, ordered = False, max_outstanding = 'auto'):
pbar.update(1)
queryIndex = int(result[0])
successRate = result[1]
matchCount = int(result[2])
maxDistance = result[3]
cvResults.loc[cvResults.shape[0]] = {
'query_index': queryIndex,
'success_rate': successRate,
'match_count': matchCount,
'max_distance': maxDistance
}
The result object is quite small, it's a numpy array with 4 floats:
return np.array([queryIndex, successRate, matchCount, trainingMaxDistance], dtype = np.float32)
I indeed have 300k tasks for each fold.
@minrk chunksize seems interesting, but from the docs it only seems applicable to map
and not imap
? I am using imap
with max_outstanding = 'auto'
.
I did just try setting chunksize and got: TypeError: imap() got an unexpected keyword argument 'chunksize'
I did just interrupt it while this happened, and this is the traceback:
https://gist.github.com/tommedema/b7107e66f4f70d1b2fa669927b2a4cff
Sorry, you're right - imap doesn't support chunksize yet.
That traceback shows it was waiting in your pandas append, not an IPP call. This could be a coincidence, but I suspect it's because appending a row to a pandas DataFrame makes a copy of the whole DataFrame. This gets expensive when you have 300k rows and are making a whole new 250k row data frame as each new row comes in.
Pre-allocating the whole DataFrame should be loads faster and less memory intensive:
N = len(cvIndicesAndQueries)
cvResults = pd.DataFrame(
columns=["query_index", "success_rate", "match_count", "max_distance"],
# defining the index ensures all the rows are defined
index=np.arange(0, N),
)
...
for i, result in enumerate(lview.imap(...)):
...
# addressing an _existing_ row doesn't create a new DataFrame
# or maybe the index should be query_index?
cvResults.iloc[i] = {...}
...
@minrk wow, very sharp! This helped tremendously. Thank you so much.