imap - only a single engine is still used after a few iterations

Question

imap - only a single engine is still used after a few iterations

tommedema opened this issue 2 years ago · comments

I've setup an ipyparalle cluster which resulted in a decent speed up from 15 it/s to about 80-120 it/s:

Sample code:

import ipyparallel as ipp
import os
    
clusterProcessesCount = os.cpu_count() # == 16

cluster = ipp.Cluster(n = clusterProcessesCount)
cluster.start_cluster_sync()
rc = cluster.connect_client_sync()
rc.wait_for_engines(clusterProcessesCount)
lview = rc.load_balanced_view()
dview = rc[:]

# ...

# push settings to child processes
dview.push(dict(
    cvTrainingFoldMinMatchCount = cvTrainingFoldMinMatchCount,
    cvTrainingFoldMaxMatchCount = cvTrainingFoldMaxMatchCount,
    cvEarlyAbandonMinPromisingSuccessRate = cvEarlyAbandonMinPromisingSuccessRate
))

# ....

for result in lview.imap(parallelTrainTestQuery, cvIndicesAndQueries, ordered = False, max_outstanding = 'auto'):            
            print(result)

Note that since I am using this for cross validation, I am referring to the the number of folds that were processed until the engines stop being used. After 2 folds (iterations of iterations) it is only using 1 engine out of 16:

There are no error messages that I can see. At the first fold all 16 engines are fully used at 100% CPU. I did notice that when I first boot the cluster there are various warning messages, but none seem to stop the cluster from working.

Is there anything I can do to help resolve this?

Min RK · Answer 1 · Wed Aug 31 2022 21:53:11 GMT+0800 (China Standard Time)

Is there a chance the 100% cpu process is not an engine at all, but rather the notebook kernel or perhaps a scheduler? Can you tell which process that is (you can check the command-line of hte process with e.g. psutil.Process(pid).cmdline() or ps ax?

Tom Medema · Answer 2 · Thu Sep 01 2022 00:01:27 GMT+0800 (China Standard Time)

@minrk thanks for the response!

ps ax gave me:

1264 ?? Rs 1130:11.85 /Users/tommedema/opt/anaconda3/bin/python -m ipykern

also here are some screenshots from activity monitor:

Min RK · Answer 3 · Thu Sep 01 2022 15:14:14 GMT+0800 (China Standard Time)

That means it's your kernel (the client) not any engines stuck doing work, perhaps processing incoming results. If you interrupt your notebook when this happens, do you get a traceback? How big are the result objects of your individual tasks? 300k is quite a few tasks. Depending on how many you have, you might want to add e.g. chunksize=10 to bundle 10 function calls per IPython Parallel message.

Tom Medema · Answer 4 · Thu Sep 01 2022 18:17:23 GMT+0800 (China Standard Time)

@minrk interesting, because the processing of a result is as simple as just adding it to a pandas dataframe:

    with tqdm(total=cvQueriesLength) as pbar:
        for result in lview.imap(parallelTrainTestQuery, cvIndicesAndQueries, ordered = False, max_outstanding = 'auto'):
            pbar.update(1)

            queryIndex = int(result[0])
            successRate = result[1]
            matchCount = int(result[2])
            maxDistance = result[3]
            
            cvResults.loc[cvResults.shape[0]] = {
                'query_index': queryIndex,
                'success_rate': successRate,
                'match_count': matchCount,
                'max_distance': maxDistance
            }

The result object is quite small, it's a numpy array with 4 floats:

return np.array([queryIndex, successRate, matchCount, trainingMaxDistance], dtype = np.float32)

I indeed have 300k tasks for each fold.

@minrk chunksize seems interesting, but from the docs it only seems applicable to map and not imap? I am using imap with max_outstanding = 'auto'.

I did just try setting chunksize and got: TypeError: imap() got an unexpected keyword argument 'chunksize'

I did just interrupt it while this happened, and this is the traceback:

https://gist.github.com/tommedema/b7107e66f4f70d1b2fa669927b2a4cff

Min RK · Answer 5 · Thu Sep 01 2022 21:04:57 GMT+0800 (China Standard Time)

Sorry, you're right - imap doesn't support chunksize yet.

That traceback shows it was waiting in your pandas append, not an IPP call. This could be a coincidence, but I suspect it's because appending a row to a pandas DataFrame makes a copy of the whole DataFrame. This gets expensive when you have 300k rows and are making a whole new 250k row data frame as each new row comes in.

Pre-allocating the whole DataFrame should be loads faster and less memory intensive:

N = len(cvIndicesAndQueries)

cvResults = pd.DataFrame(
    columns=["query_index", "success_rate", "match_count", "max_distance"],
    # defining the index ensures all the rows are defined
    index=np.arange(0, N),
)
...
for i, result in enumerate(lview.imap(...)):
    ...
    # addressing an _existing_ row doesn't create a new DataFrame
    # or maybe the index should be query_index?
    cvResults.iloc[i] = {...}
...

Tom Medema · Answer 6 · Fri Sep 02 2022 05:36:16 GMT+0800 (China Standard Time)

@minrk wow, very sharp! This helped tremendously. Thank you so much.