How can I parallelize LASER embeddings?

Question

How can I parallelize LASER embeddings?

prasunshrestha opened this issue 5 months ago · comments

I am using this code:

from laser_encoders import LaserEncoderPipeline
encoder = LaserEncoderPipeline(laser="laser2")
embeddings = encoder.encode_sentences(text)

text is a numpy array of texts, and laser2 suffices my use case. How can I parallelize the embeddings, if at all? I am running a notebook instance with limited resources and no GPU support (ml.t3.xlarge). I have tried the following two (ThreadPoolExecutor and multiprocessing) to parallelize, but gains have been marginal:

# a function to encode a chunk of sentences
def encode_chunk(chunk):
    # Import the model locally for each thread
    encoder = LaserEncoderPipeline(laser="laser2")
    return encoder.encode_sentences(chunk)

# split the numpy array into five chunks
text_chunks = np.array_split(text, 5)

# create a ThreadPoolExecutor with 5 threads
with ThreadPoolExecutor(max_workers=5) as executor:
    # Map the chunks to the encode_chunk function
    embeddings_chunks = list(executor.map(encode_chunk, text_chunks))

# concatenate the embeddings from the chunks
embeddings = np.concatenate(embeddings_chunks)

# a function to encode a chunk of sentences
def encode_chunk(chunk):
    encoder = LaserEncoderPipeline(laser="laser2")
    return encoder.encode_sentences(chunk)

# split the numpy array into five chunks
text_chunks = np.array_split(text, 5)

# create a multiprocessing Pool with 5 processes
with Pool(processes=5) as pool:
    # Map the chunks to the encode_chunk function
    embeddings_chunks = pool.map(encode_chunk, text_chunks)

# concatenate the embeddings from the chunks
embeddings = np.concatenate(embeddings_chunks)

I would very much appreciate any help on this. Thank you in advance!

David Dale · Answer 1 · Wed Feb 14 2024 05:50:51 GMT+0800 (China Standard Time)

I would suggest you first check the CPU utilization: when embedding on CPU, there is high chance that your cores already well utilized (Pytorch usually takes a good care of it; please take a look here), and more parallelization won't help. This would explain the small gains from parallelization.

Another issue with your code is that for relative small chunks, initializing the model (encoder = LaserEncoderPipeline(laser="laser2")) might take even more time than actual embedding of texts (encoder.encode_sentences(chunk)). So I won't be surprised if a multiprocessing version is actually even slower than the default one.

David Dale · Answer 2 · Wed Feb 14 2024 05:52:43 GMT+0800 (China Standard Time)

Ultimately, to speed up the computation, I recommend getting access to GPU (and make sure that you specify the batch size that leads to maximal utilization of the GPU, without causing out-of-memory errors).

If this is infeasible, you may want to consider another sentence encoder which is less resource-consuming (which one to choose, depends on your use case). And by the way, please checkout SONAR: it is a newer and better generation of LASER models (but not necessarily faster).

Prasun Shrestha · Answer 3 · Wed Feb 14 2024 06:12:34 GMT+0800 (China Standard Time)

This is helpful; thank you for your reply, @avidale! I have had instances where the multiprocessing was slower than the baseline, so you might be right. I have a hard resource constraint, so GPU might not be feasible for me, unfortunately. I didn't know about SONAR before though, but this looks performing. Unlike LASER's BiLSTM, It is also transformed-based, so the runtime might be better.