How can I parallelize LASER embeddings?
prasunshrestha opened this issue · comments
I am using this code:
from laser_encoders import LaserEncoderPipeline
encoder = LaserEncoderPipeline(laser="laser2")
embeddings = encoder.encode_sentences(text)
text
is a numpy array of texts, and laser2
suffices my use case. How can I parallelize the embeddings, if at all? I am running a notebook instance with limited resources and no GPU support (ml.t3.xlarge
). I have tried the following two (ThreadPoolExecutor
and multiprocessing
) to parallelize, but gains have been marginal:
# a function to encode a chunk of sentences
def encode_chunk(chunk):
# Import the model locally for each thread
encoder = LaserEncoderPipeline(laser="laser2")
return encoder.encode_sentences(chunk)
# split the numpy array into five chunks
text_chunks = np.array_split(text, 5)
# create a ThreadPoolExecutor with 5 threads
with ThreadPoolExecutor(max_workers=5) as executor:
# Map the chunks to the encode_chunk function
embeddings_chunks = list(executor.map(encode_chunk, text_chunks))
# concatenate the embeddings from the chunks
embeddings = np.concatenate(embeddings_chunks)
# a function to encode a chunk of sentences
def encode_chunk(chunk):
encoder = LaserEncoderPipeline(laser="laser2")
return encoder.encode_sentences(chunk)
# split the numpy array into five chunks
text_chunks = np.array_split(text, 5)
# create a multiprocessing Pool with 5 processes
with Pool(processes=5) as pool:
# Map the chunks to the encode_chunk function
embeddings_chunks = pool.map(encode_chunk, text_chunks)
# concatenate the embeddings from the chunks
embeddings = np.concatenate(embeddings_chunks)
I would very much appreciate any help on this. Thank you in advance!
I would suggest you first check the CPU utilization: when embedding on CPU, there is high chance that your cores already well utilized (Pytorch usually takes a good care of it; please take a look here), and more parallelization won't help. This would explain the small gains from parallelization.
Another issue with your code is that for relative small chunks, initializing the model (encoder = LaserEncoderPipeline(laser="laser2")
) might take even more time than actual embedding of texts (encoder.encode_sentences(chunk)
). So I won't be surprised if a multiprocessing version is actually even slower than the default one.
Ultimately, to speed up the computation, I recommend getting access to GPU (and make sure that you specify the batch size that leads to maximal utilization of the GPU, without causing out-of-memory errors).
If this is infeasible, you may want to consider another sentence encoder which is less resource-consuming (which one to choose, depends on your use case). And by the way, please checkout SONAR: it is a newer and better generation of LASER models (but not necessarily faster).
This is helpful; thank you for your reply, @avidale! I have had instances where the multiprocessing was slower than the baseline, so you might be right. I have a hard resource constraint, so GPU might not be feasible for me, unfortunately. I didn't know about SONAR before though, but this looks performing. Unlike LASER's BiLSTM, It is also transformed-based, so the runtime might be better.