PicklingError: Can't pickle <function Embedding.forward at XXXXXXX> it's not the same object as torch.nn.modules.sparse.Embedding.forward
arpit2665 opened this issue · comments
System Info
I am trying to share the LLMs during inference time between multiple forked processes using torch's ForkingPickler class. I am able to achieve this with the model loaded with FP16(Without any quantization) but can't share the model quantized with the bitsandbytes with the other forked processes. Could you please help with this issue?
Below is the list of installed libraries
python==3.11.7
torch==2.0.1
transformers==4.37.1
bitsandbytes==0.42.0
accelerate==0.23.0
cuda version = 11.0
Reproduction
import torch
from transformers import AutoModelForCausualLLM, BitsandBytesConfig
from torch.multiprocessing.reductions import ForkingPickler
base_model_name = 'mistral-7b-instruct'
#Loading LLM with FP16(Without any quantization)
base_model_wo_quant = AutoModelForCausalLLM.from_pretrained(f'{base_model_name}', torch.dtype=torch.float16, device_map={"":0}, use_safetensors=True)
#Able to share with the forked processes
_ = base_model_wo_quant.share_memory()
ForkingPickler.dumps(base_model_wo_quant)
#Preparing model to load with 4-bit quantization
compute_dtype=getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_compute_dtype=compute_dtype, bnb_4bit_use_double_quant=True, device_map={"":0})
#Loading LLM with 4bit quantization
base_model_with_quant = AutoModelForCausalLLM.from_pretrained(f'{base_model_name}', quantization_config=bnb_config, device_map={"":0}, use_safetensors=True)
_ = base_model_with_quant.share_memory()
#Failed to share with the forked processes getting below error
ForkingPickler.dumps(base_model_with_quant)
PicklingError: Can't pickle it's not the same object as torch.nn.modules.sparse.Embedding.forward
Expected behavior
Expecting the quantized model to be shared with the forked processes.