PicklingError: Can't pickle <function Embedding.forward at XXXXXXX> it's not the same object as torch.nn.modules.sparse.Embedding.forward

Question

PicklingError: Can't pickle <function Embedding.forward at XXXXXXX> it's not the same object as torch.nn.modules.sparse.Embedding.forward

arpit2665 opened this issue 2 months ago · comments

System Info

I am trying to share the LLMs during inference time between multiple forked processes using torch's ForkingPickler class. I am able to achieve this with the model loaded with FP16(Without any quantization) but can't share the model quantized with the bitsandbytes with the other forked processes. Could you please help with this issue?

Below is the list of installed libraries

python==3.11.7
torch==2.0.1
transformers==4.37.1
bitsandbytes==0.42.0
accelerate==0.23.0
cuda version = 11.0

Reproduction

import torch
from transformers import AutoModelForCausualLLM, BitsandBytesConfig
from torch.multiprocessing.reductions import ForkingPickler

base_model_name = 'mistral-7b-instruct'

#Loading LLM with FP16(Without any quantization)
base_model_wo_quant = AutoModelForCausalLLM.from_pretrained(f'{base_model_name}', torch.dtype=torch.float16, device_map={"":0}, use_safetensors=True)

#Able to share with the forked processes
_ = base_model_wo_quant.share_memory()
ForkingPickler.dumps(base_model_wo_quant)

#Preparing model to load with 4-bit quantization
compute_dtype=getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_compute_dtype=compute_dtype, bnb_4bit_use_double_quant=True, device_map={"":0})

#Loading LLM with 4bit quantization
base_model_with_quant = AutoModelForCausalLLM.from_pretrained(f'{base_model_name}', quantization_config=bnb_config, device_map={"":0}, use_safetensors=True)

_ = base_model_with_quant.share_memory()
#Failed to share with the forked processes getting below error
ForkingPickler.dumps(base_model_with_quant)

PicklingError: Can't pickle it's not the same object as torch.nn.modules.sparse.Embedding.forward

Expected behavior

Expecting the quantized model to be shared with the forked processes.