OpenGVLab / LLaMA-Adapter

I'm trying to use alpaca_finetuning_v1/llama to autoregressively generate text for validation during finetuning, however, in alpaca_finetuning_v1/llama/generation.py line 42: logits = self.model.forward_only(tokens[:, prev_pos:cur_pos], prev_pos)
I find no forward_only function in alpaca_finetuning_v1/llama/model.py
Could you please release the forward_only function？Or is there any way to use alpaca_finetuning_v1/llama/model to generate texts for validation？Because it's not convenient to load the trained model during training by llama/model.py under root folder

Thanks for pointing out our bug! Actually, alpaca_finetuning_v only support training on Alpaca dataset. To inference, you can use the model in our main dir: https://github.com/OpenGVLab/LLaMA-Adapter/blob/main/llama/model.py in which the forward function is for inference.

Thanks for pointing out our bug! Actually, alpaca_finetuning_v only support training on Alpaca dataset. To inference, you can use the model in our main dir: https://github.com/OpenGVLab/LLaMA-Adapter/blob/main/llama/model.py in which the forward function is for inference.

Thanks! But I find the model.py under https://github.com/OpenGVLab/LLaMA-Adapter/blob/main/llama/model.py is different from the one under alpaca_finetuning_v1/llama/model.py, I tried to use the forward function there to be the forward_only function so that I can inference while training, but it failed because alpaca_finetuning_v1/llama/model.py seems not take account for the situation"when seq_len >1 and mask is None"

You can add the forward_only function in https://github.com/OpenGVLab/LLaMA-Adapter/blob/db9fb8fb214c59f5bcd17bee3329f5b8c907290c/llama_adapter_v2_chat65b/llama/model.py#L335-L365 to alpaca_finetuning_v1, we will then update the code.

You can add the forward_only function in

LLaMA-Adapter/llama_adapter_v2_chat65b/llama/model.py

Lines 335 to 365 in db9fb8f

@torch.inference_mode()

def forward_inference(self, tokens: torch.Tensor, start_pos: int):

_bsz, seqlen = tokens.shape

h = self.tok_embeddings(tokens)

self.freqs_cis = self.freqs_cis.to(h.device)

freqs_cis = self.freqs_cis[start_pos : start_pos + seqlen]

if self.adapter_len * self.adapter_layer > 0:

adapter = self.adapter_query.weight.reshape(-1, self.adapter_len, self.params.dim).unsqueeze(1)

if seqlen == 1:

mask = None

elif start_pos == 0:

mask = torch.full((1, 1, seqlen, seqlen), float("-inf"), device=tokens.device)

mask = torch.triu(mask, diagonal=1).type_as(h)

else:

raise NotImplementedError()

for i, layer in enumerate(self.layers):

adapter_index = i - (len(self.layers) - self.adapter_layer)

h = layer(h, start_pos, freqs_cis, mask, adapter[adapter_index].half() if adapter_index >= 0 else None)

h = self.norm(h)

output = self.output(h[:, -1, :])

return output.float()

def enable_cache(self):

for layer in self.layers:

layer.attention.enable_cache()

def disable_cache(self):

for layer in self.layers:

layer.attention.disable_cache()

to alpaca_finetuning_v1, we will then update the code.

Thx! I add and still get the following bug. It seems that the attention module can't handle 'mask == None'
Traceback (most recent call last):
File "example_test_infer.py", line 114, in
fire.Fire(main)
File "/data1/nieyunshuang/nys_new/miniconda3/envs/navllmsig/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/data1/nieyunshuang/nys_new/miniconda3/envs/navllmsig/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/data1/nieyunshuang/nys_new/miniconda3/envs/navllmsig/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "example_test_infer.py", line 106, in main
results = generator.generate(prompts, max_gen_len=512, temperature=temperature, top_p=top_p)
File "/data1/nieyunshuang/nys_new/LLaMA-Adapter/alpaca_finetuning_v1/llama/generation.py", line 42, in generate
logits = self.model.forward_only(tokens[:, prev_pos:cur_pos], prev_pos)
File "/data1/nieyunshuang/nys_new/LLaMA-Adapter/alpaca_finetuning_v1/llama/model.py", line 241, in forward_only
h = layer(h, start_pos, freqs_cis, mask, adapter[adapter_index].half() if adapter_index >= 0 else None)
File "/data1/nieyunshuang/nys_new/miniconda3/envs/navllmsig/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/data1/nieyunshuang/nys_new/LLaMA-Adapter/alpaca_finetuning_v1/llama/model.py", line 167, in forward
h = x + self.attention.forward(self.attention_norm(x), start_pos, freqs_cis, mask, adapter)
File "/data1/nieyunshuang/nys_new/LLaMA-Adapter/alpaca_finetuning_v1/llama/model.py", line 106, in forward
mask = torch.cat([extra_mask, mask], dim=-1)
TypeError: expected Tensor as element 1 in argument 0, but got NoneType

	@torch.inference_mode()
	def forward_inference(self, tokens: torch.Tensor, start_pos: int):
	_bsz, seqlen = tokens.shape
	h = self.tok_embeddings(tokens)
	self.freqs_cis = self.freqs_cis.to(h.device)
	freqs_cis = self.freqs_cis[start_pos : start_pos + seqlen]
	if self.adapter_len * self.adapter_layer > 0:
	adapter = self.adapter_query.weight.reshape(-1, self.adapter_len, self.params.dim).unsqueeze(1)
	if seqlen == 1:
	mask = None
	elif start_pos == 0:
	mask = torch.full((1, 1, seqlen, seqlen), float("-inf"), device=tokens.device)
	mask = torch.triu(mask, diagonal=1).type_as(h)
	else:
	raise NotImplementedError()

	for i, layer in enumerate(self.layers):
	adapter_index = i - (len(self.layers) - self.adapter_layer)
	h = layer(h, start_pos, freqs_cis, mask, adapter[adapter_index].half() if adapter_index >= 0 else None)

	h = self.norm(h)
	output = self.output(h[:, -1, :])
	return output.float()

	def enable_cache(self):
	for layer in self.layers:
	layer.attention.enable_cache()

	def disable_cache(self):
	for layer in self.layers:
	layer.attention.disable_cache()

Can't find forward_only function