BatsResearch / bonito

I would like to use this model with Ollama or llama.cpp but I would like to know the bare-bone explanation of bonito's template. Would you mind giving a short explanation?

Ah ha, your paper explain it!

<|tasktype|>
Yes-no question answering
<|context|>
Zinedine Zidane -- After retiring as a player, Zidane
transitioned into coaching, becoming assistant coach at
Real Madrid… after the victory, he resigned as Real
Madrid coach.
<|task|>

Still don't mind more explanation and examples though

Cheers!

You are right. We have included the template in the paper. We have also included the preprocessing step in abstract.py. Please look at the following lines of code.

bonito/bonito/abstract.py

Lines 26 to 71 in 0b6b23d

    
               def _prepare_bonito_input( 
        
                   self, context_dataset: Dataset, task_type: str, context_col: str, **kwargs 
        
               ) -> Dataset: 
        
                   """ 
        
                   Prepares the input for the Bonito model. 
        
                   This method takes a context dataset, a task type, and a context 
        
                   column name, and prepares the dataset for the Bonito model. 
        
                   If the task type is not recognized, it raises a ValueError. 
        
                   Args: 
        
                       context_dataset (Dataset): The dataset that provides the 
        
                           context for the task. 
        
                       task_type (str): The type of the task. This can be a 
        
                           short form or a full form. If the task type is not 
        
                           recognized, a ValueError is raised. 
        
                       context_col (str): The name of the column in the dataset 
        
                           that provides the context for the task. 
        
                       **kwargs: Additional keyword arguments. 
        
                   Returns: 
        
                       Dataset: The prepared dataset for the Bonito model. 
        
                   """ 
        
                   # get the task type name 
        
                   if task_type in SHORTFORM_TO_FULL_TASK_TYPES.values(): 
        
                       full_task_type = task_type 
        
                   elif task_type in SHORTFORM_TO_FULL_TASK_TYPES: 
        
                       full_task_type = SHORTFORM_TO_FULL_TASK_TYPES[task_type] 
        
                   else: 
        
                       raise ValueError(f"Task type {task_type} not recognized") 
        
                   def process(example): 
        
                       input_text = "<|tasktype|>\n" + full_task_type.strip() 
        
                       input_text += ( 
        
                           "\n<|context|>\n" + example[context_col].strip() + "\n<|task|>\n" 
        
                       ) 
        
                       return { 
        
                           "input": input_text, 
        
                       } 
        
                   return context_dataset.map( 
        
                       process, 
        
                       remove_columns=context_dataset.column_names, 
        
                       num_proc=kwargs.get("num_proc", 1), 
        
                   )

Hope this helps! 😄

In case anyone stumbles on this here is Ollama library you can run https://ollama.com/pacozaa/bonito

And quantize and convert to gguf article here
https://medium.com/@sarinsuriyakoon/convert-pytorch-model-to-quantize-gguf-to-run-on-ollama-5c5dbc458208

	def _prepare_bonito_input(
	self, context_dataset: Dataset, task_type: str, context_col: str, **kwargs
	) -> Dataset:
	"""
	Prepares the input for the Bonito model.

	This method takes a context dataset, a task type, and a context
	column name, and prepares the dataset for the Bonito model.
	If the task type is not recognized, it raises a ValueError.

	Args:
	context_dataset (Dataset): The dataset that provides the
	context for the task.
	task_type (str): The type of the task. This can be a
	short form or a full form. If the task type is not
	recognized, a ValueError is raised.
	context_col (str): The name of the column in the dataset
	that provides the context for the task.
	**kwargs: Additional keyword arguments.

	Returns:
	Dataset: The prepared dataset for the Bonito model.
	"""
	# get the task type name
	if task_type in SHORTFORM_TO_FULL_TASK_TYPES.values():
	full_task_type = task_type
	elif task_type in SHORTFORM_TO_FULL_TASK_TYPES:
	full_task_type = SHORTFORM_TO_FULL_TASK_TYPES[task_type]
	else:
	raise ValueError(f"Task type {task_type} not recognized")

	def process(example):
	input_text = "<\|tasktype\|>\n" + full_task_type.strip()
	input_text += (
	"\n<\|context\|>\n" + example[context_col].strip() + "\n<\|task\|>\n"
	)
	return {
	"input": input_text,
	}

	return context_dataset.map(
	process,
	remove_columns=context_dataset.column_names,
	num_proc=kwargs.get("num_proc", 1),
	)

What are the bare-bone template of the Bonito?