st-tech / zr-obp

Currently, contextual linear and logistic bandit algorithms share the same superclass BaseContextualPolicy.
The constructor of BaseContextualPolicy has alpha_ and lambda_ as arguments:

zr-obp/obp/policy/base.py

Lines 93 to 129 in c9ad20c

    
           @dataclass 
        
           class BaseContextualPolicy(metaclass=ABCMeta): 
        
               """Base class for contextual bandit policies. 
        
               Parameters 
        
               ---------- 
        
               dim: int 
        
                   Number of dimensions of context vectors. 
        
               n_actions: int 
        
                   Number of actions. 
        
               len_list: int, default=1 
        
                   Length of a list of actions recommended in each impression. 
        
                   When Open Bandit Dataset is used, 3 should be set. 
        
               batch_size: int, default=1 
        
                   Number of samples used in a batch parameter update. 
        
               alpha_: float, default=1. 
        
                   Prior parameter for the online logistic regression. 
        
               lambda_: float, default=1. 
        
                   Regularization hyperparameter for the online logistic regression. 
        
               random_state: int, default=None 
        
                   Controls the random seed in sampling actions. 
        
               """ 
        
               dim: int 
        
               n_actions: int 
        
               len_list: int = 1 
        
               batch_size: int = 1 
        
               alpha_: float = 1.0 
        
               lambda_: float = 1.0 
        
               random_state: Optional[int] = None

These arguments are used to initialize self.alpha_list and self.lambda_list, which are used by LogisticEpsilonGreedy, LogisticTS, and LogisticUCB but not used by LinearEpsilonGreedy, LinTS, and LinUCB.
I suggest moving alpha_, lambda_, self.alpha_list, and self.lambda_list to another class, BaseLogisticPolicy for example, and making logistic policies inherit this new class.

@Kurorororo Can you fix this issue by yourself after finishing #78 ?

Sure!

	@dataclass
	class BaseContextualPolicy(metaclass=ABCMeta):
	"""Base class for contextual bandit policies.

	Parameters
	----------
	dim: int
	Number of dimensions of context vectors.

	n_actions: int
	Number of actions.

	len_list: int, default=1
	Length of a list of actions recommended in each impression.
	When Open Bandit Dataset is used, 3 should be set.

	batch_size: int, default=1
	Number of samples used in a batch parameter update.

	alpha_: float, default=1.
	Prior parameter for the online logistic regression.

	lambda_: float, default=1.
	Regularization hyperparameter for the online logistic regression.

	random_state: int, default=None
	Controls the random seed in sampling actions.

	"""

	dim: int
	n_actions: int
	len_list: int = 1
	batch_size: int = 1
	alpha_: float = 1.0
	lambda_: float = 1.0
	random_state: Optional[int] = None

alpha_ and lambda_ are not necessary for contextual linear bandit algorithms