st-tech / zr-obp

Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

alpha_ and lambda_ are not necessary for contextual linear bandit algorithms

Kurorororo opened this issue · comments

Currently, contextual linear and logistic bandit algorithms share the same superclass BaseContextualPolicy.
The constructor of BaseContextualPolicy has alpha_ and lambda_ as arguments:

zr-obp/obp/policy/base.py

Lines 93 to 129 in c9ad20c

@dataclass
class BaseContextualPolicy(metaclass=ABCMeta):
"""Base class for contextual bandit policies.
Parameters
----------
dim: int
Number of dimensions of context vectors.
n_actions: int
Number of actions.
len_list: int, default=1
Length of a list of actions recommended in each impression.
When Open Bandit Dataset is used, 3 should be set.
batch_size: int, default=1
Number of samples used in a batch parameter update.
alpha_: float, default=1.
Prior parameter for the online logistic regression.
lambda_: float, default=1.
Regularization hyperparameter for the online logistic regression.
random_state: int, default=None
Controls the random seed in sampling actions.
"""
dim: int
n_actions: int
len_list: int = 1
batch_size: int = 1
alpha_: float = 1.0
lambda_: float = 1.0
random_state: Optional[int] = None

These arguments are used to initialize self.alpha_list and self.lambda_list, which are used by LogisticEpsilonGreedy, LogisticTS, and LogisticUCB but not used by LinearEpsilonGreedy, LinTS, and LinUCB.
I suggest moving alpha_, lambda_, self.alpha_list, and self.lambda_list to another class, BaseLogisticPolicy for example, and making logistic policies inherit this new class.

@Kurorororo Can you fix this issue by yourself after finishing #78 ?

Sure!