jma127 / pyltr

Python learning to rank (LTR) toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OverflowError: math range error

shivaraj1994 opened this issue · comments

I am trying train using MQ2007-list dataset.

with open('/home/shivaraj/Downloads/MQ2007-list/Fold1/train.txt') as trainfile,
open('/home/shivaraj/Downloads/MQ2007-list/Fold1/vali.txt') as valifile,
open('/home/shivaraj/Downloads/MQ2007-list/Fold1/test.txt') as evalfile:
TX, Ty, Tqids, T_ = pyltr.data.letor.read_dataset(trainfile)
VX, Vy, Vqids, V_ = pyltr.data.letor.read_dataset(valifile)
EX, Ey, Eqids, E_ = pyltr.data.letor.read_dataset(evalfile)

metric = pyltr.metrics.NDCG(k=10)

Only needed if you want to perform validation (early stopping & trimming)

monitor = pyltr.models.monitors.ValidationMonitor(
VX, Vy, Vqids, metric=metric, stop_after=250)

model = pyltr.models.LambdaMART(
metric=metric,
n_estimators=1000,
learning_rate=0.02,
max_features=0.5,
query_subsample=0.5,
max_leaf_nodes=10,
min_samples_leaf=64,
verbose=1,
)

model.fit(TX, Ty, Tqids, monitor=monitor)

This error log--

OverflowError Traceback (most recent call last)
in
16 )
17
---> 18 model.fit(TX, Ty, Tqids, monitor=monitor)

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in fit(self, X, y, qids, monitor)
199
200 n_stages = self.fit_stages(X, y, qids, y_pred,
--> 201 random_state, begin_at_stage, monitor)
202
203 if n_stages < self.estimators
.shape[0]:

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _fit_stages(self, X, y, qids, y_pred, random_state, begin_at_stage, monitor)
406 y_pred = self._fit_stage(i, X, y, qids, y_pred, sample_weight,
407 sample_mask, query_groups_to_use,
--> 408 random_state)
409
410 train_total_score, oob_total_score = 0.0, 0.0

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _fit_stage(self, i, X, y, qids, y_pred, sample_weight, sample_mask, query_groups, random_state)
332 for qid, a, b, _ in query_groups:
333 lambdas, deltas = self._calc_lambdas_deltas(qid, y[a:b],
--> 334 y_pred[a:b])
335 all_lambdas[a:b] = lambdas
336 all_deltas[a:b] = deltas

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _calc_lambdas_deltas(self, qid, y, y_pred)
267 actual = y[positions]
268
--> 269 swap_deltas = self.metric.calc_swap_deltas(qid, actual)
270 max_k = self.metric.max_k()
271 if max_k is None or ns < max_k:

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/metrics/dcg.py in calc_swap_deltas(self, qid, targets, coeff)
33 for j in range(i + 1, n_targets):
34 deltas[i, j] = coeff *
---> 35 (self._gain_fn(targets[i]) - self._gain_fn(targets[j])) *
36 (self._get_discount(j) - self._get_discount(i))
37

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/metrics/gains.py in _exp2_gain(x)
16
17 def _exp2_gain(x):
---> 18 return math.exp(x * _LOG2) - 1.0
19
20

OverflowError: math range error

Did you find a workaround for this? I am having the same problem as well ...

Hello,

I don't have the full picture on what scores are included in the dataset; however, I'm guessing one of two possibilities:

  • Missing values in the target (solution: drop querysets with missing target values)
  • Large (1000+) values for the target (solution: gain_type='identity' in the NDCG constructor)

One could restructure the problem to be in log space, avoiding the exponential. This is how many numerical algorithms deal with exponentials (and potential overflow).

That's indeed what the above bit of code does :)