OverflowError: math range error

Question

OverflowError: math range error

shivaraj1994 opened this issue 4 years ago · comments

I am trying train using MQ2007-list dataset.

with open('/home/shivaraj/Downloads/MQ2007-list/Fold1/train.txt') as trainfile,
open('/home/shivaraj/Downloads/MQ2007-list/Fold1/vali.txt') as valifile,
open('/home/shivaraj/Downloads/MQ2007-list/Fold1/test.txt') as evalfile:
TX, Ty, Tqids, T_ = pyltr.data.letor.read_dataset(trainfile)
VX, Vy, Vqids, V_ = pyltr.data.letor.read_dataset(valifile)
EX, Ey, Eqids, E_ = pyltr.data.letor.read_dataset(evalfile)

metric = pyltr.metrics.NDCG(k=10)

Only needed if you want to perform validation (early stopping & trimming)

monitor = pyltr.models.monitors.ValidationMonitor(
VX, Vy, Vqids, metric=metric, stop_after=250)

model = pyltr.models.LambdaMART(
metric=metric,
n_estimators=1000,
learning_rate=0.02,
max_features=0.5,
query_subsample=0.5,
max_leaf_nodes=10,
min_samples_leaf=64,
verbose=1,
)

model.fit(TX, Ty, Tqids, monitor=monitor)

This error log--

OverflowError Traceback (most recent call last)
in
16 )
17
---> 18 model.fit(TX, Ty, Tqids, monitor=monitor)

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in fit(self, X, y, qids, monitor)
199
200 n_stages = self.fit_stages(X, y, qids, y_pred,
--> 201 random_state, begin_at_stage, monitor)
202
203 if n_stages < self.estimators.shape[0]:

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _fit_stages(self, X, y, qids, y_pred, random_state, begin_at_stage, monitor)
406 y_pred = self._fit_stage(i, X, y, qids, y_pred, sample_weight,
407 sample_mask, query_groups_to_use,
--> 408 random_state)
409
410 train_total_score, oob_total_score = 0.0, 0.0

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _fit_stage(self, i, X, y, qids, y_pred, sample_weight, sample_mask, query_groups, random_state)
332 for qid, a, b, _ in query_groups:
333 lambdas, deltas = self._calc_lambdas_deltas(qid, y[a:b],
--> 334 y_pred[a:b])
335 all_lambdas[a:b] = lambdas
336 all_deltas[a:b] = deltas

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _calc_lambdas_deltas(self, qid, y, y_pred)
267 actual = y[positions]
268
--> 269 swap_deltas = self.metric.calc_swap_deltas(qid, actual)
270 max_k = self.metric.max_k()
271 if max_k is None or ns < max_k:

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/metrics/dcg.py in calc_swap_deltas(self, qid, targets, coeff)
33 for j in range(i + 1, n_targets):
34 deltas[i, j] = coeff *
---> 35 (self._gain_fn(targets[i]) - self._gain_fn(targets[j])) *
36 (self._get_discount(j) - self._get_discount(i))
37

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/metrics/gains.py in _exp2_gain(x)
16
17 def _exp2_gain(x):
---> 18 return math.exp(x * _LOG2) - 1.0
19
20

OverflowError: math range error

Jonathan Shore · Answer 1 · Wed Aug 11 2021 20:37:59 GMT+0800 (China Standard Time)

Did you find a workaround for this? I am having the same problem as well ...

Jerry Ma · Answer 2 · Wed Aug 11 2021 20:52:42 GMT+0800 (China Standard Time)

Hello,

I don't have the full picture on what scores are included in the dataset; however, I'm guessing one of two possibilities:

Missing values in the target (solution: drop querysets with missing target values)
Large (1000+) values for the target (solution: gain_type='identity' in the NDCG constructor)

Jonathan Shore · Answer 3 · Fri Aug 13 2021 18:04:43 GMT+0800 (China Standard Time)

One could restructure the problem to be in log space, avoiding the exponential. This is how many numerical algorithms deal with exponentials (and potential overflow).

Jerry Ma · Answer 4 · Fri Aug 13 2021 19:02:02 GMT+0800 (China Standard Time)

That's indeed what the above bit of code does :)