lalalune / arcprize

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

data leakage?

srikanthsrnvs opened this issue · comments

src = src.to(device)
output = model(src)
target = src[:, model.num_context_tokens:].reshape(-1)
loss = criterion(output.view(-1, num_tokens + 1), target.view(-1))

Your model looks at the entire source?

even so, I think you have the right idea here - I want to try this challenge now!

Splitting the prize evenly with anyone who fixes it and gets us there, if you want to join :) I'm spatialweeb on twitter

yeah I dmed you - im gonna take a crack at it tonight

I tried fixing this. I am training another model.

The attention mask is a very simple upper triangular, but we also had to have a correct padding mask since we have so many padding tokens. I might had gotten these mixed up a bit.

I think most of the bugs are in the eval portion now. i.e. if there is data leakage it is because the eval data is in the gpu and it's not forward masking. Someone also suggested it could be teacher forcing-- every wrong prediction is corrected for the next, so strings can't go wildly off. That seems fine but not true to the spirit of the challenge.

interesting approach, probably a simple/dumb q but any plans to add cross-validation/overlap checking?

Okay, I fixed this, now we're not converging

Here's where I'm struggling conceptually.

We have an autoregressive transformer with a forward attention mask that is pushing all forward predictions to -infinity. I asked ChatGPT and it said I was doing the masking wrong and suggested a simpler fix which looked right-- simply upper right triangular.