data leakage?

Question

data leakage?

srikanthsrnvs opened this issue a month ago · comments

src = src.to(device)
output = model(src)
target = src[:, model.num_context_tokens:].reshape(-1)
loss = criterion(output.view(-1, num_tokens + 1), target.view(-1))

Your model looks at the entire source?

M̵̞̗̝̼̅̏̎͝Ȯ̴̝̻̊̃̋̀Õ̷̼͋N̸̩̿͜ ̶̜̠̹̼̩͒ · Answer 1 · Fri Jun 21 2024 10:17:50 GMT+0800 (China Standard Time)

M̵̞̗̝̼̅̏̎͝Ȯ̴̝̻̊̃̋̀Õ̷̼͋N̸̩̿͜ ̶̜̠̹̼̩͒ commented a month ago

good catch, reviewing

Srikanth Srinivas · Answer 2 · Fri Jun 21 2024 10:24:53 GMT+0800 (China Standard Time)

even so, I think you have the right idea here - I want to try this challenge now!

M̵̞̗̝̼̅̏̎͝Ȯ̴̝̻̊̃̋̀Õ̷̼͋N̸̩̿͜ ̶̜̠̹̼̩͒ · Answer 3 · Fri Jun 21 2024 10:54:53 GMT+0800 (China Standard Time)

Splitting the prize evenly with anyone who fixes it and gets us there, if you want to join :) I'm spatialweeb on twitter

Srikanth Srinivas · Answer 4 · Fri Jun 21 2024 10:55:49 GMT+0800 (China Standard Time)

yeah I dmed you - im gonna take a crack at it tonight

M̵̞̗̝̼̅̏̎͝Ȯ̴̝̻̊̃̋̀Õ̷̼͋N̸̩̿͜ ̶̜̠̹̼̩͒ · Answer 5 · Fri Jun 21 2024 11:38:09 GMT+0800 (China Standard Time)

I tried fixing this. I am training another model.

The attention mask is a very simple upper triangular, but we also had to have a correct padding mask since we have so many padding tokens. I might had gotten these mixed up a bit.

I think most of the bugs are in the eval portion now. i.e. if there is data leakage it is because the eval data is in the gpu and it's not forward masking. Someone also suggested it could be teacher forcing-- every wrong prediction is corrected for the next, so strings can't go wildly off. That seems fine but not true to the spirit of the challenge.

Brian Smith · Answer 6 · Fri Jun 21 2024 13:16:53 GMT+0800 (China Standard Time)

interesting approach, probably a simple/dumb q but any plans to add cross-validation/overlap checking?

M̵̞̗̝̼̅̏̎͝Ȯ̴̝̻̊̃̋̀Õ̷̼͋N̸̩̿͜ ̶̜̠̹̼̩͒ · Answer 7 · Fri Jun 21 2024 13:48:05 GMT+0800 (China Standard Time)

Okay, I fixed this, now we're not converging

Srikanth Srinivas · Answer 8 · Fri Jun 21 2024 13:50:04 GMT+0800 (China Standard Time)

Haha, yep I’m already working on a variant where I use llama to steer in latent space and create a custom flattener that doesn’t use Hilbert curves Srikanth

…

________________________________ From: M̵̞̗̝̼̅̏̎͝Ȯ̴̝̻̊̃̋̀Õ̷̼͋N̸̩̿͜ ̶̜̠̹̼̩͒ ***@***.***> Sent: Thursday, June 20, 2024 10:48:26 PM To: lalalune/arcprize ***@***.***> Cc: Srikanth Srinivas ***@***.***>; Author ***@***.***> Subject: Re: [lalalune/arcprize] data leakage? (Issue #2) Okay, I fixed this, now we're not converging — Reply to this email directly, view it on GitHub<#2 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGB2H5SHXC4RLGZVPU2J72DZIO5CVAVCNFSM6AAAAABJVARFEOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBSGA2DKMRUGE>. You are receiving this because you authored the thread.Message ID: ***@***.***>

M̵̞̗̝̼̅̏̎͝Ȯ̴̝̻̊̃̋̀Õ̷̼͋N̸̩̿͜ ̶̜̠̹̼̩͒ · Answer 9 · Fri Jun 21 2024 17:41:44 GMT+0800 (China Standard Time)

Here's where I'm struggling conceptually.

We have an autoregressive transformer with a forward attention mask that is pushing all forward predictions to -infinity. I asked ChatGPT and it said I was doing the masking wrong and suggested a simpler fix which looked right-- simply upper right triangular.