[Feature request] support num_memory_tokens in ContinuousTransformerWrapper

Question

[Feature request] support num_memory_tokens in ContinuousTransformerWrapper

pfeatherstone opened this issue 8 months ago · comments

pfeatherstone commented 8 months ago

Can we support either num_memory_tokens or null key/value in ContinuousTransformerWrapper please?

Hugo Flores García commented 8 months ago

+1!

Phil Wang · Answer 1 · Fri Oct 13 2023 23:18:04 GMT+0800 (China Standard Time)

@pfeatherstone @hugofloresgarcia you can already use null key values by setting attn_num_mem_kv = {num null k/v} on either the Encoder or Decoder

Phil Wang · Answer 2 · Fri Oct 13 2023 23:18:18 GMT+0800 (China Standard Time)

yup i can add it

wow, the continuous wrapper is very popular! had no idea

pfeatherstone · Answer 3 · Mon Oct 16 2023 17:47:47 GMT+0800 (China Standard Time)

i think there is a bug. I'll knock up a quick repro

pfeatherstone · Answer 4 · Mon Oct 16 2023 17:51:24 GMT+0800 (China Standard Time)

lm = ContinuousTransformerWrapper(
    dim_in              = 4,
    dim_out             = 256+3,
    max_seq_len         = 0,
    num_memory_tokens   = 20,
    attn_layers = Decoder(
        dim = 512,
        depth = 4,
        heads = 4,
        rotary_pos_emb  = True,
        attn_flash      = True,
        use_scalenorm   = True,
        attn_onnxable   = True,
        shift_tokens    = 1
    )
)

x = torch.randn(2, 1024, 4)
l = torch.randint(100, x.shape[1], size=(x.shape[0],))
m = torch.arange(x.shape[1]).unsqueeze(0) < l.unsqueeze(-1)
x = lm(x, mask=m)

pfeatherstone · Answer 5 · Mon Oct 16 2023 17:55:58 GMT+0800 (China Standard Time)

I'll file a new bug

Phil Wang · Answer 6 · Mon Oct 16 2023 22:47:54 GMT+0800 (China Standard Time)

@pfeatherstone oh oops, yup, should be fixed in 1.23.4