lucidrains / x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`layer_mem` is unbound (when called from `ContinuousTransformerWrapper`)

amitkparekh opened this issue · comments

  File "/opt/conda/lib/python3.11/site-packages/x_transformers/continuous.py", line 129, in forward
    x, intermediates = self.attn_layers(x, mask = mask, mems = mems, mem_masks = mem_masks, return_hiddens = True, **kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/x_transformers/x_transformers.py", line 1330, in forward
    if exists(layer_mem):
              ^^^^^^^^^
UnboundLocalError: cannot access local variable 'layer_mem' where it is not associated with a value

Got this error when trying to run a transformer with a custom order of attn_layers from the ContinuousTransformerWrapper.

I'm not using layer_mem so for me, I just commented out L1330-1331 in x_transformers.py, but as I'm not entirely sure of its uses, I didn't want to just submit a PR for that alone as I presume you have better insight and know a better solution.

@amitkparekh hey Amit! yes you are right and i put in a fix

that's really awesome you are using an undocumented feature! are you doing a weight tied transformer?

Thanks for the quick fix!

I'll be honest, I implemented this part of my research a while back and I'm struggling to remember/find my notes on the reason why I'm using the ContinuousTransformerWrapper over another. I implemented this before 1.23.5.

I think it's primarily because I've embedded things outside and I needed to provide multiple things through into the transformer that's instantiated with custom_layers within the attn_layers?

@amitkparekh no problem!

oh i see, you are using a custom ordering of the layers, not the weight tying feature

did you hack the code so that it accepts custom modules from outside the repo? i was about to get around to that

Only a little for my needs (through wrappers and such). But being able to create modules and stacks and combine the various possible features into some frankenstein through compositionality would be brilliant and is something I have personally wished for! But I get that refactoring this repo into that would probably be a massive undertaking.

If you're ever open to going that far, I'd be happy to help contribute!

@amitkparekh no not too hard, i've been thinking about it for a while, and it may take less than 100 loc to execute for the 80% use case

will keep you updated!

Let me know if you need a second pair of hands! I feel like this repo is like allennlp but for transformer architectures, so anything that makes it that easier to build custom but robust models would be amazing 🤩