kimiyoung / transformer-xl

kimiyoung/transformer-xl Issues

Why do you pass query, key, and value through the same fc_layer in transformer_xl model?
Updated 6 months ago
Why pos_seq is in descending order as the input of positional embedding?
Updated 7 months ago2
wrong argument order of _update_mems function!
Updated 7 months ago1
About Using
Updated 8 months ago
StopIteration: Caught StopIteration in replica 0 on device 0.
Updated 8 months ago6
[W C:\w\b\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:963] Warning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (function masked_fill__cuda)
Updated 2 years ago
How to obtain the data?
Closed 2 years ago
enwiki8 18 layer model .sh file
Updated 2 years ago
RelPartialLearnableDecoder vs RelLearnableDecoder
Closed 2 years ago1
Differences in DecoderLayer and RelDecoderLayers/RelPartialDecoderLayers
Closed 2 years ago1
why i-j always>0
Updated 3 years ago
linux or windows？
Updated 3 years ago1
Relative Positional Encoding
Updated 3 years ago1
CUBLAS_STATUS_EXECUTION_FAILED and Blas GEMM launch failed
Updated 3 years ago
运行不起来
Updated 3 years ago
error
Updated 3 years ago
can you provide an example program running with Python script?
Updated 3 years ago1
Question: why is relative positional encoding computed with length M vs. L+M in the paper ?
Updated 3 years ago
Possibly Incorrect Calculation of Perplexity in Pytorch Implementation
Updated 3 years ago
Pytorch programs have been killed unexpectedlly
Updated 3 years ago
The output of _rel_shift(...) does not conform to paper ?
Closed 3 years ago1
Difference between ppl and bpc
Updated 3 years ago
Copyright missing
Updated 3 years ago
Why use memory with LMShuffledIterator
Updated 3 years ago
Sin/Cos concatenation in Positional Embeddings
Updated 3 years ago1
Different training steps in tf and pytorch
Closed 3 years ago3
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
Updated 3 years ago
Can someone please tell me on what dataset was transformer-XL pre-trained on?
Updated 4 years ago
tf 2.x and python 3.x
Updated 4 years ago1
TF base model memory requirements
Updated 4 years ago
can not reproduce sota wikitext103 results
Closed 4 years ago4
Pytorch questions!
Updated 4 years ago1
What is the meaning of 'bsz' in mem_transformer.py?
Closed 4 years ago
fine-tune text classification?
Updated 4 years ago
Perplexity not changes with tgt_len
Updated 4 years ago
PositionalEmbedding error
Closed 4 years ago2
Bounty: PTB Transformer-xl
Closed 4 years ago2
论文中的figure1有些看不懂，有大神可以解答一下吗？
Closed 4 years ago
question on TRAIN_BSZ used in tf/scripts/text8_large_tpu.sh
Updated 4 years ago
Best settings to train Transformer-XL from scratch
Closed 4 years ago
Result of wt103_base
Updated 4 years ago3
qkv computation
Updated 4 years ago
what if mems is None?
Updated 4 years ago
Clarify why evaluation will be much faster?
Updated 4 years ago2
How mem_len affects 1-billion lm experiment result
Updated 5 years ago
Possible bug in a call?
Updated 5 years ago3
Extending the model for sentiment analysis
Updated 5 years ago
Short question for the critical idea in transformer-xl
Closed 5 years ago
Computing just logits/log prob without getting loss from Adaptive Softmax
Updated 5 years ago
How to speed up the inference
Closed 5 years ago1