Giters
kimiyoung
/
transformer-xl
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
3564
Watchers:
83
Issues:
132
Forks:
760
kimiyoung/transformer-xl Issues
Why do you pass query, key, and value through the same fc_layer in transformer_xl model?
Updated
6 months ago
Why pos_seq is in descending order as the input of positional embedding?
Updated
7 months ago
Comments count
2
wrong argument order of _update_mems function!
Updated
7 months ago
Comments count
1
About Using
Updated
8 months ago
StopIteration: Caught StopIteration in replica 0 on device 0.
Updated
8 months ago
Comments count
6
[W C:\w\b\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:963] Warning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (function masked_fill__cuda)
Updated
2 years ago
How to obtain the data?
Closed
2 years ago
enwiki8 18 layer model .sh file
Updated
2 years ago
RelPartialLearnableDecoder vs RelLearnableDecoder
Closed
2 years ago
Comments count
1
Differences in DecoderLayer and RelDecoderLayers/RelPartialDecoderLayers
Closed
2 years ago
Comments count
1
why i-j always>0
Updated
3 years ago
linux or windows?
Updated
3 years ago
Comments count
1
Relative Positional Encoding
Updated
3 years ago
Comments count
1
CUBLAS_STATUS_EXECUTION_FAILED and Blas GEMM launch failed
Updated
3 years ago
运行不起来
Updated
3 years ago
error
Updated
3 years ago
can you provide an example program running with Python script?
Updated
3 years ago
Comments count
1
Question: why is relative positional encoding computed with length M vs. L+M in the paper ?
Updated
3 years ago
Possibly Incorrect Calculation of Perplexity in Pytorch Implementation
Updated
3 years ago
Pytorch programs have been killed unexpectedlly
Updated
3 years ago
The output of _rel_shift(...) does not conform to paper ?
Closed
3 years ago
Comments count
1
Difference between ppl and bpc
Updated
3 years ago
Copyright missing
Updated
3 years ago
Why use memory with LMShuffledIterator
Updated
3 years ago
Sin/Cos concatenation in Positional Embeddings
Updated
3 years ago
Comments count
1
Different training steps in tf and pytorch
Closed
3 years ago
Comments count
3
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
Updated
3 years ago
Can someone please tell me on what dataset was transformer-XL pre-trained on?
Updated
4 years ago
tf 2.x and python 3.x
Updated
4 years ago
Comments count
1
TF base model memory requirements
Updated
4 years ago
can not reproduce sota wikitext103 results
Closed
4 years ago
Comments count
4
Pytorch questions!
Updated
4 years ago
Comments count
1
What is the meaning of 'bsz' in mem_transformer.py?
Closed
4 years ago
fine-tune text classification?
Updated
4 years ago
Perplexity not changes with tgt_len
Updated
4 years ago
PositionalEmbedding error
Closed
4 years ago
Comments count
2
Bounty: PTB Transformer-xl
Closed
4 years ago
Comments count
2
论文中的figure1有些看不懂,有大神可以解答一下吗?
Closed
4 years ago
question on TRAIN_BSZ used in tf/scripts/text8_large_tpu.sh
Updated
4 years ago
Best settings to train Transformer-XL from scratch
Closed
4 years ago
Result of wt103_base
Updated
4 years ago
Comments count
3
qkv computation
Updated
4 years ago
what if mems is None?
Updated
4 years ago
Clarify why evaluation will be much faster?
Updated
4 years ago
Comments count
2
How mem_len affects 1-billion lm experiment result
Updated
5 years ago
Possible bug in a call?
Updated
5 years ago
Comments count
3
Extending the model for sentiment analysis
Updated
5 years ago
Short question for the critical idea in transformer-xl
Closed
5 years ago
Computing just logits/log prob without getting loss from Adaptive Softmax
Updated
5 years ago
How to speed up the inference
Closed
5 years ago
Comments count
1
Previous
Next