Giters
facebookresearch
/
adaptive-span
Transformer training code for sequential tasks
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
605
Watchers:
17
Issues:
21
Forks:
60
facebookresearch/adaptive-span Issues
The way you preprocess data is different from that of Transformer-XL
Closed
5 years ago
Comments count
5
A question about parameter z_t
Closed
2 years ago
Comments count
9
Understanding adaptive-span loss
Closed
2 years ago
Comments count
7
Generate text
Closed
2 years ago
Comments count
1
What does batch-size mean using distributed trainning?
Closed
2 years ago
Comments count
1
Accept a mask to remove padding in batch
Closed
2 years ago
Comments count
1
confuse
Closed
2 years ago
Comments count
1
what is the cache_size mean?
Closed
2 years ago
Comments count
1
Where to find the pretrained checkpoint?
Closed
2 years ago
Comments count
1
Why does the hyper-parameter --batch-sz affect the bpc during evaluation?
Closed
2 years ago
Comments count
3
Please convert to a permissive license
Updated
4 years ago
Understanding graphs from papers
Updated
4 years ago
BPC
Closed
4 years ago
Comments count
6
Warning with PyTorch 1.4
Closed
4 years ago
Comments count
4
Queries about adaptive span
Closed
4 years ago
Comments count
1
Compute attention span of individual attention heads
Closed
4 years ago
Comments count
1
Will adaptive-span have faster predictive speeds than gpt-2?
Closed
4 years ago
Comments count
2
why not compare other local attention methods?
Closed
5 years ago
Comments count
2
did you try to start with maximum possibile cache size
Closed
5 years ago
Comments count
2
Question: How to reduce the memory in this project
Closed
5 years ago
Comments count
7
Using mask can reduce FLOPs?
Closed
5 years ago
Comments count
2