Giters
google
/
maxtext
A simple, performant and scalable Jax LLM!
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
1299
Watchers:
23
Issues:
60
Forks:
226
google/maxtext Issues
Question: Gradient Accumulation
Updated
16 days ago
Comments count
4
How to convert a model to parameter only checkpoints (unscanned) on a CPU VM
Closed
17 days ago
Comments count
2
Converting checkpoints
Updated
18 days ago
Comments count
20
Supported features
Updated
19 days ago
Comments count
4
DEFAULT_MASK_VALUE causes gradient explosion and nan loss on deep models
Updated
25 days ago
Comments count
1
Document use of Mistral
Closed
a month ago
Comments count
6
Reproducing pure computation TFLOPs
Closed
a month ago
Comments count
4
Asignación
Closed
a month ago
Comments count
1
Clarification: how does Llama-2-7b fit on a v4-8 when using Adam?
Closed
a month ago
Comments count
3
Consolidate inference related logic under jetstream-maxtext
Closed
a month ago
Comments count
1
Support LoRA training
Updated
a month ago
Support for RecurrentGemma
Updated
a month ago
Cannot do inference in float32
Updated
a month ago
Comments count
2
Support beam search
Updated
a month ago
Support for T5
Updated
a month ago
Comments count
4
Support Qwen1.5
Closed
a month ago
Comments count
1
Gemma instructions were deleted in commit
Closed
2 months ago
Comments count
2
Issues running test_llama2_7b.sh on TPU VM v3-8
Closed
2 months ago
Comments count
1
`attend_dtype` not used
Updated
2 months ago
Comments count
1
Create a user friendly inference demo
Updated
2 months ago
TFDS Data Processing Pipline
Closed
2 months ago
Comments count
6
Convert Gemma weights with scan layers
Closed
2 months ago
Comments count
2
Grain vs. `tf.data` Input Pipeline
Closed
2 months ago
Comments count
3
Convert Gemma weights
Closed
2 months ago
Comments count
3
[Bug] adam_pax has reuse donated buffer warning
Closed
2 months ago
Comments count
7
Compatibility issue with tensorflow>=2.15.1 on GPU
Updated
2 months ago
Pin `aqtp==0.5.0` when building on top of base image with python<3.11
Closed
2 months ago
Comments count
2
[Question] are there some some train replication results?
Closed
2 months ago
Comments count
5
Consider installing local CUDA variant when building GPU image
Updated
2 months ago
sharding options with grain
Closed
3 months ago
Comments count
6
How to use GPT2 tokenizer
Closed
3 months ago
Comments count
1
setup.sh runs `rm ~/jax`
Closed
3 months ago
Comments count
5
Can AQT be used to calculate qk score?
Closed
3 months ago
Comments count
1
[Question] Loading in a HF Dataset
Closed
3 months ago
[Question] convert HF tokenizer to maxtext tokenizer?
Closed
3 months ago
Comments count
3
A pip error occurs when running setup.sh.
Closed
3 months ago
Comments count
1
[request] bloom (alibi) model implementation
Closed
3 months ago
Comments count
1
Problems with a parameter checkpoint after training llama2-7b
Closed
3 months ago
Comments count
1
Issues running decode example from readme
Closed
3 months ago
Comments count
1
Issues running end_to_end/test_mistral.sh
Closed
3 months ago
Comments count
7
Should non-pod multihost be possible on TPU v2s/v3s?
Closed
4 months ago
Comments count
3
Long sequences are dropped rather than trimmed
Closed
4 months ago
Comments count
2
`nextrng` not checkpointed, consider using `fold_in(config.seed, step)`
Closed
5 months ago
Comments count
2
XlaRuntimeError when training with bfloat16 activations on TPU v3-8
Closed
6 months ago
Comments count
3
Local development instructions don't work
Closed
7 months ago
Comments count
5
Do the Attentions / MLPs run in parallel?
Closed
7 months ago
Comments count
1
Jobs in kubernetes exceeds the limit of 40 characters
Closed
7 months ago
Comments count
4
TPUv2-8 multislice
Closed
7 months ago
Comments count
2
maxtext on Colab TPU
Closed
9 months ago
Comments count
1
You don't have to
Closed
9 months ago
Comments count
1
Previous
Next