google / maxtext

A simple, performant and scalable Jax LLM!

google/maxtext Issues

Long Context
Updated 3 days ago
Multihost training collapses from time to time when loading the next batch
Updated 3 days ago3
Supported features
Updated 12 days ago18
FlashAttention Support - TPUv3
Updated 12 days ago
Outdated links in `First_run.md`
Closed 13 days ago1
Support for T5
Updated 14 days ago6
Inconsistent environment variable names
Updated 15 days ago
Gemma 2 support
Updated 16 days ago2
https://us-python.pkg.dev/gce-ai-infra/maxtext-build-support-packages/simple/ not public
Updated 22 days ago
How to implement 1F1B pipeline parallelism in Jax?
Updated 22 days ago
Eval on C4?
Closed a month ago1
Support target masking (aka loss masking or label masking) for SFT datasets
Updated a month ago
Inconsistent code formatting
Updated a month ago
`hf_access_token` only effective for loading gated datasets, not gated tokenizers
Updated a month ago
Support LoRA training
Updated 2 months ago1
Llama3
Updated 2 months ago1
Converting checkpoints
Closed 2 months ago25
llama_or_mistral_ckpt.py file requiring checkpoints in local file system
Updated 2 months ago
Update Inference Microbenchmark scripts
Closed 2 months ago
Question: Gradient Accumulation
Updated 3 months ago4
How to convert a model to parameter only checkpoints (unscanned) on a CPU VM
Closed 3 months ago2
DEFAULT_MASK_VALUE causes gradient explosion and nan loss on deep models
Updated 3 months ago1
Document use of Mistral
Closed 3 months ago6
Reproducing pure computation TFLOPs
Closed 3 months ago4
Asignación
Closed 3 months ago1
Clarification: how does Llama-2-7b fit on a v4-8 when using Adam?
Closed 3 months ago3
Consolidate inference related logic under jetstream-maxtext
Closed 3 months ago1
Support for RecurrentGemma
Updated 3 months ago
Cannot do inference in float32
Updated 3 months ago2
Support beam search
Updated 3 months ago
Support Qwen1.5
Closed 4 months ago1
Gemma instructions were deleted in commit
Closed 4 months ago2
Issues running test_llama2_7b.sh on TPU VM v3-8
Closed 4 months ago1
`attend_dtype` not used
Updated 4 months ago1
Create a user friendly inference demo
Updated 4 months ago
TFDS Data Processing Pipline
Closed 4 months ago6
Convert Gemma weights with scan layers
Closed 4 months ago2
Grain vs. `tf.data` Input Pipeline
Closed 4 months ago3
Convert Gemma weights
Closed 4 months ago3
[Bug] adam_pax has reuse donated buffer warning
Closed 5 months ago7
Compatibility issue with tensorflow>=2.15.1 on GPU
Updated 5 months ago
Pin `aqtp==0.5.0` when building on top of base image with python<3.11
Closed 5 months ago2
[Question] are there some some train replication results?
Closed 5 months ago5
Consider installing local CUDA variant when building GPU image
Updated 5 months ago
sharding options with grain
Closed 5 months ago6
How to use GPT2 tokenizer
Closed 5 months ago1
setup.sh runs `rm ~/jax`
Closed 5 months ago5
Can AQT be used to calculate qk score?
Closed 5 months ago1
[Question] Loading in a HF Dataset
Closed 5 months ago
[Question] convert HF tokenizer to maxtext tokenizer?
Closed 5 months ago3