google / maxtext

A simple, performant and scalable Jax LLM!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Local development instructions don't work

finbarrtimbers opened this issue · comments

Hello! I'm not able to use the local development instructions. Using a v4-8, I have cloned the repo, created a new conda env, and run the setup script inside the new env:

$ git clone git@github.com:google/maxtext.git
$ conda create --name maxtext python=3.10
$ bash setup.sh

However, when I try to run decoding, it fails:

$ python3 MaxText/decode.py MaxText/configs/base.yml run_name=test
... (long traceback)
AssertionError: Failed to construct dataset c4Dataset c4 cannot be loaded at version 3.0.1, only: 2.3.0, 2.2.1, 2.2.0.

Do you have a work around for this? Are there perhaps instructions for a Docker-based install?

If you change the dataset version in MaxText/configs/base.yml to read

dataset_name: 'c4/en:2.3.0'
eval_dataset_name: 'c4/en:2.3.0'

then you get a new error:

AssertionError: Dataset c4: could not find data in /home/finbarrtimbers/tensorflow_datasets. Please make sure to call dataset_builder.download_and_prepare(), or pass download=True to tfds.load() before trying to access the tf.data.Dataset object.

Is it possible to run decoding without having run training first on the same machine?

Have you created a dataset? Annoyingly we need to make a Dataset prior to decode.py.
https://github.com/google/maxtext#getting-started-download-dataset-and-configure

I believe dataset_type=synthetic as a flag might also fix this. (We're also in some chat rooms so feel free to hop into those!)

P.S. -- MaxText is highly optimized for training. But inference is still very primitive!

Have you created a dataset? Annoyingly we need to make a Dataset prior to decode.py.
https://github.com/google/maxtext#getting-started-download-dataset-and-configure

Will do- thanks!

We're also in some chat rooms so feel free to hop into those!

Ah, thanks- I wasn't sure of the proper etiquette :)

P.S. -- MaxText is highly optimized for training. But inference is still very primitive!

I'm just looking for a Jax KV cache implementation to draw inspiration from, so this is good.

Glad to help!