Kaggle TPU loading/ initialization fails

Question

Kaggle TPU loading/ initialization fails

mnansary opened this issue 2 years ago · comments

MD. Nazmuddoha Ansary commented 2 years ago

Really awesome work.
I was able to work with the tf-hub layer with the following code in kaggle TPU

    load_locally = tf.saved_model.LoadOptions(experimental_io_device='/job:localhost')
    pretrained_layer = hub.KerasLayer("https://tfhub.dev/vasudevgupta7/wav2vec2/1",load_options=load_locally,trainable=True)
    inputs = tf.keras.Input(shape=cfg.audio_shape)
    states = pretrained_layer(inputs)
    logits= tf.keras.layers.Dense(cfg.vocab_len)(states)
    model = tf.keras.Model(inputs=inputs, outputs=logits)

However, I need to load a custom model (which converts fine with your provided code and is available here: https://www.kaggle.com/code/nazmuddhohaansary/test-conversion )

but while working with TPU, this fails

tf_model = Wav2Vec2Model(config)
tf_model.summary()

This specific section worked for GPU after conversion in this script: https://www.kaggle.com/code/nazmuddhohaansary/test-conversion
but fails in this
kaggle notebook: https://www.kaggle.com/code/nazmuddhohaansary/tpu-loading-test?scriptVersionId=100483620

jit_complie is not a recognized parameter in kaggle TPU @tf.function.

please help. Any guidance is much appreciated. Thanks in advance.

Vasudev Gupta · Answer 1 · Tue Jul 12 2022 10:45:17 GMT+0800 (China Standard Time)

hello @mnansary,

ideally, when doing initialization on TPUs, it should call https://github.com/vasudevgupta7/gsoc-wav2vec2/blob/3d023d39f36a63c2e5b6fdb85219dcd3f6f35e76/src/wav2vec2/modeling.py#L99 but for some reason, it's calling the except statement. Unfortunately, I wouldn't be able to find time to debug that atm :(

But, one simple workaround would be to pass input_shape=None when doing model initialization and build the model in a regular way by yourself.

tf_model = Wav2Vec2Model(config, input_shape=None)

# but now you need to build model weights by yourself before calling `.summary`
# just like any other regular TensorFlow model

Let me know if that works or if you still get an error.

Vasudev Gupta · Answer 2 · Tue Jul 12 2022 11:19:34 GMT+0800 (China Standard Time)

Also, you can refer this script: https://github.com/vasudevgupta7/gsoc-wav2vec2/blob/main/src/main.py

this script works end2end on TPU nodes (at least tested for Cloud TPU).

MD. Nazmuddoha Ansary · Answer 3 · Wed Jul 13 2022 04:10:03 GMT+0800 (China Standard Time)

Hi, thanks a lot for your instructive replies. Really appreciated.
I really couldn't get it to work with input_shape=None ( in CPU and TPU). It behaved as you expected in GPU.
I ended up creating a wav2vec2 layer instead of a model class (https://github.com/mnansary/gsoc-wav2vec2/blob/main/src/wav2vec2/layer.py) and now it works as I wanted it. However, i don't think what I did is a good thing but it sort of serves my purpose, so thank you. Closing this issue. Again really awesome work.