Transformer hugginface BERT model not working
bksaini078 opened this issue · comments
While fine-tuning the transformers model i.e.transformers.TFDistilBertModel.from_pretrained(pretrained_weights)
I got this error message.
Can someone please help how to resolve this issue? Or, someone able to run the Transfomer BERT models in mac M1?
Reference code:
def BERT_model(max_len,pretrained_weights):
'''BERT model creation with pretrained weights
max_len: input length '''
# parameter declaration
learning_rate=2e-5
optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate)
bert=transformers.TFDistilBertModel.from_pretrained(pretrained_weights)
# declaring inputs, BERT take input_ids and attention_mask as input
input_ids= Input(shape=(max_len,),dtype=tf.int32,name='input_ids')
attention_mask=Input(shape=(max_len,),dtype=tf.int32,name='attention_mask')
distillbert= bert(input_ids,attention_mask=attention_mask)
x= distillbert[0][:,0,:]
x=tf.keras.layers.Dropout(0.2)(x)
x= tf.keras.layers.Dense(64)(x)
x=tf.keras.layers.Dense(32)(x)
output=tf.keras.layers.Dense(2,activation='sigmoid')(x)
model=Model(inputs=[input_ids,attention_mask],outputs=[output])
# compiling model
model.compile(optimizer=optimizer,loss='binary_crossentropy', metrics=['accuracy'])
return model
model.fit(x_train,y_train,batch_size=8,epochs=3,validation_split=0.2,verbose=1)
change layer to
x=tf.keras.layers.Dropout(0.2)(x)
x= tf.keras.layers.Dense(64)(x)
x=tf.keras.layers.Dense(32)(x)
x=tf.keras.layers.Dense(2,activation='sigmoid')(x)
output=tf.keras.layers.Dropout(0)(x)
Because if there's an Activation function on the last layer, there's a problem, so I'm going to add a Dropout layer that doesn't do anything on the last layer.
Thank you for your reply,
I tried the proposed approach. Unfortunately, it is showing the same error message.
Did you run the BERT model successfully on your end?
hey there @bksaini078, were you able to load the BERT from tensorflow-hub
? if so, would you mind showing how you did that? i'm unable to load the BERT model using hub.KerasLayer
(see #276)