Lab 2, Part 1, Section 1.4: Missing `from_logits=True` argument?
BellaCoola opened this issue · comments
Bella Coola commented
Hello, I am looking at "Lab 2, Part 1: MNIST Digit Classification". In section "1.4 Training the model 2.0", there is the following code block:
# Rebuild the CNN model
cnn_model = build_cnn_model()
batch_size = 12
loss_history = mdl.util.LossHistory(smoothing_factor=0.95) # to record the evolution of the loss
plotter = mdl.util.PeriodicPlotter(sec=2, xlabel='Iterations', ylabel='Loss', scale='semilogy')
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-2) # define our optimizer
if hasattr(tqdm, '_instances'): tqdm._instances.clear() # clear if it exists
for idx in tqdm(range(0, train_images.shape[0], batch_size)):
# First grab a batch of training data and convert the input images to tensors
(images, labels) = (train_images[idx:idx+batch_size], train_labels[idx:idx+batch_size])
images = tf.convert_to_tensor(images, dtype=tf.float32)
# GradientTape to record differentiation operations
with tf.GradientTape() as tape:
#'''TODO: feed the images into the model and obtain the predictions'''
logits = cnn_model(images)
# logits = # TODO
#'''TODO: compute the categorical cross entropy loss
loss_value = tf.keras.backend.sparse_categorical_crossentropy(labels, logits)
# loss_value = tf.keras.backend.sparse_categorical_crossentropy('''TODO''', '''TODO''') # TODO
loss_history.append(loss_value.numpy().mean()) # append the loss to the loss_history record
plotter.plot(loss_history.get())
# Backpropagation
'''TODO: Use the tape to compute the gradient against all parameters in the CNN model.
Use cnn_model.trainable_variables to access these parameters.'''
grads = tape.gradient(loss_value, cnn_model.trainable_variables)
# grads = # TODO
optimizer.apply_gradients(zip(grads, cnn_model.trainable_variables))
Shouldn't the tf.keras.backend.sparse_categorical_crossentropy()
call also set from_logits
parameter to True
? (By default it is False
.) If no, why not?
TonySu commented
If you look at the initialization of cnn_model, you can see the final dense layer with a softmax activation function in place. As a result, the output of cnn_model is already a tensor from the softmax function which is the default type of parameter for sparse_categorical_crossentropy(from_logits=False).
Bella Coola commented
Thank you very much :)