This notebook presents Convolutional Neural Network applied to CIFAR-10 dataset.
Contents
import numpy as np
import matplotlib.pyplot as plt
Limit TensorFlow GPU memory usage
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
with tf.Session(config=config):
pass # init sessin with allow_growth
Load dataset and show example images
(x_train_raw, y_train_raw), (x_test_raw, y_test_raw) = tf.keras.datasets.cifar10.load_data()
class2txt = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
Show example images
fig, axes = plt.subplots(nrows=1, ncols=6, figsize=[16, 9])
for i in range(len(axes)):
axes[i].set_title(class2txt[y_train_raw[i, 0]])
axes[i].imshow(x_train_raw[i])
Normalize features
x_train = (x_train_raw - x_train_raw.mean()) / x_train_raw.std()
x_test = (x_test_raw - x_train_raw.mean()) / x_train_raw.std()
print('x_train.shape', x_train.shape)
print('x_test.shape', x_test.shape)
One-hot encode labels
y_train = tf.keras.utils.to_categorical(y_train_raw, num_classes=10)
y_test = tf.keras.utils.to_categorical(y_test_raw, num_classes=10)
print('y_train.shape', y_train.shape)
print(y_train[:3])
from tensorflow.keras.layers import Input, InputLayer, Conv2D, MaxPooling2D, Activation, Flatten, Dense, Dropout
Option #1: define ConvNet model using Keras Sequential API
# model = tf.keras.Sequential()
# model.add(InputLayer(input_shape=[32, 32, 3]))
# model.add(Conv2D(filters=16, kernel_size=3, padding='same', activation='elu'))
# model.add(MaxPooling2D(pool_size=[2,2], strides=[2, 2], padding='same'))
# model.add(Conv2D(filters=32, kernel_size=3, padding='same', activation='elu'))
# model.add(MaxPooling2D(pool_size=[2,2], strides=[2, 2], padding='same'))
# model.add(Conv2D(filters=64, kernel_size=3, padding='same', activation='elu'))
# model.add(MaxPooling2D(pool_size=[2,2], strides=[2, 2], padding='same'))
# model.add(Flatten())
# model.add(Dropout(0.2))
# model.add(Dense(512, activation='elu'))
# model.add(Dropout(0.2))
# model.add(Dense(10, activation='softmax'))
Option #2: define ConvNet using Keras Functional API (both options produce identical models)
X_input = Input(shape=[32, 32, 3])
X = Conv2D(filters=16, kernel_size=3, padding='same', activation='elu')(X_input)
X = MaxPooling2D(pool_size=[2,2], strides=[2, 2], padding='same')(X)
X = Conv2D(filters=32, kernel_size=3, padding='same', activation='elu')(X)
X = MaxPooling2D(pool_size=[2,2], strides=[2, 2], padding='same')(X)
X = Conv2D(filters=64, kernel_size=3, padding='same', activation='elu')(X)
X = MaxPooling2D(pool_size=[2,2], strides=[2, 2], padding='same')(X)
X = Flatten()(X)
X = Dropout(0.2)(X)
X = Dense(512, activation='elu')(X)
X = Dropout(0.2)(X)
X = Dense(10, activation='softmax')(X)
model = tf.keras.Model(inputs=X_input, outputs=X)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
Train model
hist = model.fit(x=x_train, y=y_train, batch_size=250, epochs=20,
validation_data=(x_test, y_test), verbose=2)
Final Results
NOTE: Keras calculates training loss differently than validation loss, from documentation (source):
Why is the training loss much higher than the testing loss?
A Keras model has two modes: training and testing. Regularization mechanisms, such as Dropout and L1/L2 weight regularization, are turned off at testing time.
Besides, the training loss is the average of the losses over each batch of training data. Because your model is changing over time, the loss over the first batches of an epoch is generally higher than over the last batches. On the other hand, the testing loss for an epoch is computed using the model as it is at the end of the epoch, resulting in a lower loss.
This is why train loss/accuracy below are much better than ones calcualted during training. This also why initially during training train loss is higher than validation loss.
Final results
loss, acc = model.evaluate(x_train, y_train, batch_size=250, verbose=0)
print(f'Accuracy on train set: {acc:.3f}')
loss, acc = model.evaluate(x_test, y_test, batch_size=250, verbose=0)
print(f'Accuracy on test set: {acc:.3f}')
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=[16, 6])
axes[0].plot(hist.history['loss'], label='train_loss')
axes[0].plot(hist.history['val_loss'], label='val_loss')
axes[0].set_title('Loss')
axes[0].legend()
axes[0].grid()
axes[1].plot(hist.history['acc'], label='train_acc')
axes[1].plot(hist.history['val_acc'], label='val_acc')
axes[1].set_title('Accuracy')
axes[1].legend()
axes[1].grid()
Looks like we have a bit of overfitting issue