JovianML - ZerotoGAN - Assignment 3
Classifying images of everyday objects using a neural network¶
The ability to try many different neural network architectures to address a problem is what makes deep learning really powerful, especially compared to shallow learning techniques like linear regression, logistic regression etc.
In this assignment, you will:
- Explore the CIFAR10 dataset: https://www.cs.toronto.edu/~kriz/cifar.html
- Set up a training pipeline to train a neural network on a GPU
- Experiment with different network architectures & hyperparameters
As you go through this notebook, you will find a ??? in certain places. Your job is to replace the ??? with appropriate code or values, to ensure that the notebook runs properly end-to-end. Try to experiment with different network structures and hypeparameters to get the lowest loss.
You might find these notebooks useful for reference, as you work through this notebook:
# Uncomment and run the commands below if imports fail
# !conda install numpy pandas pytorch torchvision cpuonly -c pytorch -y
# !pip install matplotlib --upgrade --quiet
import torch
import torchvision
import numpy as np
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.nn.functional as F
from torchvision.datasets import CIFAR10
from torchvision.transforms import ToTensor
from torchvision.utils import make_grid
from torch.utils.data.dataloader import DataLoader
from torch.utils.data import random_split
%matplotlib inline
# Project name used for jovian.commit
project_name = '03-cifar10-feedforward'
Exploring the CIFAR10 dataset¶
dataset = CIFAR10(root='data/', download=True, transform=ToTensor())
test_dataset = CIFAR10(root='data/', train=False, transform=ToTensor())
Files already downloaded and verified
Q: How many images does the training dataset contain?
dataset_size = len(dataset)
dataset_size
50000
Q: How many images does the testing dataset contain?
test_dataset_size = len(test_dataset)
test_dataset_size
10000
Q: How many output classes does the dataset contain? Can you list them?
Hint: Use dataset.classes
classes = dataset.classes
classes
['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
num_classes
10
Q: What is the shape of an image tensor from the dataset?
img, label = dataset[0]
img_shape = img.shape
img_shape
torch.Size([3, 32, 32])
Note that this dataset consists of 3-channel color images (RGB). Let us look at a sample image from the dataset. matplotlib
expects channels to be the last dimension of the image tensors (whereas in PyTorch they are the first dimension), so we'll the .permute
tensor method to shift channels to the last dimension. Let's also print the label for the image.
img, label = dataset[0]
plt.imshow(img.permute((1, 2, 0)))
print('Label (numeric):', label)
print('Label (textual):', classes[label])
Label (numeric): 6 Label (textual): frog
(Optional) Q: Can you determine the number of images belonging to each class?
Hint: Loop through the dataset.
label_dict = {}
for img, label in dataset:
if label in label_dict:
label_dict[label] += 1
else:
label_dict[label] = 1
label_dict
{0: 5000, 1: 5000, 2: 5000, 3: 5000, 4: 5000, 5: 5000, 6: 5000, 7: 5000, 8: 5000, 9: 5000}
Let's save our work to Jovian, before continuing.
# !pip install jovian --upgrade --quiet
# import jovian
# jovian.commit(project=project_name, environment=None)
Preparing the data for training¶
We'll use a validation set with 5000 images (10% of the dataset). To ensure we get the same validation set each time, we'll set PyTorch's random number generator to a seed value of 43.
torch.manual_seed(43)
val_size = 5000
train_size = len(dataset) - val_size
Let's use the random_split
method to create the training & validation sets
train_ds, val_ds = random_split(dataset, [train_size, val_size])
len(train_ds), len(val_ds)
(45000, 5000)
We can now create data loaders to load the data in batches.
batch_size=64
train_loader = DataLoader(train_ds, batch_size, shuffle=True, num_workers=4, pin_memory=True)
val_loader = DataLoader(val_ds, batch_size*2, num_workers=4, pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size*2, num_workers=4, pin_memory=True)
Let's visualize a batch of data using the make_grid
helper function from Torchvision.
for images, _ in train_loader:
print('images.shape:', images.shape)
plt.figure(figsize=(16,8))
plt.axis('off')
plt.imshow(make_grid(images, nrow=16).permute((1, 2, 0)))
break
images.shape: torch.Size([64, 3, 32, 32])
Can you label all the images by looking at them? Trying to label a random sample of the data manually is a good way to estimate the difficulty of the problem, and identify errors in labeling, if any.
Base Model class & Training on GPU¶
Let's create a base model class, which contains everything except the model architecture i.e. it wil not contain the __init__
and __forward__
methods. We will later extend this class to try out different architectures. In fact, you can extend this model to solve any image classification problem.
def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim=1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))
class ImageClassificationBase(nn.Module):
def training_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
return loss
def validation_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
acc = accuracy(out, labels) # Calculate accuracy
return {'val_loss': loss.detach(), 'val_acc': acc}
def validation_epoch_end(self, outputs):
batch_losses = [x['val_loss'] for x in outputs]
epoch_loss = torch.stack(batch_losses).mean() # Combine losses
batch_accs = [x['val_acc'] for x in outputs]
epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
def epoch_end(self, epoch, result):
print("Epoch [{}], val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, result['val_loss'], result['val_acc']))
We can also use the exact same training loop as before. I hope you're starting to see the benefits of refactoring our code into reusable functions.
def evaluate(model, val_loader):
outputs = [model.validation_step(batch) for batch in val_loader]
return model.validation_epoch_end(outputs)
def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
history = []
optimizer = opt_func(model.parameters(), lr)
for epoch in range(epochs):
# Training Phase
for batch in train_loader:
loss = model.training_step(batch)
loss.backward()
optimizer.step()
optimizer.zero_grad()
# Validation phase
result = evaluate(model, val_loader)
model.epoch_end(epoch, result)
history.append(result)
return history
Finally, let's also define some utilities for moving out data & labels to the GPU, if one is available.
torch.cuda.is_available()
True
def get_default_device():
"""Pick GPU if available, else CPU"""
if torch.cuda.is_available():
return torch.device('cuda')
else:
return torch.device('cpu')
device = get_default_device()
device
device(type='cuda')
def to_device(data, device):
"""Move tensor(s) to chosen device"""
if isinstance(data, (list,tuple)):
return [to_device(x, device) for x in data]
return data.to(device, non_blocking=True)
class DeviceDataLoader():
"""Wrap a dataloader to move data to a device"""
def __init__(self, dl, device):
self.dl = dl
self.device = device
def __iter__(self):
"""Yield a batch of data after moving it to device"""
for b in self.dl:
yield to_device(b, self.device)
def __len__(self):
"""Number of batches"""
return len(self.dl)
Let us also define a couple of helper functions for plotting the losses & accuracies.
def plot_losses(history):
losses = [x['val_loss'] for x in history]
plt.plot(losses, '-x')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.title('Loss vs. No. of epochs');
def plot_accuracies(history):
accuracies = [x['val_acc'] for x in history]
plt.plot(accuracies, '-x')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.title('Accuracy vs. No. of epochs');
Let's move our data loaders to the appropriate device.
train_loader = DeviceDataLoader(train_loader, device)
val_loader = DeviceDataLoader(val_loader, device)
test_loader = DeviceDataLoader(test_loader, device)
Training the model¶
We will make several attempts at training the model. Each time, try a different architecture and a different set of learning rates. Here are some ideas to try:
- Increase or decrease the number of hidden layers
- Increase of decrease the size of each hidden layer
- Try different activation functions
- Try training for different number of epochs
- Try different learning rates in every epoch
What's the highest validation accuracy you can get to? Can you get to 50% accuracy? What about 60%?
input_size = 3*32*32
output_size = 10
Q: Extend the ImageClassificationBase
class to complete the model definition.
Hint: Define the __init__
and forward
methods.
class CIFAR10Model(ImageClassificationBase):
def __init__(self):
super().__init__()
self.linear1 = nn.Linear(input_size, 1024)
self.linear2 = nn.Linear(1024, 256)
self.linear3 = nn.Linear(256, 64)
self.linear4 = nn.Linear(64, 10)
def forward(self, xb):
# Flatten images into vectors
out = xb.view(xb.size(0), -1)
# Apply layers & activation functions
out = self.linear1(out)
out = F.relu(out)
out = self.linear2(out)
out = F.relu(out)
out = self.linear3(out)
out = F.relu(out)
out = self.linear4(out)
out = F.relu(out)
return out
You can now instantiate the model, and move it the appropriate device.
model = to_device(CIFAR10Model(), device)
Before you train the model, it's a good idea to check the validation loss & accuracy with the initial set of weights.
history = [evaluate(model, val_loader)]
history
[{'val_acc': 0.08515624701976776, 'val_loss': 2.304067373275757}]
Q: Train the model using the fit
function to reduce the validation loss & improve accuracy.
Leverage the interactive nature of Jupyter to train the model in multiple phases, adjusting the no. of epochs & learning rate each time based on the result of the previous training phase.
history += fit(60, 0.02, model, train_loader, val_loader)
Epoch [0], val_loss: 2.1448, val_acc: 0.2428 Epoch [1], val_loss: 1.8800, val_acc: 0.3271 Epoch [2], val_loss: 1.7938, val_acc: 0.3658 Epoch [3], val_loss: 1.8431, val_acc: 0.3375 Epoch [4], val_loss: 1.8325, val_acc: 0.3348 Epoch [5], val_loss: 1.6583, val_acc: 0.4047 Epoch [6], val_loss: 1.7106, val_acc: 0.3762 Epoch [7], val_loss: 2.0561, val_acc: 0.3441 Epoch [8], val_loss: 1.7518, val_acc: 0.3824 Epoch [9], val_loss: 1.6794, val_acc: 0.4074 Epoch [10], val_loss: 2.5882, val_acc: 0.2203 Epoch [11], val_loss: 2.2880, val_acc: 0.2793 Epoch [12], val_loss: 1.6375, val_acc: 0.4135 Epoch [13], val_loss: 1.7419, val_acc: 0.4084 Epoch [14], val_loss: 1.6735, val_acc: 0.4018 Epoch [15], val_loss: 1.6194, val_acc: 0.4350 Epoch [16], val_loss: 1.6924, val_acc: 0.4156 Epoch [17], val_loss: 2.0971, val_acc: 0.3502 Epoch [18], val_loss: 2.4679, val_acc: 0.3209 Epoch [19], val_loss: 1.5562, val_acc: 0.4354 Epoch [20], val_loss: 1.5954, val_acc: 0.4572 Epoch [21], val_loss: 1.9976, val_acc: 0.3449 Epoch [22], val_loss: 1.7986, val_acc: 0.4445 Epoch [23], val_loss: 1.7071, val_acc: 0.4070 Epoch [24], val_loss: 1.7218, val_acc: 0.4117 Epoch [25], val_loss: 2.0925, val_acc: 0.3662 Epoch [26], val_loss: 2.1157, val_acc: 0.3715 Epoch [27], val_loss: 2.2680, val_acc: 0.3611 Epoch [28], val_loss: 2.3626, val_acc: 0.3305 Epoch [29], val_loss: 1.5499, val_acc: 0.4576 Epoch [30], val_loss: 2.0563, val_acc: 0.3492 Epoch [31], val_loss: 1.4338, val_acc: 0.5041 Epoch [32], val_loss: 1.3924, val_acc: 0.5338 Epoch [33], val_loss: 1.8142, val_acc: 0.4365 Epoch [34], val_loss: 1.6720, val_acc: 0.4625 Epoch [35], val_loss: 2.0500, val_acc: 0.4355 Epoch [36], val_loss: 1.5103, val_acc: 0.5080 Epoch [37], val_loss: 1.6159, val_acc: 0.4799 Epoch [38], val_loss: 2.1901, val_acc: 0.3707 Epoch [39], val_loss: 1.6890, val_acc: 0.4766 Epoch [40], val_loss: 2.0853, val_acc: 0.4020 Epoch [41], val_loss: 2.0444, val_acc: 0.4209 Epoch [42], val_loss: 2.2192, val_acc: 0.3928 Epoch [43], val_loss: 2.1474, val_acc: 0.3975 Epoch [44], val_loss: 2.4251, val_acc: 0.4068 Epoch [45], val_loss: 1.7933, val_acc: 0.4520 Epoch [46], val_loss: 2.5047, val_acc: 0.3846 Epoch [47], val_loss: 2.4357, val_acc: 0.3744 Epoch [48], val_loss: 1.9839, val_acc: 0.4564 Epoch [49], val_loss: 2.0247, val_acc: 0.4459 Epoch [50], val_loss: 1.9958, val_acc: 0.4361 Epoch [51], val_loss: 3.0626, val_acc: 0.3533 Epoch [52], val_loss: 2.0561, val_acc: 0.4496 Epoch [53], val_loss: 3.2849, val_acc: 0.3752 Epoch [54], val_loss: 1.8146, val_acc: 0.5195 Epoch [55], val_loss: 2.0487, val_acc: 0.4654 Epoch [56], val_loss: 2.7021, val_acc: 0.4031 Epoch [57], val_loss: 2.3594, val_acc: 0.4463 Epoch [58], val_loss: 3.9820, val_acc: 0.3424 Epoch [59], val_loss: 2.1061, val_acc: 0.4496
history += fit(30, 0.01, model, train_loader, val_loader)
Epoch [0], val_loss: 1.6622, val_acc: 0.5520 Epoch [1], val_loss: 1.8281, val_acc: 0.5355 Epoch [2], val_loss: 1.7503, val_acc: 0.5508 Epoch [3], val_loss: 2.0752, val_acc: 0.5174 Epoch [4], val_loss: 1.9235, val_acc: 0.5420 Epoch [5], val_loss: 2.7873, val_acc: 0.4779 Epoch [6], val_loss: 2.8117, val_acc: 0.4777 Epoch [7], val_loss: 1.8542, val_acc: 0.5500 Epoch [8], val_loss: 1.9944, val_acc: 0.5504 Epoch [9], val_loss: 2.1744, val_acc: 0.5381 Epoch [10], val_loss: 2.8530, val_acc: 0.5143 Epoch [11], val_loss: 2.0436, val_acc: 0.5535 Epoch [12], val_loss: 2.0245, val_acc: 0.5508 Epoch [13], val_loss: 2.5298, val_acc: 0.5109 Epoch [14], val_loss: 2.3207, val_acc: 0.5227 Epoch [15], val_loss: 3.0861, val_acc: 0.4658 Epoch [16], val_loss: 2.1186, val_acc: 0.5445 Epoch [17], val_loss: 2.4421, val_acc: 0.5301 Epoch [18], val_loss: 2.2425, val_acc: 0.5572 Epoch [19], val_loss: 2.5142, val_acc: 0.5305 Epoch [20], val_loss: 2.3540, val_acc: 0.5400 Epoch [21], val_loss: 2.6571, val_acc: 0.5201 Epoch [22], val_loss: 2.3879, val_acc: 0.5480 Epoch [23], val_loss: 2.3497, val_acc: 0.5494 Epoch [24], val_loss: 2.4331, val_acc: 0.5477 Epoch [25], val_loss: 2.3498, val_acc: 0.5512 Epoch [26], val_loss: 2.5937, val_acc: 0.5342 Epoch [27], val_loss: 2.5990, val_acc: 0.5326 Epoch [28], val_loss: 2.5532, val_acc: 0.5461 Epoch [29], val_loss: 2.6709, val_acc: 0.5441
history += fit(15, 0.002, model, train_loader, val_loader)
Epoch [0], val_loss: 2.4683, val_acc: 0.5561 Epoch [1], val_loss: 2.4832, val_acc: 0.5566 Epoch [2], val_loss: 2.4999, val_acc: 0.5549 Epoch [3], val_loss: 2.4983, val_acc: 0.5551 Epoch [4], val_loss: 2.5258, val_acc: 0.5555 Epoch [5], val_loss: 2.5252, val_acc: 0.5535 Epoch [6], val_loss: 2.5271, val_acc: 0.5559 Epoch [7], val_loss: 2.5360, val_acc: 0.5537 Epoch [8], val_loss: 2.5539, val_acc: 0.5545 Epoch [9], val_loss: 2.5515, val_acc: 0.5549 Epoch [10], val_loss: 2.5604, val_acc: 0.5518 Epoch [11], val_loss: 2.5615, val_acc: 0.5562 Epoch [12], val_loss: 2.5760, val_acc: 0.5539 Epoch [13], val_loss: 2.5745, val_acc: 0.5533 Epoch [14], val_loss: 2.5912, val_acc: 0.5547
history += fit(10, 0.001, model, train_loader, val_loader)
Epoch [0], val_loss: 2.5911, val_acc: 0.5551 Epoch [1], val_loss: 2.5921, val_acc: 0.5543 Epoch [2], val_loss: 2.5933, val_acc: 0.5553 Epoch [3], val_loss: 2.5947, val_acc: 0.5547 Epoch [4], val_loss: 2.5965, val_acc: 0.5547 Epoch [5], val_loss: 2.6009, val_acc: 0.5541 Epoch [6], val_loss: 2.6016, val_acc: 0.5545 Epoch [7], val_loss: 2.6080, val_acc: 0.5555 Epoch [8], val_loss: 2.6125, val_acc: 0.5541 Epoch [9], val_loss: 2.6202, val_acc: 0.5535
Plot the losses and the accuracies to check if you're starting to hit the limits of how well your model can perform on this dataset. You can train some more if you can see the scope for further improvement.
plot_losses(history)
plot_accuracies(history)
Finally, evaluate the model on the test dataset report its final performance.
evaluate(model, test_loader)
{'val_acc': 0.5659612417221069, 'val_loss': 2.5425965785980225}
Are you happy with the accuracy? Record your results by completing the section below, then you can come back and try a different architecture & hyperparameters.
Recoding your results¶
As your perform multiple experiments, it's important to record the results in a systematic fashion, so that you can review them later and identify the best approaches that you might want to reproduce or build upon later.
Q: Describe the model's architecture with a short summary.
E.g. "3 layers (16,32,10)"
(16, 32 and 10 represent output sizes of each layer)
# arch = "3 layers (1024,512,10)"
Q: Provide the list of learning rates used while training.
# lrs = [0.1, 0.01, 0.001, 0.0001]
Q: Provide the list of no. of epochs used while training.
# epochs = [60, 30, 15, 10]
Q: What were the final test accuracy & test loss?
# test_acc = 0.56
# test_loss = 1.71
Finally, let's save the trained model weights to disk, so we can use this model later.
torch.save(model.state_dict(), 'cifar10-feedforward.pth')
The jovian
library provides some utility functions to keep your work organized. With every version of your notebok, you can attach some hyperparameters and metrics from your experiment.
# Clear previously recorded hyperparams & metrics
# jovian.reset()
# jovian.log_hyperparams(arch=arch,
# lrs=lrs,
# epochs=epochs)
# jovian.log_metrics(test_loss=test_loss, test_acc=test_acc)
Finally, we can commit the notebook to Jovian, attaching the hypeparameters, metrics and the trained model weights.
# jovian.commit(project=project_name, outputs=['cifar10-feedforward.pth'], environment=None)
Once committed, you can find the recorded metrics & hyperprameters in the "Records" tab on Jovian. You can find the saved model weights in the "Files" tab.
Continued experimentation¶
Now go back up to the "Training the model" section, and try another network architecture with a different set of hyperparameters. As you try different experiments, you will start to build an undestanding of how the different architectures & hyperparameters affect the final result. Don't worry if you can't get to very high accuracy, we'll make some fundamental changes to our model in the next lecture.
Once you have tried multiple experiments, you can compare your results using the "Compare" button on Jovian.
(Optional) Write a blog post¶
Writing a blog post is the best way to further improve your understanding of deep learning & model training, because it forces you to articulate your thoughts clearly. Here'are some ideas for a blog post:
- Report the results given by different architectures on the CIFAR10 dataset
- Apply this training pipeline to a different dataset (it doesn't have to be images, or a classification problem)
- Improve upon your model from Assignment 2 using a feedfoward neural network, and write a sequel to your previous blog post
- Share some Strategies for picking good hyperparameters for deep learning
- Present a summary of the different steps involved in training a deep learning model with PyTorch
- Implement the same model using a different deep learning library e.g. Keras ( https://keras.io/ ), and present a comparision.