JovianML - ZerotoGAN - Assignment 3

Open In Colab

Classifying images of everyday objects using a neural network

The ability to try many different neural network architectures to address a problem is what makes deep learning really powerful, especially compared to shallow learning techniques like linear regression, logistic regression etc.

In this assignment, you will:

  1. Explore the CIFAR10 dataset: https://www.cs.toronto.edu/~kriz/cifar.html
  2. Set up a training pipeline to train a neural network on a GPU
  3. Experiment with different network architectures & hyperparameters

As you go through this notebook, you will find a ??? in certain places. Your job is to replace the ??? with appropriate code or values, to ensure that the notebook runs properly end-to-end. Try to experiment with different network structures and hypeparameters to get the lowest loss.

You might find these notebooks useful for reference, as you work through this notebook:

In [0]:
# Uncomment and run the commands below if imports fail
# !conda install numpy pandas pytorch torchvision cpuonly -c pytorch -y
# !pip install matplotlib --upgrade --quiet
In [0]:
import torch
import torchvision
import numpy as np
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.nn.functional as F
from torchvision.datasets import CIFAR10
from torchvision.transforms import ToTensor
from torchvision.utils import make_grid
from torch.utils.data.dataloader import DataLoader
from torch.utils.data import random_split
%matplotlib inline
In [0]:
# Project name used for jovian.commit
project_name = '03-cifar10-feedforward'

Exploring the CIFAR10 dataset

In [4]:
dataset = CIFAR10(root='data/', download=True, transform=ToTensor())
test_dataset = CIFAR10(root='data/', train=False, transform=ToTensor())
Files already downloaded and verified

Q: How many images does the training dataset contain?

In [5]:
dataset_size = len(dataset)
dataset_size
Out[5]:
50000

Q: How many images does the testing dataset contain?

In [6]:
test_dataset_size = len(test_dataset)
test_dataset_size
Out[6]:
10000

Q: How many output classes does the dataset contain? Can you list them?

Hint: Use dataset.classes

In [7]:
classes = dataset.classes
classes
Out[7]:
['airplane',
 'automobile',
 'bird',
 'cat',
 'deer',
 'dog',
 'frog',
 'horse',
 'ship',
 'truck']
In [8]:
num_classes = len(classes)
num_classes
Out[8]:
10

Q: What is the shape of an image tensor from the dataset?

In [9]:
img, label = dataset[0]
img_shape = img.shape
img_shape
Out[9]:
torch.Size([3, 32, 32])

Note that this dataset consists of 3-channel color images (RGB). Let us look at a sample image from the dataset. matplotlib expects channels to be the last dimension of the image tensors (whereas in PyTorch they are the first dimension), so we'll the .permute tensor method to shift channels to the last dimension. Let's also print the label for the image.

In [10]:
img, label = dataset[0]
plt.imshow(img.permute((1, 2, 0)))
print('Label (numeric):', label)
print('Label (textual):', classes[label])
Label (numeric): 6
Label (textual): frog

(Optional) Q: Can you determine the number of images belonging to each class?

Hint: Loop through the dataset.

In [11]:
label_dict = {}
for img, label in dataset:
    if label in label_dict:
        label_dict[label] += 1
    else:
        label_dict[label] = 1
        
label_dict
Out[11]:
{0: 5000,
 1: 5000,
 2: 5000,
 3: 5000,
 4: 5000,
 5: 5000,
 6: 5000,
 7: 5000,
 8: 5000,
 9: 5000}

Let's save our work to Jovian, before continuing.

In [0]:
# !pip install jovian --upgrade --quiet
In [0]:
# import jovian
In [0]:
# jovian.commit(project=project_name, environment=None)

Preparing the data for training

We'll use a validation set with 5000 images (10% of the dataset). To ensure we get the same validation set each time, we'll set PyTorch's random number generator to a seed value of 43.

In [0]:
torch.manual_seed(43)
val_size = 5000
train_size = len(dataset) - val_size

Let's use the random_split method to create the training & validation sets

In [16]:
train_ds, val_ds = random_split(dataset, [train_size, val_size])
len(train_ds), len(val_ds)
Out[16]:
(45000, 5000)

We can now create data loaders to load the data in batches.

In [0]:
batch_size=64
In [0]:
train_loader = DataLoader(train_ds, batch_size, shuffle=True, num_workers=4, pin_memory=True)
val_loader = DataLoader(val_ds, batch_size*2, num_workers=4, pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size*2, num_workers=4, pin_memory=True)

Let's visualize a batch of data using the make_grid helper function from Torchvision.

In [19]:
for images, _ in train_loader:
    print('images.shape:', images.shape)
    plt.figure(figsize=(16,8))
    plt.axis('off')
    plt.imshow(make_grid(images, nrow=16).permute((1, 2, 0)))
    break
images.shape: torch.Size([64, 3, 32, 32])

Can you label all the images by looking at them? Trying to label a random sample of the data manually is a good way to estimate the difficulty of the problem, and identify errors in labeling, if any.

Base Model class & Training on GPU

Let's create a base model class, which contains everything except the model architecture i.e. it wil not contain the __init__ and __forward__ methods. We will later extend this class to try out different architectures. In fact, you can extend this model to solve any image classification problem.

In [0]:
def accuracy(outputs, labels):
    _, preds = torch.max(outputs, dim=1)
    return torch.tensor(torch.sum(preds == labels).item() / len(preds))
In [0]:
class ImageClassificationBase(nn.Module):
    def training_step(self, batch):
        images, labels = batch 
        out = self(images)                  # Generate predictions
        loss = F.cross_entropy(out, labels) # Calculate loss
        return loss
    
    def validation_step(self, batch):
        images, labels = batch 
        out = self(images)                    # Generate predictions
        loss = F.cross_entropy(out, labels)   # Calculate loss
        acc = accuracy(out, labels)           # Calculate accuracy
        return {'val_loss': loss.detach(), 'val_acc': acc}
        
    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        batch_accs = [x['val_acc'] for x in outputs]
        epoch_acc = torch.stack(batch_accs).mean()      # Combine accuracies
        return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
    
    def epoch_end(self, epoch, result):
        print("Epoch [{}], val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, result['val_loss'], result['val_acc']))

We can also use the exact same training loop as before. I hope you're starting to see the benefits of refactoring our code into reusable functions.

In [0]:
def evaluate(model, val_loader):
    outputs = [model.validation_step(batch) for batch in val_loader]
    return model.validation_epoch_end(outputs)

def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
    history = []
    optimizer = opt_func(model.parameters(), lr)
    for epoch in range(epochs):
        # Training Phase 
        for batch in train_loader:
            loss = model.training_step(batch)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
        # Validation phase
        result = evaluate(model, val_loader)
        model.epoch_end(epoch, result)
        history.append(result)
    return history

Finally, let's also define some utilities for moving out data & labels to the GPU, if one is available.

In [23]:
torch.cuda.is_available()
Out[23]:
True
In [0]:
def get_default_device():
    """Pick GPU if available, else CPU"""
    if torch.cuda.is_available():
        return torch.device('cuda')
    else:
        return torch.device('cpu')
In [25]:
device = get_default_device()
device
Out[25]:
device(type='cuda')
In [0]:
def to_device(data, device):
    """Move tensor(s) to chosen device"""
    if isinstance(data, (list,tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device, non_blocking=True)

class DeviceDataLoader():
    """Wrap a dataloader to move data to a device"""
    def __init__(self, dl, device):
        self.dl = dl
        self.device = device
        
    def __iter__(self):
        """Yield a batch of data after moving it to device"""
        for b in self.dl: 
            yield to_device(b, self.device)

    def __len__(self):
        """Number of batches"""
        return len(self.dl)

Let us also define a couple of helper functions for plotting the losses & accuracies.

In [0]:
def plot_losses(history):
    losses = [x['val_loss'] for x in history]
    plt.plot(losses, '-x')
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.title('Loss vs. No. of epochs');
In [0]:
def plot_accuracies(history):
    accuracies = [x['val_acc'] for x in history]
    plt.plot(accuracies, '-x')
    plt.xlabel('epoch')
    plt.ylabel('accuracy')
    plt.title('Accuracy vs. No. of epochs');

Let's move our data loaders to the appropriate device.

In [0]:
train_loader = DeviceDataLoader(train_loader, device)
val_loader = DeviceDataLoader(val_loader, device)
test_loader = DeviceDataLoader(test_loader, device)

Training the model

We will make several attempts at training the model. Each time, try a different architecture and a different set of learning rates. Here are some ideas to try:

  • Increase or decrease the number of hidden layers
  • Increase of decrease the size of each hidden layer
  • Try different activation functions
  • Try training for different number of epochs
  • Try different learning rates in every epoch

What's the highest validation accuracy you can get to? Can you get to 50% accuracy? What about 60%?

In [0]:
input_size = 3*32*32
output_size = 10

Q: Extend the ImageClassificationBase class to complete the model definition.

Hint: Define the __init__ and forward methods.

In [0]:
class CIFAR10Model(ImageClassificationBase):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(input_size, 1024)
        self.linear2 = nn.Linear(1024, 256)
        self.linear3 = nn.Linear(256, 64)
        self.linear4 = nn.Linear(64, 10)


        
    def forward(self, xb):
        # Flatten images into vectors
        out = xb.view(xb.size(0), -1)
        # Apply layers & activation functions
        out = self.linear1(out)
        out = F.relu(out)
        out = self.linear2(out)
        out = F.relu(out)
        out = self.linear3(out)
        out = F.relu(out)
        out = self.linear4(out)
        out = F.relu(out)
        return out

You can now instantiate the model, and move it the appropriate device.

In [0]:
model = to_device(CIFAR10Model(), device)

Before you train the model, it's a good idea to check the validation loss & accuracy with the initial set of weights.

In [33]:
history = [evaluate(model, val_loader)]
history
Out[33]:
[{'val_acc': 0.08515624701976776, 'val_loss': 2.304067373275757}]

Q: Train the model using the fit function to reduce the validation loss & improve accuracy.

Leverage the interactive nature of Jupyter to train the model in multiple phases, adjusting the no. of epochs & learning rate each time based on the result of the previous training phase.

In [34]:
history += fit(60, 0.02, model, train_loader, val_loader)
Epoch [0], val_loss: 2.1448, val_acc: 0.2428
Epoch [1], val_loss: 1.8800, val_acc: 0.3271
Epoch [2], val_loss: 1.7938, val_acc: 0.3658
Epoch [3], val_loss: 1.8431, val_acc: 0.3375
Epoch [4], val_loss: 1.8325, val_acc: 0.3348
Epoch [5], val_loss: 1.6583, val_acc: 0.4047
Epoch [6], val_loss: 1.7106, val_acc: 0.3762
Epoch [7], val_loss: 2.0561, val_acc: 0.3441
Epoch [8], val_loss: 1.7518, val_acc: 0.3824
Epoch [9], val_loss: 1.6794, val_acc: 0.4074
Epoch [10], val_loss: 2.5882, val_acc: 0.2203
Epoch [11], val_loss: 2.2880, val_acc: 0.2793
Epoch [12], val_loss: 1.6375, val_acc: 0.4135
Epoch [13], val_loss: 1.7419, val_acc: 0.4084
Epoch [14], val_loss: 1.6735, val_acc: 0.4018
Epoch [15], val_loss: 1.6194, val_acc: 0.4350
Epoch [16], val_loss: 1.6924, val_acc: 0.4156
Epoch [17], val_loss: 2.0971, val_acc: 0.3502
Epoch [18], val_loss: 2.4679, val_acc: 0.3209
Epoch [19], val_loss: 1.5562, val_acc: 0.4354
Epoch [20], val_loss: 1.5954, val_acc: 0.4572
Epoch [21], val_loss: 1.9976, val_acc: 0.3449
Epoch [22], val_loss: 1.7986, val_acc: 0.4445
Epoch [23], val_loss: 1.7071, val_acc: 0.4070
Epoch [24], val_loss: 1.7218, val_acc: 0.4117
Epoch [25], val_loss: 2.0925, val_acc: 0.3662
Epoch [26], val_loss: 2.1157, val_acc: 0.3715
Epoch [27], val_loss: 2.2680, val_acc: 0.3611
Epoch [28], val_loss: 2.3626, val_acc: 0.3305
Epoch [29], val_loss: 1.5499, val_acc: 0.4576
Epoch [30], val_loss: 2.0563, val_acc: 0.3492
Epoch [31], val_loss: 1.4338, val_acc: 0.5041
Epoch [32], val_loss: 1.3924, val_acc: 0.5338
Epoch [33], val_loss: 1.8142, val_acc: 0.4365
Epoch [34], val_loss: 1.6720, val_acc: 0.4625
Epoch [35], val_loss: 2.0500, val_acc: 0.4355
Epoch [36], val_loss: 1.5103, val_acc: 0.5080
Epoch [37], val_loss: 1.6159, val_acc: 0.4799
Epoch [38], val_loss: 2.1901, val_acc: 0.3707
Epoch [39], val_loss: 1.6890, val_acc: 0.4766
Epoch [40], val_loss: 2.0853, val_acc: 0.4020
Epoch [41], val_loss: 2.0444, val_acc: 0.4209
Epoch [42], val_loss: 2.2192, val_acc: 0.3928
Epoch [43], val_loss: 2.1474, val_acc: 0.3975
Epoch [44], val_loss: 2.4251, val_acc: 0.4068
Epoch [45], val_loss: 1.7933, val_acc: 0.4520
Epoch [46], val_loss: 2.5047, val_acc: 0.3846
Epoch [47], val_loss: 2.4357, val_acc: 0.3744
Epoch [48], val_loss: 1.9839, val_acc: 0.4564
Epoch [49], val_loss: 2.0247, val_acc: 0.4459
Epoch [50], val_loss: 1.9958, val_acc: 0.4361
Epoch [51], val_loss: 3.0626, val_acc: 0.3533
Epoch [52], val_loss: 2.0561, val_acc: 0.4496
Epoch [53], val_loss: 3.2849, val_acc: 0.3752
Epoch [54], val_loss: 1.8146, val_acc: 0.5195
Epoch [55], val_loss: 2.0487, val_acc: 0.4654
Epoch [56], val_loss: 2.7021, val_acc: 0.4031
Epoch [57], val_loss: 2.3594, val_acc: 0.4463
Epoch [58], val_loss: 3.9820, val_acc: 0.3424
Epoch [59], val_loss: 2.1061, val_acc: 0.4496
In [35]:
history += fit(30, 0.01, model, train_loader, val_loader)
Epoch [0], val_loss: 1.6622, val_acc: 0.5520
Epoch [1], val_loss: 1.8281, val_acc: 0.5355
Epoch [2], val_loss: 1.7503, val_acc: 0.5508
Epoch [3], val_loss: 2.0752, val_acc: 0.5174
Epoch [4], val_loss: 1.9235, val_acc: 0.5420
Epoch [5], val_loss: 2.7873, val_acc: 0.4779
Epoch [6], val_loss: 2.8117, val_acc: 0.4777
Epoch [7], val_loss: 1.8542, val_acc: 0.5500
Epoch [8], val_loss: 1.9944, val_acc: 0.5504
Epoch [9], val_loss: 2.1744, val_acc: 0.5381
Epoch [10], val_loss: 2.8530, val_acc: 0.5143
Epoch [11], val_loss: 2.0436, val_acc: 0.5535
Epoch [12], val_loss: 2.0245, val_acc: 0.5508
Epoch [13], val_loss: 2.5298, val_acc: 0.5109
Epoch [14], val_loss: 2.3207, val_acc: 0.5227
Epoch [15], val_loss: 3.0861, val_acc: 0.4658
Epoch [16], val_loss: 2.1186, val_acc: 0.5445
Epoch [17], val_loss: 2.4421, val_acc: 0.5301
Epoch [18], val_loss: 2.2425, val_acc: 0.5572
Epoch [19], val_loss: 2.5142, val_acc: 0.5305
Epoch [20], val_loss: 2.3540, val_acc: 0.5400
Epoch [21], val_loss: 2.6571, val_acc: 0.5201
Epoch [22], val_loss: 2.3879, val_acc: 0.5480
Epoch [23], val_loss: 2.3497, val_acc: 0.5494
Epoch [24], val_loss: 2.4331, val_acc: 0.5477
Epoch [25], val_loss: 2.3498, val_acc: 0.5512
Epoch [26], val_loss: 2.5937, val_acc: 0.5342
Epoch [27], val_loss: 2.5990, val_acc: 0.5326
Epoch [28], val_loss: 2.5532, val_acc: 0.5461
Epoch [29], val_loss: 2.6709, val_acc: 0.5441
In [36]:
history += fit(15, 0.002, model, train_loader, val_loader)
Epoch [0], val_loss: 2.4683, val_acc: 0.5561
Epoch [1], val_loss: 2.4832, val_acc: 0.5566
Epoch [2], val_loss: 2.4999, val_acc: 0.5549
Epoch [3], val_loss: 2.4983, val_acc: 0.5551
Epoch [4], val_loss: 2.5258, val_acc: 0.5555
Epoch [5], val_loss: 2.5252, val_acc: 0.5535
Epoch [6], val_loss: 2.5271, val_acc: 0.5559
Epoch [7], val_loss: 2.5360, val_acc: 0.5537
Epoch [8], val_loss: 2.5539, val_acc: 0.5545
Epoch [9], val_loss: 2.5515, val_acc: 0.5549
Epoch [10], val_loss: 2.5604, val_acc: 0.5518
Epoch [11], val_loss: 2.5615, val_acc: 0.5562
Epoch [12], val_loss: 2.5760, val_acc: 0.5539
Epoch [13], val_loss: 2.5745, val_acc: 0.5533
Epoch [14], val_loss: 2.5912, val_acc: 0.5547
In [37]:
history += fit(10, 0.001, model, train_loader, val_loader)
Epoch [0], val_loss: 2.5911, val_acc: 0.5551
Epoch [1], val_loss: 2.5921, val_acc: 0.5543
Epoch [2], val_loss: 2.5933, val_acc: 0.5553
Epoch [3], val_loss: 2.5947, val_acc: 0.5547
Epoch [4], val_loss: 2.5965, val_acc: 0.5547
Epoch [5], val_loss: 2.6009, val_acc: 0.5541
Epoch [6], val_loss: 2.6016, val_acc: 0.5545
Epoch [7], val_loss: 2.6080, val_acc: 0.5555
Epoch [8], val_loss: 2.6125, val_acc: 0.5541
Epoch [9], val_loss: 2.6202, val_acc: 0.5535

Plot the losses and the accuracies to check if you're starting to hit the limits of how well your model can perform on this dataset. You can train some more if you can see the scope for further improvement.

In [38]:
plot_losses(history)
In [39]:
plot_accuracies(history)

Finally, evaluate the model on the test dataset report its final performance.

In [40]:
evaluate(model, test_loader)
Out[40]:
{'val_acc': 0.5659612417221069, 'val_loss': 2.5425965785980225}

Are you happy with the accuracy? Record your results by completing the section below, then you can come back and try a different architecture & hyperparameters.

Recoding your results

As your perform multiple experiments, it's important to record the results in a systematic fashion, so that you can review them later and identify the best approaches that you might want to reproduce or build upon later.

Q: Describe the model's architecture with a short summary.

E.g. "3 layers (16,32,10)" (16, 32 and 10 represent output sizes of each layer)

In [0]:
# arch = "3 layers (1024,512,10)"

Q: Provide the list of learning rates used while training.

In [0]:
# lrs = [0.1, 0.01, 0.001, 0.0001]

Q: Provide the list of no. of epochs used while training.

In [0]:
# epochs = [60, 30, 15, 10]

Q: What were the final test accuracy & test loss?

In [0]:
# test_acc = 0.56
# test_loss = 1.71

Finally, let's save the trained model weights to disk, so we can use this model later.

In [0]:
torch.save(model.state_dict(), 'cifar10-feedforward.pth')

The jovian library provides some utility functions to keep your work organized. With every version of your notebok, you can attach some hyperparameters and metrics from your experiment.

In [0]:
# Clear previously recorded hyperparams & metrics
# jovian.reset()
In [0]:
# jovian.log_hyperparams(arch=arch, 
#                        lrs=lrs, 
#                        epochs=epochs)
In [0]:
# jovian.log_metrics(test_loss=test_loss, test_acc=test_acc)

Finally, we can commit the notebook to Jovian, attaching the hypeparameters, metrics and the trained model weights.

In [0]:
# jovian.commit(project=project_name, outputs=['cifar10-feedforward.pth'], environment=None)

Once committed, you can find the recorded metrics & hyperprameters in the "Records" tab on Jovian. You can find the saved model weights in the "Files" tab.

Continued experimentation

Now go back up to the "Training the model" section, and try another network architecture with a different set of hyperparameters. As you try different experiments, you will start to build an undestanding of how the different architectures & hyperparameters affect the final result. Don't worry if you can't get to very high accuracy, we'll make some fundamental changes to our model in the next lecture.

Once you have tried multiple experiments, you can compare your results using the "Compare" button on Jovian.

compare-example

(Optional) Write a blog post

Writing a blog post is the best way to further improve your understanding of deep learning & model training, because it forces you to articulate your thoughts clearly. Here'are some ideas for a blog post:

  • Report the results given by different architectures on the CIFAR10 dataset
  • Apply this training pipeline to a different dataset (it doesn't have to be images, or a classification problem)
  • Improve upon your model from Assignment 2 using a feedfoward neural network, and write a sequel to your previous blog post
  • Share some Strategies for picking good hyperparameters for deep learning
  • Present a summary of the different steps involved in training a deep learning model with PyTorch
  • Implement the same model using a different deep learning library e.g. Keras ( https://keras.io/ ), and present a comparision.