# **CIS 4190/5190 Fall 2023 - Homework 5**

**Before starting, you must click on the "Copy To Drive" option in the top bar. Go to File --> Save a Copy to Drive. This is the master notebook so <u>you will not be able to save your changes without copying it </u>! Once you click on that, make sure you are working on that version of the notebook so that your work is saved**

In [None]:
# Restart the runtime after running this cell everytime you open the notebook
!pip install dill

In [None]:
import random
import numpy as np
import pandas as pd
import os
import sys
import matplotlib.pyplot as plt
from numpy.linalg import *
from sklearn import preprocessing
np.random.seed(42)  # don't change this line

import base64

# **PennGrader Setup**

First, you'll need to set up the PennGrader, an autograder we are going to use throughout the semester. The PennGrader will automatically grade your answer and provide you with an instant feedback. Unless otherwise stated, you can resubmit up to a reasonable number of attempts (e.g. 100 attemptes per day). **We will only record your latest score in our backend database**.

After finishing each homework assignment, you must submit your iPython notebook to gradescope before the homework deadline. Gradescope will then retrive and display your scores from our backend database.

In [None]:
%%capture
!pip3 install penngrader --upgrade

In [None]:
from penngrader.grader import *

In [None]:
#PLEASE ENSURE YOUR PENN-ID IS ENTERED CORRECTLY. IF NOT, THE AUTOGRADER WON'T KNOW WHO
#TO ASSIGN POINTS TO YOU IN OUR BACKEND
STUDENT_ID = ...          # YOUR PENN-ID GOES HERE AS AN INTEGER#

Run the following cell to initialize the autograder. This autograder will let you submit your code directly from this notebook and immidiately get a score.

**NOTE:** Remember we store your submissions and check against other student's submissions... so, not that you would, but no cheating.

In [None]:
grader = PennGrader(homework_id = 'CIS5190_F23_HW5', student_id = STUDENT_ID)

#### **NOTE 1. Results of sections marked as "manually graded" should be submitted along with the written homework solutions.**

#### **NOTE 2. If you are running into a `__builtins__' error, it's likely because you're using a function call of the form numpy.ndarray.mean(), like a.mean(). This does not play nice with PennGrader unfortunately. Please use the function call numpy.mean(a) instead.**

# **1. [20 pts] Image Classification using CNN**

#### **Import libraries**

In [None]:
import os
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import transforms
from torchvision.transforms import ToTensor
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import matplotlib.pyplot as plt

#### **Set the random seed**

In [None]:
np.random.seed(0)
torch.manual_seed(0)

#### **Set GPU**

In [None]:
# Make sure you're using cuda (GPU) by checking the hardware accelerator under Runtime -> Change runtime type
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("We're using:", device)

#### **Download and extract the data**

In [None]:
%%capture
!pip install -U gdown

In [None]:
!gdown 1vacRphjb47LXifcd3F2xlsOTKR0R_LiF

In [None]:
%%capture
!yes N | unzip "/content/supertuxkart_data.zip" -d "/content"

## **1.1. Dataset class implementation**

In this section, you will be training, validating and testing a CNN model to classify images of objects from a car racing video game called SuperTuxKart. There are 6 classes of objects: kart is 1, pickup is 2, nitro is 3, bomb is 4 and projectile 5. The background class (all other images) is assigned the label 0. First, you need to load data in a way that PyTorch can deal with easily. We will lean on PyTorchâ€™s `Dataset` class to do this.

Complete the `STKDataset` class that inherits from `Dataset`.

1. `__init__` is a constructor, and would be the natural place to perform operations common to the full dataset, such as parsing the labels and image paths.
2. The `__len__` function should return the size of the dataset, i.e., the number of samples.
3. The `__getitem__` function should return a python tuple of (image, label). The image should be a torch.Tensor of size (3, 64, 64) and the label should be an int.

The labels of the images under a particular folder (`train/` or `val/`) are stored in the same folder as `labels.csv`. Read the `labels.csv` file using `pandas` to understand what it looks like before proceeding. There is also a `labels.csv` in the `test/` folder. That would only contain the file names of the test samples.

In [None]:
ENCODING_TO_LABELS = {0: "background",
                    1: "kart",
                    2: "pickup",
                    3: "nitro",
                    4: "bomb",
                    5: "projectile"}

LABELS_TO_ENCODING = {"background": 0,
                    "kart": 1,
                    "pickup": 2,
                    "nitro": 3,
                    "bomb": 4,
                    "projectile": 5}

In [None]:
class STKDataset(Dataset):

    def __init__(self, image_path, transform=None):
        self.image_path = image_path
        self.labels = pd.read_csv(image_path + "/labels.csv")
        self.transform = transform

    def __len__(self):

        # STUDENT TODO START: Return the number of samples in the dataset
        ...
        # STUDENT TODO END

    def __getitem__(self, idx):

        if torch.is_tensor(idx):
            idx = idx.tolist()

        # STUDENT TODO START: Create the path to each image by joining the root path with the name of the file as found in labels.csv
        img_name = ...
        # STUDENT TODO END

        # Read the image from the file path
        image = Image.open(img_name)
        # Transform the image using self.transform
        if self.transform:
            image = self.transform(image)

        if "label" in self.labels.columns:
            # STUDENT TODO START: Extract label name and encode it using the LABELS_TO_ENCODING dictionary
            label = ...
            # STUDENT TODO END
            sample = (image, label)
        else:
            sample = (image)
        return sample

In [None]:
# STUDENT TODO START: Use transforms.Compose to transform the image such that every pixel takes on a value between -1 and 1
# Hint: Refer to transforms.ToTensor() and transforms.Normalize()
transform = ...
# STUDENT TODO END

train_dataset = STKDataset(image_path="train", transform=transform)
val_dataset = STKDataset(image_path="val", transform=transform)
test_dataset = STKDataset(image_path="test", transform=transform)

#### **Visualization**

The following cell visualizes the data as a sanity check for your implementation of the `STKDataset` class.

In [None]:
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
torch.manual_seed(0)
for i in range(1, cols * rows + 1):
    sample_idx = torch.randint(len(train_dataset), size=(1,)).item()
    img, label = train_dataset[sample_idx]
    figure.add_subplot(rows, cols, i)
    plt.title(ENCODING_TO_LABELS[label])
    plt.axis("off")
    plt.imshow(img.permute(1, 2, 0)*0.5 + 0.5)
plt.show()

#### **Data loaders**

In [None]:
# STUDENT TODO START: Create data loaders for training, validation, and test sets each having a batch size of 64.
# Set shuffle to be True for training and validation data loaders, False for test data loader.
train_dataloader = ...
val_dataloader = ...
test_dataloader = ...
# STUDENT TODO END

## **1.2. CNN architecture**

Your goal is to devise a CNN that passes the threshold accuracy (80%) on the test set. You get full score (20 pts) if you get at least 80% test set accuracy and 0 if you get 30% or below. The score varies linearly between 0 and 20 for accuracies between 30% and 80%.

There are several decisions that you take in building your CNN including but not limited to:

- the number of convolutional layers
- the kernel size, stride, padding and number of out channels for each convolutional layer
- number of fully connected layers
- number of nodes in each fully connected layer

You are free to decide the architecture. To make your search easier, we recommend you to use not more than four convolutional layers and four fully connected layers. We also suggest that you use the relu activation function between the layers.

In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        # STUDENT TODO START: Create the layers of your CNN here
        ...
        # STUDENT TODO END

    def forward(self, x):
        # STUDENT TODO START: Perform the forward pass through the layers
        ...
        # STUDENT TODO END

# STUDENT TODO START: Create an instance of Net and move it to the GPU
model = ...
# STUDENT TODO END

## **1.2. Training, validation, and testing**

In [None]:
# STUDENT TODO START:
# 1. Set the criterion to be cross entropy loss
criterion = ...

# 2. Experiment with different optimizers
optimizer = ...
# STUDENT TODO END

In [None]:
train_loss, validation_loss = [], []
train_acc, validation_acc = [], []

# STUDENT TODO START:
# Note that we have set the number of epochs to be 10. You can choose to increase or decrease the number of epochs.
num_epochs = 10
for epoch in range(num_epochs):

    model.train()
    running_loss = 0.
    correct, total = 0, 0

    for i, data in enumerate(train_dataloader, 0):

        inputs, labels = data
        # 1. Store the inputs and labels in the GPU
        inputs = ...
        labels = ...

        # 2. Get the model predictions
        predictions = ...

        # 3. Zero the gradients out
        ...

        # 4. Get the loss
        loss = ...

        # 5. Calculate the gradients
        ...

        # 6. Update the weights
        ...

        running_loss += loss.item()

        _, predicted = torch.max(predictions, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    train_loss.append(running_loss / len(train_dataloader))
    train_acc.append(correct/total)

    model.eval()
    running_loss = 0.
    correct, total = 0, 0

    for i, data in enumerate(val_dataloader, 0):

        inputs, labels = data
        # 1. Store the inputs and labels in the GPU
        inputs = ...
        labels = ...

        # 2. Get the model predictions
        predictions = ...

        # 3. Get the loss
        loss = ...

        running_loss += loss.item()

        _, predicted = torch.max(predictions, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    validation_loss.append(running_loss / len(val_dataloader))
    validation_acc.append(correct/total)

    print(f"Epoch {epoch+1}:")

    print(f"Training Loss:", round(train_loss[epoch], 3))
    print(f"Validation Loss:", round(validation_loss[epoch], 3))

    print(f"Training Accuracy:", round(train_acc[epoch], 3))
    print(f"Validation Accuracy:", round(validation_acc[epoch], 3))

    print("------------------------------")

In [None]:
model.eval()

test_predictions = np.array([])

for i, data in enumerate(test_dataloader, 0):

    inputs = data
    # STUDENT TODO START:
    # 1. Store the inputs in the GPU
    inputs = ...

    # 2. Get the model predictions
    predictions = ...
    # STUDENT TODO END

    _, predicted = torch.max(predictions, 1)

    test_predictions = np.concatenate((test_predictions, predicted.detach().cpu().numpy()))

In [None]:
# PennGrader Grading Cell
grader.grade(test_case_id = 'test_cnn_predictions', answer = test_predictions)

Download the .ipynb notebook and submit on Gradescope.