Introduction

PyTorch has revolutionized the approach to computer vision or NLP problems. It's a dynamic deep-learning framework, which makes it easy to learn and use.

In this guide, we will build an image classification model from start to finish, beginning with exploratory data analysis (EDA), which will help you understand the shape of an image and the distribution of classes. You'll learn to prepare data for optimum modeling results and then build a convolutional neural network (CNN) that will classify images according to whether they contain a cactus or not.

Click here to download the aerial cactus dataset from an ongoing Kaggle competition. Instead of MNIST B/W images, this dataset contains RGB image channels. Hence, it is perfect for beginners to use to explore and play with CNN. It's also a chance to classify something other than cats and dogs.

To begin, import the `torch`

and `torchvision`

frameworks and their libraries with `numpy`

, `pandas`

, and `sklearn`

. Libraries and functions used in the code below include:

`transforms`

, for basic image transformations`torch.nn.functional`

, which contains useful activation functions`Dataset`

and`Dataloader`

, PyTorch's data loading utility

```
1import pandas as pd
2import matplotlib.pyplot as plt
3import torch
4import torch.nn.functional as F
5import torchvision
6import torchvision.transforms as transforms
7
8from torch.utils.data import Dataset, DataLoader
9from sklearn.model_selection import train_test_split
10
11%matplotlib inline
```

python

```
1import os
2os.getcwd()
3# place the files in your IDE working dicrectory .
4labels = pd.read_csv(r'/aerialcactus/train.csv')
5submission = pd.read_csv(r'/aerialcactus/sample_submission.csv)
6
7train_path = r'/aerialcactus/train/train/'
8test_path = r'/aerialcactus/test/test/'
```

python

`1labels.head()`

python

`1labels.tail()`

python

`1labels['has_cactus'].value_counts()`

python

```
1label = 'Has Cactus', 'Hasn\'t Cactus'
2plt.figure(figsize = (8,8))
3plt.pie(labels.groupby('has_cactus').size(), labels = label, autopct='%1.1f%%', shadow=True, startangle=90)
4plt.show()
```

python

As per the pie chart, the data is biased towards one class. Imbalanced data will affect the final results. We already have enough data for CNN to produce results, so there is no need for any data sampling or augmentation.

Images in a dataset do not usually have the same pixel intensity and dimensions. In this section, you will pre-process the dataset by standardizing the pixel values. The next required process is transforming raw images into tensors so that the algorithm can process them.

```
1import matplotlib.image as img
2fig,ax = plt.subplots(1,5,figsize = (15,3))
3
4for i,idx in enumerate(labels[labels['has_cactus'] == 1]['id'][-5:]):
5 path = os.path.join(train_path,idx)
6 ax[i].imshow(img.imread(path))
```

python

```
1fig,ax = plt.subplots(1,5,figsize = (15,3))
2for i,idx in enumerate(labels[labels['has_cactus'] == 0]['id'][:5]):
3 path = os.path.join(train_path,idx)
4 ax[i].imshow(img.imread(path))
```

python

Use the below code to standardize the image by defined mean and standard deviation because using raw image data will not give the desired results.

```
1import numpy as np
2import matplotlib.pyplot as plt
3
4def imshow(image, ax=None, title=None, normalize=True):
5 if ax is None:
6 fig, ax = plt.subplots()
7 image = image.numpy().transpose((1, 2, 0))
8
9 if normalize:
10 mean = np.array([0.485, 0.456, 0.406])
11 std = np.array([0.229, 0.224, 0.225])
12 image = std * image + mean
13 image = np.clip(image, 0, 1)
14
15 ax.imshow(image)
16 ax.spines['top'].set_visible(False)
17 ax.spines['right'].set_visible(False)
18 ax.spines['left'].set_visible(False)
19 ax.spines['bottom'].set_visible(False)
20 ax.tick_params(axis='both', length=0)
21 ax.set_xticklabels('')
22 ax.set_yticklabels('')
23
24 return ax
```

python

```
1class CactiDataset(Dataset):
2 def __init__(self, data, path , transform = None):
3 super().__init__()
4 self.data = data.values
5 self.path = path
6 self.transform = transform
7
8 def __len__(self):
9 return len(self.data)
10
11 def __getitem__(self,index):
12 img_name,label = self.data[index]
13 img_path = os.path.join(self.path, img_name)
14 image = img.imread(img_path)
15 if self.transform is not None:
16 image = self.transform(image)
17 return image, label
```

python

You can stack multiple image transformation commands in `transform.Compose`

. Normalizing an image is an important step that makes model training stable and fast. In `tranforms.Normalize()`

class, a list of means and standard deviations is sent in the form of a list. It uses this formula:

```
1train_transform = transforms.Compose([transforms.ToPILImage(),
2 transforms.ToTensor(),
3 transforms.Normalize(means,std)])
4
5test_transform = transforms.Compose([transforms.ToPILImage(),
6 transforms.ToTensor(),
7 transforms.Normalize(means,std)])
8
9valid_transform = transforms.Compose([transforms.ToPILImage(),
10 transforms.ToTensor(),
11 transforms.Normalize(means,std)])
```

python

How well the model can learn depends on the variety and volume of the data. We need to divide our data into a training set and a validation set using `train_test_split`

.

**Training dataset**: The model learns from this dataset's examples. It fits a parameter to a classifier.

**Validation dataset**: The examples in the validation dataset are used to tune the hyperparameters, such as learning rate and epochs. The aim of creating a validation set is to avoid large overfitting of the model. It is a checkpoint to know if the model is fitted well with the training dataset.

**Test dataset**: This dataset test the final evolution of the model, measuring how well it has learned and predicted the desired output. It contains unseen, real-life data.

`1train, valid_data = train_test_split(labels, stratify=labels.has_cactus, test_size=0.2)`

python

```
1train_data = CactiDataset(train, train_path, train_transform )
2valid_data = CactiDataset(valid_data, train_path, valid_transform )
3test_data = CactiDataset(submission, test_path, test_transform )
```

python

Define the values of hyperparameters.

```
1# Hyper parameters
2
3num_epochs = 35
4num_classes = 2
5batch_size = 25
6learning_rate = 0.001
```

python

Whenever you initialize the batch of images, it is on the CPU for computation by default. The function `torch.cuda.is_available()`

will check whether a GPU is present. If CUDA is present, `.device("cuda")`

will route the tensor to the GPU for computation.

```
1# CPU or GPU
2
3device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
4device
```

python

The device will use CUDA with a single GPU processor. This will make our calculations faster. If you have a CPU in your system, no problem. You can use Google Colab, which provides free GPU.

In the code below, `dataloader`

ombines a dataset and a sampler and provides an iterable over the given dataset. `dataset()`

indicates which dataset to load form the available data. For details, read this documentation.

```
1train_loader = DataLoader(dataset = train_data, batch_size = batch_size, shuffle=True, num_workers=0)
2valid_loader = DataLoader(dataset = valid_data, batch_size = batch_size, shuffle=False, num_workers=0)
3test_loader = DataLoader(dataset = test_data, batch_size = batch_size, shuffle=False, num_workers=0)
```

python

```
1import numpy as np
2import matplotlib.pyplot as plt
3
4def imshow(image, ax=None, title=None, normalize=True):
5 if ax is None:
6 fig, ax = plt.subplots()
7 image = image.numpy().transpose((1, 2, 0))
8
9 if normalize:
10 mean = np.array([0.485, 0.456, 0.406])
11 std = np.array([0.229, 0.224, 0.225])
12 image = std * image + mean
13 image = np.clip(image, 0, 1)
14
15 ax.imshow(image)
16 ax.spines['top'].set_visible(False)
17 ax.spines['right'].set_visible(False)
18 ax.spines['left'].set_visible(False)
19 ax.spines['bottom'].set_visible(False)
20 ax.tick_params(axis='both', length=0)
21 ax.set_xticklabels('')
22 ax.set_yticklabels('')
23
24 return ax
```

python

```
1trainimages, trainlabels = next(iter(train_loader))
2
3fig, axes = plt.subplots(figsize=(12, 12), ncols=5)
4print('training images')
5for i in range(5):
6 axe1 = axes[i]
7 imshow(trainimages[i], ax=axe1, normalize=False)
8
9print(trainimages[0].size())
```

python

The next step is to make a CNN model that learns ffrom the manipulated training dataset.

If you try to recognize objects in a given image, you notice features like color, shape, and size that help you identify objects in images. The same technique is used by a CNN. The two main layers in a CNN are the convolution and pooling layer, where the model makes a note of the features in the image, and the fully connected (FC) layer, where classification takes place.

Image Source: https://commons.wikimedia.org/wiki/File:Typical_cnn.png

Mathematically, convolution is an operation performed on two functions to produce a third function. Convolution is operating in speech processing (1 dimension), image processing (2 dimensions), and video processing (3 dimensions). The convolution layer forms a thick filter on the image.

The convolutional layer’s output shape is affected by the choice of kernel size, input dimensions, padding, and strides (number of pixels by which the window moves).

In this model, a 3x3 kernel size is used. It will have 27 weights and 1 bias.

This is what happens behind the CNN.

The factors that affect the convolutional layer’s output shape are the kernel size, input dimensions, padding and strides (no.of pixel by which the window moves). In this model 3x3 kernel filter is used. It will have 27 weights and 1 bias.

Similarly, carry out the calculation of layer 2.

A drawback of a convolution feature map is that it records the exact position of features. Even the smallest development in the feature map will produce different results. This problem is solved by down sampling the feature map. It will be a lower version of the image with important features intact. In this model, max pooling is used. It calculates the maximum value of each patch of the feature map.

Some brief notes about important parameters of `__init__`

model and `forward`

are stated below:

During forward propagation, *activation function* is used on each layer. The *non-linearity transformation* is introduced by the activation function. A neural network without an activation function is just a linear regression model, so it can not be ignored. Below is a list of activation functions.

```
1epochs = 35
2batch_size = 25
3learning_rate = 0.001
```

python

```
1import torch
2import torch.nn as nn
3import torch.nn.functional as F
4
5class CNN(nn.Module):
6 def __init__(self):
7 super(CNN, self).__init__()
8 self.conv1 = nn.Conv2d(in_channels=3, out_channels=10, kernel_size=3)
9 self.conv2 = nn.Conv2d(10, 20, kernel_size=3)
10 self.conv2_drop = nn.Dropout2d()
11 self.fc1 = nn.Linear(720, 1024)
12 self.fc2 = nn.Linear(1024, 2)
13
14 def forward(self, x):
15 x = F.relu(F.max_pool2d(self.conv1(x), 2))
16 x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
17 x = x.view(x.shape[0],-1)
18 x = F.relu(self.fc1(x))
19 x = F.dropout(x, training=self.training)
20 x = self.fc2(x)
21 return x
```

python

Create a complete CNN.

```
1model = CNN()
2print(model)
```

python

There are different types of losses implemented in machine learning. In this guide, *cross-entropy* loss is used. In this context, it is also known as *log loss*. Notice it has the same formula as that of likelihood, but it contains a log value.

The best thing about this function is that if the prediction is 0, the first half goes away, and if the prediction is 1, the second half drops. With this, you can estimate of where your model can go wrong while predicting the label. Changes are to be made during training to minimize the loss.

Select any one optimizer algorithm available in the `torch.optim`

package. The optimizers have some elements of the gradient descent. By changing the model parameters, like weights, and adding bias, the model can be optimized. The learning rate will decide how big the steps should be to change the parameters.

- Calculate what a small change in each weight would do to the loss function (selecting the direction to reach minima).
- Adjust each weight based on its gradient (i.e., take a small step in the determined direction).
- Keep doing steps 1 and 2 until the loss function gets as low as possible.

Here, adaptive moment estimation (Adam) is used as an optimizer. It is a blend of RMSprop and stochastic gradient descent.

Loss function and optimization go hand-in-hand. Loss function checks whether the model is moving in the correct direction and making progress, whereas optimization improves the model to deliver accurate results.

```
1model = CNN().to(device)
2criterion = nn.CrossEntropyLoss()
3optimizer = torch.optim.Adam(model.parameters(),lr = learning_rate)
```

python

```
1%%time
2# keeping-track-of-losses
3train_losses = []
4valid_losses = []
5
6for epoch in range(1, num_epochs + 1):
7 # keep-track-of-training-and-validation-loss
8 train_loss = 0.0
9 valid_loss = 0.0
10
11 # training-the-model
12 model.train()
13 for data, target in train_loader:
14 # move-tensors-to-GPU
15 data = data.to(device)
16 target = target.to(device)
17
18 # clear-the-gradients-of-all-optimized-variables
19 optimizer.zero_grad()
20 # forward-pass: compute-predicted-outputs-by-passing-inputs-to-the-model
21 output = model(data)
22 # calculate-the-batch-loss
23 loss = criterion(output, target)
24 # backward-pass: compute-gradient-of-the-loss-wrt-model-parameters
25 loss.backward()
26 # perform-a-ingle-optimization-step (parameter-update)
27 optimizer.step()
28 # update-training-loss
29 train_loss += loss.item() * data.size(0)
30
31 # validate-the-model
32 model.eval()
33 for data, target in valid_loader:
34
35 data = data.to(device)
36 target = target.to(device)
37
38 output = model(data)
39
40 loss = criterion(output, target)
41
42 # update-average-validation-loss
43 valid_loss += loss.item() * data.size(0)
44
45 # calculate-average-losses
46 train_loss = train_loss/len(train_loader.sampler)
47 valid_loss = valid_loss/len(valid_loader.sampler)
48 train_losses.append(train_loss)
49 valid_losses.append(valid_loss)
50
51 # print-training/validation-statistics
52 print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
53 epoch, train_loss, valid_loss))
```

python

```
1# test-the-model
2model.eval() # it-disables-dropout
3with torch.no_grad():
4 correct = 0
5 total = 0
6 for images, labels in valid_loader:
7 images = images.to(device)
8 labels = labels.to(device)
9 outputs = model(images)
10 _, predicted = torch.max(outputs.data, 1)
11 total += labels.size(0)
12 correct += (predicted == labels).sum().item()
13
14 print('Test Accuracy of the model: {} %'.format(100 * correct / total))
15
16# Save
17torch.save(model.state_dict(), 'model.ckpt')
```

python

```
1%matplotlib inline
2%config InlineBackend.figure_format = 'retina'
3
4plt.plot(train_losses, label='Training loss')
5plt.plot(valid_losses, label='Validation loss')
6plt.xlabel("Epochs")
7plt.ylabel("Loss")
8plt.legend(frameon=False)
```

python

Take a deep breath! A CNN-based image classifier is ready, and it gives 98.9% accuracy. As per the graph above, training and validation loss decrease exponentially as the epochs increase. The losses are in line with each other, which proves that the model is reliable and there is no underfitting or overfitting of the model.

Data preparation is the most important and time-intensive process in data science. It is a great skill to know how to play around with data in the initial stage. Getting to know your data is what makes a good data scientist. This guide is not a complete one-stop for pre-processing, but you got a brief overview.

You also learned about the layers involved in designing the CNN model, the role of loss, and optimizer functions.

Building your own neural network is a cumbersome task, and that's why *transfer learning* (taking knowledge from one situation and applying it to another) is used a lot these days. Nevertheless, it is always good to have foundational knowledge.