Working knowledge of neural networks, TensorFlow and image classification are essential tools in the arsenal of any data scientist, even for those whose area of application is outside of computer vision. Indeed, the technology of Convolutional Neural Networks (CNNs) has found applications in areas ranging from speech recognition to malware detection and even to understanding climate. This guide will help you onboard onto using these tools at lightning speed by guiding you through a fun starter project of classifying cats and dogs, and then point the way to subsequent steps to becoming an expert on image classification.
Code for this guide is available on Github.
If you haven’t registered yet to Kaggle, head on over to Kaggle and create an account. Open the settings to your account and click Create New API Token:
This will download a Kaggle API json file, which you’ll want to place at ~/.kaggle/kaggle.json
(or, for a typical Windows setup, at C:\Users\<Windows-username>\.kaggle\kaggle.json
).
The Kaggle API is a convenient way to access datasets. For more information on Kaggle’s API, see Kaggle API Github page.
If your environment lacks the Kaggle pip library, install it by running:
1pip install kaggle
Now, you can use the Kaggle APIs. In your environment, run:
1kaggle competitions download -c dogs-vs-cats
This will download the “Dogs vs. Cats” dataset.
Next, you will unzip the dataset and, for clarity, remove unneeded data.
1!unzip train.zip
2!mv train data
3!rm test1.zip sampleSubmission.csv train.zip
You now have a dataset consisting of cat and dog images.
Next, you’ll perform some data exploration. Set a variable pointing to the dataset’s location.
1DATASET_LOCATION = "data"
Collect the labels and filenames of the dataset.
1import os
2
3filenames = os.listdir(DATASET_LOCATION)
4classes = []
5for filename in filenames:
6 image_class = filename.split(".")[0]
7 if image_class == "dog":
8 classes.append(1)
9 else:
10 classes.append(0)
Read the dataset into a pandas dataframe for convenient access.
1import pandas as pd
2
3df = pd.DataFrame({"filename": filenames, "category": classes})
4df["category"] = df["category"].replace({0: "cat", 1: "dog"})
You can see you now have labels for the files:
1df.head()
In addition, you can see that the dataset is balanced, and consists of 12,500 images of cats and dogs, each:
1df.category.value_counts()
The following code block will display a random image:
1import random
2from keras.preprocessing.image import load_img
3import matplotlib.pyplot as plt
4
5sample1 = random.choice(filenames)
6image1 = load_img(DATASET_LOCATION + "/" + sample1)
7plt.imshow(image1)
Here is code to preview another datapoint:
1sample2 = random.choice(filenames)
2image2 = load_img(DATASET_LOCATION + "/" + sample2)
3plt.imshow(image2)
The sizes of the images are not uniform:
This is important because for a standard neural network classifier, the sizes of the inputs must be identical. Fortunately, this is not a problem, as you will rescale all images to the same size. Specify the desired size:
1IMAGE_WIDTH = 64
2IMAGE_HEIGHT = 64
3IMAGE_SIZE = (IMAGE_WIDTH, IMAGE_HEIGHT)
4INPUT_SHAPE = (IMAGE_WIDTH, IMAGE_HEIGHT, 1)
You now have a good sense of what the dataset consists of.
Next, you will specify the architecture of a neural network that you will use to classify the images. The architecture you will use is a simple, standard CNN meant to serve as a starting point. The library that implements the CNN is called Keras, and is a high-level API that can use lower-level neural network libraries, such as TensorFlow, under the hood.
1import keras
2from keras.models import Sequential
3from keras.layers import Dense, Dropout, Flatten
4from keras.layers import Conv2D, MaxPooling2D
5
6model = Sequential()
7model.add(Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=INPUT_SHAPE))
8model.add(Conv2D(64, (3, 3), activation="relu"))
9model.add(MaxPooling2D(pool_size=(2, 2)))
10model.add(Dropout(0.25))
11model.add(Flatten())
12model.add(Dense(128, activation="relu"))
13model.add(Dropout(0.5))
14model.add(Dense(2, activation="softmax"))
15
16model.compile(
17 loss=keras.losses.categorical_crossentropy,
18 optimizer=keras.optimizers.Adadelta(),
19 metrics=["accuracy"],
20)
The model consists of convolutional layers, max pooling layers, dense layers and dropout layers. It utilizes categorical cross-entropy as a loss function, and Adadelta as its optimizer.
To train the model on the data, and to be able to assess its performance as it trains, split the dataset into a training and testing set:
1from sklearn.model_selection import train_test_split
2
3train_df, test_df = train_test_split(df, test_size=0.20, random_state=42)
Keras also has very convenient methods to perform data augmentation and reading images from directories. Data augmentation is a procedure in which existing data is used to generate new data. For example, you can take an existing image and flip it to create another data point. That is what an ImageDataGenerator
allows you to do.
1from keras.preprocessing.image import ImageDataGenerator
2
3train_datagen = ImageDataGenerator(
4 rotation_range=15,
5 rescale=1.0 / 255,
6 shear_range=0.1,
7 zoom_range=0.2,
8 horizontal_flip=True,
9 width_shift_range=0.1,
10 height_shift_range=0.1,
11)
The flow_from_dataframe
method efficiently reads and preprocesses files of a directory.
1BATCH_SIZE = 16
2train_generator = train_datagen.flow_from_dataframe(
3 train_df,
4 DATASET_LOCATION,
5 x_col="filename",
6 y_col="category",
7 target_size=IMAGE_SIZE,
8 class_mode="categorical",
9 batch_size=BATCH_SIZE,
10 color_mode="grayscale",
11)
Create a similar generator for the test images
For illustrative purposes, create a generator for a single image and display the corresponding augmented data.
1example_df = train_df.sample(n=1)
2example_generator = train_datagen.flow_from_dataframe(
3 example_df,
4 DATASET_LOCATION,
5 x_col="filename",
6 y_col="category",
7 target_size=IMAGE_SIZE,
8 class_mode="categorical",
9 color_mode="grayscale",
10)
Plot the augmented data.
1plt.figure(figsize=(12, 12))
2for i in range(0, 15):
3 plt.subplot(5, 3, i + 1)
4 for X_batch, Y_batch in example_generator:
5 image = X_batch[0]
6 image = image.reshape(IMAGE_SIZE)
7 plt.imshow(image)
8 break
9plt.tight_layout()
10plt.show()
Note that throughout, you have greyscaled the data. The purpose of this is to reduce the size of the data, and therefore the corresponding computational burden. You have now setup the data preprocessing part of your training pipeline.
If you have access to a GPU, this is the time to enable it. A GPU accelerates most Deep Learning computations (i.e., computations for neural networks with many layers). If you are using a Google Colab notebook, click on Runtime and then on Change runtime type. Under Hardware accelerator, select GPU and then SAVE.
You now are running your notebook with a GPU enabled. Go ahead and train your model .
1EPOCHS = 10
2history = model.fit_generator(
3 train_generator,
4 epochs=EPOCHS,
5 validation_data=test_generator,
6 validation_steps=test_df.shape[0] // BATCH_SIZE,
7 steps_per_epoch=train_df.shape[0] // BATCH_SIZE,
8)
This might take a bit of time, but with the simplifications you have made, such as small greyscale images, it shouldn’t be much longer than 10 minutes. Once finished, you should see something like this:
Looking at val_accuracy
, you can see that the classifier attains about 78% accuracy on the testing set, which is a good starting point for further improvements. Looking at the accuracy
, which is the training accuracy, you can see that it stays close to the val_accuracy
. The implication is that your network is not overfitting, meaning that it is not inferring rules that aren't true. You will see suggestions for improvement in a later section.
It’s nice to see the classifier in action. Take some six samples to observe:
1NUM_SAMPLES = 6
2sample_test_df = test_df.head(NUM_SAMPLES).reset_index(drop=True)
3sample_test_datagen = ImageDataGenerator(rescale=1.0 / 255)
4sample_test_generator = sample_test_datagen.flow_from_dataframe(
5 sample_test_df,
6 DATASET_LOCATION,
7 x_col="filename",
8 y_col="category",
9 target_size=IMAGE_SIZE,
10 class_mode="categorical",
11 batch_size=BATCH_SIZE,
12 color_mode="grayscale",
13)
Predict on these samples:
1predict = model.predict_generator(sample_test_generator)
2import numpy as np
3
4predictions = np.argmax(predict, axis=-1)
And display the results:
1plt.figure(figsize=(12, 24))
2for index, row in sample_test_df.iterrows():
3 filename = row["filename"]
4 prediction = predictions[index]
5 img = load_img(DATASET_LOCATION + "/" + filename)
6 plt.subplot(6, 3, index + 1)
7 plt.imshow(img)
8 plt.xlabel(prediction)
9plt.tight_layout()
10plt.show()
As you can see, the classifier performs with some mistakes. But that's okay for a first prototype.
You have accomplished much in this guide, taking a set of images and constructing a classifier that can recognize these as images of cats or dogs. There is also a lot of exciting content to learn, from transfer learning to segmentation, in your path to expertise in image classification. Following the directions below, you will reach your destination in no time.
The following steps will yield improvements to your classifier on this problem:
Working on different problems like these will allow you to expand the variety of problems you can solve:
Happy learning!