Deep learning is one of the hottest topics in the field of machine learning and artificial intelligence. This guide will introduce you to PyTorch, a popular deep learning library from Facebook. PyTorch is positioned alongside TensorFlow from Google. Both PyTorch and TensorFlow have a common goal: training machine learning models using neural networks. But PyTorch offers a Pythonic interface to deep learning where TensorFlow is very low-level, requiring the user to know a lot about the internals of neural networks. Recently, the Keras project became part of TensorFlow, and some of the conveniences in PyTorch became available to TensorFlow users. However, Keras is higher level than even PyTorch. For many users, PyTorch may be the ideal compromise between flexibility and rapid development for training machine learning models.
A neural network in PyTorch is a class which inherits from torch.nn.Module
. The layers of the network are declared in the class initializer.
1import torch.nn as nn
2import torch.nn.functional as F
3
4class Net(nn.Module):
5 def __init__(self):
6 super(Net, self).__init__()
7 self.conv1 = nn.Conv2d(1, 32, 3, 1)
8 self.conv2 = nn.Conv2d(32, 64, 3, 1)
9 self.dropout1 = nn.Dropout2d(0.25)
10 self.dropout2 = nn.Dropout2d(0.5)
11 self.fc1 = nn.Linear(9216, 128)
12 self.fc2 = nn.Linear(128, 10)
This is from the PyTorch examples and defines a simple network for the MNIST sample data set. Notice that the layers are only created and configured in the initializer. The connections between them are left for the forward
method. This method takes the input (the image data), pushes it forward through the network, and returns a prediction.
1def forward(self, x):
2 x = self.conv1(x)
3 x = F.relu(x)
4 # ...
5 x = F.max_pool2d(x, 2)
6 x = self.dropout1(x)
7 # ...
8 x = self.fc2(x)
9 output = F.log_softmax(x, dim=1)
10 return output
Redundant lines were omitted for brevity. But you can see the variable x
always holds the current state of the prediction as it goes through the network. This is also how the activation functions are introduced.
Those familiar with Keras might be shocked at the amount of code needed to accomplish this. Many single line calls in Keras require multiple lines of code with PyTorch. But again, PyTorch gives you a level of control that Keras does not. On the other hand, PyTorch requires less code than the same task would if you were to use the lower level TensorFlow API.
Before training a PyTorch model, you must load a dataset from a DataLoader
in the torch.utils.data
module.
1train_loader = torch.utils.data.DataLoader(torchvision.datasets.MNIST(...))
Often it is helpful to leverage a GPU when training a model with a neural network. A GPU must be enabled explicitly in PyTorch.
1device = torch.device("cuda" if use_cuda else "cpu")
And then use the device
when creating a new network.
1model = Net().to(device)
The actual training requires an optimizer.
1optimizer = torch.optim.Adadelta(...)
Then iterate over the DataLoader
and train the model.
1for _, (data, target) in enumerate(train_loader):
The data
and target
must be transferred to the GPU device
.
1 data, target = data.to(device), target.to(device)
For each pass, the optimizer
gradients are zeroed out.
1 optimizer.zero_grad()
A prediction is received from the model
for the data
. This is where the forward
method is called.
1 output = model(data)
The functional
module provides implementations of loss functions. The loss function will compare the predicted output
to the expected target
value.
1 loss = F.nll_loss(output, target)
Next comes the backpropagation step.
1 loss.backward()
And finally, the optimizer updates the model.
1 optimizer.step()
After multiple passes over the training data, the model can be tested using a similar method, but without computing the gradients. Once the model is accurate enough, saving it is simple.
1torch.save(model.state_dict(), 'mnist.pt')
Obviously, this is much more code than is required for Keras. The equivalent code in Keras could be just one line. But there are other advantages to using PyTorch.
Data in PyTorch is stored in a Tensor
.
1x = torch.tensor([[2, 3], [5, 7]])
Conceptually, a tensor
is a multidimensional list that knows a few new tricks. In the data science community, these are often created with numpy
. An advantage that PyTorch has over TensorFlow is the ability to seamlessly move between tensor
and numpy.array
.
1np_array_x = x.numpy()
And you can also easily create a tensor
from a numpy.array
.
1import numpy as np
2y = np.random.randint(0, 10, size=(2, 3))
3tensor_y = torch.from_numpy(y)
In deep learning, the computational graph is similar to a flow chart. The nodes of the chart can represent operations, such as mathematical functions, or variables.
Here the computation graph would be the same as the function (a + b) / x
. In PyTorch, the computational graph is created during training. This way the graph can be tuned to the training data. Static computational graphs assume that all data has the same size and structure. Traditionally TensorFlow has used static computational graphs. TensorFlow 2.0 has added some dynamic features, but older code will still use static graphs.
The computational graph is created by a technique called automatic differentiation implemented in the autograd
module in PyTorch. During the forward pass of the network the computational graph will be created. This makes the backpropagation step a simple method call. Operations in the graph can be tracked by calling the method requires_grad_
on a Tensor
and passing True
to start tracking. When the backward
method is called during training, the gradients are calculated for each operation that is being tracked. The tracking can be turned off for an entire graph with the no_grad
method to speed up execution, for example, during testing of the model.
ONNX is a standard for persisting machine learning models. PyTorch supports exporting models to the ONNX format. Many other deep learning libraries, including TensorFlow, can import ONNX models. This way, you can take advantage of the features for training models found in PyTorch, but use the models in projects leveraging other libraries. This is especially important for transfer learning.
Interpreting the outcome of experiments is often faster when the data is visualized. TensorFlow includes a tool called TensorBoard to visualize machine learning models. It can track metrics during training, render a graph of the model, provide friendly representations of data, such as images. It also includes profiling helpers. PyTorch models can also be used with TensorBoard.
PyTorch is a good fit for projects that don't need the complexity of TensorFlow, but need more control than Keras. This doesn't mean Keras should be avoided all the time. Keras is used by professionals in both research and industry. But Keras makes assumptions that don't apply to every situation. PyTorch lets you customize neural networks to meet the requirements of your project while still taking advantage of Python language features. Thanks for reading!