Creating tensors

A tensor is a multidimensional array, we can create them and upload to the GPU. There are varios types of tensorsin particular FloatTensor (32 bit), ByteTensors (8 bit) and LongTensors (64 bit). You can also create numpy arrays and convert them into tensors

import torch
import numpy as np

# Create a floar tensor
a = torch.FloatTensor([2,3])
print(a)

# Create a tensor of zeros with the same shape as a
a.zero_()

# Create an array in numpy and put it into a tensor
n = np.zeros(3,2)
b = torch.tensor(n)

Tensors in GPU

It is fairly straightforward, just create the tensors and send them to the device GPU. Use the method .to(device) to make a copy. If you are sure your are going to send them to cuda, use .cuda().

Adding gradients

Each created tensor has several attributes related to gradients:

  • grad: Holds a tensor of the same shape with the gradients.
  • is_leaf=True: Indicates if the tensor was constructed by the user or is a result of an operation.
  • requires_grad=True: Indicates if the tensor requires the gradient to be calculated. By default, the constructor has this set to False.

To make this clear, let’s look at the following example of a simple neural network. To calculate all the gradients in the graph, you can use the .backward() method of the gradient.

import torch

# Define the tensors
v1=torch.tensor([1.0, 1.0],requires_grad=True)
v2=torch.tensor([2.0, 2.0])

# Create the graph
v_sum=v1 + v2
v_res=(v_sum*2).sum()

# Shall be true, due to inheritance
print(v_res.requires_grad)
# Compute the gradients
v_res.backward()
print(v1.grad)

Creating NNs in pytorch

From the torch.nn package, we have a ton of predefined classes providing the basic functionality of neural networks. In the code bellow we use nn.Linear(2, 5) to construct a layer with 2 inputs and 5 outputs, with all the weights properly initialized. Some other useful methods include:

  • .parameters(): Returns the weights.
  • .zero_grad(): Initializes all weights of the object to zero.
  • .to(device): Sends the network to CUDA.
  • .state_dict(): Retrieves the state dictionary of the model.
  • .load_state_dict(): Useful for loading and saving different neural network states.
  import torch.nn as nn
   import torch

   # Create a sample tensor
   v = torch.FloatTensor([1,2])

   # Create a NN with one layer of 2 inputs and 5 outputs
   layer = nn.Linear(2,5)

   # Pass the tensor to the layer and get the output
   print(layer(v))

The sequential classes

Allows to combine several layers into a single call, here we created a 3 layer neural network with ReLu activation and dropout.

 s = nn.Sequential(
      nn.Linear(2,5),
      nn.ReLU(),
      nn.Linear(5,20),
      nn.ReLU(),
      nn.Linear(20,10),
      nn.ReLU(),
      nn.Dropout(p=0.3),
      nn.SoftMax(dim=1)
  )

Loss Functions

Loss functions define our training objective. They evaluate how well or poorly our model is performing, or in simple terms, how close the network’s prediction is to the desired result. We have a variety of loss functions to choose from, all included in the nn module:

  • nn.MSELoss: Mean squared error.
  • nn.BCELoss: Binary cross-entropy, used in classification.
  • nn.CrossEntropyLoss: The widely used maximum likelihood criterion.

Optimizers

Optimizers adjust the model parameters based on the gradients with respect to the loss function to minimize it. They are all part of the torch.optim package. The main optimizers include:

  • SGD: Stochatic gradient descent
  • RMSprop
  • Adagrad: An adaptive optimizer.

A sample training loop

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# Define the dataset
def generate_data(num_samples=100, num_features=2):
    # Randomly generate input data and labels
    X = np.random.rand(num_samples, num_features)
    y = (np.sum(X, axis=1) > 1).astype(np.float32)  # Binary classification: sum > 1
    return torch.tensor(X, dtype=torch.float32), torch.tensor(y, dtype=torch.float32)

# Define the neural network
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.pipe = nn.Sequential(
            nn.Linear(2, 5),
            nn.ReLU(),
            nn.Linear(5, 1),
            nn.Sigmoid()  # Output layer for binary classification
        )

    def forward(self, x):
        return self.pipe(x)

# Initialize data, model, loss function, and optimizer
X, y = generate_data()
# Create the NN from the class from above
model = SimpleNN()
# Binary Cross-Entropy Loss
loss_function = nn.BCELoss()  

# Create the optimizer 
optimizer = optim.SGD(model.parameters(), lr=0.01)  

# Training loop
num_epochs = 20
batch_size = 10
num_batches = len(X) // batch_size

for epoch in range(num_epochs):
    epoch_loss = 0.0
    for i in range(num_batches):
        # Get batch data
        batch_start = i * batch_size
        batch_end = batch_start + batch_size
        batch_X = X[batch_start:batch_end]
        batch_y = y[batch_start:batch_end]

        # Forward pass
        outputs = model(batch_X)
        loss = loss_function(outputs.squeeze(), batch_y)

        # Backward pass
        optimizer.zero_grad()  # Reset gradients
        loss.backward()  # Compute gradients
        optimizer.step()  # Update parameters

        # Accumulate batch loss
        epoch_loss += loss.item()

    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {epoch_loss/num_batches:.4f}")

# Save the model
torch.save(model.state_dict(), "simple_nn_model.pth")

# Load the model (optional)
model.load_state_dict(torch.load("simple_nn_model.pth"))
model.eval()  # Switch to evaluation mode