PyTorch Linear Regression Model

PyTorch
Tensors
Linear Regression Model

PyTorch

PyTorch is a scalable and multiplatform programming interface for implementing and running machine learning algorithms. PyTorch is primarily developed at Facebook AI Research lab. Many machine learning researchers and practitioners from academia and industry have adapted PyTorch to develop deep learning solutions, such as Tesla Autopilot, Uber’s Pyro, and Hugging Face’s Transformers.

Tensors

Tensors are the fundamentals data structures used in machine learning, in context of data science tensors are multi-dimensional arrays of numbers that represent complex data.

Dimensions of Tensors in PyTorch. photo from learnopencv

1D Tensor

import torch
import numpy as np

# Create a 1D tensor (vector)
tensor_1d = torch.tensor([1, 2, 3, 4, 5])
print(tensor_1d)

tensor([1, 2, 3, 4, 5])

2D Tensor

# Create a 2D tensor (matrix)
tensor_2d = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(tensor_2d)

tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

3D Tensor

tensor_3d = torch.tensor([[[1, 2, 3], [4, 5, 6]],
                          [[7, 8, 9], [10, 11, 12]]])

tensor([[[ 1, 2, 3], [ 4, 5, 6]], [[ 7, 8, 9], [10, 11, 12]]])

Linear Regression Model

At its core, linear regression seeks to establish a linear relationship between a dependent variable (often denoted as ( y )) and one or more independent variables (denoted as ( x )). The simplest form, simple linear regression, involves a single independent variable and is represented by the equation:

\[y = \beta_0 + \beta_1 x + \epsilon\]

where:

$( y )$ is the dependent variable.
$( x )$ is the independent variable.
$( \beta_0 )$ is the y-intercept of the regression line.
$( \beta_1 )$ is the slope of the regression line.
$( \epsilon )$ is the error term, accounting for the variability in ( y ) that ( x ) cannot explain.

In multiple linear regression, the model extends to include multiple independent variables, and the equation becomes:

\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n + \epsilon\]

where:

$( y )$ is the dependent variable.
$( x_1, x_2, \ldots, x_n )$ are the independent variables.
$( \beta_0 \ )$ is the y-intercept of the regression plane.
$( \beta_1, \beta_2, \ldots, \beta_n )$ are the coefficients representing the impact of each independent variable.
$( \epsilon )$ is the error term.

The goal of linear regression is to determine the optimal values of $( \beta_0, \beta_1, \ldots, \beta_n )$ that minimize the sum of squared differences between the observed values and the values predicted by the model, known as the residual sum of squares (RSS). This optimization is typically performed using methods such as Ordinary Least Squares (OLS).

Linear regression provides a straightforward yet powerful approach to predictive modeling, making it a cornerstone technique in the field of data science.

import torch
import numpy as np
import matplotlib.pyplot as plt

X_train = np.arange(10, dtype="float32").reshape((10, 1))
y_train = np.array([1.0, 1.5, 3.5, 8, 5.1,
                    6.3, 6.6,7.4, 8.7, 9.0])

plt.plot(X_train, y_train, '*', markersize=8)
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Scatter plot

we will standardize the features (mean centering and dividing by the standard deviation)

from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader

X_train_norm = (X_train - np.mean(X_train)) / np.std(X_train)
X_train_norm = torch.from_numpy(X_train_norm)
y_train = torch.from_numpy(y_train)
train_ds = TensorDataset(X_train_norm, y_train)
batch_size = 1
train_dl = DataLoader(train_ds, batch_size, shuffle=True)

Now, we can define our model for linear regression as $z = wx + b$. Here we will use the torch.nn module as it provides predefined layer for building complex NN models and use stochastic gradient descent as the optimizer from torch.optim modules.

torch.manual_seed(1)
weight = torch.randn(1)
weight.requies_grad_()
bias = torch.zeros(1, requies_grad=True)

Here we initialized weight and bias vector. After defining the model we need to define loss function that we want to minimize to find optimal weight. We will choose mean squared error (MSE) as our loss function.

loss_fn = nn.MSELoss(reduction='mean')
input_size, output_size = 1, 1
learning_rate = 0.01
num_epochs = 200

model = nn.Linear(input_size, output_size)
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Note, we used torch.nn.Linear class for linear layer to perform $z = wx + b$ calculation. Now, we can call step() method of the optimizer to train the model and pass batched dataset we created above.

for epoch in range(num_epochs):
    for x_batch, y_batch in train_dl:
        # 1. Generate predictions
        pred = model(x_batch)[:, 0]

        # 2. Calculate loss
        loss = loss_fn(pred, y_batch)

        # 3. Compute gradients
        loss.backward()

        # 4. Update parameters using gradients
        optimizer.step()

        # 5. Reset the gradients to zero
        optimizer.zero_grad()

    if epoch % log_epochs==0:
        print(f'Epoch {epoch} Loss {loss.item():.4f}')

After the model is trained we can check the weight and bias parameters

print("Final Paremeters:", model.weight.item(), model.bias.item())

Final Paremeters: 2.419365882873535 5.718400955200195

For the test data, we created NumPy array of values evenly spaced between 0 and 9 and apply the same standardization features as the train dataset.

# Test data
X_test = np.linspace(0, 9, num=100, dtype='float32').reshape(-1, 1)
X_test_norm = (X_test - np.mean(X_train)) / np.std(X_train)
X_test_norm = torch.from_numpy(X_test_norm)

# Prediction
y_pred = model(X_test_norm).detach().numpy()

# Plot
fig = plt.figure(figsize=(13, 5))
ax = fig.add_subplot(1, 2, 1)
plt.plot(X_train_norm, y_train, 'o', markersize=8)
plt.plot(X_test_norm, y_pred, '--', lw=3)
plt.legend(['Training examples', 'Linear reg.'])
ax.set_xlabel('x', size=15)
ax.set_ylabel('y', size=15)
ax.tick_params(axis='both', which='major')
plt.show()

The Linear regression model fits the data