PyTorch Gradient Descent

Question

I am trying to manually implement gradient descent in PyTorch as a learning exercise. I have the following to create my synthetic dataset:

import torch
torch.manual_seed(0)
N = 100
x = torch.rand(N,1)*5
# Let the following command be the true function
y = 2.3 + 5.1*x
# Get some noisy observations
y_obs = y + 2*torch.randn(N,1)

Then I create my predictive function (y_pred) as shown below.

w = torch.randn(1, requires_grad=True)
b = torch.randn(1, requires_grad=True)
y_pred = w*x+b
mse = torch.mean((y_pred-y_obs)**2)

which uses MSE to infer the weights w,b. I use the block below to update the values according to the gradient.

gamma = 1e-2
for i in range(100):
  w = w - gamma *w.grad
  b = b - gamma *b.grad
  mse.backward()

However, the loop only works in the first iteration. The second iteration onwards w.grad is set to None. I am fairly sure the reason this happens is because I am setting w as a function of it self (I might be wrong).

The question is how do I update the weights properly with the gradient information?

Robert · Accepted Answer · 2019-11-12 18:49:20Z

You should call the backward method before you apply the gradient descent.
You need to use the new weight to calculate the loss every iteration.
Create new tensor without gradient tape every iteration.

The following code works fine on my computer and gives w=5.1 & b=2.2 after 500 iterations training.

Code:

import torch
torch.manual_seed(0)
N = 100
x = torch.rand(N,1)*5
# Let the following command be the true function
y = 2.3 + 5.1*x
# Get some noisy observations
y_obs = y + 0.2*torch.randn(N,1)

w = torch.randn(1, requires_grad=True)
b = torch.randn(1, requires_grad=True)


gamma = 0.01
for i in range(500):
    print(i)
    # use new weight to calculate loss
    y_pred = w * x + b
    mse = torch.mean((y_pred - y_obs) ** 2)

    # backward
    mse.backward()
    print('w:', w)
    print('b:', b)
    print('w.grad:', w.grad)
    print('b.grad:', b.grad)

    # gradient descent, don't track
    with torch.no_grad():
        w = w - gamma * w.grad
        b = b - gamma * b.grad
    w.requires_grad = True
    b.requires_grad = True

Output:

499
w: tensor([5.1095], requires_grad=True)
b: tensor([2.2474], requires_grad=True)
w.grad: tensor([0.0179])
b.grad: tensor([-0.0576])

Interesting. So after the no_grad part we need to reset the requires_grad=True again.
You are right! Because default value of requires_grad is false
Actually, we create a new tensor “w”, and it is assigned to “original w - lr*grad”
@sachinruk You could replace w = w - gamma * w.grad with w -= gamma * w.grad. This updates w inplace and you can drop the w.requires_grad = True at every iteration.
@sachinruk But then, you have to set the gradients zero after each iteration: w.grad.zero_()

Collectives™ on Stack Overflow

PyTorch Gradient Descent

1 Answer 1

8 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Related