Lack of gradient when creating tensor from numpy

Question

Can someone please explain to me the following behavior?

import torch
import numpy as np

z = torch.tensor(np.array([1., 1.]), requires_grad=True).float()

def pre_main(z):
  return z * 3.0
  
  
x = pre_main(z)
x.backward(torch.tensor([1., 1.]))
print(z.grad)

prints:

None

Meanwhile:

import torch
import numpy as np

z = torch.tensor([1., 1.], requires_grad=True).float()

def pre_main(z):
  return z * 3.0
  
  
x = pre_main(z)
x.backward(torch.tensor([1., 1.]))
print(z.grad)

prints:

tensor([3., 3.])

Why are my gradients being destroyed when constructing from a numpy array? How do I fix this?

Ivan · Accepted Answer · 2021-10-07 14:44:42Z

Your gradient is not destroyed: grad returns None because it has never been saved on the grad attribute. This is because non-leaf tensors don't have their gradients stored during backpropagation. Hence the warning message you received when running your first snippet:

UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward().

This is the case for your z tensor when it is defined as:

>>> z = torch.tensor(np.array([1., 1.]), requires_grad=True).float()
>>> z.is_leaf
False

Compared to:

>>> z = torch.tensor([1., 1.], requires_grad=True).float()
>>> z.is_leaf
True

which means the latter will have its gradient value in z.grad.

But notice that:

>>> z = torch.tensor(np.array([1., 1.]), requires_grad=True)
>>> z.is_leaf
True

To further explain this: when a tensor is first initialized it is a leaf node (.is_leaf returns True). As soon as you apply a function on it (here .float() is an in-place operator) it is not a leaf anymore since it has parents in the computational graph.

So really, there's nothing to fix... What you can do though is make sure the gradient is saved on z.grad when the backward pass is called. So, the second question comes down to how to store/access the gradient on a non-leaf node?.

Now, if you would like to store the gradient on .backward() call. You could use retain_grad() as explained in the warning message:

z = torch.tensor(np.array([1., 1.]), requires_grad=True).float()
z.retain_grad()

Or, since we expected it to be a leaf node, solve it by using FloatTensor to convert the numpy.array to a torch.Tensor:

z = torch.FloatTensor(np.array([1., 1.]))
z.requires_grad=True

Alternatively, you could stick with torch.tensor and supply a dtype:

z = torch.tensor(np.array([1., 1.]), dtype=torch.float64, requires_grad=True)

Stephen Mylabathula · Accepted Answer · 2021-01-01 17:54:02Z

You can first convert the numpy array to a tensor, and then specify that you need the gradient:

z = torch.from_numpy(np.array([1., 1.])).float() 
z = torch.tensor(z, requires_grad=True)
  
x = pre_main(z)
x.backward(torch.tensor([1., 1.]))
print(z.grad)

A little more digging is showing me that if you remove the float conversion .float() after creating your initial tensor the gradient calculation works. You can accomplish this by specifying the dtype in the numpy array itself if feasible:

z = torch.tensor(np.array([1., 1.], dtype=np.float32), requires_grad=True)
x = pre_main(z)
x.backward(torch.tensor([1., 1.]))
print(z.grad)

Collectives™ on Stack Overflow

Lack of gradient when creating tensor from numpy

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related