Gradient is none in pytorch when it shouldn't

Question

I am trying to get/trace the gradient of a variable using pytorch, where I have that variable, pass it to a first function that looks for some minimum value of some other variable, then the output of the first function is inputted to a second function, and the whole thing repeats multiple times.

Here is my code:

import torch

def myFirstFunction(parameter_current_here):
    optimalValue = 100000000000000
    Optimal = 100000000000000
    for j in range(2, 10):
        i = torch.ones(1, requires_grad=True)*j
        with torch.enable_grad():
            optimalValueNow = i*parameter_current_here.sum()
        if (optimalValueNow < optimalValue):
            optimalValue = optimalValueNow
            Optimal = i
    return optimalValue, Optimal

def mySecondFunction(Current):
    with torch.enable_grad():
        y = (20*Current)/2 + (Current**2)/10
    return y

counter = 0
while counter < 5:
    parameter_current = torch.randn(2, 2, requires_grad=True)

    outputMyFirstFunction = myFirstFunction(parameter_current)
    outputmySecondFunction = mySecondFunction(outputMyFirstFunction[1])
    outputmySecondFunction.backward()

    print("outputMyFirstFunction after backward:",
               outputMyFirstFunction)
    print("outputmySecondFunction after backward:",
               outputmySecondFunction)
    print("parameter_current Gradient after backward:",
               parameter_current.grad)

    counter = counter + 1

The parameter_current.grad is none for all iterations when it obviously shouldn't be none. What am I doing wrong? And how can I fix it?

Your help on this would be highly appreciated. Thanks a lot!

Aly

GSSwain · Accepted Answer · 2020-06-02 18:48:05Z

I have a similar experience with this. Reference: https://pytorch.org/docs/stable/tensors.html

For Tensors that have requires_grad which is True, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and so grad_fn is None.
Only leaf Tensors will have their grad populated during a call to backward(). To get grad populated for non-leaf Tensors, you can use retain_grad(). Example:

    >>> a = torch.tensor([[1,1],[2,2]], dtype=torch.float, requires_grad=True)
    >>> a.is_leaf
    True
    >>> b = a * a
    >>> b.is_leaf
    False
    >>> c = b.mean()
    >>> c.backward()
    >>> print(c.grad)
    None
    >>> print(b.grad)
    None
    >>> print(a.grad)
    tensor([[0.5000, 0.5000],
            [1.0000, 1.0000]])
    >>> b = a * a
    >>> c = b.mean()
    >>> b.retain_grad()
    >>> c.retain_grad()
    >>> c.backward()
    >>> print(a.grad)
    tensor([[1., 1.],
            [2., 2.]])
    >>> print(b.grad)
    tensor([[0.2500, 0.2500],
            [0.2500, 0.2500]])
    >>> print(c.grad)
    tensor(1.)

Kevin He · Accepted Answer · 2018-11-29 19:34:05Z

0

I'm guessing the problem is the with torch.enable_grad(): statements. After to exited the with statement, the torch.enable_grad() no longer applies and torch will clear the grads after the functions are run.

answered Nov 29, 2018 at 19:34

Kevin He

1,2508 silver badges19 bronze badges

1 Comment

MBT Over a year ago

No, this is not the behaviour of this statement. torch.enable_grad() has no effect in the given context above. Please check the docs: "Enables gradient calculation inside a no_grad context. This has no effect outside of no_grad." pytorch.org/docs/stable/…

MBT · Accepted Answer · 2018-11-29 19:41:13Z

Since it is not really clear to me what you actually want to archive, besides computing gradients for parameter_current, I just focus on describing why it doesn't work and what you can do to acutally compute gradients.

I've added some comments in the code to make it more clear what the problem is.

But in short the problem is that your parameter_current is not part of the computation of your loss resp. the tensor you call backward() on which is outputmySecondFunction.

So currently you are only computing gradients for i as you have set requires_grad=True for it.

Please check the comments, for detailes:

import torch

def myFirstFunction(parameter_current_here):
    # I removed some stuff to reduce it to the core features
    # removed torch.enable_grad(), since it is enabled by default
    # removed Optimal=100000000000000 and Optimal=i, they are not used
    optimalValue=100000000000000
    for j in range(2,10):
        # Are you sure you want to compute gradients this tensor i? 
        # Because this is actually what requires_grad=True does.
        # Just as a side note, this isn't your problem, but affects performance of the model.
        i= torch.ones(1,requires_grad=True)*j
        optimalValueNow=i*parameter_current_here.sum()
        if (optimalValueNow<optimalValue):
            optimalValue=optimalValueNow

    # Part Problem 1:
    # optimalValueNow is multiplied with your parameter_current
    # i is just your parameter i, nothing else
    # lets jump now the output below in the loop: outputMyFirstFunction
    return optimalValueNow,i

def mySecondFunction(Current):
    y=(20*Current)/2 + (Current**2)/10
    return y

counter=0
while counter<5:
    parameter_current = torch.randn(2, 2,requires_grad=True)

    # Part Problem 2:
    # this is a tuple (optimalValueNow,i) like described above
    outputMyFirstFunction=myFirstFunction(parameter_current)
    # now you are taking i as an input
    # and i is just torch.ones(1,requires_grad=True)*j
    # it as no connection to parameter_current
    # thus nothing is optimized
    outputmySecondFunction=mySecondFunction(outputMyFirstFunction[1])

    # calculating gradients, since parameter_current is not part of the computation 
    # no gradients will be computed, you only get gradients for i
    # Btw. if you would not have set requires_grad=True for i, you actually would get an error message
    # for calling backward on this
    outputmySecondFunction.backward()

    print("outputMyFirstFunction after backward:",outputMyFirstFunction)
    print("outputmySecondFunction after backward:",outputmySecondFunction)
    print("parameter_current Gradient after backward:",parameter_current.grad)

    counter=counter+1

So if you want to compute gradients for parameter_current you simply have to make sure it is part of the computation of the tensor you call backward() on, you can do so for example by changing:

outputmySecondFunction=mySecondFunction(outputMyFirstFunction[1])

to:

outputmySecondFunction=mySecondFunction(outputMyFirstFunction[0])

Will have this effect, as soon as you change it you will get gradients for parameter_current!

I hope it helps!

Full working code:

import torch

def myFirstFunction(parameter_current_here):
    optimalValue=100000000000000
    for j in range(2,10):
        i= torch.ones(1,requires_grad=True)*j
        optimalValueNow=i*parameter_current_here.sum()
        if (optimalValueNow<optimalValue):
            optimalValue=optimalValueNow

    return optimalValueNow,i

def mySecondFunction(Current):
    y=(20*Current)/2 + (Current**2)/10
    return y

counter=0
while counter<5:
    parameter_current = torch.randn(2, 2,requires_grad=True)
    outputMyFirstFunction=myFirstFunction(parameter_current)
    outputmySecondFunction=mySecondFunction(outputMyFirstFunction[0]) # changed line
    outputmySecondFunction.backward()

    print("outputMyFirstFunction after backward:",outputMyFirstFunction)
    print("outputmySecondFunction after backward:",outputmySecondFunction)
    print("parameter_current Gradient after backward:",parameter_current.grad)

    counter=counter+1

Output:

outputMyFirstFunction after backward: (tensor([ 1.0394]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 10.5021])
parameter_current Gradient after backward: tensor([[ 91.8709,  91.8709],
        [ 91.8709,  91.8709]])
outputMyFirstFunction after backward: (tensor([ 13.1481]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 148.7688])
parameter_current Gradient after backward: tensor([[ 113.6667,  113.6667],
        [ 113.6667,  113.6667]])
outputMyFirstFunction after backward: (tensor([ 5.7205]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([ 60.4772])
parameter_current Gradient after backward: tensor([[ 100.2969,  100.2969],
        [ 100.2969,  100.2969]])
outputMyFirstFunction after backward: (tensor([-13.9846]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([-120.2888])
parameter_current Gradient after backward: tensor([[ 64.8278,  64.8278],
        [ 64.8278,  64.8278]])
outputMyFirstFunction after backward: (tensor([-10.5533]), tensor([ 9.]))
outputmySecondFunction after backward: tensor([-94.3959])
parameter_current Gradient after backward: tensor([[ 71.0040,  71.0040],
        [ 71.0040,  71.0040]])

Thanks you very very much for this. The issue though is that you changed the 1 in this line to 0: outputmySecondFunction=mySecondFunction(outputMyFirstFunction[1]), while for my logic, it should be the 1 (the input to the second function is the 2nd output of the first function not the 1st one). Also, i is implicitly tied to parameter_current since optimalValueNow is calculated explicitly from parameter_current and then i is updated based on optimalValueNow using some if condition (part you omitted from my first function). Could you update your code/explanation according to all this, please?
Oh, and actually, I had a mistake in the return of the first function that I now corrected. So, the correct output of the function should be optimalValue,Optimal not what I had earlier. So, as you can see, we are updating i and Optimal (which is what we need to pass to the second function) is the updated i that implicitly depends on parameter_current as I explained in the above comment. Again, your further input would be highly appreciated!
@Aly why do you set requires_grad=True for i? This is a local variable, a new i is created every time you call the function. So in this given setup you cannot optimize it.
Your updated program has the same problem Optimal = i = torch.ones(1,requires_grad=True)*j, you are doing some further calculation on it and finally calling the loss on the result. There is NO connection to parameter_current in this calculation.
"i is implicitly tied to parameter_current" this is not the case because you call your backward on the branch of the graph related to optimal. In the way do are doing it now there it is no bug. parameter_current is just not involved in the part of the graph you call backward on. Changing parameter_current has just no effect on the result of outputmySecondFunction please check my comments on the code again. You have to go through it. You have two options, either you change it from 1 to 0 like I suggested, or alternatively you change your computation so that: -> next comment

Collectives™ on Stack Overflow

Gradient is none in pytorch when it shouldn't

3 Answers 3

Comments

1 Comment

7 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

7 Comments

Linked

Related