Pytorch gradients has not calculated

Question

I create a NN. I'm having a problem with recounting gradients. The problem is that I scalarly multiply 2 tensors u @ v and normalize one of them. It is important that gradients cannot be calculated for h. Therefore, I use detach(). In addition, during the recalculation of gradients, normalization should not be taken into account (I do not know how to do this).

import torch
from torch import nn


class Nn(nn.Module):
    def __init__(self):
        super(Nn, self).__init__()
        self.ln = nn.Linear(5, 5)

    def forward(self, x):
        v = self.ln(x)

        u = v.clone()
        h = v.clone()

        u /= u.norm()
        h = h.detach()
        h /= h.norm()

        res = torch.stack([torch.stack([u @ h, u @ h])])

        return res


def patches_generator():
    while True:
        decoder = torch.rand((5, ))
        target = torch.randint(2, (1,))
        yield decoder, target


net = Nn()

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters())

net.train()
torch.autograd.set_detect_anomaly(True)
for decoder, targets in patches_generator():
    optimizer.zero_grad()
    outputs = net(decoder)
    loss = criterion(outputs, targets)
    loss.backward()
    optimizer.step()

As a result, I get the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [9, 512, 1, 1]], which is output 0 of ReluBackward1, is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Is this the entire code? I don't see the Relu layer the error references. — Agost Biro
– Agost Biro, Commented Aug 19, 2019 at 8:39
nvm, it's not relevant, you just probably copied the error message from your original code — Agost Biro
– Agost Biro, Commented Aug 19, 2019 at 9:18

Agost Biro · Accepted Answer · 2019-08-19 17:01:21Z

The problem is the in-place division operator applied to u in this line:

u /= u.norm()

changing it to

u = u / u.norm()

makes the code run. The reason is that the in-place operator overwrites the intermediate result from this line

u = v.clone()

which makes it impossible for Pytorch to compute the gradient.

(The error message in the question contains a reference to a ReluBackward1 layer which is not in the reduced code example. Pytorch ReLU layers have an optional in_place argument which makes the operation in place while supporting backprop. This often works, because in a sequential network there is no need to distinguish between the output of the ReLU activation and the output of the weights to compute the gradient, but in more complex architectures it might be necessary to retain the output of the weights.)

Collectives™ on Stack Overflow

Pytorch gradients has not calculated

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related