3

I create a NN. I'm having a problem with recounting gradients. The problem is that I scalarly multiply 2 tensors u @ v and normalize one of them. It is important that gradients cannot be calculated for h. Therefore, I use detach(). In addition, during the recalculation of gradients, normalization should not be taken into account (I do not know how to do this).

import torch
from torch import nn


class Nn(nn.Module):
    def __init__(self):
        super(Nn, self).__init__()
        self.ln = nn.Linear(5, 5)

    def forward(self, x):
        v = self.ln(x)

        u = v.clone()
        h = v.clone()

        u /= u.norm()
        h = h.detach()
        h /= h.norm()

        res = torch.stack([torch.stack([u @ h, u @ h])])

        return res


def patches_generator():
    while True:
        decoder = torch.rand((5, ))
        target = torch.randint(2, (1,))
        yield decoder, target


net = Nn()

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters())

net.train()
torch.autograd.set_detect_anomaly(True)
for decoder, targets in patches_generator():
    optimizer.zero_grad()
    outputs = net(decoder)
    loss = criterion(outputs, targets)
    loss.backward()
    optimizer.step()

As a result, I get the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [9, 512, 1, 1]], which is output 0 of ReluBackward1, is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

4
  • Is this the entire code? I don't see the Relu layer the error references. Commented Aug 19, 2019 at 8:39
  • This code has problem. I reduced them Commented Aug 19, 2019 at 9:08
  • where does ReluBackward1 come from? Commented Aug 19, 2019 at 9:15
  • nvm, it's not relevant, you just probably copied the error message from your original code Commented Aug 19, 2019 at 9:18

1 Answer 1

1

The problem is the in-place division operator applied to u in this line:

u /= u.norm()

changing it to

u = u / u.norm()

makes the code run. The reason is that the in-place operator overwrites the intermediate result from this line

u = v.clone()

which makes it impossible for Pytorch to compute the gradient.

(The error message in the question contains a reference to a ReluBackward1 layer which is not in the reduced code example. Pytorch ReLU layers have an optional in_place argument which makes the operation in place while supporting backprop. This often works, because in a sequential network there is no need to distinguish between the output of the ReLU activation and the output of the weights to compute the gradient, but in more complex architectures it might be necessary to retain the output of the weights.)

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.