I create a NN. I'm having a problem with recounting gradients. The problem is that I scalarly multiply 2 tensors u @ v
and normalize one of them. It is important that gradients cannot be calculated for h. Therefore, I use detach()
. In addition, during the recalculation of gradients, normalization should not be taken into account (I do not know how to do this).
import torch
from torch import nn
class Nn(nn.Module):
def __init__(self):
super(Nn, self).__init__()
self.ln = nn.Linear(5, 5)
def forward(self, x):
v = self.ln(x)
u = v.clone()
h = v.clone()
u /= u.norm()
h = h.detach()
h /= h.norm()
res = torch.stack([torch.stack([u @ h, u @ h])])
return res
def patches_generator():
while True:
decoder = torch.rand((5, ))
target = torch.randint(2, (1,))
yield decoder, target
net = Nn()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters())
net.train()
torch.autograd.set_detect_anomaly(True)
for decoder, targets in patches_generator():
optimizer.zero_grad()
outputs = net(decoder)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
As a result, I get the following error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [9, 512, 1, 1]], which is output 0 of ReluBackward1, is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
ReluBackward1
come from?