2

I'm relatively new to PyTorch and am trying to reproduce an algorithm from an academic paper that approximates a term using the Hessian matrix. I've set up a toy problem so that I can compare the results of the full Hessian with the approximation. I found this gist and have been playing with it to compute the full Hessian part of the algorithm.

I am getting the error: "RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation."

I've scoured through the simple example code, documentation, and many, many forum posts about this issue and cannot find any in-place operations. Any help would be greatly appreciated!

Here is my code:

import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

torch.set_printoptions(precision=20, linewidth=180)

def jacobian(y, x, create_graph=False):
    jac = []
    flat_y = y.reshape(-1)     
    grad_y = torch.zeros_like(flat_y)     

    for i in range(len(flat_y)):         
        grad_y[i] = 1.
        grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
        jac.append(grad_x.reshape(x.shape))
        grad_y[i] = 0.
    return torch.stack(jac).reshape(y.shape + x.shape)           

def hessian(y, x):
    return jacobian(jacobian(y, x, create_graph=True), x)                                             

def f(x):                                                                                             
    return x * x

np.random.seed(435537698)

num_dims = 2
num_samples = 3

X = [np.random.uniform(size=num_dims) for i in range(num_samples)]
print('X: \n{}\n\n'.format(X))

mean = torch.Tensor(np.mean(X, axis=0))
mean.requires_grad = True
print('mean: \n{}\n\n'.format(mean))

cov = torch.Tensor(np.cov(X, rowvar=False))
print('cov: \n{}\n\n'.format(cov))

with autograd.detect_anomaly():
    hessian_matrices = hessian(f(mean), mean)
    print('hessian: \n{}\n\n'.format(hessian_matrices))

And here is the output with the stack trace:

X: 
[array([0.81700949, 0.17141617]), array([0.53579366, 0.31141496]), array([0.49756485, 0.97495776])]


mean: 
tensor([0.61678934097290039062, 0.48592963814735412598], requires_grad=True)


cov: 
tensor([[ 0.03043144382536411285, -0.05357056483626365662],
        [-0.05357056483626365662,  0.18426130712032318115]])


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-5a1c492d2873> in <module>()
     42 
     43 with autograd.detect_anomaly():
---> 44     hessian_matrices = hessian(f(mean), mean)
     45     print('hessian: \n{}\n\n'.format(hessian_matrices))

2 frames
<ipython-input-3-5a1c492d2873> in hessian(y, x)
     21 
     22 def hessian(y, x):
---> 23     return jacobian(jacobian(y, x, create_graph=True), x)
     24 
     25 def f(x):

<ipython-input-3-5a1c492d2873> in jacobian(y, x, create_graph)
     15     for i in range(len(flat_y)):
     16         grad_y[i] = 1.
---> 17         grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
     18         jac.append(grad_x.reshape(x.shape))
     19         grad_y[i] = 0.

/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
    155     return Variable._execution_engine.run_backward(
    156         outputs, grad_outputs, retain_graph, create_graph,
--> 157         inputs, allow_unused)
    158 
    159 

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2]] is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
2
  • Seems like some magic stuff happens in the C code for torch.autograd.grad... Changing the definition of f(x) from x*x to x*x*torch.ones_like(x) solves the problem. I have no idea why... Seems like a bug in PyTorch to me... Commented Apr 19, 2020 at 18:03
  • That does seem to have made it magically work. It would be nice for someone to add an explanation as to why. Commented Apr 20, 2020 at 18:41

2 Answers 2

1

I sincerely thought this was a bug in PyTorch, but after posting a bug, I got a good answer from albanD. https://github.com/pytorch/pytorch/issues/36903#issuecomment-616671247 He also pointed out that https://discuss.pytorch.org/ is available for asking questions.

The problem arises because we traverse the computation graph several times again and again. Exactly what is going on here is beyond me though...

The in place edit that your error message refers to are the obvious ones: grad_y[i] = 1. and grad_y[i] = 0.. Resuing grad_y over and over again in the computation is what causes trouble. Redefining jacobian(...) as below works for me.

def jacobian(y, x, create_graph=False):
    jac = []
    flat_y = y.reshape(-1)
    for i in range(len(flat_y)):
        grad_y = torch.zeros_like(flat_y)
        grad_y[i] = 1.
        grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
        jac.append(grad_x.reshape(x.shape))
    return torch.stack(jac).reshape(y.shape + x.shape)

An alternative, that works, but is more like black magic to me is leaving jacobian(...) as it is, and instead redefine f(x) to

def f(x):
    return x * x * 1

That works too.

Sign up to request clarification or add additional context in comments.

Comments

1

For the future readers, the RuntimeError mentionned in the title is likely to arise in a more general setting than that of the original author, for instance when moving around tensor slices and/or manipulating tensors from list comprehensions, as this was the context that led me here (first link returned by my search engine for the RuntimeError).

In order to prevent from this RuntimeError and make sure that gradient can flow smoothly, the most helpful rationale for me was mentioned in the link above (but was missing in the solution message) and it consists of using the .clone() method of torch.Tensors when moving them around (or some slices of them).

For instance:

some_container[slice_indices] = original_tensor[slice_indices].clone()

where only the original_tensor have requires_grad=True, and subsequent (potentially batched) operations will be performed on the tensor some_container.

Or:

some_container = [
    tensor.clone() 
    for tensor in some_tensor_list if some_condition_fn(tensor)
]
new_composed_tensor = torch.cat(some_container, dim=0)

2 Comments

Thank you for your guidance @Yunnosch, I have tried to make a more thorough explanation and to highlight in what way I meant to add on to the previous solution-message, I hope it is suitable? Please tell me if I am still missing a point...
It now looks much more like an answer. I am not entirely happy with "widening" the scope of the actual question. On the other hand, people coming here for the title might find help in this post. Not my idea of how a Q/A pair should be, but I accept this as an answer now. (Also, this is not my technical area, strictly speaking, so I refrain from judging, because I might be too far off...) Have fun.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.