[RLlib] Policy weights overwritten in self-play #16718
Labels
Comments
|
Hey @richardliaw , I saw that you changed the title, so just to make sure we are in the same page, the problem is that in the above code the weights are overwritten without local_mode = True and with local_mode = True the code works ok. |
|
@george-skal thanks for this point -- fixed it! @michaelzhiluo can you take a quick look at this issue? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment


Hi all!
I am trying a self-play based scheme, where I want to have two agents in waterworld environment have a policy that is being trained (“shared_policy_1”) and other 3 agents that sample a policy from a menagerie (set) of the previous policies of the first two agents ( “shared_policy_2”).
My problem is that I see that the weights in the menagerie are overwritten in every iteration by the current weights. The problem is not happening with ray.init(local_mode=True), but happens without local mode.
The problem seems to be that, when you call get_weights it returns a dictionary of numpy arrays with one entry per parameter in the model. It looks like what is happening is that when local_mode=True the numpy arrays in the weight dictionary are references to unique objects but when local_mode=False the numpy arrays are references to the same numpy array and the values are changing in the learn_on_batch steps. This means that you end up with a list of dictionaries that all have the references to the same numpy object so when it updates every dictionary is also updated.
Please check here https://discuss.ray.io/t/policy-weights-overwritten-in-self-play/2520 for the full discussion, the weights printed and a proposed solution.
The code is:
Thanks,
George
The text was updated successfully, but these errors were encountered: