i think i have the right data
but when at part6, i failed to train_model
i try to find the problem
it seems here where "features" is no "nan"
but after "mlp_extractor()", the "latent_pi" is a "nan" tensor
is there anybody know why this happen? and how to deal with it?
The text was updated successfully, but these errors were encountered:
It seems like your model has diverged and while extracting policy using log, it leads to NaN values. You can try hyperparameters optimization, maybe the current ones are suboptimal and diverging the models. The tutorial for the same is here
i think i have the right data



but when at part6, i failed to train_model
i try to find the problem
it seems here where "features" is no "nan"
but after "mlp_extractor()", the "latent_pi" is a "nan" tensor
is there anybody know why this happen? and how to deal with it?
The text was updated successfully, but these errors were encountered: