drl-algorithms

Hi,
I am currently using your FinRL_PortfolioAllocation_NeurIPS_2020 code and I have some strange behavior at the beginning of training. Sometimes the first episode reward mean value is super high and then drops during the training as shown on the tensorboard plot. This high value is never reached again. Any idea why this is happening ?

Edit: I'm training PPO agent from stablebaselines3 wit