6th November 2023
nanoPPO v0.15 Release, bringing significant enhancements to the Proximal Policy Optimization (PPO) algorithm tailored for reinforcement learning tasks.
What's New in v0.15?
- Actor/Critic Causal Attention Policy: A new policy framework to enhance decision-making processes.
- Custom Learning Rate Scheduler: Introducing a version number and a custom scheduler for fine-tuning the learning rate during agent training.
- Gradient and Weight Inf/Nan Checks: Added safeguards against infinite and NaN values in gradients and weights to improve stability.
- Enhanced Training Mechanism: The training script now utilizes average rewards and includes a new cosine learning rate scheduler for iterative adjustment.
Additional Improvements:
- Debug flag for NAN detection in model parameters.
- Use of
torch.nn.utils.clip_grad_norm_
for gradient clipping.
Documentation:
For a full overview of the new features and improvements, please refer to the GitHub README and the detailed Changelog.