Reinforcement Learning Battlesnake

Overview

During my internship at RBC, I collaborated with another student to develop an autonomous agent for the RBC Battlesnake competition. Our approach stood out by being the only entry to utilize deep reinforcement learning, specifically implementing OpenAI’s PPO (Proximal Policy Optimization) algorithm.

The Project

Battlesnake is a competitive programming challenge where developers create autonomous agents to play a multiplayer version of the classic game Snake. Players must navigate to collect food while avoiding collisions with walls, other snakes, and themselves.

Training Progress

Technical Implementation

Reinforcement Learning Approach

Implemented PPO (Proximal Policy Optimization) algorithm for training
Designed a custom reward function incorporating survival time, food collection, and spatial efficiency

Hybrid Control System

Developed a supervisor system using Minimax algorithm to:
- Validate moves proposed by the RL model
- Identify immediate winning conditions
- Prevent obvious fatal mistakes
Implemented state-space pruning to optimize decision-making speed

Results

Only RL-based agent in the RBC competition
Achieved 8th place globally on international leaderboards within one month of deployment
Demonstrated the viability of reinforcement learning in competitive programming

Code

The project is open source and available on GitHub: bs-lindor

Key Learnings

Practical implementation of PPO in a competitive environment
Importance of hybrid approaches combining learning-based and traditional algorithms
Techniques for optimizing real-time decision-making in game environments