Investigations into Multi-Agent Reinforcement Learning Models of Fish Schools

Project Team

Students

Zhongrui Sun
Computer Science
Penn State Abington

Faculty Mentors

Nathan Wagenhoffer
Penn State Abington
Computer Science

Mariantonieta Gutierrez Soto
Penn State University Park
School of Engineering Design and Innovation

Project

Project Video

Project Abstract

In this study, we extend the finite dipole self-propelled swimming fluid simulation environment to accommodate multiple agents, aiming to enhance the decision-making capabilities of the agents and overall complexity. Previously focused on a single agent, the environment has been adapted to simulate the interactions of multiple self-propelled dipoles in an inviscid fluid. The transformation process involved the modification of the existing simulation functions into arrays to allow for the inclusion and interaction of multiple entities simultaneously. The alteration has consequently introduced the capability of investigating multi-agent dynamics within the fluid simulation environment. Once the multi-agent environment was established, we integrated two reinforcement learning algorithms, Deep Q-Networks (DQN) and Advantage Actor-Critic (A2C), to guide the behavior of our simulated swimmers. The motivation to employ DQN and A2C stems from their renowned capability to handle large state spaces and complex behaviors, qualities that were of the essence given the added complexity of a multi-agent environment. DQN was chosen as it uses a neural network to approximate the Q-function, enabling the agent to generalize to unseen states and navigate larger state-action spaces. This method proved particularly valuable in our multi-agent scenario, where interactions between swimmers add to the complexity of the environment. On the other hand, A2C was employed due to its dual architecture involving an actor and a critic network. The actor-network directs the selection of actions, while the critic provides estimates of state values to guide the actor’s training. This on-policy algorithm was beneficial in managing the complex interactions within our simulation, refining the policy to optimize the multi-agent dynamics. The findings from this research could directly inform the design of more efficient path-planning algorithms for autonomous underwater vehicles (AUVs). Improved path planning algorithms could lead to more efficient AUV operations, reducing energy usage and increasing the viability of AUVs for tasks such as oceanographic data collection, marine life monitoring, and undersea search missions.

Evaluate this Project

Use this form link to provide feedback to the presenters, and add your project evaluation for award(s) consideration.