CoLight: Learning Network-level Cooperation for Traffic Signal Control

So, for anyone following this story, we are finally here for the network-level control of traffic signals!

1. One intersection, a series of intersections, a network of intersections.

Before controlling a network of signals, we have investigated on the individual intersection (IntelliLight, LIT, FRAP) & intersections along arterial (PressLight).

In IntelliLight, we just formulate the RL problem and look into the sample-imbalance problem. In LIT, we investigate which part of the RL is influencing the control and try to build connections between traditional transportation methods and RL method. FRAP looks into the sample efficiency problem in RL-based traffic signal control – since it’s a real-world control problem and we could not afford that much trial-and-error.

In PressLight, we move forward from a single intersection to multiple intersections and try to simplify the reward and state design as well as to achieve coordination between intersections. The idea of pressure is a really good example of building transportation ideas with RL design. However, in PressLight, we are still relying on the individual RL methods – just the state and reward design makes a single agent can implicitly incorporate coordination among each other.

In CoLight, we move forward again to network-level control from no explicit coordination to direct communication between agents.

2. Communication for cooperation

Coordination could benefit signal control for multi-intersection scenarios. Since recent advances in RL improve the performance on isolated traffic signal control, efforts have been performed to design strategies that cooperate with multi-agent Reinforcement Learning (MARL) agents. Usually, there are several ways to achieve coordination:

(from A Survey on Traffic Signal Control Methods)

Our method lies in the category of independent RL with communication. So, here are two questions: 1. why do we choose this way? 2. What’ s new here?

3. Why independent RL with communication?

3.1 Centralized Control

3.2 Individual RL without any coordination

3.2 Individual RL by concatenating neighborhood information into local observation

4. New stuff in CoLight

In CoLight, we have several new things that distinguish ourselves:

Dynamic communication and index-free model learning. This is done through graph attention neural network. For people who are not familiar with this, we refer to this paper. Here we point out that the dynamic communication and index-free model learning is really important.
- Dynamic communication: Existing studies tend to select the traffic conditions from adjacent intersections and directly concatenate them with the traffic condition of the target intersection, neglecting the fact that the traffic is changing both temporally and spatially.
- Index-free model learning with parameter sharing. Parameter sharing is a commonly-used technique for multi-agent scenarios especially when the number of agents is large. This can greatly reduce the parameters to learn. But in the traffic signal control scenario, sharing models among different intersection could be wrong. If we simply take neighboring information as part of the state for the communication purpose, the index of the concatenation on features in observation matters. Say, intersection A and B both have four neighbors (East, West, South, North), their neighbors’ are concatenated in the order of EWSN. If A is influenced mostly by its East neighbor and B is seldom influenced by its East neighbor, what would happen? The model shared by A and B will not differentiate the influence of east neighbor, right? Therefore, in this paper, we propose, other than to model the influence of each neighbor, how about we model the overall influence of all neighbors? This sort of like the mean-field idea.
Large-scale network experiment. While we were surveying existing methods for multi-intersection control, it is surprising that NONE of the existing methods conduct experiments on the network with more than 100 intersections (with coordination). This means that traditional methods do have some scalability issue. In this paper, we conduct experiments on a large network that contains 196 intersections of real-world data (formerly we use 48 intersections in real-world data). For a detailed comparison of the methods and their scales, please refer to this survey.

5. Experiments and interesting findings

Here are the experiments for synthetic and real-world data. New York data has 196 intersections in the network, Hangzhou has 16, and Jinan has 12. All of the real-world data has dynamic traffic.

We also have some interesting findings, basically tells us which intersection is important spatially. In the following example, we have three intersections along an arterial.

We wonder, which intersection influences intersection#0 most? which influences #1 most? and which influences #2 most?

The answer given by the learned attention is pretty obvious: All the intersections are mostly influenced by themselves – of course the policy of one intersection is mostly influenced by the traffic condition itself! While we also observed that intersection#1 is more influenced by intersection#0 than by #intersection#2, this is because intersection#0 is upstream, and the traffic from #0 will influence the #1 in most cases.

We also have some findings on spatial attention in the real-world scenario and temporal attention. If you are interested, please refer to our paper:

CoLight: Learning Network-level Cooperation for Traffic Signal Control

Hua Wei

I hope.

CoLight (CIKM 2019)