PressLight (KDD 2019)

PressLight: Learning Max Pressure Control to Coordinate Traffic Signals in Arterial Network

——–

The following part of this post is a note that details on how we move from IntelliLight to PressLight. Might be a little verbose.

Coordination in traffic signals

Conventional multi-intersection control in transportation usually achieves coordination by setting a fixed offset (i.e., the time interval between the beginnings of green lights) among all intersections. Known as Green Wave, the traffic traverses the intersections along an arterial without stopping. There are even people trying to catching the Green Wave, like surfing (sounds cool!). See the below video for a better idea about the green wave.

(source: ytb)

Challenges:

In fact, it is not an easy task to provide coordination along an arterial, given traffic of opposite directions usually cannot be facilitated simultaneously. To solve this problem, some optimization-based methods are developed to minimize vehicle travel time and/or the number of stops at multiple intersections. Systems like SCATS and SCOOT also have an internal selection or optimization processes to modify cycle length, phase splits and offsets. Instead of optimizing offsets, max-pressure aims to maximize the throughput of the network so as to minimize the travel time. However, these approaches still rely on assumptions to simplify the traffic condition and do not guarantee optimal results in the real world.

Possible Solution with RL

Since recent advances in RL improve the performance on isolated traffic signal control, efforts have been made to design strategies that coordinate multiple intersections. One way is to use individual RL agents to control the traffic signals in the multi-intersection system. These methods are more scalable, since each agent makes its own decision based on the information from itself and neighboring intersections without explicit coordination. Our proposed method also follows this direction.

Lost in RL

However, when we were trying to optimize for the coordination, one question popped up: what should we use as reward and state? (actually, this is a problem we faced during the design of IntelliLight)

Looking through literature, no matter it’s about traffic signal control or recommendation, we found out that if we use the state that has correlations with reward, that would be good enough. However, since our ultimate goal is to optimize the total travel time of all vehicles, and this goal cannot be optimized directly, most of the literature uses surrogate reward signals: queue length, delay, waiting time, Plenty of them use the weighted sum of these signals. Then here comes a tricky thing: How do we define the weights of these signals?

The question kept haunting us during we are preparing IntelliLight, and we tuned carefully on the weights before we submit the paper the wrapping up the camera-ready. Then what comes to our mind next is, if the travel time cannot be optimized directly, can we optimize just one surrogate signal?

With this in mind, we turned to the transportation experts for such a signal. At first, we found out that actually there is already a mature theory developed about half a century ago given by Webster, known as Webster’s delay formula.

Webster’s Formula

Webster’s Formula is a really classic theory in the transportation area. When we found out this formula after we submitted camera-ready, I felt really anxious about whether IntelliLight could outperform it.

We implement this formula immediately and luckily found out that it is highly sensitive to parameters like our method, which is a good start to investigate.

Noticing that there IS sensitivity on parameters, we then drafted a paper criticizing our former paper and start looking into what state and reward design is suitable for controlling a single intersection. Readers can refer to this paper if interested.

Going Beyond IntelliLight

While Guanjie started looking into a single intersection scenario, I decided to focus on controlling multiple intersections. Where should I start then?

Controlling arterials seems to be a promising way. One reason is we had the opportunity to control arterials in Hangzhou City Brain Research; another reason is the domain experts informed us of another classic transportation that is used to control arterials: GreenWave

GreenWave seems to be a straightforward extension on Webster’s formula and it has a closed-form solution:

We tried hard to design our method to achieve the green wave. Since we just want to extend our single intersection method to the multi-intersection scenario, we found ourself in individual RL methods where we use the same state and reward design is IntelliLight. Then we found out that it is really difficult for our method to achieve coordination just using the number of queue length on the incoming lanes – this is total isolation among agents and the agents definitely cannot learn to set green light to incoming vehicles because they DO NOT know when will the vehicles arrive at the intersection.

Towards a global objective

We acknowledged the limitation of independent RL for traffic signal control, and the reward we are using is the queue length at that time. We proved in this paper that optimizing queue length independently is equal to optimizing global travel time if there is no queue spillback where we can treat each intersection independently. However, this assumption is rather strong and motivate us to look for advanced global reward surrogate. Then one paper rose up – MaxPressure.

The idea behind max pressure is rather straight-forward, that if the downstream intersection is full of cars, stop giving me more cars to let me block you. With this in mind, we try to figure out how to combine it with RL.

Max pressure has a nice theoretical foundation, which is what we need. We try to build a connection between MP and RL.

Max pressure is optimizing through greedy selection, where DQN is kinda greedy with future predictions. If we simply put pressure as our reward, then the RL is optimizing the future reward (discounted rewards of several timesteps).

Experiments

The experiment results show that the pressure reward seems pretty good along with the state design. We tested this method under both synthetic and real-world data, and it shows superior performance.

What’s more, it can also learn the green wave automatically under the same setting where GreenWave is the optimal solution.

For more interesting demos and discussions, please refer to our paper and poster presentation at the KDD 2019 on August 6th.

References:

[1] A Survey on Traffic Signal Control Methods

[2] Diagnosing Reinforcement Learning for Traffic Signal Control

Hua Wei

I hope.