When human beings see something unexpected, we do a double take. It’s a common phrase with real cognitive implications — and it explains why neural networks fail when scenes get weird.
Today’s best neural networks for object detection work in a “feed forward” manner. This means that information flows through them in only one direction. They start with an input of fine-grained pixels, then move to curves, shapes, and scenes, with the network making its best guess about what it’s seeing at each step along the way. As a consequence, errant observations early in the process end up contaminating the end of the process, when the neural network pools together everything it thinks it knows in order to make a guess about what it’s looking at.
Read more:
Hartnett, K. (2018, September 20). Machine Learning Confronts the Elephant in the Room. Quanta Magazine. https://www.quantamagazine.org/machine-learning-confronts-the-elephant-in-the-room-20180920/