why we rule the land of the blind robots
We might not be able to beat a natural language search engine or a supercomputer able to crunch every last possible move in a game of chess, but there’s one area where we easily leave just about any machine in the dust. That area? Visual recognition of course. As we saw once before, no computer vision system seems to be able to match what we can do, and even rudimentary simulations of our visual cortexes are more than a match for the top performers in digital image recognition. Why? Well, to put it simply, the brain cheats. Unlike computers, we’ve evolved a myriad of pathways in our brains to filter and process information and over eons, we’ve developed a very good structure for pattern recognition and template matching based on cues ranging from estimating the objects’ distance from us to complex feature interpolation and extrapolation. We can see things that machines can’t, whether the problem is the technology or the data format, and a study on whether the brain somehow compresses visual data sheds a light into what our brains actually do to match what our eyes see to an abstract or specific object in our minds, highlighting the role of one of our neural pathways.
Whatever you see gets transmitted to the occipital lobe in the back of our brain and analyzed by neurons in an area commonly known as V1. Your V1 neurons don’t actually see any detail or complex features. Their task is to see contrasts and identify whether objects are moving or static, and stimulate the next visual cortex, V2, for further filtering and passing on more complex patterns identified in the visual stream to V3. When visual data makes it to the V4 cortex, things get very interesting because neurons in V4 seem to have a strong response to curves and angles. Basically, one could say that V4 is doing a form of feature extraction before passing off the refined image to cortexes that do higher level template and object matching. And it’s that focus on angles and curves that attracted the attention of a neuroscience lab which simulated the behavior of V4 neurons with a computer model. Interestingly, they saw that the fewer neurons were trained to respond to the images in the training set given to them, the more they lit up when curves and acute angles were found in the pictures. The more neurons were stimulated, the more responsive the digital V4 was to flat and shallow outlines. Our V4s are actually compressing incoming visual data, concluded the study. And from what I could tell, it seems that this compression is actually helping V4 neurons perform key feature extraction and enables high-level visual data processing for the next visual cortices.
Here’s why I’m making that conclusion. One of the standard approaches to image recognition while using 2D data is to employ outline extraction algorithms. I’ve mentioned them before, and they are very effective when given a good quality image, finding usable object shapes. Then, their results are used to build algorithms for identifying key features, or matching the outlines to certain masks quantifying proportions and dimensions in their databases. Today, we generally deploy genetic algorithms to do that, trying to build associations while in the past, expert systems were a more common approach. Our brains don’t necessarily train like that, but they do base object identification on outlines and basic shapes. Remove all features from a human figure and you still know that you’re looking at a representation of a human because it has the right proportion and features, features you can see by their position and curvature against a background which gives enough contrast for a human outline to be identifiable. So when your V4 lights up as the acute angles and curves start to show up, it can stimulate neurons which respond to objects with those angles and curves, narrowing down the possible identifications for the objects you’re seeing. This means that feature extraction algorithms are actually on the right path. It’s just that their task is made more difficult by using flat images, but that’s still an ongoing area of research and gets really technical and system-specific, so I’m going to leave it at that for now.
So does this model of V4’s visual information handling capabilities take us one step closer to giving a future computer system the ability to see like we do? Sadly, no. It just elaborates the piece of a puzzle which we’ve already found a long time ago. The big problem is that brains use a lot of neurons to process and filter data, taking advantage of countless evolutionary refinements over countless generations. We’re trying to do that in systems which weren’t built to work with information like neurons, or really respond like they do because how exactly neurons store information and how they retrieve it when stimulated is still rather fuzzy to us. So when building a visual recognition algorithm, we’re actually trying to replicate processes we don’t understand at the required level of detail to truly replicate, and many of the advances we make in this area are usually based on applying statistics and using the computer’s brute computational force to come up with an answer. And if you flip your training images on their sides or look at them from a different angle, you have to do that computation all over again to get a proper response from your system. As noted above, when we go up against machines in the realm of visual recognition, we’re cheating, putting eons of ongoing evolution and hundreds of millions of neurons against decades of so far incomplete research and probabilistic trial and error…
See: Carlson, E., Rasquinha, R., Zhang, K., & Connor, C. (2011). A Sparse Object Coding Scheme in Area V4 Current Biology DOI: 10.1016/j.cub.2011.01.013