why we rule the land of the blind robots

February 18, 2011 — 3 Comments

We might not be able to beat a natural language search engine or a supercomputer able to crunch every last possible move in a game of chess, but there’s one area where we easily leave just about any machine in the dust. That area? Visual recognition of course. As we saw once before, no computer vision system seems to be able to match what we can do, and even rudimentary simulations of our visual cortexes are more than a match for the top performers in digital image recognition. Why? Well, to put it simply, the brain cheats. Unlike computers, we’ve evolved a myriad of pathways in our brains to filter and process information and over eons, we’ve developed a very good structure for pattern recognition and template matching based on cues ranging from estimating the objects’ distance from us to complex feature interpolation and extrapolation. We can see things that machines can’t, whether the problem is the technology or the data format, and a study on whether the brain somehow compresses visual data sheds a light into what our brains actually do to match what our eyes see to an abstract or specific object in our minds, highlighting the role of one of our neural pathways.

Whatever you see gets transmitted to the occipital lobe in the back of our brain and analyzed by neurons in an area commonly known as V1. Your V1 neurons don’t actually see any detail or complex features. Their task is to see contrasts and identify whether objects are moving or static, and stimulate the next visual cortex, V2, for further filtering and passing on more complex patterns identified in the visual stream to V3. When visual data makes it to the V4 cortex, things get very interesting because neurons in V4 seem to have a strong response to curves and angles. Basically, one could say that V4 is doing a form of feature extraction before passing off the refined image to cortexes that do higher level template and object matching. And it’s that focus on angles and curves that attracted the attention of a neuroscience lab which simulated the behavior of V4 neurons with a computer model. Interestingly, they saw that the fewer neurons were trained to respond to the images in the training set given to them, the more they lit up when curves and acute angles were found in the pictures. The more neurons were stimulated, the more responsive the digital V4 was to flat and shallow outlines. Our V4s are actually compressing incoming visual data, concluded the study. And from what I could tell, it seems that this compression is actually helping V4 neurons perform key feature extraction and enables high-level visual data processing for the next visual cortices.

Here’s why I’m making that conclusion. One of the standard approaches to image recognition while using 2D data is to employ outline extraction algorithms. I’ve mentioned them before, and they are very effective when given a good quality image, finding usable object shapes. Then, their results are used to build algorithms for identifying key features, or matching the outlines to certain masks quantifying proportions and dimensions in their databases. Today, we generally deploy genetic algorithms to do that, trying to build associations while in the past, expert systems were a more common approach. Our brains don’t necessarily train like that, but they do base object identification on outlines and basic shapes. Remove all features from a human figure and you still know that you’re looking at a representation of a human because it has the right proportion and features, features you can see by their position and curvature against a background which gives enough contrast for a human outline to be identifiable. So when your V4 lights up as the acute angles and curves start to show up, it can stimulate neurons which respond to objects with those angles and curves, narrowing down the possible identifications for the objects you’re seeing. This means that feature extraction algorithms are actually on the right path. It’s just that their task is made more difficult by using flat images, but that’s still an ongoing area of research and gets really technical and system-specific, so I’m going to leave it at that for now.

So does this model of V4′s visual information handling capabilities take us one step closer to giving a future computer system the ability to see like we do? Sadly, no. It just elaborates the piece of a puzzle which we’ve already found a long time ago. The big problem is that brains use a lot of neurons to process and filter data, taking advantage of countless evolutionary refinements over countless generations. We’re trying to do that in systems which weren’t built to work with information like neurons, or really respond like they do because how exactly neurons store information and how they retrieve it when stimulated is still rather fuzzy to us. So when building a visual recognition algorithm, we’re actually trying to replicate processes we don’t understand at the required level of detail to truly replicate, and many of the advances we make in this area are usually based on applying statistics and using the computer’s brute computational force to come up with an answer. And if you flip your training images on their sides or look at them from a different angle, you have to do that computation all over again to get a proper response from your system. As noted above, when we go up against machines in the realm of visual recognition, we’re cheating, putting eons of ongoing evolution and hundreds of millions of neurons against decades of so far incomplete research and probabilistic trial and error…

See: Carlson, E., Rasquinha, R., Zhang, K., & Connor, C. (2011). A Sparse Object Coding Scheme in Area V4 Current Biology DOI: 10.1016/j.cub.2011.01.013

Share
  • ajollynerd

    I see the point you’re making, but just because we can’t get computers to see exactly like we do doesn’t mean we can’t get them to see in ways that are analogous. There is more than one way to separate a feline from its pelt.

    For a completely biological example, look at the different ways in which humans have acclimatized to living at extreme (over 13000ft) altitude. In Tibet, they have developed a gene that improves absorption and metabolism of oxygen. In South America, their bodies simply produce more red blood cells. Two different methods for achieving the same result.

    I think we can use algorithms and raw computational power to do something analogous to what our brains do with neurones (though I can’t say exactly how, as I’m not a computer scientist), and that the only real barriers at this point are time and money.

  • Greg Fish

    “just because we can’t get computers to see exactly like we do doesn’t mean we can’t get them to see in ways that are analogous.”

    No, it doesn’t, which is why this wasn’t the argument I was making. My point was that a human brain is much better equipped for image recognition than a computer and that our brains have been fine tuned by evolution to do things that we’re just now starting to understand. And to even do something analogous, we still need to have at least some idea of how we do it ourselves. Like I said, how we can train computers to see tends to be a very, very technical and elaborate debate, especially when talking about real-time, frame by frame visual processing.

  • MutantBuzzard

    won’t last, eventualy technology will prove to be supieror in all ways, humans are making them selves obsloet