darpa: we need a formula for taming chaos

DARPA wants machines that can look at video and not just see what's in the frame, but understand it.

by: Greg Fish on 01.13.2011

When the military’s most ambitious research and development arm, DARPA, asks scientists and researchers to come up with something, you just know it’s bound to be a brow-raising project. It’s latest solicitation for an upcoming data management system seems simple enough at first glance, looking for ways to identify what deserves human attention from a vast stream of video feeds and sensor data, organizing and reducing a big pipeline of noise to a small stream of potentially important information. But they’re not interested in plain, old fashioned filtering algorithms that apply probabilistic models to clean up noise and Gaussian blur. No, that’s too easy for DARPA’s task masters. Instead, they’re asking for a mathematical formalism of stohasticity. Or in plain English: find a mathematical order to chaos. As those who reported on the solicitation rightfully note, an algorithm that can filter out random background filler in images and focus on objects of interest wouldn’t just be a really useful tool for the military and any intelligence service, but a quantum leap in AI as we know it.

Some time ago, I wrote about a few of the problems with today’s image recognition algorithms, focusing on the actual act of recognition itself. What I decided to leave out was that most vision systems are actually trying to throttle the data coming from the sensor. For example, let’s say I’m working on an algorithm to get a rough idea of how to classify an object I scanned in with a LIDAR. Along with the object, I’ll capture other things in the frame, things like a desk it may be on, random reflections from dust in the air or the walls, etc. For computers today, that scan is just a collection of numbers which a program can convert into XYZ coordinates based on a rough idea of the order in which those number are supposed to be arranged in the data stream. And because all I’ll get is a collection of numbers, I now need to write some routines to clean up the data and get rid of all random reflections from particulates in the air, walls, tables, and so on. Only now can I even start working with my object because I know I’m actually looking at the data for it. But if the computer worked like my brain, all the stuff I had to remove would’ve been discarded from the stream and I wouldn’t have to write a hundred lines of code just to clean things up a little. That ability, to look at the background and decide that’s it’s not important to the search, is what DARPA really wants and it’s an enormous challenge because we don’t really know how we do it, only that we evolved with a brain that’s extremely good at complex interpolation.

Countless experiments have shown that our brains work by hints and inference more than anything else and that so-called photographic memory is extremely rare and is usually seen in savants. Evolutionarily, it’s much better to discard random background noise from what’s really important because you can spot predators and accidents about to happen faster, make a quicker decision, and avoid becoming lunch or a bloody splotch on a hard surface. But how exactly this is done is still an open question. We know that as visual stimuli are being processed, they get more and more refined, and as we recognize shapes and colors, certain neuron clusters activate and channel where the information is supposed to go, so some sort of filtering is involved as we try to make sense of what we see. But how those decisions are made and the process of matching the final image to our conception of it in the cortices of the brain which help handle abstract thought, are still being studied. It may seem that just using artificial neuron networks would be a good approach, but that’s far easier said than done. What should be the proper activation threshold for each node when you’re trying to interpolate images that are just a row of pixels to a computer? How do you teach it to decide what it’ll keep or what it’ll remove for any and all possible images considering how many different forms background noise can take?

You could start by applying an edge detection algorithm (Sobel’s is straightforward and effective in my humble opinion), but then what if the object is turned at an angle for which you don’t have edge templates? And if you scan every possible angle of an object and start comparing it to what you get, you’d be looking at a brute force algorithm which requires multithreading to run in a reasonable time. And DARPA wouldn’t want that either. It’s very explicit about a revolutionary change in image processing rather than evolutionary. I’d bet that the solution to this problem would involve something a lot more complex than just a few formulas and that DARPA will not get it in three and a half years like it wants, but I certainly wouldn’t rule out that enough researchers competing with each other in academic journals and commercial applications wouldn’t eventually come up with at least a partial solution to this issue because there’s so much to be gained if we can make computer vision a little bit more human and build robots that actually see rather than merely parse and detect.

# tech // artificial intelligence / computer science / computer vision / military

by: Greg Fish

Los Angeles-based editor and founder of Weird Things, co-host of the WoWT Podcast, ex-Soviet computer lobotomist with a graduate degree in computer science. Specializes in, but not limited to, popular science, technology, the web, and conspiracy theories. His work also appeared in Rantt, BusinessWeek, i09, HowStuffWorks, SEED, RawStory, Science To The People, Le Monde, and Discovery News/Seeker, and he has a weekly radio segment on The Shift With Shane Hewitt.

All Articles

Show Comments