mit weighs in on how to train your neural network

Computer scientists at MIT are spearheading an effort to make designing and training artificial neural networks a lot faster and more efficient by powering through a paradox in their implementation.

by: Greg Fish on 05.29.2019

One of the trickiest parts of creating an artificial neural network is trying to figure out exactly how many layers it needs and how many artificial neurons need to be in each layer. Consider the following exercise. Previously, we looked at how convolutional neural networks work when it comes to analyzing images. We know that the outputs would be two artificial neurons that tell us how likely a patient is to have cancer and how likely the patient is to be in the clear. But how many inputs do we need? An image from a CT scan could be thousands of pixels wide and thousands more tall. Do we cut them in 30 pixel squares? 50 pixel? 100 pixel? At what point is the resolution too high to matter? How many hidden and pooling layers do we need with how many neurons? We have to figure it out by trial and error.

If we have plenty of computing power, we’ll probably want to start with a large neural network with very high input resolution and a bunch of layers. This increases the possibility that we’ll be able to quickly make the right connections to train the network correctly, though quickly is a relative term. Bigger neural networks take a lot more time to train because each neuron in a particular layer has to process the input from every other neuron in the previous layer. If we take a 9,000 pixel by 9,000 pixel image and feed in RGB data for every pixel into the network, we’ll have 81,000,000 inputs to crunch through. Cut it into 30 pixel squares and the first hidden layer will have 90,000 inputs to process. It’s very likely that both setups will arrive at similar accuracy, but the first will take exponentially more time to train due to its immense size.

And that’s the paradox of designing neural networks. The smaller we make the neural network, the faster it will train and load. The larger we make it, the more likely it is to be accurate. One of the easiest ways to address this is to create a massive neural net, then prune it after we’ve figured out the right size to retain the target accuracy, but it comes with the downside of a long and complicated training and optimization process, which can get really expensive if you have to use a lot of cloud computing power. But there may be another way according to a new paper from MIT. What if we created big neural networks to crunch through certain types of data sets, optimized them for size and performance, then created a database of guidelines defining the best size for neural networks meant to tackle certain kinds of problems?

A thousand-fold reduction like the one in our example is very, very unlikely, but in initial tests, the researchers saw up to a 20% decrease in neural network size. While that doesn’t seem like much, remember that every additional neuron has to process every other neuron in the layer preceding its, and every neuron in the next layer has to include it in its computations, so the computing cycle savings from a few neurons fewer here and there over tens of thousands of training iterations really add up. Likewise, these guidelines will be just that, guidelines. They’re not going to design a neural network for you, you’ll have to do that based on the problem you want to solve. But they will give you a decent idea of how large it would need to be for a good chance at achieving an acceptable accuracy without wasting a whole lot of computing power or relying on the digital muscle of expensive cloud platforms.

The hope is that with a large enough repository of these guidelines, we could deploy AI across more devices and use it to tackle more problems, solving business intelligence tacks currently done on top tier virtual machines in the cloud on laptops and smartphones. In addition, if you don’t need to upload your data into the cloud to train your AI, it’s more secure and you have a greater degree of control over the implementation of your security policy. It’s also an effort that makes a lot of sense right now. Since we’ve more or less standardized the libraries used to build, train, and run neural networks, coders and software architects are focused on platforms that make sharing and leveraging them easier. And having proven guidelines to better design them in the first place will give them a huge advantage going forward.

# tech // artificial intelligence / cloud / computing

by: Greg Fish

Los Angeles-based editor and founder of Weird Things, co-host of the WoWT Podcast, ex-Soviet computer lobotomist with a graduate degree in computer science. Specializes in, but not limited to, popular science, technology, the web, and conspiracy theories. His work also appeared in Rantt, BusinessWeek, i09, HowStuffWorks, SEED, RawStory, Science To The People, Le Monde, and Discovery News/Seeker, and he has a weekly radio segment on The Shift With Shane Hewitt.

All Articles

Show Comments