i, for one, welcome our robot scientist overlords
A relatively recent post on Wired Science tries to paint an interesting new picture for the relationship between scientists and computers armed with advanced data parsing and analytical software. This new generation of artificial intelligence systems is supposed to crunch data from millions of scientific papers and form new and interesting hypotheses for future experiments by finding patterns in the data they’re fed. And not only that, but they supposedly function with inhuman logic that scientists will have to strive to figure out if they want to do the experiment correctly in a reversal of the usual role between humans and machines, with computers telling the humans what to do while the organisms try to make sense of their complicated instructions. But honestly, I’m not worried that my research into AI is going to be taken over by a supercomputer and neither should you give too much weight to the role of new data mining algorithms in scientific ventures for the foreseeable future.
When reading the post, I immediately thought back to a futurist’s prediction of something like that happening, calling the machines performing this data crunching “a super-intelligence,” and the problems this idea faced and still faces in the real world. Obviously, with countless papers being published every year, it’s tempting to use a shortcut to keeping on top of the many studies one has to read to stay current. However, as Wired duly notes, not all these papers are necessarily fountains of wisdom. Thanks to the many specialty journals and vanity or highly speculative publications, as well as the soaring requirements for how many papers any future scientist should have on his or her CV, on top of the many new fields and sub-fields in scientific disciplines, we’ve had an explosion of papers. But properly properly peer-reviewing them all is a tall order and despite a growing awareness of how today’s publications are letting bad papers slip through the cracks and deny good ones on rather arbitrary measures, a lot of purely speculative and very possibly flawed work makes it through. Feeding a big stack of more or less random papers to an AI system to spit out a potential experiment could easily turn into an interesting case study in how to create a GIGO model: garbage in, garbage out.
And when we use computers to find great data or interesting ideas buried in a paper that didn’t get the media coverage it deserved, the way that computers choose to treat it depends on what other scientists thought was a good paper when programming their data mining machine. Depending on their settings, they could miss a great piece of work, assign way too much importance to papers that almost certainly won’t give them serious breakthroughs, or unwittingly create a hodgepodge of major leads and muddled messes. It’s not that their AIs will sound like cryptic oracles, its that they’ve coughed up a data soup with some potentially questionable and arbitrary ingredients. Knowing full well how the last scenario happens from seeing it on a nearly daily basis, this particular phrasing from Wired’s writers really rubs me the wrong way…
Programmers have turned computers from extraordinarily powerful but fundamentally dumb tools, into tools with smarts. Artificially intelligent programs make sense of data so complex that it defies human analysis. They even come up with hypotheses, the testable questions that drive science, on their own.
The reason why computers often generate data that’s complex beyond human analysis is human error. When complicated software follows a lot of fuzzy and potentially self-contradicting rules, the result is a rather legible, but ultimately very confusing report. It looks like it has something interesting or important, but in reality, it’s just noise created by going overboard on rules and conditional statements in the code, or not watching how you’re setting up your data sources, allowing for potential data corruption when you connect to the wrong databases, connecting your inputs to the wrong database columns, and so on, and so forth. It’s actually quite fitting that in their explanation of how these supposedly hyper-intelligent computers may sound, Wired referenced oracles. Just like the Oracles of Delphi may have been breathing fumes that made them hallucinate and attributed the strange visions and sounds to messages from the gods, supposedly brilliant AI systems coming up with very complex and cryptic conclusions after scanning five or six million papers might just be presenting you with a product of very confused and loopy logic given to them by those who wrote their code…