that’s not what we meant by computer scientist
A while ago, there was some buzz on the pop sci circuit about scientists using machines to catch up with the constantly growing body of published papers and asking whether the machines could ever qualify as actual scientists proposing hypotheses of their own. Now there seems to be an affirmative answer since a robot- aided investigation into yeast growth rates led to a discovery and the researchers running the project credit a machine called Adam with finding how to make its specimens grow faster after updating its knowledge base with its previous findings about the genetics of yeast.
And like proud parents, the researchers wrote a paper about how well Adam preformed, while arguing that the language of scientific research should be formalized and standardized in such a way that future scientific machinery could use existing research to hypothesize on the basis of the most current information in the realm it’s designed to tackle. Does this mean we should shed a tear for future lab assistants who will no longer be needed after all research is done by an obliging AI with a smiley face, happily plowing its way through oceans of data without making a peep about its tiny stipend?
As you keep reading on this blog, machines are great tools for very specific tasks and when you know what it is you want computers to do, you can outsource whatever heavy lifting, repetitive actions, and data crunching you want to a computer. While what Adam did was certainly a great use of a computer in a scientific setting, it was working on a highly specific problem. It measured growth in yeast and mapped its genomic data for any links between the organism’s chemistry and development. That’s a very important part here. It didn’t just jump off a desktop one day and declare that it wants to dedicate the lifetime of its circuitry to study yeast.
Someone had to program it how to do its job, how to account for the findings, and then, how to analyze whether what it’s finding during its experiments has any significance. And when talking about its experiments, what we actually mean is that it’s running through its dataset and looking for any results it will decide to be significant. In some way, it’s actually a perfect scientist because it coldly and dispassionately evaluates its work through a lens of pure logic and statistics. It’s not subject to internal biases, it doesn’t want to please its handlers by delivering the result they want, and it’s not about to start fudging its own numbers just to get published.
However, how well computers will conduct scientific experiments depends on how well they were coded and humans can always enter biases and preferences for certain types of results. It wouldn’t be much of a shock were a homeopath’s supercomputer suddenly run simulations in which water has memory of only medicine since that’s what it will be programmed to find, and those findings then presented as ironclad because hey, a computer did them and a computer really doesn’t care whcih way the numbers broke down. But the human in charge of its programming, however, most certainly does. So while we could trust machines with some types of experiments where the results come down to measuring statistical significance between different batches and setups, we couldn’t really trust them with exploratory, vague questions.
For one thing, computers do not handle ambiguity well and ambiguous instructions will almost inevitably result in errors and crashes. And any kind of abstract, curiosity-driven tinkering is going to be impossible to program because without a goal or an elaborate set of steps to follow, computers will simply sit there and cycle through their background noise. It’s really more of an excess mental bandwidth thing to be curious and in my opinion, we may just have evolved curiosity in the absence of any other way to learn. Were we spoon-fed knowledge from the instant of our birth, maybe we would lack any drive or motivation to learn more and ask questions.
And that brings us back to the paper written by Adam’s custodians on the subject of formalizing scientific and experimental data into a language that can be broken down as metadata for research computers and future scientists who want to follow in the footsteps of previous promising research. By standardizing how we write about our research, they argue, it will be easier to follow everything we’re doing and to recreate our work from nil in another lab. Ok, that’s a noble idea, but the problem is that we already have a standard format for how to present our work to the outside world. It’s known as the research paper and it has certain sections we have to fill out in certain ways, and detail what we did, how we did it, what the results were, and how everything broke down when the experiment was over.
What the researchers choose to share and how they choose to share it depends on their style, their audience, and how much they think they need to explain themselves. It seems to me that the authors were so happy with the vast logs Adam generated while running all the experiments, they thought that every scientist should make such detailed logs an integral part of their work. And they’re right that yes, they absolutely should. But that’s been the guideline for many decades now and we still have those who won’t follow it because they don’t want to write a 500 page paper outlining every little thing they did.
Finally, there’s one last issue with robotic researchers that I have to mention again, an issue that would be a constant nuisance were we to adopt the authors’ suggestion in some way, shape, or form, and present all the metadata from thousands of experiments to machines conducting research. Humans have skepticism. They also know that peer review can fail or be circumvented and terrible papers will be published, and they are on a constant lookout for making a GIGO model. Computers have no such sense of skepticism and aren’t all that good at detecting subtle manipulation used to fluff up insignificant results or present an old idea as novel by rewording it.
They’re going to assume that every but of data they’re fed is valid and produce research that’s not going to hold up because it’s based on bad studies. Without careful human curators, machine scientists could become a giant exercise in GIGO research. And even worse, because we would be dealing with many, many gigabytes of data, if not several terabytes, at a time, the mistakes will be very difficult to detect and take years to accurately parse. This is why we shouldn’t rush to make machines our new lab assistants, but treat them as what they are: helpful tools which can get it wrong, and be very aware that overreliance on computers won’t make it any easier to make some huge breakthrough. In fact, it can easily lead you astray.
See: King, R., Liakata, M., Lu, C., Oliver, S., & Soldatova, L. (2011). On the formalization and reuse of scientific research Journal of The Royal Society Interface DOI: 10.1098/rsif.2011.0029