whoops, ray manages to do it again…

Ray Kurzweil seems to think that DNA is like software code which can be optimized and compressed. He's in for a rude awakening if he tries to treat it that way.

dna blueprint

Ray Kurzweil’s recent prognostications about reverse-engineering the human brain within unrealistically short time frames have been making the rounds on the web, inspiring a web comic parody, a rant from PZ calling him the Deepak Chopra for the tech-minded, and, of course, a rebuttal from yours truly. Now, Ray responds with a clarification, claiming that he’s been taken out of context, that his talk was over-simplified, and trying to offer a more detailed and nuanced version of his predictions. Okay, fair enough. The media does get ahead of itself and tech blogs like Gizmodo are notorious for tripping over a story before they learn enough details, or its real world significance. But the problem with Ray’s response is that even with some caveats and additional details, he’s still making significant mistakes about the brain and how it works, as well as what it takes for an accurate simulation of an organ this complex, and you bet I’m going to take issue with his arguments again.

Ray’s first correction is that he expects the brain to be reverse-engineered by 2030 rather than 2020, which is really not much of an improvement to those who don’t ascribe to a numerology of exponential technological and evolutionary advancement he passionately advocates. According to him, we accomplish so much more with each passing year and decade that it’s basically like giving neurologists and computer scientists enough leeway and then some in his Singularity schedule. But how much more time? What’s the formula being used to measure how much more we’ll accomplish in a given time span compared to the previous one? Ray says those who doubt his predictions don’t understand his exponential curve, and that may be true. After all, it’s his creation and he picked all the arbitrary milestones and did his own calculations. And on top of his attempts to play the tech world’s Nostradamus, he also makes major mistakes about biology like this one…

The amount of information in the genome (after lossless compression, which is feasible because of the massive redundancy in the genome) is about 50 million bytes (down from 800 million bytes in the uncompressed genome). It is true that information in the genome goes through a complex route to create a brain, but the information in the genome constrains the amount of information in the brain prior to the brain’s interaction with its environment.

Let’s see, where do we start with this one? Remember that in my previous post on the subject, I noted that the redundancy is there due to natural selection and that simply filtering out redundancies isn’t going to be of any help in retrieving only the information you need to recreate the instructions for a brain’s growth and bottom-up development. Also, how does Ray get the 800 million byte figure? If you store each nucleotide in your DNA as a character, you’ll have to allocate two bytes per nucleobase in memory. So with 6 billion nucleobases, you’re looking at roughly 12 billion bytes or 11.7 GB of data that you’d need to save to a persistent object. Which will still be hundreds of megabytes, even after some serious compression. Why do you think experts in genomics are calling for supercomputers to analyze genetic data? It’s not because it’s so easy to parse the immense amount of information and those “redundancies,” like repeated genes and STRs are actually rather important to growth and development in ways we still don’t know because we lack both the biological knowledge and a really good algorithm for parsing nucleotide sequences in an efficient and practical manner.

With that in mind, let’s move on to Ray’s bizarre claim that the genome limits the amount of information in the brain prior to it actually growing and developing. He keeps using the word information, but I wonder what he’s actually talking about when he does. In computer science, information is anything to be stored or computed in some way, shape or form. That information comes from a database, or user input, or one of the processes of the program with which we’re working. But in biology, the genome is telling the brain key stages of how it may form based on environmental cues and genetic triggers. And those triggers and cues begin as soon as cells start to divide into a new embryo. Where’s the limit of the information? What do you gain by knowing a string of nucleobases and the amino acids they encode when it isn’t actually a strict blueprint in the same sense we’d expect from a program? And what information could be in the brain when that brain hasn’t even started to form and create connections? That’s what makes our brains what they are and that should be the focus, instead of how we’re going to take a shortcut in our efforts to reverse-engineer them by ignoring their redundancies…

For example, the cerebellum contains a module of four types of neurons. That module is repeated about ten billion times. The cortex, a region that only mammals have and that is responsible for our ability to think symbolically and in hierarchies of ideas, also has massive redundancy. It has a basic pattern-recognition module that is considerably more complex than the repeated module in the cerebellum, but that cortex module is repeated about a billion times. There is also information in the interconnections, but there is massive redundancy in the connection pattern as well.

You know, for someone who says he studied this topic for four decades and is supposed to be up on virtually everything new in neurology and computer science, it’s pretty amazing that Ray is suggesting that all of those redundant connections could be ignored to get the structure of the brain and derive how it works on a neuron by neuron level. See, those redundancies are associated with higher cognition, and every living thing has a certain degree of redundancy as dictated by evolutionary processes. So what Ray is suggesting here is a terrific plan not to study what actually allows us to develop self-awareness and intellect, which we could really only get by studying the entire growth and development of the brain from day one. There are no shortcuts here and if Ray actually took the time to follow what neurologists and biologists say about these redundancies and how important their linking seems to be for high-level and complex brain functions, he would know that. Along the way, he also would’ve realized the true scope of the challenge. But then again, he really wants to become immortal so facing our limitations would also mean facing his fears. And he’s just not ready to do that.

addendum 08.24.2010: Okay, so it looks like I missed that Ray was thinking of a two bit data type for the base pairs, which really would yield an appreciably smaller file. That said, his reasoning behind the million lines of code it would take to simulate a human brain is still wrong (please see comments for elaboration as to why), and considering that we would only capture a sequence of amino acids, we would still need far more data. In fact, as biologists responding to Kurzweil have pointed out, you can’t derive a brain from the genome, and we can’t even derive complete micro structures from proteins yet because of all the complex interactions we have to take into account, interactions that depend on the development and the environment of the organism rather than its genome. Some of Ray’s defenders say that he’s not actually proposing to derive the brain from protein sequences, but if that’s the case, why even bother with DNA at all? Those amino acids encode what proteins are to be made and how, so it’s how these proteins interact that’s of vital importance if you want to understand how a brain is built during development, not just what amino acids being encoded with no additional context.

[ story first spotted on io9 ]

# science // biology / computer science / ray kurzweil


  Show Comments