why your dna is nothing like a database

Your DNA contains useful information that's read and translated, but comparing it to a computer database profoundly misses the point.

Greg Fish
Oct 21 2009

How many times have you heard DNA being compared to a computer database? How many creationists took this metaphor to heart and started wandering off into statements about how the information contained in our genome must have been engineered to mimic the back ends of complex software systems? And how many science classes still make this analogy? On the surface, it seems more or less accurate, since DNA stores a great deal of hereditary information that’s read and executed by cellular organelles. But when we actually take a deeper look into this concept, the similarities quickly begin to break down which makes perfect sense since we’re trying to compare a dynamic, evolving system to a static tool created to perform a limited set of tasks.

One of the first problems we would encounter when treating our genome like a database would be the way it encodes amino acids. As we know from basic biology, the bases along the DNA strands organize themselves into a triplet code read by mRNA (or messenger RNA), which carries the message to ribosomes to assemble the dictated proteins. However, both DNA and mRNA have four nucleotides doing the encoding and there are 64 combinations for only 20 amino acids. That means the same amino acids would be specified by different triplet codons. And indeed, the breakdown of what each codon means is full of redundancies. It’s like each entry in a database having several unique keys, something most modern database development tools won’t even let you do since without truly unique identifiers for each piece of data, database structures quickly break down and you’re going to have a whole lot of problems retrieving and working with its contents.

Sure you could trick the database in successfully associating one piece of data with several identifiers but it’s a very sloppy and bug prone way to organize anything. When I worked on a little app that allowed high school biology students to enter a DNA or an mRNA triplet and see the detailed chemical structure of the amino acid that it encoded, I had to trick the computer into realizing that GGU, GGC, GGA and GGG could all mean glycine by storing all the amino acids and all the codons separately, then pointing them to each other in the code. So every time I’ve ever heard creationists talk about DNA being perfectly designed, my mind just flashes back to the challenges I had with designing that little app. If someone designed DNA to work like a blueprint, it’s a very messy and jury-rigged design at best. It would’ve been far more efficient to make each codon specify only one amino acid. All you’d need would be 20 combinations for a one to one match.

But of course there’s a reason why the redundancy of the DNA has been kept by natural selection. Unlike our computer systems, genomes change due to transcriptions errors, environmental damage and so on. When a computer database starts randomly changing its data on you, it’s called a corruption and it renders your piece of software pretty much useless. In fact, just see an IT person’s reaction to the words “data corruption.” It will probably involve cursing, groaning, and if a big deadline is looming overhead, uncontrolled crying or a panic attack. But when we’re talking about DNA, the redundancy in its encoding is one of the things that protects us from potentially harmful side effects of mutations. Even if there are snips in the strand, there’s still a fairly good chance that the right amino acid will be encoded. And again, if DNA was well designed and organized, there’d be no need for the redundancy and mutations would always be corrected, not just patched up here and there or countered, so the code can function as intended.

Finally, we have to remember that a good part of our genomes doesn’t actively code for a protein or pass any important information during embryo development. In any database, having data that’s simply an archive and wouldn’t be required for day to day function is considered wasteful and an administrator would run routines to get rid of it. Leaner databases mean faster execution. If this archival data suddenly becomes necessary, the developers would simply request that the needed bits are added back on and work with them. When we add this to our list of problems with DNA running like some sort of digital blueprint, it seems that our genomes are over-engineered, redundant, inefficient and prone to random changes that would easily cripple any computer system designed to carry out a specific task. But that’s ok because biological systems are formed bottom up, not top down. They evolve for change and propagation, not self-contained data processing or a rather limited data exchange dictated by a strict, inflexible system of rules and regulations. And this is exactly why we should not be comparing the two in books or in science classes.

wowt.com

# science // computers / database / design / dna

by: Greg Fish

Los Angeles-based ex-Soviet computer lobotomist. Specializes in popular science, technology, the web, and conspiracy theories. His work also appeared in Rantt, BusinessWeek, i09, HowStuffWorks, SEED, RawStory, Science To The People, Le Monde, and Discovery News/Seeker, and he is a weekly radio contributor on Canadian radio.

show comments