when politicians and technology don’t mix
Something new, odd, and rather exciting is starting to happen in the world of defense and intelligence. Rather than classify the pencil on which a programmer working on a subroutine or a method in a program to be used by any defense agency chewed, as per usual, Navy-backed labs and the NSA are starting to open-source the code for their upcoming projects. And the NSA’s project is something very big and very useful, based on code used by giant search engines and other companies which have to process vast amounts of data quickly. As a massive facility intended to intercept and store vast amounts of potentially actionable intelligence is being built in Utah, the intelligence agency is trying to figure out how to efficiently retrieve this data for later use.
If the search queries take too long or they can’t find a way to accurately find the data they need, they may as well do the taxpayers and favor and shut down construction of their data farm and give up on electronic snooping. But we already have companies dealing with petabytes of barely structured data and they’ve published a few white papers on the subject, with the most influential of these papers coming from Google and detailing the design of their custom BigTable database. It’s huge, efficient, and perfect for massive data warehouses.
There are several projects that were spawned from this white paper, the most recognizable one — well, if you happen to be a DBA or a software architect — being Hadoop, but for the NSA none of tem provided what they’d need to follow the regulations for storing classified information. Its solution? The Apache Accumulo project, a new implementation of the BigTable which secures each partition of data based on its access level, which I’d assume would be governed by the clearance level of the individual trying to access the data, making sure that security is baked into some of the lowest levels of their applications rather than creating a gateway which can be compromised, leaving the rest of the data at the intruder’s mercy.
Problem solved, terrific new tool created and handed to the public via the Apache Foundation, and we can all go back to worrying exactly how all of this electronic snooping that the NSA wants to do with get done, right? Wrong say the politicians. In response to a release of Accumulo, certain lawmakers declared that this effort runs afoul of a law requiring government agencies to buy software when commercial alternatives exist instead of developing their own tools. In this case, the lawmakers want the NSA to use other open-source projects in which they think the agency could do the same thing it does with Accumulo by adding its custom code to the already existing codebase.
Now, it’s odd to see a common architectural debate playing out in a defense appropriations bill, but here it is, politicians and programmers arguing about how to build a database. Personally, I would be siding with the programmers because the level of changes required by Accumulo go to the very linchpin of the addressing system and redefine how the system processes data storage and retrieval. Considering that after this sort of change, every level of the code will have to be modified, it’s actually much faster and easier to write a custom implementation.
What politicians are asking the NSA to do is comparable to buying an old house to then replace its foundations, redo its floor plan, and renovate the kitchens and bathrooms, leaving the outside untouched, instead of just building a new house to fit its needs. Eager to follow the letter of the law rather than its spirit, lawmakers are trying to play architect rather than accept that the agency under their authority hired an able team of experts who know what they’re doing. Question is why are powerful lawmakers getting into such low level trivialities as what database system an intel agency uses to work with its data. Aren’t there problems that need a lot more legal attention than that? Isn’t oversight supposed to be about strategic issues instead of diving into problems most organizations usually solve at the middle management level?