do we need github to save our code forever? or should we rethink how we code?

Code storage and sharing platform GitHub wants to archive open source software for up to 10,000 years. But is that a good idea, and could we use its momentum to rethink how and why we code?

by: Greg Fish on 01.08.2020

Some of the most interesting discoveries about our past were made from examining things our ancestors threw away and forgot about while trying to preserve their temples, libraries, art, and monuments. This is not to mention that parsing through what has been forgotten or simply stashed out of sight allows us to understand our past for what it really was rather than only for what the people of the times wanted to be remembered. Maybe it’s with this in mind that one of the biggest computer code management services on the planet, GitHub, wants to preserve its contents as of 02/02/2020 for as long as the next ten millennia across various storage media, stating that since our world is powered by software, it only makes sense that we treat code as heritage we want to save for posterity.

However, to someone who has written a lot of code used by government agencies and private industry, this seems like a very grandiose way to treat the contents of GitHub. First of all, let’s talk about the numbers. There are over 100 million projects on the platform written in more than 330 languages, some 27 million of them public and a vast majority of those are active and meet the criteria for inclusion into the archive. It’s an absolutely overwhelming amount of code ranging from simple hacks for text formatting in a particular scripting language to wildly popular and widely used packages to build and train artificial neural networks. And nearly all of those active projects are under constant refinement, meaning that GitHub’s deep time archives will be updated on a constant basis across various storage media and phases.

And as if that wasn’t complicated and burdensome enough, eventually GitHub is planning to etch the code for the millions of projects being preserved on quartz glass plates with a laser, allowing it to survive for well over 10,000 years if those plates remain in good enough shape to be examined. We’re essentially talking about a library filled with billions of plates of sturdy glass, updated every five or more years, sitting in a cavern under the Arctic in an effort to make sure that we don’t lose any potentially important piece of technology. The only way to describe this task is Herculean, and while I can appreciate the reasoning behind it and the resources that will be involved, there are some important things to keep in mind before we start going down the road of treating code the same way we do textbooks.

why code is seldom made to be preserved

Unlike books or blueprints, code is meant to be a living thing, frequently in a state of flux. It’s often forgotten, overwritten, or discarded into obsolescence by design. Since we constantly have a stream of new technologies and updates to those technologies, the code that inhabits them often has to change as well. Certainly, there’s software that’s been running for decades but that’s an exception rather than the norm. The life cycle of the typical project is about five years and nearly three in four fail their task in some way, shape, of form. The industry is also rife with code which more or less is just reinventing the wheel, part of the reason why over the decades we ended up with nearly 9,000 programming languages — more than all our spoken languages in existence today — and hundreds of operating systems.

So many of us tend to think that we can do better or see something millions of those who tried before us couldn’t, only to discover that we arrived at the same solution at a slightly different angle. We created frameworks that made coders’ lives easier but made user experience slower and more cumbersome unless they have top of the line broadband services. We’ve created vast and complex mechanisms to create online data entry and manipulation tools that eventually got so complex, we had to redesign them from the ground up with new standards. Is any .NET developer out there going to clamor that we save WebForms? Should we hold on to Angular and Vue, their spiritual successors in JavaScript, just as we’re trying to switch to WebAssembly?

And who among those of us writing code for a living hasn’t created some rough patch or nasty function to get a nasty problem technically solved because management was breathing down necks with the naive hope that we’ll come back to fix it the right way later, and later never came because new tasks were piled on our plate? Preserve that kind of code for posterity and watch those of us who wrote it have a panic attack. In short, code is a living thing meant to be constantly changed, improved, and discarded when it’s no longer needed so rogue snippets of it won’t hide bugs for future implementations. Most of us tend to end up writing our code for the next release cycle or to solve the next bug, not for our great to the 500th degree grandkids to power their spacecraft or robot toys because that’s how we pay our bills.

do we need a shift in how we write software?

Of course, none of this is to say that all programmers write disposable throw-away code, and that for all its experiments, the industry hasn’t come up with worthwhile tools that need to be preserved for the future. My concern is that the GitHub Arctic Code Vault will end up with a few thousand meaningful and unique projects, and ten million slightly different versions of “left-pad,” creations of their moment, for very specific and temporary goals. Maybe each repository to be saved should be reviewed very thoroughly with is creators to make sure future updates have this ambitious preservation scheme in mind and the design of future tools considers what will happen if someone a thousand years from now tries to use them.

In other words, programmers could use this effort as an opportunity to talk about software design and maintenance in a completely new way. How much reinvention do we need? How often do we need to update our operating systems and how many of them are necessary and why? Right now, we have the luxury and incentive to experiment because we don’t really have to worry about scenarios where we wouldn’t be able to issue a patch or create a brand-new code repository. But we’ve actually covered some scenarios where we might never have the ability to update our code at will and any major patch or upgrade could take centuries, if not millennia to execute and just as long to verify. Why not start considering them seriously and focus on standardizing our discipline?

If for just a moment we stop thinking about code as something we need to check in at the end of the day to meet our sprint commitments and client requirements of the quarter, we can ask questions like how unique does the logic behind a banking website needs to be and how many different phone operating systems are really required. Maybe, just maybe, a lot of custom and complex solutions are no longer the best approach to having software powering our world and yet another programming language isn’t necessary? Maybe we can make do with more off the shelf software and less reinventing the wheel? Could it be time to start pruning the variety of tools and code at our disposal and picking which ones have the best potential to stick around not just for the next decade, but for the next century, if not thousands of years?

# tech // code / computer science / futurism

by: Greg Fish

Los Angeles-based editor and founder of Weird Things, co-host of the WoWT Podcast, ex-Soviet computer lobotomist with a graduate degree in computer science. Specializes in, but not limited to, popular science, technology, the web, and conspiracy theories. His work also appeared in Rantt, BusinessWeek, i09, HowStuffWorks, SEED, RawStory, Science To The People, Le Monde, and Discovery News/Seeker, and he has a weekly radio segment on The Shift With Shane Hewitt.

All Articles

Show Comments