a moral dilemma for an intelligent search engine
There’s been a blip in the news cycle I’ve been meaning to dive into, but lately, more and more projects have been getting in the way of a steady writing schedule, and there are only so many hours in the day. So what’s the blip? Well, professional tech prophet and the public face of the Singularity as most of us know it, Ray Kurzweil, has a new gig at Google. His goal? To use stats to create an artificial intelligence that will handle web searches and explore the limits of how one could use statistics and inference to teach a synthetic mind. Unlike many of his prognostications about where technology is headed, this project is actually on very sound ground because we’re using search engines more and more to find what we want, and we do it based on the same type of educated guessing that machine learning can tackle quite well. And that’s why instead of what you’ve probably come to expect from me when Kurzweil embarks on a mission, you’ll get a small preview of the problems an artificially intelligent search engine will eventually face.
Machine learning and artificial neural networks are all the rage in the press right now because lots and lots of computing power can now run the millions of simulations required to train rather complex and elaborate behaviors in a relatively short amount of time. Watson couldn’t be built a few decades ago when artificial neural networks were being mathematically formalized because we simply didn’t have the technology we do today. Today’s cloud storage ideas require roughly the same kind of computational might as an intelligent system, and the thinking goes that if you pair the two, you’ll not only have your data available anywhere with an internet connection, but you’ll also have a digital assitant to fetch you what you need without having to browse through a myrriad of folders. Hence, systems like Watson and Siri, and now, whatever will come out of the joint Google-Kurzweil effort, and these functional AI prototypes are good at navigating context with a probabilistic approach, which successfully models how we think about the world.
So far so good, right? If we’re looking for something like “auto mechanics in Random, AZ,” your search assistant living in the cloud would know to look at the relevant business listings, and if a lot of these listings link to reviews, it would assume that reviews are an important past of such a search result and bring them over as well. Knowing that reviews are important, it would likely do what it can to read through the reviews and select the mechanics with the most positive reviews that really read as if they were written by actual customers, parsing the text and looking for any telltale signs of sockpuppeting like too many superlatives or a rash of users in what seems like a stangely short time window as compared to the rest of the reviews. You get good results, some warnings about who to avoid, the AI did it’s job, you’re happy, the search engine is happy, and a couple of dozen tech reporters write gushing articles about this Wolfram Alpha Mark 2. But what if, just what if, you were to search for something scientific, something that brings up lots and lots of manufactroversies like evolution, climate change, or sexual education materials? The AI isn’t going to have the tools to give you the most useful or relevant recommendations there.
First off, there’s only so much that knowing context will do. For the AI, any page discussing the topic is valid, so a creationist website savaging evolution with unholy fury and a barrage of very, very carefully mined quotes designed to look respectable to the novice reader, and the archives at Talk Origins have the same validity unless a human tells it to prioritize scientific content over religious misrepresentations. Likewise, sites discussing healthy adult sexuality, sites going off in their condemantions of monogamy, and sites decrying any sexual activity before marriage as an amoral indulgence of the emotionally defective , are all the same to an AI without human input. I shudder to think of the kind of mess trying to accomodate a statistical approach here can make. Yes, we could say that if a user lives in what we know to be a socially conservative area, place a marked emphasis on the prudish and religious side of things, and if a user is in a moderate or a liberal area, show a gradient of sound science and alternative views on sexuality. Statistically, it makes sense. In the big picture, it perpetuates socio-political echo chambers.
And that introduces a moral dilemma Google and Kurzweil will have to face. Today’s search bar takes in your input, finds what look like good matches, and spits them out in pages. Good? Bad? Moral? Immoral? Scientifically valid? Total crackpottery? You, the human, will decide. Having an intelligent search assistant, however, places at least some of the responsibility for trying to filter out or flag bad or heavily biased information on the technology involved, and if the AI is way too accommodating to the user, it will simply perpetuate misinformation and propaganda. If it’s a bit too confrontational, or follows a version of the Golden Mean fallacy, it will be seen as defective by users who don’t like to step outside of their bubble too much, or those who’d like their AI to be a little more opinionated and put up an intellectual challenge. Hey, no one said that indexing and curating all human knowledge will be easy and that it won’t require making a stand on what gets top billing when someone tries to dive into your digital library. And here, no amount of machine learning and statistical analysis will save your thinking search engine…