a moral dilemma for an intelligent search engine

January 29, 2013 — 5 Comments

android chip

There’s been a blip in the news cycle I’ve been meaning to dive into, but lately, more and more projects have been getting in the way of a steady writing schedule, and there are only so many hours in the day. So what’s the blip? Well, professional tech prophet and the public face of the Singularity as most of us know it, Ray Kurzweil, has a new gig at Google. His goal? To use stats to create an artificial intelligence that will handle web searches and explore the limits of how one could use statistics and inference to teach a synthetic mind. Unlike many of his prognostications about where technology is headed, this project is actually on very sound ground because we’re using search engines more and more to find what we want, and we do it based on the same type of educated guessing that machine learning can tackle quite well. And that’s why instead of what you’ve probably come to expect from me when Kurzweil embarks on a mission, you’ll get a small preview of the problems an artificially intelligent search engine will eventually face.

Machine learning and artificial neural networks are all the rage in the press right now because lots and lots of computing power can now run the millions of simulations required to train rather complex and elaborate behaviors in a relatively short amount of time. Watson couldn’t be built a few decades ago when artificial neural networks were being mathematically formalized because we simply didn’t have the technology we do today. Today’s cloud storage ideas require roughly the same kind of computational might as an intelligent system, and the thinking goes that if you pair the two, you’ll not only have your data available anywhere with an internet connection, but you’ll also have a digital assitant to fetch you what you need without having to browse through a myrriad of folders. Hence, systems like Watson and Siri, and now, whatever will come out of the joint Google-Kurzweil effort, and these functional AI prototypes are good at navigating context with a probabilistic approach, which successfully models how we think about the world.

So far so good, right? If we’re looking for something like "auto mechanics in Random, AZ," your search assistant living in the cloud would know to look at the relevant business listings, and if a lot of these listings link to reviews, it would assume that reviews are an important past of such a search result and bring them over as well. Knowing that reviews are important, it would likely do what it can to read through the reviews and select the mechanics with the most positive reviews that really read as if they were written by actual customers, parsing the text and looking for any telltale signs of sockpuppeting like too many superlatives or a rash of users in what seems like a stangely short time window as compared to the rest of the reviews. You get good results, some warnings about who to avoid, the AI did it’s job, you’re happy, the search engine is happy, and a couple of dozen tech reporters write gushing articles about this Wolfram Alpha Mark 2. But what if, just what if, you were to search for something scientific, something that brings up lots and lots of manufactroversies like evolution, climate change, or sexual education materials? The AI isn’t going to have the tools to give you the most useful or relevant recommendations there.

First off, there’s only so much that knowing context will do. For the AI, any page discussing the topic is valid, so a creationist website savaging evolution with unholy fury and a barrage of very, very carefully mined quotes designed to look respectable to the novice reader, and the archives at Talk Origins have the same validity unless a human tells it to prioritize scientific content over religious misrepresentations. Likewise, sites discussing healthy adult sexuality, sites going off in their condemantions of monogamy, and sites decrying any sexual activity before marriage as an amoral indulgence of the emotionally defective , are all the same to an AI without human input. I shudder to think of the kind of mess trying to accomodate a statistical approach here can make. Yes, we could say that if a user lives in what we know to be a socially conservative area, place a marked emphasis on the prudish and religious side of things, and if a user is in a moderate or a liberal area, show a gradient of sound science and alternative views on sexuality. Statistically, it makes sense. In the big picture, it perpetuates socio-political echo chambers.

And that introduces a moral dilemma Google and Kurzweil will have to face. Today’s search bar takes in your input, finds what look like good matches, and spits them out in pages. Good? Bad? Moral? Immoral? Scientifically valid? Total crackpottery? You, the human, will decide. Having an intelligent search assistant, however, places at least some of the responsibility for trying to filter out or flag bad or heavily biased information on the technology involved, and if the AI is way too accommodating to the user, it will simply perpetuate misinformation and propaganda. If it’s a bit too confrontational, or follows a version of the Golden Mean fallacy, it will be seen as defective by users who don’t like to step outside of their bubble too much, or those who’d like their AI to be a little more opinionated and put up an intellectual challenge. Hey, no one said that indexing and curating all human knowledge will be easy and that it won’t require making a stand on what gets top billing when someone tries to dive into your digital library. And here, no amount of machine learning and statistical analysis will save your thinking search engine…

Share
  • TheBrett

    Interesting stuff. It sounds like they’ll have to introduce a lot of rules into it as it goes along, to try and steer it into useful directions.

  • Paul451

    For awhile Google had something called… Wonder Wheel, I think. Which bunched related searches by keywords other than the ones you entered. I could see a similar tool for Kurzweil’s AI to bundle philosophically similar information. I assume the intend of Kurzweil’s project is to actually find an answer, not just produce a site list, so I could see it put together a kind of wikipedia page, with information split in to sections under appropriate categories. You self-select the information you want, drilling down as you need.

    Even now, it annoys me that Google doesn’t let you group results unless you add specific keywords. Search for a product and you get the manufacturer’s site, professional reviews, user review sites, local retailers, online overseas retailers, blogs, and forums, all jumbled together. I would think it’s within Google’s existing ability to tell the difference between news, forums, blogs, retailers, etc.

    Grouping them by type would allow you to turn on/off whole categories of results, say, the generic online retailers, to clear 50 nearly identical sites from the first few pages, without having to guess a keyword that will clear the unwanted sites without losing the stuff you want. Hell, it would be nice to be able to filter sites by the amount of information they provide, get rid of the sites that just happen to have the name of the item on a page but no information about it. [If you can't tell, I've recently spent time looking up products.]

  • gfish3000

    Eh, wonder wheel was no AI precursor. If anything, they have to work really quickly and really hard to catch up with Wolfram|Alpha. Now that’s a project well on it’s way to useful, everyday AI-hood.

  • Paul451

    Wonderwheel wasn’t AI, hell it wasn’t even that useful. (Probably why it’s past tense.) I just meant the concept of dividing content by aspects other than those being directly searched for.

    Even without “understanding” the content, I suspect that Google could already break site lists into categories based on the format/style of pages. (Blogs, news, retailers, forums, content free lists, etc.) And it annoys me that they don’t.

    A natural language AI should be able to do this even better based on the content (even if it can’t tell what it’s grouping, chances are it will end up with similar content in categories based on style and vocabulary alone. Ie, it won’t know why CvsE debates all fall into the same category, only that they share distinct stylistic similarities. But for us humans, a single extracted summary for the category will make the nature of the content obvious. You just need to present those categories to the user, they can filter themselves. No moral issues, no risk of censorship.

  • gfish3000

    Good news for an aspiring AI. A lot of work in gauging what belongs in what category has been done by the human-curated DMOZ. But the issue of what information to best present and how to judge what’s best suited is still an issue around which you’ll need to train. At what point do you say you’re done with the results and show the human? That’s a very specific functional requirements question…