quantum of bigot: fighting trolls with a.i.
If you’re going to fight trolls, racists, and bigots with AI, you’re going to need a much more comprehensive approach than a few neural networks…
Many working in a mathematics heavy field have a similar vice. We want to quantify everything, especially if the quantification process is going to be an extremely complicated and imperfect one. In fact, the level of difficulty is the main draw because it forces us to think about what makes up the very thing we’re trying to quantify and how we can objectively define and measure it in the real world. And when it comes to quantifying bigotry that’s exploding on social media, this isn’t an abstract problem for the curious. Just ask Twitter. After years of hemorrhaging cash, it’s been looking for a buyer interested in monetizing its users for its own devices and willing to absorb the losses for a flood of new sales. But despite some interest and a few bids, the deals went nowhere for one simple reason: Twitter’s troll problem. And as the problem spreads to Facebook and comment sections of news and blogs, Google tried using its artificial intelligence knowhow to help flag bigotry, but when used against actual hate, its system came up short on many counts since it has to rely on key words and the sequences in which they have to be used to know how toxic they are. It’s the fundamental principle by which neural networks used for such problems are built, and they’re rather limited.
For example, let’s say someone posts a comment that says “all black people are thugs” which is obviously racist as hell. Google’s neural net learned by analyzing over phrases containing slurs like this and their intended targets again and again until it sunk in that the key words “black,” “people,” and “thug” put in close verbal and logical proximity to each other are, say 90% toxic. So far, the system works, but let’s set the complexity bar higher. Let’s consider another hypothetical post that says “black people should just play basketball” which definitely has a racist connotation, but doesn’t have slurs and obvious negatives for the system to react to. It sees nothing wrong in a combination of “black,” “people,” and “basketball,” yet the quite is obviously saying that black people should just be athletes, implying other careers to be off limits, and not just any athletes, but in a sport designated for them. It’s a solid 90% or higher on the toxicity scale, but the algorithm sees little to be suspicious about other than the word “just” and flags it as 60% toxic at the very most. Simply looking at sequences of words and their logical distances from each other in the phrase has some problems as a reliable method for a bigotry detector. But how do we remedy these shortcomings?
To try and answer that, we need to step way, way back and first talk about bigotry not as an algorithm, but as social entity. Who exactly are bigots and what makes them tick, not by dictionary definition one would expect to find in a heavily padded college essay, but by practical, real world manifestations that quickly make them stand out. They don’t just use slurs or bash liberal, egalitarian ideas by calling them something vile or comparing them to some horrible disease. Just note how Google’s algorithm goes astray when given quotes light on invective but heavy on the bigoted subtext and what’s known in journalist circles as dog whistles. Sarcasm adds another problem. How do you know on the basis on one comment that the person isn’t just mocking a bigot by pretending to be them, or conversely, mocking those calling out his bigoted statements? Well, the obvious answer is that we need context every time we evaluate a comment because two of the core features of bigotry are sincerity and a self-defensive attitude. Simply put, bigots say bigoted things because they truly believe them, and they hate being called bigots for it.
Only sociopaths and psychopaths are perfectly fine with seeing themselves as villains, ordinary people don’t think of themselves as bad people or want others to consider them as such. Even when they say and do terrible things we will use as cautionary tales in the future, they will approach it from the standpoint that they’re either standing up for what they know to be right or just doing their jobs. It’s a phenomenon explored in the famous Holocaust treatise The Banality of Evil, which argues that what we think of as evil on national and global scales can’t be explained by greed, jealousy, or even a strain of religious fundamentalism, but by a climate in which everyone is a cog in a machine the stated goal of which is some nebulous “greatness.” No, this is not to draw a direct parallel between Trumpism and Nazism because they have fundamentally opposite goals. The latter was based around ethnic cleansing and global domination, the former is based on isolationism and is fine with cultural homogeneity and forced assimilation. But those who were taken in by Trumpism don’t want to be reminded that this is still bigotry.
In fact, the common message given to tech businessman Sam Altman on his interview tour of Trump’s America was that they detest being called bigots, bad people, or xenophobes, and warn that they will cling closer to Trump if they keep being labeled as such. I have no doubt that they don’t think they are not bigoted or xenophobic, but it’s hard to take their word for it when it gets followed by a stream of invective about immigrants destroying culture, bringing crime and disease with them, and describing minorities as getting fortunes in government handouts while “real Americans” like them are just tossed by the wayside by “un-American” politicians. It’s the classic rule that any statement beginning with “I’m not racist, but” will almost always end up being bigoted because the conjunction pretty much demands something not exactly open-minded to be said in order for what will be said to make sense. This is more than likely how the aforementioned algorithm knows to start to raise its toxicity score for the argument: it detects a pattern that raises a red flag that something very negative is about to make its appearance.
And this is ultimately what a successful bigotry-flagging AI needs: patterns and context. Instead of just looking at what was said, it needs to know who said it. Does this person frequently trip the bigot sensor, pushing it into the 55% to 65% range and above? Does this person escalate when called out by others, tripping the sensor even more? What is this person’s social score as determined by feedback from other users in their replies and votes? Yes, the social score can be brigaded, but there are tell tale signs which can be used to disqualify votes, signs like large numbers of people from sites known for certain biases coming in to vote a certain way, correlations between some of these sites posting and a rush of voters heavily skewing one way, and floods of comments that trigger the sensor, so these are well understood problems that can be managed already. We should also track from where the users are coming on the web. Are they coming from sites favorited and frequented by bigots to post stuff that trips the sensor? That’s also a potential red flag.
A flow that tracks where the user came from, their reputation, their pattern of comments, and how they handle feedback won’t be a perfect system, but it’s not supposed to be. It will give users the benefit of the doubt, then crack down when they show they true colors. In the end, we should end up with a user with a track record and a social score reflective of it, and if that score is very problematic, the best practice would be to shadow ban this person. You will also be able to model the telltale signs of a verbal drive-by over time to flag it before anyone sees it and take appropriate automated action. Again, it would be impossible to build a perfect anti-abuse system, but with a flow of data moderated by several purpose built neural nets will definitely give you a leg up on toxic users. And certainly, for some users it will almost be a kind of perverse challenge to see how far they can push the system and become a commenter with the lowest reputation or the highest offense score. But for a number of others, it could actually be an important piece of feedback.
You see, there is absolutely such a thing as toxic political correctness which takes good, noble ideas, and turns them into verbal cudgels used on anyone not on board with post-modernist intersectionality, and has the potential to turn into strong armed mollycoddling which has led down a politically self-destructive path for the progressive wing of the left. Certainly this may have led some people who were at first happy to go along with the ideas behind a lot of politically correct rhetoric but were confronted with condescension or whiplash-inducing rebukes of the slightest failure to follow the woke canon of the group they interacted decided to turn towards the anti-PC right. They may have thought about them as sober skeptics who worry more about facts than feelings, but immersing themselves in the Trumpist bubbles were led to embrace bigotry through distorted, misleading data, and outright lies. They still think of themselves as good, upstanding people without a hateful bone in their bodies. But a computer which can show them when what they said tripped a bigot sensor, how often, and the severity and degree of their rants might show them that no, they’re not the nice people they thought.
And being able to transparently present this feedback may be just as key to good anti-troll AI as monitoring sources of traffic, the actual content, users’ histories, and learning how to flag dog whistles from those histories and the input of other users and administrators. We don’t want something only able to flag abuse if we don’t know how it works, we want something that shows us an audit trail to inform users and the programmers what happened, and use the same process we use to identify bigots: over time, in context, giving time and opportunity for the hood to slip and reveal what’s beneath. Then we can mute, quarantine, and provide feedback to users who leave toxic or bigoted comments what we find so objectionable and why. It’s that there’s no law against hate speech or racism, but social media is not a government ran enterprise which must respect their first amendment rights and cannot do anything about their speech as not to violate the law.
Social media was created and is maintained by private companies that don’t have to give bigots a major platform, and is users are fed up with trolls who sincerely believe not only that their opinions are only offensive to “libtards, cucks, and kikes,” but any disagreement and consequences for their actions and words violates their right to free speech. Since it doesn’t, we can finally do something about the popular refrain that the comment section is where a misanthrope goes to reaffirm his hatred of humanity, and reason along with civil discourse go to die a horrible death by a thousand insults. Google’s new Perspective algorithm is a good start, but it’s just one piece of the puzzle we can’t solve with the data points from a single comment. Ultimately, we need to teach computers to follow a conversation and make an informed opinion of a person’s character, something that can’t be done by a single neural net heavily reliant on parsing language. Understanding how to do it, and how to do it soon, may be one of the most important technical issues we tackle…
Note: if you liked this post, please recommend and share it. I’d love to hear some thoughts and feedback on this approach, and if it can get the attention of companies working on such algorithms, with millions of comments worth of training data, and looking for ways to refine their approach, I think it can help. Often, the first step to improving a complicated algorithm is to define how it works because code is the easy part, knowing what to code and why is where all software ultimately succeeds to fails.