Instagram Unleashes an AI System to Blast Away Nasty Comments

The social media site wants to turn itself into the friendliest place on the internet.

Every word has at least one meaning when it stands alone. But the meaning can change depending on context, or even over time. A sentence full of neutral words can be hostile (“Only whites should have rights”), and a sentence packed with potentially hostile words (“Fuck what, fuck whatever y'all been wearing”) can be neutral when you recognize it as a Kanye West lyric.

Humans are generally good at this kind of parsing, and machines are generally bad. Last June, however, Facebook announced that it had built a text classification engine to help machines interpret words in context.

The system, called DeepText, is based on recent advances in artificial intelligence and a concept called word embeddings, which means it is designed to mimic the way language works in our brains. When the system encounters a new word, it does what we do and tries to deduce meaning from all the other words around it.

White, for instance, means something completely different when it’s near the words snow, Sox, House, or power. DeepText is designed to operate the way a human thinks, and to improve over time, like a human too.

DeepText was built as an in-house tool that would let Facebook engineers quickly sort through mass amounts of text, create classification rules, and then build products to help users. If you’re on Facebook griping about the White Sox, the system should quickly figure out that you’re talking about baseball, which, at a deeper level, it should already know is a sport. If you’re talking about the White House, you might want to read the news. If you use the word white near snow, you might want to buy boots, unless you also use the words seven and dwarfs. If you’re talking about white power, maybe you shouldn’t be on the platform.

Getting access to DeepText, as Facebook explains it, is akin to getting a lesson in spear fishing (and a really good spear). Then the developers wade out into the river.

Almost immediately after learning about DeepText, executives at Instagram—which Facebook acquired in 2012—saw an opportunity to combat one of the scourges of its platform: spam. People come to Instagram for the photographs, but they often leave because of the layers of malarkey underneath, where bots (and sometimes humans too) pitch products, ask for follows, or just endlessly repeat the word succ.

Instagram’s first step was to hire a team of men and women to sort through comments on the platform and to classify them as spam or not spam. This kind of job, which is roughly the social media equivalent of being asked to dive onto a grenade, is common in the technology industry. Humans train machines to perform monotonous or even demoralizing tasks, which the machines will ultimately do better. If the humans do the job well, they lose the work. In the meantime, however, everyone else’s feeds get saved.

After the contractors had sorted through massive piles of bilge, buffoonery, and low-grade extortion, four-fifths of the data was fed into DeepText. Then Instagram’s engineers worked to create algorithms to try to classify spam correctly.

The system analyzed the semantics of each sentence, and also took the source into account. A note from someone you don’t follow is more likely to be spam than one from someone you do; a comment repeated endlessly on Selena Gomez’s feed probably isn’t being made by a human.

The algorithms that resulted were then tested on the one-fifth of the data that hadn’t been given to DeepText, to see how well the machines had matched the humans. Eventually, Instagram became satisfied with the results, and the company quietly launched the product last October. Spam began to vanish as the algorithms did their work, circling like high-IQ Roombas let loose in an apartment overrun with dust bunnies.

Instagram won’t say exactly how much the tool reduced spam, or divulge the inner secrets of how the system works. Reveal your defenses to a spammer and they’ll figure out how to counterpunch. But Kevin Systrom, Instagram’s C.E.O, was delighted.

He was so delighted, in fact, that he decided to try using DeepText on a more complicated problem: eliminating mean comments. Or, more specifically, eliminating comments that violate Instagram’s Community Guidelines, either specifically or, as a spokesman for the company says, “in spirit.” The Guidelines serve as something like a constitution for the social media platform. Instagram publishes a 1,200-word version publicly—asking people to be always respectful and never naked—and has a much longer, private set that employees use as a guide.

Once again, a team of contractors got to work. A person looks at a comment and determines whether it is appropriate. If it’s not, he sorts it into a category of verboten behavior, like bullying, racism, or sexual harassment. The raters, all of whom are at least bilingual, have analyzed roughly two million comments, and each comment has been rated at least twice.

Meanwhile, Instagram employees have been testing the system internally on their own phones, and the company has been adjusting the algorithms: selecting and modifying ones that seem to work and discarding ones that don’t. The machines give each comment a score between 0 and 1, which is a measure of Instagram’s confidence that the comment is offensive or inappropriate. Above a certain threshold, the comment gets zapped. As with spam, the comments are rated based both on a semantic analysis of the text and factors such as the relationship between the commenter and the poster, as well as the commenter’s history. Something typed by someone you’ve never met is more likely to be graded poorly than something typed by a friend.

This morning, Instagram will announce that the system is going live. Type something mean or hostile or harassing, and, if the system works, it should disappear. (The person who typed it will still see it on his phone, which is one of the ways Instagram is trying to make the process hard to game.) The technology will be automatically incorporated into people’s feeds, but it will also be easy to turn off: just click the ellipses in the settings menu and then click Comments.

The filter will only be available in English at first, but other languages will follow. Meanwhile, Instagram is also announcing that they’re expanding their robot spam filter to work in nine other languages: English, Spanish, Portuguese, Arabic, French, German, Russian, Japanese, and Chinese.

Some hateful comments will get through; it’s the internet after all. The new risk, of course, is false positives: innocuous or even helpful comments that the system deletes. Thomas Davidson, who helped build a machine-learning system to identify hate speech on Twitter, points out how hard the problem that Instagram is trying to solve really is. Machines are smart, but they can be tripped up by words that mean different things in different languages or different contexts. Here are some benign tweets that his system falsely identified as hateful:

“I didnt buy any alcohol this weekend, and only bought 20 fags. Proud that I still have 40 quid tbh”

“Intended to get pics but didn't have time.. Must be a mud race/event here this weekend.. Is like a redneck convoy out there”

“Alabama is overrated this yr the last 2 weeks has shown too many chinks in their armor WV gave them hell too.”

When asked about these particular sentences, Instagram didn’t respond specifically. They just noted that there would be errors. The system is based on the judgment of the original raters, and all humans make mistakes. Algorithms are flawed too, and they can have biases built in because of the data they trained on.

Furthermore, the system is built to be wrong 1 percent of the time, which also isn’t zero. Before the launch, I asked Systrom whether he struggled with the choice between making the system aggressive, which would mean blocking stuff that it shouldn’t, or passive, which would mean the opposite.

“It’s the classic problem,” he responded. “If you go for accuracy, you misclassify a bunch of stuff that was actually pretty good. So, you know, if you’re my friend and I’m just joking around with you, Instagram should let that through because you’re just joking around and I’m just giving you a hard time.… The thing we don’t want to do is have any instance where we block something that shouldn’t be blocked. The reality is it’s going to happen, so the question is: Is that margin of error worth it for all the really bad stuff that’s blocked?” He then added, “We’re not here to curb free speech. We’re not here to curb fun conversations between friends. But we are here to make sure we’re attacking the problem of bad comments on Instagram.”

If Systrom’s right, and the system works, Instagram could become one of the friendliest places on the internet. Or maybe it will seem too polished and controlled. Or maybe the system will start deleting friendly banter or political speech. Systrom is eager to find out. “The whole idea of machine learning is that it’s far better about understanding those nuances than any algorithm has in the past, or than any single human being could,” he says. “And I think what we have to do is figure out how to get into those gray areas and judge the performance of this algorithm over time to see if it actually improves things. Because, by the way, if it causes trouble and it doesn’t work, we’ll scrap it and start over with something new.”