How to Remove Online Hate Speech in Under 24 Hours

 In Social Networks
Note: This post was originally published on July 5th, 2016. We’ve updated the content in light of the draft bill presented by the German government on March 14th.

In July of last year, the major players in social media came together as a united front with a pact to remove hate speech within 24 hours. Facebook defines hate speech as “content that attacks people based on their perceived or actual race, ethnicity, religion, sex, gender, sexual orientation, disability or disease.” Hate speech is a serious issue, as it shapes the core beliefs of people all over the globe.

Earlier this week, the German government took their fight against online hate speech one step further. They have proposed a new law that would levy fines up to €50 million against social media companies that failed to remove or block hate speech within 24 hours of a complaint. And the proposed law wouldn’t just affect companies — it would affect individuals as well. Social media companies would be expected to appoint a “responsible contact person.” This individual could be subject to a fine up to €5 million if user complaints aren’t dealt with promptly.

Those are big numbers — the kinds of numbers that could potentially cripple a business.

As professionals with social products, we tend to rally around the shared belief that empowering societies to exchange ideas and information will create a better, more connected world. The rise of the social web has been one of the most inspiring and amazing changes in recent history, impacting humanity for the better.

Unfortunately, like many good things in the world, there tends to be a dark underbelly hidden beneath the surface. While the majority of users use social platforms to share fun content, interesting information and inspirational news, there is a small fraction of users that use these platforms to spread messages of hate.

It is important to make the distinction that we are not talking about complaints, anger, or frustration. We recognize that there is a huge difference between trash talking vs. harassing specific individuals or groups of people.

We are a protection layer for social products, and we believe everyone should have the power to share without fear of harassment or abuse. We believe that social platforms should be as expressive as possible, where everyone can share thoughts, opinions, and information freely.

We also believe that hate speech does not belong on any social platform. To this end, we want to enable all social platforms to remove hate speech as fast as possible — and not just because they could be subject to a massive fine. As professionals in the social product space, we want everyone to be able to get this right — not just the huge companies like Google.

Smaller companies may be tempted to do this manually, but the task becomes progressively harder to manage with increased scale and growth. Eventually, moderators will be spending every waking moment looking at submissions, making for an inefficient process and slow reaction time.

Instead of removing hate speech within 24 hours, we want to remove it within minutes or even seconds. That is our big, hairy, audacious goal.

Here’s how we approach this vision of ‘instant hate speech removal.’

Step 1 — Label everything.

Full disclosure: traditional filters suck. They have a bad reputation for being overly-simplistic, unable to address context, and prone to flagging false-positives. Still, leaving it up to users to report all terrible content is unfair to them and bad for your brand. Filters are not adequate for addressing something as complicated as hate speech, so we decided to invest our money into creating something different.

Using the old environmentally-friendly adage of “reduce, reuse, recycle (in that specific order)”, we first want to reduce all the noise. Consider movie ratings: all films are rated, and “R” ratings come accompanied by explanations. For instance, “Rated R for extreme language and promotion of genocide.” We want to borrow this approach and apply labels that indicate the level of risk associated with the content.

There are two immediate benefits: First, users can decide what they want to see; and second, we can flag any content above our target threshold. Of course, content that falls under ‘artistic expression’ can be subjective. Films like “Schindler’s List” are hard to watch but do not fall under hate speech, despite touching upon subjects of racism and genocide. On social media, some content may address challenging issues without promoting hate. The rating allows people to prepare themselves for what they are about to see, but we need more information to know if it is hate speech.

In the real world, we might look at the reputation of the individual to gain a better sense of what to expect. Likewise, content on social media does not exist in a vacuum; there are circumstances at play, including the reputation of the speaker. To simulate human judgment, we have built out our system with 119 features to examine the text, context, and reputation. Just looking for words like “nigga” will generate tons of noise, but if you combine that with past expressions of racism and promotions of violence, you can start sifting out the harmless stuff to determine what requires immediate action.

User reputation is a powerful tool in the fight against hate speech. If a user has a history of racism, you can prioritize reviewing — and removing — their posts above others.

The way we approach this with Community Sift is to apply a series of lenses to the reported content — internally, we call this ‘classification.’ We assess the content on a sliding scale of risk, note the frequency of user-submitted reports, the context of the message (public vs. large group vs. small group vs. 1:1), and the speaker’s reputation. Note that at this point in the process we have not done anything yet other than label the data. Now it is time to do something with it.

Step 2 — Take automatic action.

After we label the data, we can place it into three distinct ‘buckets.’ The vast majority (around 95%) will fall under ‘obviously good’, since social media predominantly consists of pictures of kittens, food, and reposted jokes. Just like there is the ‘obviously good,’ however, there is also the ‘obviously bad’.

In this case, think of the system like anti-virus technology. Every day, people are creating new ways to mess up your computer. Cybersecurity companies dedicate their time to finding the latest malware signatures so that when one comes to you, it is automatically removed. Similarly, our company uses AI to find new social signatures by processing billions of messages across the globe for our human professionals to review. The manual review is critical to reducing false positives. Just like with antivirus technology, you do not want to delete innocuous content on people’s computers, lest you end up making some very common mistakes like this one.

So what is considered ‘obviously bad?’ That will depend on the purpose of the site. Most already have a ‘terms of use’ or ‘community guidelines’ page that defines what the group is for and the rules in place to achieve that goal. When users break the rules, our clients can configure the system to take immediate action with the reported user, such as warning, muting, or banning them. The more we can automate meaningfully here, the better. When seconds matter, speed is of the essence.

Now that we have labeled almost everything as either ‘obviously good’ and ‘obviously bad,’ we can prioritize which messages to address first.

Step 3 — Create prioritized queues for human action.

Computers are great at finding the good and the bad, but what about all the stuff in the middle? Currently, the best practice is to crowdsource judgment by allowing your users to report content. Human moderation of some kind is key to maintaining and training a quality workflow to eliminate hate speech. The challenge is going to be getting above the noise of bad reports and dealing with the urgent right now.

Remember the Steven Covey model of time management? Instead of only using a simple chronologically sorted list of hate speech reports, we want to provide humans with a streamlined list of items to action quickly, with the most important items at the top of the list.

A simple technique is to have two lists. One list has all the noise of user reported content. We see that about 80–95% of those reports are junk (one user like dogs, so they report the person who likes cats). Since we labeled the data in step 1, we know a fair bit about it already: the severity of the content, the intensity of the context, and the person’s reputation. If the community thinks the content violates the terms of use and our label says it is likely bad, chances are, it is bad. Alternatively, if the label thinks it is fine, then we can wait until more people report it, thus reducing the noise.

The second list focuses on high-risk, time-sensitive content. These are rare events, so this work queue is kept minuscule. Content enters when the system thinks it is high-risk, but cannot be sure; or, when users report content that is right on the border of triggering the conditions necessary for a rating of ‘obviously bad.’ The result is a prioritized queue that humans can stay on top of and remove content from in minutes instead of days.

In our case, we devote millions of dollars a year into continual refinement and improvement with human professionals, so product owners don’t have to. We take care of all that complexity to get product owners back to the fun stuff instead — like making more amazing social products.

Step 4 — Take human action.

Product owners could use crowdsourced, outsourced, or internal moderation to handle these queues, though this depends on the scale and available resources within the team. The important thing is to take action as fast as humanly possible, starting with the questionable content that the computers cannot catch.

Step 5 — Train artificial intelligence based on decisions.

To manage the volume of reported content for a platform like Facebook or Twitter, you need to employ some level of artificial intelligence. By setting up the moderation AI to learn from human decisions, the system becomes increasingly effective at automatically detecting and taking action against emerging issues. The more precise the automation, the faster the response.

After five years of dedicated research in this field, we’ve learned a few tricks.

Machine learning AI is a powerful tool. But when it comes to processing language, it’s far more efficient to use a combination of a well-trained human team working alongside an expert system AI.

By applying the methodology above, it is now within our grasp to remove hate speech from social platforms almost instantly. Prejudice is an issue that affects everyone, and in an increasingly connected global world, it affects everyone in real-time. We have to get this right.

Since Facebook, YouTube, Twitter and Microsoft signed the EU hate speech code back in 2016, more and more product owners have taken up the fight and are looking for ways to combat intolerance in their communities. With this latest announcement by the German government— and the prospect of substantial fines in the future — we wanted to go public with our insights in hopes that someone sees something he or she could apply to a platform right now. In truth, 24 hours just isn’t fast enough, given the damage that racism, threats, and harassment can cause. Luckily, there are ways to prevent hate speech from ever reaching the community.

At Community Sift and Two Hat Security, we have a dream — that all social products have the tools at their disposal to protect their communities. The hardest problems on the internet are the most important to solve. Whether it’s hate speech, child exploitation, or rape threats, we cannot tolerate dangerous or illegal content in our communities.

If we work together, we have a real shot at making the online world a better place. And that’s never been more urgent than it is today.

Recommended Posts
Contact Us

Hello! Send us your question, and we'll get back to you as soon as possible.

Start typing and press Enter to search