How To Prevent Offensive Images From Appearing in Your Social Platform

If you manage a social platform like an Instagram or a Tumblr, you’ll inevitably face the task of having to remove offensive UGC (user-generated content) from your website, game, or app.

At first, this is simple, with only the occasional inappropriate image or three to remove. Since it seems like such a small issue, you just delete the offending images as needed. However, as your user base grows, so does the % of users who refuse to adhere to your terms of use.

There are some fundamental issues with human moderation:

  • It’s expensive. It costs much more to review images manually, as each message needs to be reviewed by flawed human eyes.
  • Moderators get tired and make mistakes. As you throw more pictures at people, they tend to get sick of looking for needles in haystacks and start to get fatigued.
  • Increased risk. If your platform allows for ‘instant publishing’ without an approval step, then you take on the additional risk of exposing users to offensive images.
  • Unmanageable backlogs. The more users you have, the more content you’ll receive. If you’re not careful, you can overload your moderators with massive queues full of stuff to review.
  • Humans aren’t scalable. When you’re throwing human time at the problem, you’re spending human resource dollars on things that aren’t about your future.
  • Stuck in the past. If you’re spending all of your time moderating, you’re wasting precious time reacting to things rather than building for the future.

At Two Hat, we believe in empowering humans to make purposeful decisions with their time and brain power. We built Community Sift to take care of the crappy stuff so you don’t have to. That’s why we’ve worked with leading professionals and partners to provide a service that automatically assesses and prioritizes user-generated content based on probable risk levels.

Do you want to build and maintain your own anti-virus software and virus signatures?

Here’s the thing — you could go and build some sort of image system in-house to evaluate the risk of incoming UGC. But here’s a question for you: would you create your own anti-virus system just to protect yourself from viruses on your computer? Would you make your own project management system just because you need to manage projects? Or would you build a bug-tracking database system just to track bugs? In the case of anti-virus software, that would be kind of nuts. After all, if you create your own anti-virus software, you’re the first one to get infected with new viruses at they emerge. And humans are clever… they create new viruses all the time. We know because that’s what we deal with every day.

Offensive images are much like viruses. Instead of having to manage your own set of threat signatures, you can just use a third-party service and decrease the scope required to keep those images at bay. By using an automated text and image classification system on your user-generated content, you can protect your users at scale, without the need for an army of human moderators leafing through the content.

Here are some offensive image types we can detect:

  • Pornography
  • Graphic Violence
  • Weapons
  • Drugs
  • Custom Topics
Example image analysis result


Some benefits to an automated threat prevention system like Community Sift:

  • Decreased costs. Reduces moderation queues by 90% or more.
  • Increased efficiency. Prioritized queues for purposeful moderation, sorted by risk
  • Empowers automation. Instead of pre-moderating or reacting after inappropriate images are published, you can let the system filter or prevent the images from being posted in the first place.
  • Increased scalability. You can grow your community without worrying about the scope of work required to moderate the content.
  • Safer than managing it yourself. In the case of Community Sift, we’re assessing images, videos, and text across multiple platforms. You gain a lot from the network effect.
  • Shape the community you want. You can educate your user base proactively. For example, instead of just accepting inbound pornographic images, you can warn the user that they are uploading content that breaks your terms of use. A warning system is one of the most practical ways to encourage positive user behavior in your app.
  • Get back to what matters. Instead of trying to tackle this problem, you can focus on building new features and ideas. Let’s face it… that’s the fun stuff, and that’s where you should be spending your time — coming up with new features for the community that’s gathered together because of your platform.

In the latest release to the Community Sift image classification service, the system has been built from the ground up with our partners using machine learning and artificial intelligence. This new incarnation of the image classifier was trained on millions of images to be able to distinguish the difference between a pornographic photo and a picture of a skin-colored donut, for example.

Classifying images can be tricky. In earlier iterations of our image classification service, the system wrongly believed that plain, glazed donuts and fingernails were pornographic since both image types contained a skin tone color. We’ve since fixed this, and the classifier is now running at a 98.14% detection rate and a 0.32% false positive rate for pornography. The remaining 1.86%? Likely blurry images or pictures taken from a distance.

On the image spectrum, some content is so severe it will always be filtered — that’s the 98.14%. Some content you will see again and again, and requires that action be taken on the user, like a ban or suspension — that’s when we factor in user reputation. The more high-risk content they post, the closer we look at their content.

Some images are on the lower end of the severity spectrum. In other words, there is less danger if they appear on the site briefly, are reported, and then removed — that’s the 1.86%.

By combining the image classifier with the text classifier, Community Sift can also catch less-overt pornographic content. Some users may post obscene text within a picture instead of an actual photo, while other users might try to sneak in a picture with an innuendo, but with a very graphic text description.

Keeping on top of incoming user-generated content is a huge amount of work, but it’s absolutely worth the effort. In some of the studies conducted by our Data Science team, we’ve observed that users who engage in social interactions are 3x more likely to continue using your product and less likely to leave your community.

By creating a social platform that allows people to share ideas and information, you have the ability to create connections between people from all around the world.

Community is built through connections from like-minded individuals that bond through shared interests. The relationships between people in a community are strengthened and harder to break when individuals come together through shared beliefs. MMOs like World of Warcraft and Ultima Online mastered the art of gaming communities, resulting in long-term businesses rather than short-term wins.

To learn more about how we help shape healthy online communities, reach out to us anytime. We’d be happy to share more about our vision to create a harassment-free, healthy social web.

Can Community Sift Outperform Google Jigsaw’s Conversation AI in the War on Trolls?

There are some problems in the world that everyone should be working on, like creating a cure for cancer and ensuring that everyone in the world has access to clean drinking water.

On the internet, there is a growing epidemic of child exploitative content, and it is up to us as digital service providers to protect users from illegal and harmful content. Another issue that’s been spreading is online harassment — celebrities, journalists, game developers, and many others face an influx of hate speech and destructive threats on a regular basis.

Harassment is a real problem — not a novelty startup idea like ‘the Uber for emergency hairstylists.’ Cyberbullying and harassment are problems that affect people in real-life, causing them psychological damage, trauma, and sometimes even causing people to self-harm or take their own lives. Young people are particularly susceptible to this, but so are many adults. There is no disconnect between our virtual lives and our real lives in our interconnected, mesh-of-things society. Our actual reality is already augmented.

Issues such as child exploitation, hate speech, and harassment are problems we should be solving together.

We are excited to see that our friends at Alphabet (Google) are publicly joining the fray, taking proactive action against harassment. The internal incubator formerly known as Google Ideas will now be known as Jigsaw, with a mission to make people in the world safer. It’s encouraging to see that they are tackling the same problems that we are — countering extremism and protecting people from harassment and hate speech online.

Like Jigsaw, we also employ a team of engineers, scientists, researchers, and designers from around the world. And like the talented folks at Google, we also collaborate to solve the really tough problems using technology.

There are also some key differences in how we approach these problems!

Since the Two Hat Security team started by developing technology solutions for child-directed products, we have unique, rich, battle-tested experience with conversational subversion, grooming, and cyberbullying. We’re not talking about sitting on the sidelines here — we have hands-on experience protecting kids’ communities from high-risk content and behaviours.

Our CEO, Chris Priebe, helped code and develop the original safety and moderation solutions for Club Penguin, the children’s social network with over 300 million users acquired by The Walt Disney Company in 2007. Chris applied what he’s learned over the past 20 years of software development and security testing to Community Sift, our flagship product.

At Two Hat, we have an international, native-speaking team of professionals from all around the world — Italy, France, Germany, Brazil, Japan, India, and more. We combine their expertise with computer algorithms to validate their decisions, increase efficiency, and improve future results. Instead of depending on crowdsourced results (which require that users are forced to see a message
before they can report it), we focus on enabling platforms to sift out messages before they are deployed.

Google vs. Community Sift — Test Results

In a recent article published in Wired, writer Andy Greenberg put Google Jigsaw’s Conversation AI to the test. As he rightly stated in his article, “Conversation AI, meant to curb that abuse, could take down its own share of legitimate speech in the process.” This is exactly the issue we have in maintaining Community Sift — ensuring that we don’t take down legitimate free speech in the process of protecting users from hate speech.

We thought it would be interesting to run the same phrases featured in the Wired article through Community Sift to see how we’re measuring up. After all, the Google team sets a fairly high bar when it comes to quality!

From these examples, you can see that our human-reviewed language signatures provided a more nuanced classification to the messages than the artificial intelligence did. Instead of starting with artificial intelligence assigning risk, we bring conversation trends and human professionals to the forefront, then allow the A.I. to learn from their classifications.

Here’s a peak behind the scenes at some of our risk classifications.

We break apart sentences into phrase patterns, instead of just looking at the individual words or the phase on its own. Then we assign other labels to the data, such as the user’s reputation, the context of the conversation, and other variables like vertical chat to catch subversive behaviours, which is particularly important for child-directed products.

Since both of the previous messages contain a common swearword, we need to classify that to enable child-directed products to filter this out of their chat. However, in this context, the message is addressing another user directly, so it is at higher risk of escalation.

This phrase, while seemingly harmless to an adult audience, contains some risk for younger demographics, as it could be used inappropriately in some contexts.

As the Wired writer points out in his article, “Inside Google’s Internet Justice League and Its AI-Powered War on Trolls”, this phrase is often a response from troll victims to harassment behaviours. In our system, this is a lower-risk message.

The intention of our classification system is to empower platform owners to make informed and educated decisions about their content. Much like how the MPAA rates films or the ESRB rates video games, we rate user-generated content to empower informed decision-making.


Trolls vs. Regular Users

We’re going to go out on a limb here and say that every company cares about how their users are being treated. We want customers to be treated with dignity and respect.

Imagine you’re the owner of a social platform like a game or app. If your average cost of acquisition sits at around $4, then it will cost you a lot of money if a troll starts pushing people away from your platform.

Unfortunately, customers who become trolls don’t have your community’s best interests or your marketing budget in mind — they care more about getting attention… at any cost. Trolls show up on a social platform to get the attention they’re not getting elsewhere.

Identifying who these users are is the first step to helping your community, your product, and even the trolls themselves. Here at Two Hat, we like to talk about our “Troll Performance Improvement Plans” (Troll PIPs), where we identify who your top trolls are, and work on a plan to give them a chance to reform their behaviour before taking disciplinary action. After all, we don’t tolerate belligerent behaviour or harassment in the workplace, so why would we tolerate it within our online communities?

Over time, community norms set in, and it’s difficult to reshape those norms. Take 4chan, for example. While this adult-only anonymous message board has a team of “volunteer moderators and janitors”, the site is still regularly filled with trolling, flame wars, racism, grotesque images, and pornography. And while there may be many legitimate, civil conversations lurking beneath the surface of 4chan, the site has earned a reputation that likely won’t change in the eyes of the public.

Striking a balance between free speech while preventing online harassment is tricky, yet necessary. If you allow trolls to harass other users, you are inadvertently enabling someone to cause another psychological harm. However, if you suppress every message, you’re just going to annoy users who are just trying to express themselves.


We’ve spent the last four years improving and advancing our technology to help make the internet great again. It’s a fantastic compliment to have a company as amazing as Google jumping into the space we’ve been focused on for so long, where we’re helping social apps and games like Dreadnought, PopJam, and ROBLOX.

Having Google join the fray shows that harassment is a big problem worth solving, and it also helps show that we have already made some tremendous strides to pave the way for them. We have had conversations with the Google team about the Riot Games’ experiments and learnings about toxic behaviours in games. Seeing them citing the same material is a great compliment, and we are honored to welcome them to the battle against abusive content online.

Back at Two Hat, we are already training the core Community Sift system on huge data sets — we’re under contract to process four billion messages a day across multiple languages in real-time. As we all continue to train artificial intelligence to recognize toxic behaviors like harassment, we can better serve the real people who are using these social products online. We can empower a freedom of choice for users to allow them to choose meaningful settings, like opting out of rape threats if they so choose. After all, we believe a woman shouldn’t have to self-censor herself, questioning whether that funny meme will result in a rape or death threat against her family. We’d much rather enable people to censor out inappropriate messages from those special kind of idiots who threaten to rape women.

While it’s a shame that we have to develop technology to curb behaviours that would be obviously inappropriate (and in some cases, illegal) in real-life, it is encouraging to know that there are so many groups taking strides to end hate speech now. From activist documentaries and pledges like The Bully Project, inspiring people to stand up against
bullying, to Alphabet/Google’s new Jigsaw division, we are on-track to start turning the negative tides in a new direction. And we are proud to be a part of such an important movement.