3 Takeaways From The 16th International Bullying Prevention Conference

I recently had the privilege to speak on the keynote gaming panel of the 16th Annual International Bullying Prevention Conference, an event themed Kindness & Compassion: Building Healthy Communities.

The International Bullying Prevention Association is a 501(c)3 nonprofit organization founded in 2003 when grassroots practitioners and researchers came together to convene the first conference in the US entirely focused on bullying prevention. They host an annual conference in Chicago where attendees can benefit from workshops, poster sessions and TED-inspired sessions which deliver hands-on solutions and theoretical, research-based presentations. 

Below, I focus on the sessions and discussions I participated in regarding cyberbullying, and present a brief account of the takeaways I brought back to Canada and Two Hat.

1. User-centric approaches to online safety

A few people on the tech panels referred to the concept of “user-centric safety” — letting users set their boundaries and comfort levels for online interactions. Catherine Teitelbaum, a renowned Global Trust & Safety Executive who heads up Trust & Safety for Twitch, is a big champion of the idea and spoke about how the concept of “safety” varies from person to person. Offering customized control for the user experience, like Twitch does with Automod by empowering channel owners to set their chat filtering standards, is the way of the future. 

Online communities are diverse and unique, and often platforms contain many communities with different norms. The ability to tailor chat settings to those unique characteristics is critical.

Wouldn’t it be great for users to be able to choose their safety settings and what they are comfortable with – the same way they can set their privacy settings on online platforms? What if a mother wants to enjoy an online platform with her child, but wants to ensure that they don’t see any sexual language? Perhaps a gamer just wants to relax and play a few rounds without experiencing the violent language that might be the norm in a mature game centered around combat. The more agency and flexibility we give to users and players online, the better we can cater to the different expectations we all have when we log in.

2. Shared Responsibility, and the Importance of Diverse Voices

The concept of sharing and contributing to the greater good of online safety practices across tech industries also came up. Here at Two Hat we believe that ushering in a new age of content moderation and empowering an Internet that will fulfill its true purpose of connecting human beings is only possible through a shared responsibility approach (which also came up in the conference). We believe it will take the efforts of everyone involved to truly change things for the better. This includes academia, industry, government, and users. 

In his 2018 book “Farsighted: How Do We Make The Decisions That Matter The Most”, Steven Johnson writes about how complex decisions require a comprehensive mapping of all factors involved and how those are informed and extremely benefited from a set of diverse perspectives. The best, farsighted decisions compile the voices of a variety of people. The intricate human interaction systems we are creating on the Internet require complex decision-making at both the inception and design stage. However, right now those decisions are rarely informed by multi-disciplinary lenses. No wonder we are so shortsighted when it comes to anticipating issues with online behaviour and online harms.

A true, collaborative community of practice is needed. We need that rising tide that floats all boats, as my good friend Dr. Kim Voll says.

3. Empathy as an Antidote

Another good friend, Dr. Sameer Hinduja was one of the speakers in the conference. Dr Hinduja is a Professor in the School of Criminology and Criminal Justice at Florida Atlantic University and Co-Director of the Cyberbullying Research Center who is recognized internationally for his groundbreaking work on the subjects of cyberbullying and safe social media use. You will be hard-pressed to find someone more dedicated to the well-being of others.

He talked about how empathy can be used to prevent bullying, pulling from research and practical applications that have resulted in improvement in peer to peer relationships. He stressed the importance of practices that lead youth to go beyond the traditional approach of “being in someone else’s shoes” to feel empathy, and reaching a point where they truly value others. This is so important, and it makes me wonder: How can we design human interaction systems online where we perceive each other as valuable individuals and are constantly reminded of our shared humanity? How do we create platforms that discourage solely transactional interaction? How do we bring offline social cues into the online experience? How can we design interaction proxies to reduce friction between users – and ultimately lead us to more positive and productive online spaces? I don’t have all the answers – no one does. But I am encouraged by the work of people like Dr Hinduja, the Trust and Safety team at Twitch, the incredible Digital Civility efforts of Roblox and my friend Laura Higgins, their Director of Community Safety & Digital Civility, and events like The International Bullying Prevention Conference.

Moving Forward

Cyberbullying is one of the many challenges facing online platforms today. Let’s remember that it’s not just cyberbullying – there is a wider umbrella of behaviors that we need to better understand and define, including harassment, reputation tarnishing, doxxing, and more. We need to find a way to facilitate better digital interactions in general, by virtue of how we design online spaces, how we encourage positive and productive exchanges, and understanding that it will take a wider perspective, informed by many lenses, in order to create online spaces that fulfill their true potential.

If you’re reading this, you’re likely in the industry, and you’re definitely a participant in online communities. So what can you do, today, to make a difference? How can industry better collaborate to advance online safety practices?

Three Techniques to Protect Users From Cyberbullying

CEO Chris Priebe founded Two Hat Security back in 2012, with a big goal: To protect people of all ages from online bullying. Over the last six years, we’ve been given the opportunity to help some of the largest online games, virtual worlds, and messaging apps in the world grow healthy, engaged communities on their platforms.

Organizations like The Cybersmile Foundation provide crucial services, including educational resources and 24-hour global support, to victims of cyberbullying and online abuse.

But what about the platforms themselves? What can online games and social networks do to prevent cyberbullying from happening in the first place? And how can community managers play their part?

In honour of #StopCyberbullyingDay 2018 and our official support of the event, today we are sharing our top three techniques that community managers can implement to stop cyberbullying and abuse in their communities.

1. Share community guidelines.
Clear community standards are the building blocks of a healthy community. Sure, they won’t automatically prevent users from engaging in toxic or disruptive behaviour , but they go a long way in setting language and behaviour expectations up front.

Post guidelines where every community member can see them. For a forum, pin a “Forum Rules, Read Before Posting” post at the top of the page. For comment sections, include a link or popup next to the comment box. Online games can even embed code of conduct reminders within their reporting feature. Include consequences — what can users expect to happen if policies are broken?

Don’t just include what not to do — include tips for what you would like your users to do, as well. Want the community to encourage and support each other? Tell them!

2. Use proactive moderation.
Once community standards are clearly communicated, community managers need a method to filter, escalate, and review abusive content.

Often, that involves choosing the right moderation software. Most community managers use either a simple profanity filter or a content moderation tool. Proactive moderation involves filtering cyberbullying and abuse before it reaches the community. Profanity filters use a strict blacklist/whitelist to detect harassment, but they’re not sophisticated or accurate enough to understand context or nuance, and some only work for the English language.

Instead, find a content moderation tool that can accurately identify cyberbullying, remove it in real-time — and ultimately prevent users from experiencing abuse.

Of course, platforms should still always have a reporting system. But proactive moderation means that users only have to report questionable, “grey-area” content or false positive, instead of truly damaging content like extreme bullying and hate speech.

3. Reward positive users.
Positive user experience leads to increased engagement, loyalty, and profits.

Part of a good experience involves supporting the community’s code of conduct. Sanctioning users who post abusive comments or attack other community members is an essential technique in proactive moderation.

But with so much attention paid to disruptive behaviour, positive community members can start to feel like their voices aren’t heard.

That’s why we encourage community managers to reinforce positive behaviour by rewarding power users.

Emotional rewards add a lot of value, cost nothing, and take very little time. Forum moderators can upvote posts that embody community standards. Community managers can comment publicly on encouraging or supportive posts. Mods and community managers can even send private messages to users who contribute to community health and well-being.

Social rewards like granting access to exclusive content and achievement badges work, too. Never underestimate the power of popularity and peer recognition when it comes to encouraging healthy behaviour!

When choosing a content moderation tool to aid in proactive moderation, look for software that measures user reputation based on behaviour. This added technology takes the guesswork and manual review out of identifying positive users.

#StopCyberbullyingDay 2018, organized by the Cybersmile Foundation.

The official #StopCyberbullyingDay takes place once every year, on the third Friday in June. But for community managers, moderators, and anyone who works with online communities (including those of us at Two Hat Security), protecting users from bullying and harassment is a daily task. Today, start out by choosing one of our three healthy community building recommendations — and watch your community thrive.

After all, doesn’t everyone deserve to share online without fear of harassment or abuse?

Gamers Unite to End Online Harassment

Southern New Hampshire University sophomore Abbey Sager has accomplished more in her eighteen years than most of us do in a lifetime.

In addition to studying Business Administration and Nonprofit Management in university, Abbey is the founder and president of Diverse Gaming Coalition, a 501(c)3 non-profit organization dedicated to ending online bullying and harassment.

Abbey Sager, founder of Diverse Gaming Coalition

Bullied so badly as a teen that she dropped out of high school, Abbey later pursued her GED and completed her high school education. Determined not to let the same thing happen to other bullied teens, she founded Diverse Gaming Coalition. The coalition distinguishes itself from other anti-bullying organizations by making fun an essential pillar of its initiatives — no mind-numbing Powerpoint presentations or bland speeches allowed.

We spoke to Abbey about her experiences with online harassment, how she thinks online games can promote healthy interactions, and the Diverse Gaming Coalition’s current initiatives.

Tell us about your experiences with harassment in online games.

Being a female gamer, online harassment happens almost daily. Plus, harassment can be a two-way street. Sometimes, people don’t care at all and will spew obscenities over their microphone. Other times, people choose to send mean, hurtful messages. That includes adding me, finding other personal accounts, finding out information about me, threatening me, and doing things people wouldn’t normally do through voice chat, let alone to your face.

Playing games that involve or promote voice chat where I speak, questions like, “Are you a girl?” or “How old are you?” are common. I would escape real-life bullying to find solace in video games with my friends, but sometimes, it just made matters worse.

On one occasion, I was playing a game that involved voice chat in the game itself. It didn’t take too long for some random person to find my address, the names of my parents, and announce it to the entire game. I felt extremely unsafe, and it even made me not want to play games for months.

We’ve seen major changes in the industry this year. For example, Twitch released AutoMod, which allows broadcasters to moderate their own channels. Overwatch has been updating their reporting system. And companies like Riot Games are pioneering innovative initiatives like the League of Legends High School Clubs in Australia and New Zealand. As a gamer, what do you think games can do to further promote fair play & digital citizenship in their products? What can players do?

Both the games themselves and players play a big role in promoting fair play. For one, gamers have control over what they say and do. For instance, if a teammate is flaming or harassing them, most of the times ignoring the bully gets them to stop. You can’t fuel the flames without your actions.

Plus, games can take more action by taking reports more seriously and finding ways to keep these offenders from stopping their ways.

What initiatives are the Coalition working on?

Currently, Diverse Gaming Coalition is working on various initiatives to cater and target people with interest in a specific topic.

Our main project is our Comic Project, which includes two parts. The first part is a full 16-page comic focusing on a story of bullying, friendship, and differences within others.

A sneak peek at the next Diverse Gaming Coalition comic book

The second part includes monthly webcomics that focus on different topics each month to cater to today’s prominent social issues. You can read some of our online comics on our blog.

Our other initiatives focus on the online world- social media, internet, etc. We want to spread inclusivity and safe-spaces onto all of the platforms that we’re present on. That’s why we created our “Diverse Gamer’s” groups. This includes platforms such as Discord, Twitch, Steam and League of Legends. By doing this, we intend to create an environment to cater to everyone, while promoting those who do good on their platforms.

We’re always actively working on other projects. Everyone can follow our social media and keep up to date by subscribing to our mailing list on our site.

Explain the significance of diverse gaming. What does diversity mean to you? Why is it important?

At Diverse Gaming Coalition, our focus is to end bullying, while collaborating with other causes to support people from all walks of life. We do this through incorporating youth into everything we do, including events, workshops, streaming, gaming, and anything of interest! We understand how dull and repetitive anti-bullying organizations can be. The Diverse Gaming Team is fueled by Millennials, so we understand the bland representation that is centered around anti-bullying campaigns, but we strive to be different. Our main goal is to relay all our information in lively and fun way to cater to people.

With this in mind, not a lot of people get the justice they deserve by simply having a diverse background. In video games, LGBTQ+ persons don’t get the representation that they deserve, women are overly-sexualized, and black people have little to no representation. Diversity is what fuels creativity, compassion, and overall kindness.

The Coalition at work — and play. ; )

What do you hope to achieve with the coalition?

I hope that our organization can promote peace, love, and positivity in the world through our work. We want people that would have never gotten involved with bullying initiatives in the past to come join us and realize, “That’s actually a pretty big issue that we should work on ending”.

How can gamers and non-gamers get involved with Diverse Gaming Coalition?

You can find out more about how to get involved here.

We’re always looking for blog writers, and anyone passionate about ending bullying, on and offline. Feel free to send us an email at contact@diversegaming.co!

At  Two Hat Security, we believe that everyone has the right to share online without fear of harassment or abuse.

With our chat filter and moderation software Community Sift, we empower social and gaming platforms to build healthy, engaged online communities, all while protecting their brand and their users from high-risk content.

Contact us today to discuss how we can help you grow a thriving community of users in a safe, welcoming, diverse environment.

Want more articles like this? Subscribe to our newsletter and never miss a blog!

* indicates required


Top Three Reasons You Should Meet us at Gamescom

Heading to Gamescom or devcom this year? It’s a huge conference, and you have endless sessions, speakers, exhibits, and meetings to choose from. Your time is precious — and limited. How do you decide where you go, and who you talk to?

Here are three reasons we think you should meet with us while you’re in Cologne.

You need practical community-building tips.

Got trolls?

Our CEO & founder Chris Priebe is giving an awesome talk at devcom. He’ll be talking about the connection between trolls, community toxicity, and increased user churn. The struggle is real, and we’ve got the numbers to prove it.

Hope to build a thriving, engaged community in your game? Want to increase retention? Need to reduce your moderation workload so you can focus on fun stuff like shipping new features?

Chris has been in the online safety and security space for 20 years now and learned a few lessons along the way. He’ll be sharing practical, time-and-industry-proven moderation strategies that actually work.

Check out Chris’s talk on Monday, August 21st, from 14:30 – 15:00.

You don’t want to get left behind in a changing industry.

This is the year the industry gets serious about user-generated content (UGC) moderation.

With recent Facebook Live incidents (remember this and this?), new hate speech legislation in Germany, and the latest online harassment numbers from the Pew Research Center, online behavior is a hot topic.

We’ve been studying online behavior for years now. We even sat down with Kimberly Voll and Ivan Davies of Riot Games recently to talk about the challenges facing the industry in 2017.

Oh, and we have a kinda crazy theory about how the internet ended up this way. All we’ll say is that it involves Maslow’s hierarchy of needs

So, it’s encouraging to see that more and more companies are acknowledging the importance of smart, thoughtful, and intentional content moderation.

If you’re working on a game/social network/app in 2017, you have to consider how you’ll handle UGC (whether it’s chat, usernames, or images). Luckily, you don’t have to figure it out all by yourself.


You deserve success.

And we love this stuff.

Everyone says it, but it’s true: We really, really care about your success. And smart moderation is key to any social product’s success in a crowded and highly competitive market.

Increasing user retention, reducing moderation workload, keeping communities healthy — these are big deals to us. We’ve been fortunate enough to work with hugely successful companies like Roblox, Supercell, Kabam, and more, and we would love to share the lessons we’ve learned and best practices with you.

We’re sending three of our very best Two Hatters/Community Sifters to Germany. Sharon has a wicked sense of humor (and the biggest heart around), Mike has an encyclopedic knowledge of Bruce Springsteen lore, and Chris — well, he’s the brilliant, free-wheeling brain behind the entire operation.

So, if you’d like to meet up and chat at Gamescom, Sharon, Mike, and Chris will be in Cologne from Monday, August 21st to Friday, August 25th. Send us a message at hello@twohat.com, and one of them will be in touch.

Want more articles like this? Subscribe to our newsletter and never miss an update!

* indicates required

How Are You Celebrating Stop Cyberbullying Day?

Bullied: A Life in Two Stories

One. She wakes with a heaviness in her heart. It’s only Tuesday; still four school days left to go if she includes today. She glances at her phone, swipes to open the screen. Seventeen notifications. Texts, message threads, every app lit up with a new comment.

She ignores them all. She already knows what they say, anyway.

She gets dressed, carefully avoiding the mirror. Eats her breakfast in silence; soggy little rainbow circles, drenched in milk.

Her phone vibrates. She glances at the screen. It’s briefly lit with a message from a number she doesn’t recognize. u r faaaaaaat lil piggy, it says. She looks away, reads the back of the cereal box instead.

Breakfast finished, she shrugs into her backpack. Time to face the day. Time to leave.

Tucked away in her back pocket, her phone vibrates again.

Two. He double-clicks the bronze shield on his desktop. The game opens with a burst of heroic drums and horns. He enters his username and password, selects his favorite server, armors up for battle, and strides into the town square where several members of his clan wait. The square is crowded, teeming with barrel-chested warriors, tall mages draped in black cloaks, hideous pop-eyed goblins hopping from foot to foot.

He scans the usernames, looking for one in particular. Doesn’t see it. Feels his shoulders loosen and his back relax. He hadn’t realized how much tension he was holding inside, just looking for the name.

“who is ready to fight?” he types in the room chat.

A private message flashes in the lower right corner of his screen.

“hey faggot loser im baaaaaack”
“when u r goin to kill yrslf”
“log off n die loser”

His shoulders tighten again. It’s going to be a long session.


Those are only two examples of online bullying. There are countless others.

In 2017, there are no safe spaces for the bullied. We are all connected, day and night. Kids can’t disengage. We can’t expect them to put their iPhones away, stop using social networks, and walk away from the internet.

Online communities are just as meaningful as offline communities. And for kids and teens, they can — and should — be spaces that encourage personal growth, curiosity, and discovery. But too often, the online space is riddled with obstacles that stop kids from reaching their true potential.

The internet grew up fast.

We’re only just starting to realize that we’ve created a culture of bullying and abuse. So it’s up to us to change the culture.

As adults, it’s our job to ensure that when kids and teens are online, they are safe. Safe to be themselves, safe to share who they really are, and safe from abuse.

Today we celebrate Stop Cyberbullying Day. Launched by the Cybersmile Foundation in 2012, it’s a dedicated day to campaign for a better internet — for a truly inclusive space where everyone is free to share without fear of harassment or abuse.

Here at Two Hat Security, we believe in a world free of online bullying, harassment, and child exploitation. Today’s message of solidarity and empathy is core to our vision of a better internet. No one can fix this problem on their own, which is why days like today are so important.

Let’s come together — as families, as companies, as co-workers, and as citizens of this new digital world — and take a stand against bullying. The Cybersmile Foundation has some great ideas on their site — like Tweeting something nice to a person you follow or coming up with a new anti-bullying slogan.

We’ll continue to find new ways to protect online communities around the world. And we’ll keep trying to change the culture and the industry, every day. We hope you’ll join us.

Quora: How big a problem are bullying and harassment of pre-teens and teenagers on social media?

The numbers indicate that cyberbullying and harassment are huge problems for young people on social media. A 2016 report from the Cyberbullying Research Center indicates that 33.8% of students between 12 and 17 were victims of cyberbullying in their lifetime. Conversely, 11.5% of students between 12 and 17 indicated that they had engaged in cyberbullying in their lifetime.

Cyberbullying is different from “traditional” bullying in that it happens 24/7. For victims, there is no escape. It’s not confined to school or the playground. Kids and teens connect through social media, so for many, there is no option to simply go offline.

Even more troubling is the connection between cyberbullying and child exploitation. At Two Hat Security, we’ve identified a cycle in which child predators groom young victims, who are tricked into taking explicit photos which are then shared online; this leads to bullying and harassment from peers and strangers. Finally, the victim suffers from depression, engages in self-harm, and sometimes — tragically — commits suicide. It’s a heartbreaking cycle.

Cyberbullying and online harassment are profoundly dangerous and alarming behaviors with real, often severe and sometimes fatal, consequences for victims.

Social media platforms have options, though. AI-based text and image filters like Community Sift are the first lines of defense against cyberbullying. Purposeful, focused moderation of User Generated Content (UGC) is the next step. And finally, education and honest, open discussions about the effects of cyberbullying on real victims is crucial. The more we talk about it, the more comfortable victims will feel speaking out about their experiences.

Originally published on Quora, featured in Huffington Post and Forbes

Want more articles like this? Subscribe to our newsletter and never miss an update!

* indicates required

Quora: Does it make sense for media companies to disallow comments on articles?

It’s not hard to understand why more and more media companies are inclined to turn off comments. If you’ve spent any time reading the comments section on many websites, you’re bound to run into hate speech, vitriol, and abuse. It can be overwhelming and highly unpleasant. But the thing is, even though it feels like they’re everywhere, hate speech, vitriol, and abuse are only present in a tiny percentage of comments. Do the math, and you find that thoughtful, reasonable comments are the norm. Unfortunately, toxic voices almost always drown out healthy voices.

But it doesn’t have to be that way.

The path of least resistance is tempting. It’s easy to turn off comments — it’s a quick fix, and it always works. But there is a hidden cost. When companies remove comments, they send a powerful message to their best users: Your voice doesn’t matter. After all, users who post comments are engaged, they’re interested, and they’re active. If they feel compelled to leave a comment, they will probably also feel compelled to return, read more articles, and leave more comments. Shouldn’t media companies cater to those users, instead of the minority?

Traditionally, most companies approach comment moderation in one of two ways, both of which are ineffective and inefficient:

  • Pre-moderation. Costly and time-consuming, pre-moderating everything requires a large team of moderators. As companies scale up, it can become impossible to review every comment before it’s posted.
  • Crowdsourcing. A band-aid solution that doesn’t address the bigger problem. When companies depend on users to report the worst content, they force their best users to become de facto moderators. Engaged and enthusiastic users shouldn’t have to see hate speech and harassment. They should be protected from it.

I’ve written before about techniques to help build a community of users who give high-quality comments. The most important technique? Proactive moderation.

My company Two Hat Security has been training and tuning AI since 2012 using multiple unique data sets, including comments sections, online games, and social networks. In our experience, proactive moderation uses a blend of AI-powered automation, human review, real-time user feedback, and crowdsourcing.

It’s a balancing act that combines what computers do best (finding harmful content and taking action on users in real-time) and what humans do best (reviewing and reporting complex content). Skim the dangerous content — things like hate speech, harassment, and rape threats — off the top using a finely-tuned filter that identifies and removes it in real-time. That way no one has to see the worst comments. You can even customize the system to warn users when they’re about to post dangerous content. Then, your (much smaller and more efficient) team of moderators can review reported comments, and even monitor comments as they’re posted for anything objectionable that slips through the cracks.

Comments section don’t have to be the darkest places on the internet. Media companies have a choice — they can continue to let the angriest, loudest, and most hateful voices drown out the majority, or they can give their best users a platform for discussion and debate.

Originally published on Quora

Want more articles like this? Subscribe to our newsletter and never miss an update!

* indicates required

Can Community Sift Outperform Google Jigsaw’s Conversation AI in the War on Trolls?

There are some problems in the world that everyone should be working on, like creating a cure for cancer and ensuring that everyone in the world has access to clean drinking water.

On the internet, there is a growing epidemic of child exploitative content, and it is up to us as digital service providers to protect users from illegal and harmful content. Another issue that’s been spreading is online harassment — celebrities, journalists, game developers, and many others face an influx of hate speech and destructive threats on a regular basis.

Harassment is a real problem — not a novelty startup idea like ‘the Uber for emergency hairstylists.’ Cyberbullying and harassment are problems that affect people in real-life, causing them psychological damage, trauma, and sometimes even causing people to self-harm or take their own lives. Young people are particularly susceptible to this, but so are many adults. There is no disconnect between our virtual lives and our real lives in our interconnected, mesh-of-things society. Our actual reality is already augmented.

Issues such as child exploitation, hate speech, and harassment are problems we should be solving together.

We are excited to see that our friends at Alphabet (Google) are publicly joining the fray, taking proactive action against harassment. The internal incubator formerly known as Google Ideas will now be known as Jigsaw, with a mission to make people in the world safer. It’s encouraging to see that they are tackling the same problems that we are — countering extremism and protecting people from harassment and hate speech online.

Like Jigsaw, we also employ a team of engineers, scientists, researchers, and designers from around the world. And like the talented folks at Google, we also collaborate to solve the really tough problems using technology.

There are also some key differences in how we approach these problems!

Since the Two Hat Security team started by developing technology solutions for child-directed products, we have unique, rich, battle-tested experience with conversational subversion, grooming, and cyberbullying. We’re not talking about sitting on the sidelines here — we have hands-on experience protecting kids’ communities from high-risk content and behaviours.

Our CEO, Chris Priebe, helped code and develop the original safety and moderation solutions for Club Penguin, the children’s social network with over 300 million users acquired by The Walt Disney Company in 2007. Chris applied what he’s learned over the past 20 years of software development and security testing to Community Sift, our flagship product.

At Two Hat, we have an international, native-speaking team of professionals from all around the world — Italy, France, Germany, Brazil, Japan, India, and more. We combine their expertise with computer algorithms to validate their decisions, increase efficiency, and improve future results. Instead of depending on crowdsourced results (which require that users are forced to see a message
before they can report it), we focus on enabling platforms to sift out messages before they are deployed.

Google vs. Community Sift — Test Results

In a recent article published in Wired, writer Andy Greenberg put Google Jigsaw’s Conversation AI to the test. As he rightly stated in his article, “Conversation AI, meant to curb that abuse, could take down its own share of legitimate speech in the process.” This is exactly the issue we have in maintaining Community Sift — ensuring that we don’t take down legitimate free speech in the process of protecting users from hate speech.

We thought it would be interesting to run the same phrases featured in the Wired article through Community Sift to see how we’re measuring up. After all, the Google team sets a fairly high bar when it comes to quality!

From these examples, you can see that our human-reviewed language signatures provided a more nuanced classification to the messages than the artificial intelligence did. Instead of starting with artificial intelligence assigning risk, we bring conversation trends and human professionals to the forefront, then allow the A.I. to learn from their classifications.

Here’s a peak behind the scenes at some of our risk classifications.

We break apart sentences into phrase patterns, instead of just looking at the individual words or the phase on its own. Then we assign other labels to the data, such as the user’s reputation, the context of the conversation, and other variables like vertical chat to catch subversive behaviours, which is particularly important for child-directed products.

Since both of the previous messages contain a common swearword, we need to classify that to enable child-directed products to filter this out of their chat. However, in this context, the message is addressing another user directly, so it is at higher risk of escalation.

This phrase, while seemingly harmless to an adult audience, contains some risk for younger demographics, as it could be used inappropriately in some contexts.

As the Wired writer points out in his article, “Inside Google’s Internet Justice League and Its AI-Powered War on Trolls”, this phrase is often a response from troll victims to harassment behaviours. In our system, this is a lower-risk message.

The intention of our classification system is to empower platform owners to make informed and educated decisions about their content. Much like how the MPAA rates films or the ESRB rates video games, we rate user-generated content to empower informed decision-making.


Trolls vs. Regular Users

We’re going to go out on a limb here and say that every company cares about how their users are being treated. We want customers to be treated with dignity and respect.

Imagine you’re the owner of a social platform like a game or app. If your average cost of acquisition sits at around $4, then it will cost you a lot of money if a troll starts pushing people away from your platform.

Unfortunately, customers who become trolls don’t have your community’s best interests or your marketing budget in mind — they care more about getting attention… at any cost. Trolls show up on a social platform to get the attention they’re not getting elsewhere.

Identifying who these users are is the first step to helping your community, your product, and even the trolls themselves. Here at Two Hat, we like to talk about our “Troll Performance Improvement Plans” (Troll PIPs), where we identify who your top trolls are, and work on a plan to give them a chance to reform their behaviour before taking disciplinary action. After all, we don’t tolerate belligerent behaviour or harassment in the workplace, so why would we tolerate it within our online communities?

Over time, community norms set in, and it’s difficult to reshape those norms. Take 4chan, for example. While this adult-only anonymous message board has a team of “volunteer moderators and janitors”, the site is still regularly filled with trolling, flame wars, racism, grotesque images, and pornography. And while there may be many legitimate, civil conversations lurking beneath the surface of 4chan, the site has earned a reputation that likely won’t change in the eyes of the public.

Striking a balance between free speech while preventing online harassment is tricky, yet necessary. If you allow trolls to harass other users, you are inadvertently enabling someone to cause another psychological harm. However, if you suppress every message, you’re just going to annoy users who are just trying to express themselves.


We’ve spent the last four years improving and advancing our technology to help make the internet great again. It’s a fantastic compliment to have a company as amazing as Google jumping into the space we’ve been focused on for so long, where we’re helping social apps and games like Dreadnought, PopJam, and ROBLOX.

Having Google join the fray shows that harassment is a big problem worth solving, and it also helps show that we have already made some tremendous strides to pave the way for them. We have had conversations with the Google team about the Riot Games’ experiments and learnings about toxic behaviours in games. Seeing them citing the same material is a great compliment, and we are honored to welcome them to the battle against abusive content online.

Back at Two Hat, we are already training the core Community Sift system on huge data sets — we’re under contract to process four billion messages a day across multiple languages in real-time. As we all continue to train artificial intelligence to recognize toxic behaviors like harassment, we can better serve the real people who are using these social products online. We can empower a freedom of choice for users to allow them to choose meaningful settings, like opting out of rape threats if they so choose. After all, we believe a woman shouldn’t have to self-censor herself, questioning whether that funny meme will result in a rape or death threat against her family. We’d much rather enable people to censor out inappropriate messages from those special kind of idiots who threaten to rape women.

While it’s a shame that we have to develop technology to curb behaviours that would be obviously inappropriate (and in some cases, illegal) in real-life, it is encouraging to know that there are so many groups taking strides to end hate speech now. From activist documentaries and pledges like The Bully Project, inspiring people to stand up against
bullying, to Alphabet/Google’s new Jigsaw division, we are on-track to start turning the negative tides in a new direction. And we are proud to be a part of such an important movement.

How to Remove Online Hate Speech in Under 24 Hours

Note: This post was originally published on July 5th, 2016. We’ve updated the content in light of the draft bill presented by the German government on March 14th.

In July of last year, the major players in social media came together as a united front with a pact to remove hate speech within 24 hours. Facebook defines hate speech as “content that attacks people based on their perceived or actual race, ethnicity, religion, sex, gender, sexual orientation, disability or disease.” Hate speech is a serious issue, as it shapes the core beliefs of people all over the globe.

Earlier this week, the German government took their fight against online hate speech one step further. They have proposed a new law that would levy fines up to €50 million against social media companies that failed to remove or block hate speech within 24 hours of a complaint. And the proposed law wouldn’t just affect companies — it would affect individuals as well. Social media companies would be expected to appoint a “responsible contact person.” This individual could be subject to a fine up to €5 million if user complaints aren’t dealt with promptly.

Those are big numbers — the kinds of numbers that could potentially cripple a business.

As professionals with social products, we tend to rally around the shared belief that empowering societies to exchange ideas and information will create a better, more connected world. The rise of the social web has been one of the most inspiring and amazing changes in recent history, impacting humanity for the better.

Unfortunately, like many good things in the world, there tends to be a dark underbelly hidden beneath the surface. While the majority of users use social platforms to share fun content, interesting information and inspirational news, there is a small fraction of users that use these platforms to spread messages of hate.

It is important to make the distinction that we are not talking about complaints, anger, or frustration. We recognize that there is a huge difference between trash talking vs. harassing specific individuals or groups of people.

We are a protection layer for social products, and we believe everyone should have the power to share without fear of harassment or abuse. We believe that social platforms should be as expressive as possible, where everyone can share thoughts, opinions, and information freely.

We also believe that hate speech does not belong on any social platform. To this end, we want to enable all social platforms to remove hate speech as fast as possible — and not just because they could be subject to a massive fine. As professionals in the social product space, we want everyone to be able to get this right — not just the huge companies like Google.

Smaller companies may be tempted to do this manually, but the task becomes progressively harder to manage with increased scale and growth. Eventually, moderators will be spending every waking moment looking at submissions, making for an inefficient process and slow reaction time.

Instead of removing hate speech within 24 hours, we want to remove it within minutes or even seconds. That is our big, hairy, audacious goal.

Here’s how we approach this vision of ‘instant hate speech removal.’

Step 1 — Label everything.

Full disclosure: traditional filters suck. They have a bad reputation for being overly-simplistic, unable to address context, and prone to flagging false-positives. Still, leaving it up to users to report all terrible content is unfair to them and bad for your brand. Filters are not adequate for addressing something as complicated as hate speech, so we decided to invest our money into creating something different.

Using the old environmentally-friendly adage of “reduce, reuse, recycle (in that specific order)”, we first want to reduce all the noise. Consider movie ratings: all films are rated, and “R” ratings come accompanied by explanations. For instance, “Rated R for extreme language and promotion of genocide.” We want to borrow this approach and apply labels that indicate the level of risk associated with the content.

There are two immediate benefits: First, users can decide what they want to see; and second, we can flag any content above our target threshold. Of course, content that falls under ‘artistic expression’ can be subjective. Films like “Schindler’s List” are hard to watch but do not fall under hate speech, despite touching upon subjects of racism and genocide. On social media, some content may address challenging issues without promoting hate. The rating allows people to prepare themselves for what they are about to see, but we need more information to know if it is hate speech.

In the real world, we might look at the reputation of the individual to gain a better sense of what to expect. Likewise, content on social media does not exist in a vacuum; there are circumstances at play, including the reputation of the speaker. To simulate human judgment, we have built out our system with 119 features to examine the text, context, and reputation. Just looking for words like “nigga” will generate tons of noise, but if you combine that with past expressions of racism and promotions of violence, you can start sifting out the harmless stuff to determine what requires immediate action.

User reputation is a powerful tool in the fight against hate speech. If a user has a history of racism, you can prioritize reviewing — and removing — their posts above others.

The way we approach this with Community Sift is to apply a series of lenses to the reported content — internally, we call this ‘classification.’ We assess the content on a sliding scale of risk, note the frequency of user-submitted reports, the context of the message (public vs. large group vs. small group vs. 1:1), and the speaker’s reputation. Note that at this point in the process we have not done anything yet other than label the data. Now it is time to do something with it.

Step 2 — Take automatic action.


After we label the data, we can place it into three distinct ‘buckets.’ The vast majority (around 95%) will fall under ‘obviously good’, since social media predominantly consists of pictures of kittens, food, and reposted jokes. Just like there is the ‘obviously good,’ however, there is also the ‘obviously bad’.

In this case, think of the system like anti-virus technology. Every day, people are creating new ways to mess up your computer. Cybersecurity companies dedicate their time to finding the latest malware signatures so that when one comes to you, it is automatically removed. Similarly, our company uses AI to find new social signatures by processing billions of messages across the globe for our human professionals to review. The manual review is critical to reducing false positives. Just like with antivirus technology, you do not want to delete innocuous content on people’s computers, lest you end up making some very common mistakes like this one.

So what is considered ‘obviously bad?’ That will depend on the purpose of the site. Most already have a ‘terms of use’ or ‘community guidelines’ page that defines what the group is for and the rules in place to achieve that goal. When users break the rules, our clients can configure the system to take immediate action with the reported user, such as warning, muting, or banning them. The more we can automate meaningfully here, the better. When seconds matter, speed is of the essence.

Now that we have labeled almost everything as either ‘obviously good’ and ‘obviously bad,’ we can prioritize which messages to address first.

Step 3 — Create prioritized queues for human action.

Computers are great at finding the good and the bad, but what about all the stuff in the middle? Currently, the best practice is to crowdsource judgment by allowing your users to report content. Human moderation of some kind is key to maintaining and training a quality workflow to eliminate hate speech. The challenge is going to be getting above the noise of bad reports and dealing with the urgent right now.

Remember the Steven Covey model of time management? Instead of only using a simple chronologically sorted list of hate speech reports, we want to provide humans with a streamlined list of items to action quickly, with the most important items at the top of the list.

A simple technique is to have two lists. One list has all the noise of user reported content. We see that about 80–95% of those reports are junk (one user like dogs, so they report the person who likes cats). Since we labeled the data in step 1, we know a fair bit about it already: the severity of the content, the intensity of the context, and the person’s reputation. If the community thinks the content violates the terms of use and our label says it is likely bad, chances are, it is bad. Alternatively, if the label thinks it is fine, then we can wait until more people report it, thus reducing the noise.

The second list focuses on high-risk, time-sensitive content. These are rare events, so this work queue is kept minuscule. Content enters when the system thinks it is high-risk, but cannot be sure; or, when users report content that is right on the border of triggering the conditions necessary for a rating of ‘obviously bad.’ The result is a prioritized queue that humans can stay on top of and remove content from in minutes instead of days.

In our case, we devote millions of dollars a year into continual refinement and improvement with human professionals, so product owners don’t have to. We take care of all that complexity to get product owners back to the fun stuff instead — like making more amazing social products.

Step 4 — Take human action.

Product owners could use crowdsourced, outsourced, or internal moderation to handle these queues, though this depends on the scale and available resources within the team. The important thing is to take action as fast as humanly possible, starting with the questionable content that the computers cannot catch.

Step 5 — Train artificial intelligence based on decisions.

To manage the volume of reported content for a platform like Facebook or Twitter, you need to employ some level of artificial intelligence. By setting up the moderation AI to learn from human decisions, the system becomes increasingly effective at automatically detecting and taking action against emerging issues. The more precise the automation, the faster the response.

After five years of dedicated research in this field, we’ve learned a few tricks.

Machine learning AI is a powerful tool. But when it comes to processing language, it’s far more efficient to use a combination of a well-trained human team working alongside an expert system AI.

By applying the methodology above, it is now within our grasp to remove hate speech from social platforms almost instantly. Prejudice is an issue that affects everyone, and in an increasingly connected global world, it affects everyone in real-time. We have to get this right.

Since Facebook, YouTube, Twitter and Microsoft signed the EU hate speech code back in 2016, more and more product owners have taken up the fight and are looking for ways to combat intolerance in their communities. With this latest announcement by the German government— and the prospect of substantial fines in the future — we wanted to go public with our insights in hopes that someone sees something he or she could apply to a platform right now. In truth, 24 hours just isn’t fast enough, given the damage that racism, threats, and harassment can cause. Luckily, there are ways to prevent hate speech from ever reaching the community.

At Community Sift and Two Hat Security, we have a dream — that all social products have the tools at their disposal to protect their communities. The hardest problems on the internet are the most important to solve. Whether it’s hate speech, child exploitation, or rape threats, we cannot tolerate dangerous or illegal content in our communities.

If we work together, we have a real shot at making the online world a better place. And that’s never been more urgent than it is today.