To Mark Zuckerberg

Re: Building Global Communities

“There are billions of posts, comments and messages across our services each day, and since it’s impossible to review all of them, we review content once it is reported to us. There have been terribly tragic events — like suicides, some live streamed — that perhaps could have been prevented if someone had realized what was happening and reported them sooner. There are cases of bullying and harassment every day, that our team must be alerted to before we can help out. These stories show we must find a way to do more.” — Mark Zuckerberg

This is hard.

I built a company (Two Hat Security) that’s also contracted to process 4 billion chat messages, comments, and photos a day. We specifically look for high-risk content in real-time, such as bullying, harassment, threats of self-harm, and hate speech. It is not easy.

“There are cases of bullying and harassment every day, that our team must be alerted to before we can help out. These stories show we must find a way to do more.”

I must ask — why wait until cases get reported?

If you wait for a report to be filed by someone, haven’t they already been hurt? Some things that are reported can never be unseen. Some like Amanda Todd cannot have that image retracted. Others post when they are enraged or drunk and the words like air cannot be taken back. The saying goes, “What happens in Vegas stays in Vegas, Facebook, Twitter and Instagram forever” so maybe some things should never go live. What if you could proactively create a safe global community for people by preventing (or pausing) personal attacks in real-time instead?

This, it appears, is key to creating the next vision point.

“How do we help people build an informed community that exposes us to new ideas and builds common understanding in a world where every person has a voice?”

One of the biggest challenges to free speech online in 2017 is that we allow a small group of toxic trolls the ‘right’ to shut up a larger group of people. Ironically, these users’ claim to free speech often ends up becoming hate speech and harassment, destroying the opportunity for anyone else to speak up, much like bullies in the lunchroom. Why would someone share their deepest thoughts if others would just attack them? Instead, the dream for real conversations gets lost beneath a blanket of fear. Instead, we get puppy pictures, non-committal thumbs up, and posts that are ‘safe.’ If we want to create an inclusive community, people need to be able to share ideas and information online without fear of abuse from toxic bullies. I applaud your manifesto, as it calls this out, and calls us all to work together to achieve this.

But how?

Fourteen years ago, we both set out to change the social network of our world. We were both entrepreneurial engineers, hacking together experiments using the power of code. It was back in the days of MySpace and Friendster and the later Orkut. We had to browse to every single friend we had on MySpace just to see if they wrote anything new. To solve this I created myTWU — a social stream of all the latest blogs and photos of fellow students, alumni and sports teams on our internal social tool. Our office was in charge of building online learning but we realized that education is not about ideas but community. It was not enough to dump curriculum online for independent study, people needed places of belonging.

A year later “The Facebook” came out. You reached beyond the walls of one University and over time opened it to the world.

So I pivoted. As part of our community, we had a little chat room where you could waddle around and talk to others. It was a skin of a little experiment my brother was running. He was caught by surprise when it grew to a million users which showed how users long for community and places of belonging. In those days chat rooms were the dark part of the web and it was nearly impossible to keep up with the creative ways users tried to hurt each other.

So I was helping my brother code the safety mechanisms for his little social game. That little social game grew to become a global community with over 300 million users and Disney bought it back in 2007. I remember huddling in my brother’s basement rapidly building the backend to fix the latest trick to get around the filter. Club Penguin was huge.

After a decade of kids breaking the filter and building tools to moderate the millions upon millions of user reports, I had a breakthrough. By then I was security at Disney, with the job to hack everything with a Mouse logo on it. In my training, we learned that if someone DDoS’es a network or tries to break the system, you find a signature of what they are doing and turn up the firewall against that.

“What if we did that with social networks and social attacks?” I thought.

I’ve spent the last five years building an AI system with signatures and firewalls as it relates to social content. As we process billions of messages with Community Sift, we build reputation scores in real-time. We know who the trolls are — they leave digital signatures everywhere they go. Moreover, I can adjust the AI to turn up the sensitivity only where it counts. In so doing we drastically dropped false positives, opened communication with the masses while detecting the highest risk when it matters.

I had to build whole new AI algorithms to do this since traditional methods only hit 90–95% percent. That is great for most AI tasks but when it comes to cyber-bullying, hate-speech, and suicide the stakes are too high for the current state of art in NLP.

“To prevent harm, we can build social infrastructure to help our community identify problems before they happen. When someone is thinking of suicide or hurting themselves, we’ve built infrastructure to give their friends and community tools that could save their life.”

Since Two Hat is a security company, we are uniquely positioned to prevent harm with the largest vault of high-risk signatures, like grooming conversations and CSAM (child sexual abuse material.) In collaboration with our partners at the RCMP (Royal Canadian Mounted Police), we are developing a system to predict and prevent child exploitation before it happens to complement the efforts our friends at Microsoft have made with PhotoDNA. With, we are training AI models to find CSAM, and have lined up millions of dollars of Ph.D. research to give students world-class experience in working with our team.

“Artificial intelligence can help provide a better approach. We are researching systems that can look at photos and videos to flag content our team should review. This is still very early in development, but we have started to have it look at some content, and it already generates about one-third of all reports to the team that reviews content for our community.”

It is incredible what deep learning has accomplished in the last few years. And although we have been able to see near perfect recall in finding pornography with our current work there is an explosion of new topics we are training on. Further, the subtleties you outline are key.

I look forward to two changes to resolve this:

  1. I call on networks to trust that their users have resilience. It is not imperative to find everything just the worst. If all content can be sorted by maybe bad to absolutely bad we can then draw a line in the sand and say these cannot be unseen and these the community will find. In so doing we don’t have to wait for technology to reach perfection nor wait for users to report things we already know are bad. Let computers do what they do well and let humans deal with the rest.
  2. I call on users to be patient. Yes, sometimes in our ambition to prevent harm we may find a Holocaust photo. We know this is terrible but we ask for your patience. Computer vision is like a child still learning. A child that sees that image for the first time is still deeply impacted and is concerned. Join us to report these problems and to help train the system to mature and discern.

However, you are right that many more strides need to happen to get this to where it needs to be. We need to call on the world’s greatest thinkers. Of all the hard problems to solve, our next one is child pornography (CSAM). Some things cannot be unseen. There are things when seen re-victimize over and over again. We are the first to gain access to hundreds of thousands of CSAM material and train deep learning models on them with We are pouring millions of dollars and putting the best minds on this topic. It is a problem that must be solved.

And before I move on I want to give a shout out to your incredible team whom I have had the chance to volunteer at hack-a-thons with and who have helped me think through how to get this done. Your company commitment to social good is outstanding and they have helped many other companies and not for profits.

“The guiding principles are that the Community Standards should reflect the cultural norms of our community, that each person should see as little objectionable content as possible, and each person should be able to share what they want while being told they cannot share something as little as possible. The approach is to combine creating a large-scale democratic process to determine standards with AI to help enforce them.”

That is cool. I have got a couple of the main pieces needed for that completed if you need them.

“The idea is to give everyone in the community options for how they would like to set the content policy for themselves. Where is your line on nudity? On violence? On graphic content? On profanity?”

I had the chance to swing by Twitter 18 months ago. I took their sample firehose and have been running it through our system. We label each message across 1.8 million of our signatures, then put together a quick demo of what it would be like if you could turn off the toxicity on Twitter. It shows low, medium, and high-risk. I would not expect to see anything severe on there, as they have recently tried to clean it up.

My suggestion to Twitter was to allow each user the option to choose what they want to see. The suggestion was that a global policy gets rid of clear infractions against terms of use for content that can never be unseen such as gore or CSAM. After the global policy is applied, you can then let each user choose their own risk and tolerance levels.

We are committed to helping you and the Facebook team with your mission to build a safe, supportive, and inclusive community. We are already discussing ways we can help your team, and we are always open to feedback. Good luck on your journey to connect the world, and hope we cross paths next time I am in the valley.

Chris Priebe
CEO, Two Hat Security


Originally published on Medium 

How To Prevent Offensive Images From Appearing in Your Social Platform

If you manage a social platform like an Instagram or a Tumblr, you’ll inevitably face the task of having to remove offensive UGC (user-generated content) from your website, game, or app.

At first, this is simple, with only the occasional inappropriate image or three to remove. Since it seems like such a small issue, you just delete the offending images as needed. However, as your user base grows, so does the % of users who refuse to adhere to your terms of use.

There are some fundamental issues with human moderation:

  • It’s expensive. It costs much more to review images manually, as each message needs to be reviewed by flawed human eyes.
  • Moderators get tired and make mistakes. As you throw more pictures at people, they tend to get sick of looking for needles in haystacks and start to get fatigued.
  • Increased risk. If your platform allows for ‘instant publishing’ without an approval step, then you take on the additional risk of exposing users to offensive images.
  • Unmanageable backlogs. The more users you have, the more content you’ll receive. If you’re not careful, you can overload your moderators with massive queues full of stuff to review.
  • Humans aren’t scalable. When you’re throwing human time at the problem, you’re spending human resource dollars on things that aren’t about your future.
  • Stuck in the past. If you’re spending all of your time moderating, you’re wasting precious time reacting to things rather than building for the future.

At Two Hat, we believe in empowering humans to make purposeful decisions with their time and brain power. We built Community Sift to take care of the crappy stuff so you don’t have to. That’s why we’ve worked with leading professionals and partners to provide a service that automatically assesses and prioritizes user-generated content based on probable risk levels.

Do you want to build and maintain your own anti-virus software and virus signatures?

Here’s the thing — you could go and build some sort of image system in-house to evaluate the risk of incoming UGC. But here’s a question for you: would you create your own anti-virus system just to protect yourself from viruses on your computer? Would you make your own project management system just because you need to manage projects? Or would you build a bug-tracking database system just to track bugs? In the case of anti-virus software, that would be kind of nuts. After all, if you create your own anti-virus software, you’re the first one to get infected with new viruses at they emerge. And humans are clever… they create new viruses all the time. We know because that’s what we deal with every day.

Offensive images are much like viruses. Instead of having to manage your own set of threat signatures, you can just use a third-party service and decrease the scope required to keep those images at bay. By using an automated text and image classification system on your user-generated content, you can protect your users at scale, without the need for an army of human moderators leafing through the content.

Here are some offensive image types we can detect:

  • Pornography
  • Graphic Violence
  • Weapons
  • Drugs
  • Custom Topics
Example image analysis result


Some benefits to an automated threat prevention system like Community Sift:

  • Decreased costs. Reduces moderation queues by 90% or more.
  • Increased efficiency. Prioritized queues for purposeful moderation, sorted by risk
  • Empowers automation. Instead of pre-moderating or reacting after inappropriate images are published, you can let the system filter or prevent the images from being posted in the first place.
  • Increased scalability. You can grow your community without worrying about the scope of work required to moderate the content.
  • Safer than managing it yourself. In the case of Community Sift, we’re assessing images, videos, and text across multiple platforms. You gain a lot from the network effect.
  • Shape the community you want. You can educate your user base proactively. For example, instead of just accepting inbound pornographic images, you can warn the user that they are uploading content that breaks your terms of use. A warning system is one of the most practical ways to encourage positive user behavior in your app.
  • Get back to what matters. Instead of trying to tackle this problem, you can focus on building new features and ideas. Let’s face it… that’s the fun stuff, and that’s where you should be spending your time — coming up with new features for the community that’s gathered together because of your platform.

In the latest release to the Community Sift image classification service, the system has been built from the ground up with our partners using machine learning and artificial intelligence. This new incarnation of the image classifier was trained on millions of images to be able to distinguish the difference between a pornographic photo and a picture of a skin-colored donut, for example.

Classifying images can be tricky. In earlier iterations of our image classification service, the system wrongly believed that plain, glazed donuts and fingernails were pornographic since both image types contained a skin tone color. We’ve since fixed this, and the classifier is now running at a 98.14% detection rate and a 0.32% false positive rate for pornography. The remaining 1.86%? Likely blurry images or pictures taken from a distance.

On the image spectrum, some content is so severe it will always be filtered — that’s the 98.14%. Some content you will see again and again, and requires that action be taken on the user, like a ban or suspension — that’s when we factor in user reputation. The more high-risk content they post, the closer we look at their content.

Some images are on the lower end of the severity spectrum. In other words, there is less danger if they appear on the site briefly, are reported, and then removed — that’s the 1.86%.

By combining the image classifier with the text classifier, Community Sift can also catch less-overt pornographic content. Some users may post obscene text within a picture instead of an actual photo, while other users might try to sneak in a picture with an innuendo, but with a very graphic text description.

Keeping on top of incoming user-generated content is a huge amount of work, but it’s absolutely worth the effort. In some of the studies conducted by our Data Science team, we’ve observed that users who engage in social interactions are 3x more likely to continue using your product and less likely to leave your community.

By creating a social platform that allows people to share ideas and information, you have the ability to create connections between people from all around the world.

Community is built through connections from like-minded individuals that bond through shared interests. The relationships between people in a community are strengthened and harder to break when individuals come together through shared beliefs. MMOs like World of Warcraft and Ultima Online mastered the art of gaming communities, resulting in long-term businesses rather than short-term wins.

To learn more about how we help shape healthy online communities, reach out to us anytime. We’d be happy to share more about our vision to create a harassment-free, healthy social web.

Can Community Sift Outperform Google Jigsaw’s Conversation AI in the War on Trolls?

There are some problems in the world that everyone should be working on, like creating a cure for cancer and ensuring that everyone in the world has access to clean drinking water.

On the internet, there is a growing epidemic of child exploitative content, and it is up to us as digital service providers to protect users from illegal and harmful content. Another issue that’s been spreading is online harassment — celebrities, journalists, game developers, and many others face an influx of hate speech and destructive threats on a regular basis.

Harassment is a real problem — not a novelty startup idea like ‘the Uber for emergency hairstylists.’ Cyberbullying and harassment are problems that affect people in real-life, causing them psychological damage, trauma, and sometimes even causing people to self-harm or take their own lives. Young people are particularly susceptible to this, but so are many adults. There is no disconnect between our virtual lives and our real lives in our interconnected, mesh-of-things society. Our actual reality is already augmented.

Issues such as child exploitation, hate speech, and harassment are problems we should be solving together.

We are excited to see that our friends at Alphabet (Google) are publicly joining the fray, taking proactive action against harassment. The internal incubator formerly known as Google Ideas will now be known as Jigsaw, with a mission to make people in the world safer. It’s encouraging to see that they are tackling the same problems that we are — countering extremism and protecting people from harassment and hate speech online.

Like Jigsaw, we also employ a team of engineers, scientists, researchers, and designers from around the world. And like the talented folks at Google, we also collaborate to solve the really tough problems using technology.

There are also some key differences in how we approach these problems!

Since the Two Hat Security team started by developing technology solutions for child-directed products, we have unique, rich, battle-tested experience with conversational subversion, grooming, and cyberbullying. We’re not talking about sitting on the sidelines here — we have hands-on experience protecting kids’ communities from high-risk content and behaviours.

Our CEO, Chris Priebe, helped code and develop the original safety and moderation solutions for Club Penguin, the children’s social network with over 300 million users acquired by The Walt Disney Company in 2007. Chris applied what he’s learned over the past 20 years of software development and security testing to Community Sift, our flagship product.

At Two Hat, we have an international, native-speaking team of professionals from all around the world — Italy, France, Germany, Brazil, Japan, India, and more. We combine their expertise with computer algorithms to validate their decisions, increase efficiency, and improve future results. Instead of depending on crowdsourced results (which require that users are forced to see a message
before they can report it), we focus on enabling platforms to sift out messages before they are deployed.

Google vs. Community Sift — Test Results

In a recent article published in Wired, writer Andy Greenberg put Google Jigsaw’s Conversation AI to the test. As he rightly stated in his article, “Conversation AI, meant to curb that abuse, could take down its own share of legitimate speech in the process.” This is exactly the issue we have in maintaining Community Sift — ensuring that we don’t take down legitimate free speech in the process of protecting users from hate speech.

We thought it would be interesting to run the same phrases featured in the Wired article through Community Sift to see how we’re measuring up. After all, the Google team sets a fairly high bar when it comes to quality!

From these examples, you can see that our human-reviewed language signatures provided a more nuanced classification to the messages than the artificial intelligence did. Instead of starting with artificial intelligence assigning risk, we bring conversation trends and human professionals to the forefront, then allow the A.I. to learn from their classifications.

Here’s a peak behind the scenes at some of our risk classifications.

We break apart sentences into phrase patterns, instead of just looking at the individual words or the phase on its own. Then we assign other labels to the data, such as the user’s reputation, the context of the conversation, and other variables like vertical chat to catch subversive behaviours, which is particularly important for child-directed products.

Since both of the previous messages contain a common swearword, we need to classify that to enable child-directed products to filter this out of their chat. However, in this context, the message is addressing another user directly, so it is at higher risk of escalation.

This phrase, while seemingly harmless to an adult audience, contains some risk for younger demographics, as it could be used inappropriately in some contexts.

As the Wired writer points out in his article, “Inside Google’s Internet Justice League and Its AI-Powered War on Trolls”, this phrase is often a response from troll victims to harassment behaviours. In our system, this is a lower-risk message.

The intention of our classification system is to empower platform owners to make informed and educated decisions about their content. Much like how the MPAA rates films or the ESRB rates video games, we rate user-generated content to empower informed decision-making.


Trolls vs. Regular Users

We’re going to go out on a limb here and say that every company cares about how their users are being treated. We want customers to be treated with dignity and respect.

Imagine you’re the owner of a social platform like a game or app. If your average cost of acquisition sits at around $4, then it will cost you a lot of money if a troll starts pushing people away from your platform.

Unfortunately, customers who become trolls don’t have your community’s best interests or your marketing budget in mind — they care more about getting attention… at any cost. Trolls show up on a social platform to get the attention they’re not getting elsewhere.

Identifying who these users are is the first step to helping your community, your product, and even the trolls themselves. Here at Two Hat, we like to talk about our “Troll Performance Improvement Plans” (Troll PIPs), where we identify who your top trolls are, and work on a plan to give them a chance to reform their behaviour before taking disciplinary action. After all, we don’t tolerate belligerent behaviour or harassment in the workplace, so why would we tolerate it within our online communities?

Over time, community norms set in, and it’s difficult to reshape those norms. Take 4chan, for example. While this adult-only anonymous message board has a team of “volunteer moderators and janitors”, the site is still regularly filled with trolling, flame wars, racism, grotesque images, and pornography. And while there may be many legitimate, civil conversations lurking beneath the surface of 4chan, the site has earned a reputation that likely won’t change in the eyes of the public.

Striking a balance between free speech while preventing online harassment is tricky, yet necessary. If you allow trolls to harass other users, you are inadvertently enabling someone to cause another psychological harm. However, if you suppress every message, you’re just going to annoy users who are just trying to express themselves.


We’ve spent the last four years improving and advancing our technology to help make the internet great again. It’s a fantastic compliment to have a company as amazing as Google jumping into the space we’ve been focused on for so long, where we’re helping social apps and games like Dreadnought, PopJam, and ROBLOX.

Having Google join the fray shows that harassment is a big problem worth solving, and it also helps show that we have already made some tremendous strides to pave the way for them. We have had conversations with the Google team about the Riot Games’ experiments and learnings about toxic behaviours in games. Seeing them citing the same material is a great compliment, and we are honored to welcome them to the battle against abusive content online.

Back at Two Hat, we are already training the core Community Sift system on huge data sets — we’re under contract to process four billion messages a day across multiple languages in real-time. As we all continue to train artificial intelligence to recognize toxic behaviors like harassment, we can better serve the real people who are using these social products online. We can empower a freedom of choice for users to allow them to choose meaningful settings, like opting out of rape threats if they so choose. After all, we believe a woman shouldn’t have to self-censor herself, questioning whether that funny meme will result in a rape or death threat against her family. We’d much rather enable people to censor out inappropriate messages from those special kind of idiots who threaten to rape women.

While it’s a shame that we have to develop technology to curb behaviours that would be obviously inappropriate (and in some cases, illegal) in real-life, it is encouraging to know that there are so many groups taking strides to end hate speech now. From activist documentaries and pledges like The Bully Project, inspiring people to stand up against
bullying, to Alphabet/Google’s new Jigsaw division, we are on-track to start turning the negative tides in a new direction. And we are proud to be a part of such an important movement.

How to Remove Online Hate Speech in Under 24 Hours

Note: This post was originally published on July 5th, 2016. We’ve updated the content in light of the draft bill presented by the German government on March 14th.

In July of last year, the major players in social media came together as a united front with a pact to remove hate speech within 24 hours. Facebook defines hate speech as “content that attacks people based on their perceived or actual race, ethnicity, religion, sex, gender, sexual orientation, disability or disease.” Hate speech is a serious issue, as it shapes the core beliefs of people all over the globe.

Earlier this week, the German government took their fight against online hate speech one step further. They have proposed a new law that would levy fines up to €50 million against social media companies that failed to remove or block hate speech within 24 hours of a complaint. And the proposed law wouldn’t just affect companies — it would affect individuals as well. Social media companies would be expected to appoint a “responsible contact person.” This individual could be subject to a fine up to €5 million if user complaints aren’t dealt with promptly.

Those are big numbers — the kinds of numbers that could potentially cripple a business.

As professionals with social products, we tend to rally around the shared belief that empowering societies to exchange ideas and information will create a better, more connected world. The rise of the social web has been one of the most inspiring and amazing changes in recent history, impacting humanity for the better.

Unfortunately, like many good things in the world, there tends to be a dark underbelly hidden beneath the surface. While the majority of users use social platforms to share fun content, interesting information and inspirational news, there is a small fraction of users that use these platforms to spread messages of hate.

It is important to make the distinction that we are not talking about complaints, anger, or frustration. We recognize that there is a huge difference between trash talking vs. harassing specific individuals or groups of people.

We are a protection layer for social products, and we believe everyone should have the power to share without fear of harassment or abuse. We believe that social platforms should be as expressive as possible, where everyone can share thoughts, opinions, and information freely.

We also believe that hate speech does not belong on any social platform. To this end, we want to enable all social platforms to remove hate speech as fast as possible — and not just because they could be subject to a massive fine. As professionals in the social product space, we want everyone to be able to get this right — not just the huge companies like Google.

Smaller companies may be tempted to do this manually, but the task becomes progressively harder to manage with increased scale and growth. Eventually, moderators will be spending every waking moment looking at submissions, making for an inefficient process and slow reaction time.

Instead of removing hate speech within 24 hours, we want to remove it within minutes or even seconds. That is our big, hairy, audacious goal.

Here’s how we approach this vision of ‘instant hate speech removal.’

Step 1 — Label everything.

Full disclosure: traditional filters suck. They have a bad reputation for being overly-simplistic, unable to address context, and prone to flagging false-positives. Still, leaving it up to users to report all terrible content is unfair to them and bad for your brand. Filters are not adequate for addressing something as complicated as hate speech, so we decided to invest our money into creating something different.

Using the old environmentally-friendly adage of “reduce, reuse, recycle (in that specific order)”, we first want to reduce all the noise. Consider movie ratings: all films are rated, and “R” ratings come accompanied by explanations. For instance, “Rated R for extreme language and promotion of genocide.” We want to borrow this approach and apply labels that indicate the level of risk associated with the content.

There are two immediate benefits: First, users can decide what they want to see; and second, we can flag any content above our target threshold. Of course, content that falls under ‘artistic expression’ can be subjective. Films like “Schindler’s List” are hard to watch but do not fall under hate speech, despite touching upon subjects of racism and genocide. On social media, some content may address challenging issues without promoting hate. The rating allows people to prepare themselves for what they are about to see, but we need more information to know if it is hate speech.

In the real world, we might look at the reputation of the individual to gain a better sense of what to expect. Likewise, content on social media does not exist in a vacuum; there are circumstances at play, including the reputation of the speaker. To simulate human judgment, we have built out our system with 119 features to examine the text, context, and reputation. Just looking for words like “nigga” will generate tons of noise, but if you combine that with past expressions of racism and promotions of violence, you can start sifting out the harmless stuff to determine what requires immediate action.

User reputation is a powerful tool in the fight against hate speech. If a user has a history of racism, you can prioritize reviewing — and removing — their posts above others.

The way we approach this with Community Sift is to apply a series of lenses to the reported content — internally, we call this ‘classification.’ We assess the content on a sliding scale of risk, note the frequency of user-submitted reports, the context of the message (public vs. large group vs. small group vs. 1:1), and the speaker’s reputation. Note that at this point in the process we have not done anything yet other than label the data. Now it is time to do something with it.

Step 2 — Take automatic action.


After we label the data, we can place it into three distinct ‘buckets.’ The vast majority (around 95%) will fall under ‘obviously good’, since social media predominantly consists of pictures of kittens, food, and reposted jokes. Just like there is the ‘obviously good,’ however, there is also the ‘obviously bad’.

In this case, think of the system like anti-virus technology. Every day, people are creating new ways to mess up your computer. Cybersecurity companies dedicate their time to finding the latest malware signatures so that when one comes to you, it is automatically removed. Similarly, our company uses AI to find new social signatures by processing billions of messages across the globe for our human professionals to review. The manual review is critical to reducing false positives. Just like with antivirus technology, you do not want to delete innocuous content on people’s computers, lest you end up making some very common mistakes like this one.

So what is considered ‘obviously bad?’ That will depend on the purpose of the site. Most already have a ‘terms of use’ or ‘community guidelines’ page that defines what the group is for and the rules in place to achieve that goal. When users break the rules, our clients can configure the system to take immediate action with the reported user, such as warning, muting, or banning them. The more we can automate meaningfully here, the better. When seconds matter, speed is of the essence.

Now that we have labeled almost everything as either ‘obviously good’ and ‘obviously bad,’ we can prioritize which messages to address first.

Step 3 — Create prioritized queues for human action.

Computers are great at finding the good and the bad, but what about all the stuff in the middle? Currently, the best practice is to crowdsource judgment by allowing your users to report content. Human moderation of some kind is key to maintaining and training a quality workflow to eliminate hate speech. The challenge is going to be getting above the noise of bad reports and dealing with the urgent right now.

Remember the Steven Covey model of time management? Instead of only using a simple chronologically sorted list of hate speech reports, we want to provide humans with a streamlined list of items to action quickly, with the most important items at the top of the list.

A simple technique is to have two lists. One list has all the noise of user reported content. We see that about 80–95% of those reports are junk (one user like dogs, so they report the person who likes cats). Since we labeled the data in step 1, we know a fair bit about it already: the severity of the content, the intensity of the context, and the person’s reputation. If the community thinks the content violates the terms of use and our label says it is likely bad, chances are, it is bad. Alternatively, if the label thinks it is fine, then we can wait until more people report it, thus reducing the noise.

The second list focuses on high-risk, time-sensitive content. These are rare events, so this work queue is kept minuscule. Content enters when the system thinks it is high-risk, but cannot be sure; or, when users report content that is right on the border of triggering the conditions necessary for a rating of ‘obviously bad.’ The result is a prioritized queue that humans can stay on top of and remove content from in minutes instead of days.

In our case, we devote millions of dollars a year into continual refinement and improvement with human professionals, so product owners don’t have to. We take care of all that complexity to get product owners back to the fun stuff instead — like making more amazing social products.

Step 4 — Take human action.

Product owners could use crowdsourced, outsourced, or internal moderation to handle these queues, though this depends on the scale and available resources within the team. The important thing is to take action as fast as humanly possible, starting with the questionable content that the computers cannot catch.

Step 5 — Train artificial intelligence based on decisions.

To manage the volume of reported content for a platform like Facebook or Twitter, you need to employ some level of artificial intelligence. By setting up the moderation AI to learn from human decisions, the system becomes increasingly effective at automatically detecting and taking action against emerging issues. The more precise the automation, the faster the response.

After five years of dedicated research in this field, we’ve learned a few tricks.

Machine learning AI is a powerful tool. But when it comes to processing language, it’s far more efficient to use a combination of a well-trained human team working alongside an expert system AI.

By applying the methodology above, it is now within our grasp to remove hate speech from social platforms almost instantly. Prejudice is an issue that affects everyone, and in an increasingly connected global world, it affects everyone in real-time. We have to get this right.

Since Facebook, YouTube, Twitter and Microsoft signed the EU hate speech code back in 2016, more and more product owners have taken up the fight and are looking for ways to combat intolerance in their communities. With this latest announcement by the German government— and the prospect of substantial fines in the future — we wanted to go public with our insights in hopes that someone sees something he or she could apply to a platform right now. In truth, 24 hours just isn’t fast enough, given the damage that racism, threats, and harassment can cause. Luckily, there are ways to prevent hate speech from ever reaching the community.

At Community Sift and Two Hat Security, we have a dream — that all social products have the tools at their disposal to protect their communities. The hardest problems on the internet are the most important to solve. Whether it’s hate speech, child exploitation, or rape threats, we cannot tolerate dangerous or illegal content in our communities.

If we work together, we have a real shot at making the online world a better place. And that’s never been more urgent than it is today.

Freedom of Speech = Freedom From Accountability

We believe freedom of speech can be a positive force, especially when used with a level of care and respect for others. Realistically, we don’t live in a world where people will always be sweet and happy like Teletubbies. People are not always going to be kind to each other, and everyone is bound to have a bad day…

Here’s where the true challenge comes in for product owners — what do you do to protect free speech while also protecting your community from hate speech and online harassment? Do you allow users to threaten to rape other users in the name of freedom of expression? How will you discern the difference between someone having a bad day versus repeat offenders in need of correction?

Context is everything

The knee-jerk reaction might be to implement an Orwellian censorship strategy. In some cases, this may be the correct approach. Strict filtration is the right strategy for a child-directed product, where there are topics that are never acceptable from a legal perspective. However, filtering out certain words or phrases may not be the solution for a 17+ gaming community or a social media platform for adults. The context of a conversation between two strangers is much different from a conversation between a group of old friends, or a public chatroom where many voices are ‘shouting’ at each other.

Every community has different rules of engagement — each company has a philosophy about what they deem to be an appropriate conversation within their social product. What Flickr considers acceptable will differ significantly from what’s socially accepted on 4chan, or from within a professional Slack channel. Every product is different and unique, and that is one of the challenges we have in providing a protection service to such a wide variety of social apps and games.

Each product has a set of goals and guidelines that govern what they believe is acceptable or unacceptable within their community. Similarly, online collectives tend to have expectations about what they think is appropriate or inappropriate behaviour within their tribe. A moderator or community manager should act as a facilitator, reconciling any differences of expectation between the product owners and the community.

Respect each other as humans

With anonymity, it is much easier to divorce oneself from the reality that there’s a real human on the receiving end of cruel comments or so-called rape ‘jokes’.

Writer Lindy West shared a bit of her story about confronting a ‘troll’ who had been harassing her online in an excellent episode of “This American Life”. The writer and the man engage in a civil conversation, acknowledging the tension directly, eventually coming to somewhat of understanding about each other.

People forget that the victims of these ‘trolls’ are real people, but they also forget that ‘trolls’ are real people, too. As Lindy West describes, “empathy, boldness, and kindness” are some practical ways to bridge differences between two humans. There is a substantial difference between a virus and a corrupted file, just as there is a difference between a real troll and someone who’s having a bad day. With respect comes an opportunity to see each other as human beings rather than avatars on the other side of a screen.

Freedom of speech does not equal freedom from accountability

Some have described the internet as a haven for freedom of expression, where there is less pressure to be “politically correct”. While this may be partially true, there is still an inherent accountability that comes with our freedom. When someone chooses to exploit their freedom to publish hate speech, he or she will likely face some natural consequences, like the effect on his or her personal reputation (or in some extreme cases, legal repercussions).

Freedom of speech is not always sweet. It can even be ugly without crossing the line of transforming into toxic behavior. It can also be amazing and transformative. The democratization of thought enabled by modern social platforms have had a profound effect on society, empowering millions to share and exchange ideas and information.

One of our goals with Community Sift is to create safety without censorship, empowering product owners to preserve user freedom while also protecting their social apps and games. There are so many issues that plague online communities, including spam, radicalization, and illegal content. Businesses work with us because we use a combination of machine learning, artificial intelligence, and human community professionals to protect their products and services.

Moreover, while we respect the need for freedom of speech, we cannot support any activity that results in someone taking their own life. That is why we do what we do. If we can protect a single life through automated escalations and improved call-for-help workflows, we will have made the world a better place. While this may sound overly altruistic, we believe this is a challenge that is worth tackling head-on, regardless of the perspective about “freedom of speech.”


Originally published on Medium

Photo by Cory Doctorow. Source: Flickr

How We Manage Toxicity for Social Apps and Websites

At Two Hat, we believe the social internet is a positive place with unlimited potential. We also believe bullying and toxicity are causing harm to real people and causing irreparable damage to social products. That’s why we made Community Sift.

We work with leading game studios and social platforms to find and manage toxic behaviours in their communities. We do this in real-time, and (at the time of writing) process over 1 billion messages a month.

Some interesting facts about toxicity in online communities:

  • According to the Fiksu Index, the cost of acquiring a loyal user is now $4.23, making user acquisition one of the biggest costs to a game.
  • Player Behavior in Online Games research published by Riot Games indicates that “players are 320% more likely to quit, the more toxicity they experience.”

Toxicity hurts everyone:

  • An estimated 1% of a new community is toxic. If that is ignored, the best community members leave and toxicity can grow as high as 20%.
  • If a studio spends $1 million launching its game and a handful of toxic users send destructive messages, their investment is at risk.
  • Addressing the problem early will model what the community is for, and what is expected of future members, thus reducing future costs.
  • Behaviour does change. That’s why we’ve created responsive tools that adapt to changing trends and user behaviours. We believe people are coachable and have built our technology with this assumption.
  • Even existing communities see an immediate drop in toxicity with the addition of strong tools.

Here’s a little bit about what Community Sift can do to help:

  • More than a Filter: Unlike products that only look for profanity, we have over 1 million human-validated rules and multiple AI systems to seek out bullying, toxicity, racism, fraud, and more.
  • Emphasis on Reputation: Every user has a bad day. The real problem is users who are consistently damaging the community.
  • Reusable Common Sense: Instead of simple reg-ex or black/whitelist, we measure the severity on a spectrum, from extreme good to extreme bad. You can use the same rules but a different permission level for group chat vs. private chat and for one game vs. another.
  • Industry Veterans: Our team has made games with over 300 million users and managed a wide variety of communities across multiple languages. We are live and battle-tested on top titles, processing over 1 billion messages a month at the time of writing.

To install Community Sift, you have your backend servers make one simple API call for each message, and we handle all the complexity in our cloud.

When toxic behaviour is found, we can:

  • Hash out the negative parts of a message: e.g. *”####ed out message”*
  • Educate the user
  • Reward positive users who are consistently helping others
  • Automatically trigger a temporary mute for regular offenders
  • Escalate for internal review when certain conditions like “past history of toxicity” are met
  • Group toxic users on a server together to help protect new users
  • Provide daily stats, BI reports, and analytics

We’d love to show you how we can help protect your social product. Feel free to book a demo anytime.

Empowering Young Adults While Managing Online Risk

I recall being a young boy living in orchard country in the beautiful Okanagan Valley. By the age of 8, I had the run of my 37-acre orchard and its surrounding gullies and fields. I’d run, bike, hike, and explore with a German Shepherd as my co-conspirator and a backpack filled with trail mix. Occasionally, I’d wipe out and return home with some tears in my eyes and a wound on my leg, but it always healed and I was all the more diligent the next time.

Surrounding the orchard were various homes of people my family knew, and I knew I could visit if I ever needed help. I was aware that talking to strangers could be dangerous and I knew well enough to stay away from the dangerous bits of landscape, not that there were any cliffs or raging rivers- had there been, my radius of freedom might have been a little smaller.

Was there some risk? Yes. Was the risk of death or serious harm serious? No. Had it been, I wouldn’t have been allowed to travel so far and wide. Also, my life had been guided by my parents to ensure I knew how to make good decisions (which, for the most part, I did). This ability to assume an appropriate amount of risk helped guide me to be the person I now am. In truth, I’m a bit of an experience junky, but I’m also a little risk averse. However, when thrust into difficult situations I don’t shy away from them.

My company provides filter and moderation tools for online communities. We do it very well. In years past, filters for online communities (that is to say, the bit of technology that blocks certain words and phrases) had to be either a blacklist filter or a whitelist filter. Blacklist filters make sure that nothing on that list is said. The problem with blacklist filtering is that you’re constantly trying to figure out the new ways of saying bad things. Whitelist filters are the opposite, they only allow users to say things that are on the whitelist, which proves to be a very restrictive way to communicate.

We decided to do it differently, where we look at words and phrases and assign a risk to those words. We can then gauge how the word is used and look at the context by which it’s being used (is the user trusted, has the user demonstrated negative behaviour in the past, is the environment for older users or younger users, etc.). We can then filter uniquely by user and context, thus eliminating the overbearing action of saying all words are either good or bad (yes, some words are just bad and others are just good).

An area we’re keenly interested in is how we can help replicate a healthy amount of risk in an online community without putting users in danger. Most parents accept that a child might fall and scrape their knee while playing on a playground. We also accept the risk that when a child plays with other children they might be on the receiving end of some not-nice behaviour. We hope this won’t happen, but when it does we comfort them and teach them about character and how they should react to such people. They will meet bullies throughout their entire life. In the online arena though, we’ve become quite scared of anything that might cause risk to a child, possibly with good reason. When we think about the effects of this, we are concerned that children are no longer learning important life lessons.

I love how Tanya Byron said that we must “use our understanding of how [children] develop to empower them to manage risks and make the digital world safer.

Recently we’ve been asking ourselves, ‘How can we allow for a safe amount of risk to be present while providing tools that mimic real life?’ For example, in real life, a bully has to look into the eye of his or her victim. Although we can’t mimic that, we can deliver specific and timely responses to a bully that encourage them, at the moment of their bullying, to picture how others might receive what they’re saying. Another example might be the way an adult can engage in a situation that is beginning to get more serious. Even though we start to filter the sort of words that become more abusive, how can we then get this information to an adult or moderator as quickly and efficiently as possible so the adult can intervene? This is the subject of our current development, as we believe deeply that in order for kids to be truly safe online, they need to grow and develop skills that cause them to make smarter decisions and show greater amounts of empathy. This includes the need to look at what’s an appropriate amount of risk for children at all ages.

The internet is providing an unprecedented amount of access to people of all ages and backgrounds. Perhaps, as we progress in our understanding of its impact, more and more companies will start to realize the role they must take in helping it develop well. We must be willing to challenge assumptions and work through our own discomforts so that we can engage in a healthy discussion. As parents, we must challenge ourselves to see how technology has changed the way our children interact and how the risks we’re well aware of from when we were children are experienced in the digital world. How can we learn to help our kids fall gracefully and stand up again more confidently?


Originally published on the Family Online Safety Institute website


Incredible Moderators Who Stay Focused

When I was young, I was one of those good kids. When a teacher walked by I would say, “Hello Ms. so-and-so” and if our class was asked for volunteers, I would be among the first to thrust my hand excitedly into the air. My parents taught me to respect other adults and to respect authority, and I did.

But, there was this one noon-hour-supervisor I didn’t respect (one among many other really fantastic ones). I think she hated all children and became a noon-hour-supervisor so she could prove it. She made rash decisions and seemed to enjoy giving out pink slips. I never got in trouble, except with this supervisor I got two pink slips. Having to get my parents to sign those was terrible as they just couldn’t accept that the school board could employ someone who was as bad as I said she was. With so many fantastic noon-hour-supervisors, I suppose I was pretty fortunate to only experience one that was less than stellar.

She made rash decisions and seemed to enjoy giving out pink slips.

I liken the role of noon-hour-supervisor (hopefully the good ones) to that of an online community moderator. Moderators are there to ensure a community is safe and healthy. They exist to serve a community as someone who has the authority to remove users that are disruptive or hurtful to the larger community. The difference between being a noon-hour-supervisor and being a moderator is that, as a moderator, you don’t get to look into the eyes of the community you’re taking care of. The temptation then is to become like the noon-hour-supervisor who disciplines for the sake of disciplining and seems to enjoy watching the bad apples. Moderators spend hours a day reviewing chat logs and content created by the community. They do this in an effort to find the bad apples while also finding ways to encourage the community (well, the good moderators do that anyways).

I’ve worked with some really incredible moderators who have been able to stay focused on why they do what they do.

I’ve worked with some really incredible moderators who have been able to stay focused on why they do what they do. There are three things that helped them stay that way rather than turning into the crusty sort of person that seems to delight in seeing all of the community as troublesome.

1. Speak often of your ideals

Every day, speak about the ideals of your community with your fellow moderators. Constantly paint the picture to each other of what this community is capable of being and highlight the users that model this and make you believe it can become like that.

2. Use tools that allow you to focus on the good while ensuring safety from the bad

Not all tools are created equal. Moderation tools should just as easily allow you to celebrate the vast majority of good users rather than always focusing on the negative behaviour of the minority.

3. Refuse to celebrate the bad

It’s really easy, and sometimes quite funny, to highlight the users that are being disruptive. For the same reasons the nightly news focuses on the negative, as moderators, it’s easy to get hung up on those users. Discipline yourself and your team to care about what good is done. Eventually, that will become the habit of your moderation and the community will be the better for it.


Originally published on the PRIVO Online Privacy Matters blog