Quora: What are the different ways to moderate content?

There are five different approaches to User-Generated Content (UGC) moderation:

  • Pre-moderate all content
  • Post-moderate all content
  • Crowdsourced (user reports)
  • 100% computer-automated
  • 100% human review

Each option has its merits and its drawbacks. But as with most things, the best method lies somewhere in between — a mixture of all five techniques.

Let’s take a look at the pros and cons of your different options.

Pre-moderate all content

  • Pro: You can be fairly certain that nothing inappropriate will end up in your community; you know you have human eyes on all content.
  • Con: Time and resource-consuming; subject to human error; does not happen in real time, and can be frustrating for users who expect to see their posts immediately.

Post-moderate all content

  • Pro: Users can post and experience content in real-time.
  • Con: Once risky content is posted, the damage is done; puts the burden on the community as it usually involves a lot of crowdsourcing and user reports.

Crowdsourcing/user reports

  • Pro: Gives your community a sense of ownership; people are good at finding subtle language.
  • Con: Similar to pre-moderating all content, once threatening content is posted, it’s already had its desired effect, regardless of whether it’s removed; forces the community to police itself.

100% computer-automated

  • Pro: Computers are great at identifying the worst and best content; automation frees up your moderation team to engage with the community.
  • Con: Computers aren’t great at identifying gray areas and making tough decisions.

100% human review

  • Pro: Humans are good at making tough decisions about nuanced topics; moderators become highly attuned to community sentiment.
  • Con: Humans burn out easily; not a scalable solution; reviewing disturbing content can have an adverse effect on moderator’s health and wellness.
    So, if all five options have valid pros and cons, what’s the solution? In our experience, the most effective technique uses a blend of both pre- and post-moderation, human review, and user reports, in tandem with some level of automation.

The first step is to nail down your community guidelines. Social products that don’t clearly define their standards from the very beginning have a hard time enforcing them as they scale up. Twitter is a cautionary tale for all of us, as we witness their current struggles with moderation. They launched the platform without the tools to enforce their (admittedly fuzzy) guidelines, and the company is facing a very public backlash because of it.

Consider your stance on the following:

  • Bullying: How do you define bullying? What behavior constitutes bullying in your community?
  • Profanity: Do you block all swear words or only the worst obscenities? Do you allow acronyms like WTF?
  • Hate speech: How do you define hate speech? Do you allow racial epithets if they’re used in a historical context? Do you allow discussions about religion or politics?
  • Suicide/Self-harm: Do you filter language related to suicide or self-harm, or do you allow it? Is their a difference between a user saying “I want to kill myself,” “You should kill yourself,” and “Please don’t kill yourself”?
  • PII (Personally Identifiable Information): Do you encourage users to use their real names, or does your community prefer anonymity? Can users share email addresses, phone numbers, and links to their profiles on other social networks? If your community is under-13 and in the US, you may be subject to COPPA.

Different factors will determine your guidelines, but the most important things to consider are:

  • The nature of your product. Is it a battle game? A forum to share family recipes? A messaging app?
  • Your target demographic. Are users over or under 13? Are portions of the experience age-gated? Is it marketed towards adults-only?

Once you’ve decided on community guidelines, you can start to build your moderation workflow. First, you’ll need to find the right software. There are plenty of content filters and moderation tools on the market, but in our experience, Community Sift is the best.

A high-risk content detection system designed specifically for social products, Community Sift works alongside moderation teams to automatically identify threatening UGC in real time. It’s built to detect and block the worst of the worst (as defined by your community guidelines), so your users and moderators don’t ever have to see it. There’s no need to force your moderation team to review disturbing content that a computer algorithm can be trained to recognize in a fraction of a second. Community Sift also allows you to move content into queues for human review, and automate actions (like player bans) based on triggers.

Once you’ve tuned the system to meet your community’s unique needs, you can create your workflows.

You may want to pre-moderate some content, even with a content filter running in the background. If your product is targeted at under-13 users, as an added layer of human protection, you might pre-moderate anything that the filter doesn’t classify as high-risk. Or maybe you route all content flagged as high-risk (extreme bullying, hate speech, rape threats, etc) into queues for moderators to review. For older communities, you may not require any pre-moderation and instead depend on user reports for any post-moderation work.

With an automated content detection system in place, you give your moderators their time back to do the tough, human stuff, like dealing with calls for help and reviewing user reports.

Another piece of the moderation puzzle is addressing negative user behavior. We recommend using automation, with the severity increasing with each offense. Techniques include warning users when they’ve posted high-risk content, and muting or banning their accounts for a short period. Users who persist can eventually lose their accounts. Again, the process and severity here will vary based on your product and demographic. The key is to have a consistent, well-thought-out process from the very beginning.

You will also want to ensure that you have a straightforward and accessible process for users to report offensive behavior. Don’t bury the report option, and make sure that you provide a variety of report tags to select from, like bullying, hate speech, sharing PII, etc. This will make it much easier for your moderation team to prioritize which reports they review first.

Ok, so moderation is a lot of work. It requires patience and dedication and a strong passion for community-building. But it doesn’t have to be hard if you leverage the right tools and the right techniques. And it’s highly rewarding, in the end. After all, what’s better than shaping a positive, healthy, creative, and engaged community in your social product? It’s the ultimate goal, and ultimately, it’s an attainable one — when you do it right.

 

Originally published on Quora

Want more articles like this? Subscribe to our newsletter and never miss an update!

* indicates required


Quora: What can social networks do to provide safer spaces for women?

For many women, logging onto social media is inherently dangerous. Online communities are notoriously hostile towards women, with women in the public eye—journalists, bloggers, and performers—often facing the worst abuse. But abuse is not just the province of the famous. Nearly every woman who has ever expressed an opinion online has had these experiences: Rape threats. Death threats. Harassment. Sometimes, even their children are targeted.

In the last few years, we’ve seen many well-documented cases of ongoing, targeted harassment of women online. Lindy West. Anita Sarkeesian. Leslie Jones. These women were once famous for their talent and success. Now their names are synonymous with online abuse of the worst kind.

And today we add a new woman to the list: Allie Rose-Marie Leost. An animator for EA Labs, her social media accounts were targeted this weekend in a campaign of online harassment. A blog post misidentified her as the lead animator for Mass Effect: Andromeda, and blamed her for the main character’s awkward facial animations. Turns out, Leost never even worked on Mass Effect: Andromeda. And yet she was forced to spend a weekend defending herself against baseless, crude, and sexually violent attacks from strangers.

Clearly, social media has a problem, and it’s not going away anytime soon. And it’s been happening for years.

A 2014 report by the Pew Research Center found that:

Young women, those 18-24, experience certain severe types of harassment at disproportionately high levels: 26% of these young women have been stalked online, and 25% were the target of online sexual harassment.

We don’t want to discount the harassment and abuse that men experience online, in particular in gaming communities. This issue affects all genders. However, there is an additional level of violence and vitriol directed at women. And it almost always includes threats of sexual violence. Women are also more likely to be doxxed, the practice of sharing someone else’s personal information online without their consent.

So, what can social networks do to provide safer spaces for women?

First, they need to make clear in their community guidelines that harassment, abuse, and threats are unacceptable —regardless of whether they’re directed at a man or a woman. For too long social networks have adopted a “free speech at all costs” approach to community building. If open communities want to flourish, they have to define where free speech ends, and accountability begins.

Then, social networks need to employ moderation strategies that:

Prevent abuse in real time. Social networks cannot only depend on moderators or users to find and remove harassment as it happens. Not only does that put undue stress on the community to police itself, it also ignores the fundamental problem—when a woman receives a rape threat, the damage is already done, regardless of how quickly it’s removed from her feed.

The best option is to stop abuse in real time, which means finding the right content filter. Text classification is faster and more accurate than it’s ever been, thanks to recent advances in artificial intelligence, machine learning, and Natural Language Processing (NLP).

Our expert system uses a cutting-edge blend of human ingenuity and automation to identify and filter the worst content in real time. People make the rules, and the system implements them.

When it comes to dangerous content like abuse and rape threats, we decided that traditional NLP wasn’t accurate enough. Community Sift uses Unnatural Language Processing (uNLP) to find the hidden, “unnatural” meaning. Any system can identify the word “rape,” but a determined user will always find a way around the obvious. The system also needs to identify the l337 5p34k version of r4p3, the backwards variant, and the threat hidden in a string of random text.

Take action on bad actors in real time. It’s critical that community guidelines are reinforced. Most people will change their behavior once they know it’s unacceptable. And if they don’t, social networks can take more severe action, including temporary or permanent bans. Again, automation is critical here. Companies can use the same content filter tool to automatically warn, mute, or suspend accounts as soon as they post abusive content.

Encourage users to report offensive content. Content filters are great at finding the worst stuff and allowing the best. Automation does the easy work. But there will always be content in between that requires human review. It’s essential that social networks provide accessible, user-friendly reporting tools for objectionable content. Reported content should be funnelled into prioritised queues based on content type. Moderators can then review the most potentially dangerous content and take appropriate action.

Social networks will probably never stop users from attempting to harass women with rape or death threats. It’s built into our culture, although we can hope for a change in the future. But they can do something right now—leverage the latest, smartest technology to identify abusive language in real time.

Originally published on Quora

Want more articles like this? Subscribe to our newsletter and never miss an update!

* indicates required


Four Moderation Strategies To Keep the Trolls Away

To paraphrase the immortal Charles Dickens:

It was the : ) of times, it was the : ( of times…

Today, our tale of two communities continues.

Yesterday, we tested our theory that toxicity can put a dent in your profits. We used our two fictional games AI Warzone and Trials of Serathian as an A/B test, and ran their theoretical financials through our mathematical formula to see how they performed.

And what were the results? The AI Warzone community flourished. With a little help from a powerful moderation strategy, they curbed toxicity and kept the trolls at bay. The community was healthy, and users stuck around.

Trials of Serathian paid the cost of doing nothing. As toxicity spread, user churn went up, and the company had to spend more and more on advertising to attract new users just to meet their growth target.

Today, we move from the hypothetical to the real. Do traditional techniques like crowdsourcing and muting actually work? Are there more effective strategies? And what does it mean to engineer a healthy community?

Charles Kettering famously said that “A problem well stated is a problem half-solved”; so let’s start by defining a word that gets used a lot in the industry, but can mean very different things to different people: trolls.

What is a Troll?

We’re big fans of the Glove and Boots video Levels of Trolling.

Technically these are goblins, but still. These guys again!

The crux of the video is that trolling can be silly and ultimately harmless — like (most) pranks — or it can be malicious and abusive, especially when combined with anonymity.

When we talk about trolls, we refer to users who maliciously and persistently seek to ruin other users’ experiences.

Trolls are persistent. Their goal is to hurt the community. And unfortunately, traditional moderation techniques have inadvertently created a culture where trolls are empowered to become the loudest voices in the room.

Strategies That Aren’t Working

Many social networks and gaming companies— including Trials of Serathian —take a traditional approach to moderation. It follows a simple pattern: depend on your users to report everything, give users the power to mute, and let the trolls control the conversation.

Let’s take a look at each strategy to see where it falls short.

Crowdsourcing Everything

Crowdsourcing — depending on users to report toxic chat — is the most common moderation technique in the industry. As we’ll discover later, crowdsourcing is a valuable tool in your moderation arsenal. But it can’t be your only tool.

Let’s get real — chat happens in real time. So by relying on users to report abusive chat, aren’t you in effect allowing that abuse to continue? The damage is already done by the time the abusive player is finally banned, or the chat is removed. It’s already affected its intended victim.

Imagine if you approached software bugs the same way. You have QA testers for a reason — to find the big bugs. Would you release a game that was plagued with bugs? Would you expect your users to do the heavy lifting? Of course not.

Community is no different. There will always be bugs in our software, just as there will always be users who have a bad day, say something to get a rise out of a rival, or just plain forget the guidelines. Just like there will always be users who want to watch the world burn — the ones we call trolls. If you find and remove trolls without depending on the community to do it for you, you go a long way towards creating a healthier atmosphere.

You earn your audience’s trust — and by extension their loyalty — pretty quickly when you ship a solid, polished product. That’s as true of community as it is of gameplay.

If you’ve already decided that you won’t tolerate harassment, abuse, and hate speech in your community, why let it happen in the first place?

Muting Annoying Players

Muting is similar to crowdsourcing. Again, you’ve put all of the responsibility on your users to police abuse. In a healthy community, only about 1% of users are true trolls — players who are determined to upset the status quo and hurt the community. When left unmoderated, that number can rise to as much as 20%.

That means that the vast majority of users are impacted by the behavior of the few. So why would you ask good players to press mute every time they encounter toxic behavior? It’s a band-aid solution and doesn’t address the root of the problem.

It’s important that users have tools to report and mute other players. But they cannot be the only line of defense in the war on toxicity. It has to start with you.

Letting The Trolls Win

We’ve heard this argument a lot. “Why would I get rid of trolls? They’re our best users!” If trolls make up only 1% of your user base, why are you catering to a tiny minority?

Good users — the kind who spend money and spread the word among their friends — don’t put up with trolls. They leave, and they don’t come back.

Simon Fraser University’s Reddit study proved that a rise in toxicity always results in slower community growth. Remember our formula in yesterday’s post? The more users you lose, the more you need to acquire, and the smaller your profits.

Trust us — there is a better way.

Strategies That Work

Our fictional game AI Warzone took a new approach to community. They proactively moderated chat with the intention to shape a thriving, safe, and healthy community using cutting-edge techniques and the latest in artificial and human intelligence.

The following four strategies worked for AI Warzone — and luckily, they work in the real world too.

Knowing Community Resilience

One of the hardest things to achieve in games is balance. Developers spend tremendous amounts of time, money, and resources ensuring that no one dominant strategy defines gameplay. Both Trials of Serathian and AI Warzone spent a hefty chunk of development time preventing imbalance in their games.

The same concept can be applied to community dynamics. In products where tension and conflict are built into gameplay, doesn’t it make sense to ensure that your community isn’t constantly at each other’s throats? Some tension is good, but a community that is always at war can hardly sustain itself.

It all comes down to resilience — how much negativity can a community take before it collapses?

Without moderation, players in battle games like AI Warzone and Trials of Serathian are naturally inclined to acts — and words — of aggression. Unfortunately, that’s also true of social networks, comment sections, and forums.

The first step to building an effective moderation strategy is determining your community’s unique resilience level. Dividing content into quadrants can help:

  • High Risk, High Frequency
  • High Risk, Low Frequency
  • Low Risk, High Frequency
  • Low Risk, Low Frequency

 

Where does your community draw the line?

Younger communities will always have a lower threshold for high-risk chat. That means stricter community guidelines with a low tolerance for swearing, bullying, and other potentially dangerous activity.

The older the community gets, the stronger its resilience. An adult audience might be fine with swearing, as long as it isn’t directed at other users.

Once you know what your community can handle, it’s time to look closely at your userbase.

Dividing Users Based on Behavior

It’s tempting to think of users as just a collection of usernames and avatars, devoid of personality or human quirks. But the truth is that your community is made up of individuals, all with different behavior patterns.

You can divide this complex community into four categories based on behavior.

 

The four categories of user behavior.

Let’s take a closer look at each risk group:

  • Boundary testers: High risk, low frequency offenders. These players will log in and instantly see what they can get away with. They don’t start out as trolls — but they will upset your community balance if you let them get away with it.
  • Trolls: High risk, high frequency offenders. As we’ve discussed, these players represent a real threat to your community’s health. They exist only to harass good players and drive them away.
  • Average users/don’t worry: Low risk, low frequency offenders. These players usually follow community guidelines, but they have a bad day now and then. They might take their mood out on the rest of the community, mostly in a high-stress situation.
  • Spammers: Low risk, high frequency offenders. Annoying and tenacious, but they pose a minor threat to the community.

Once you’ve divided your users into four groups, you can start figuring out how best to deal with them.

Taking Action Based on Behavior

Each of the four user groups should be treated differently. Spammers aren’t trolls. And players who drop an f-bomb during a heated argument aren’t as dangerous as players who frequently harass new users.

 

How to deal with different kinds of behavior.

Filter and Ban Trolls

Your best option is to deal with trolls swiftly and surely. Filter their abusive chat, and ban their accounts if they don’t stop. Set up escalation queues for potentially dangerous content like rape threats, excessive bullying, and threats, then let your moderation team review them and take action.

Warn Boundary Testers

A combination of artificial intelligence and human intelligence works great for these users. Set up computer automation to warn and/or mute them in real time. If you show them that you’re serious about community guidelines early on, they are unlikely to re-offend.

Crowdsource Average Users

Crowdsourcing is ideal for this group. Content here is low risk and low frequency, so if a few users see it, it’s unlikely that the community will be harmed. Well-trained moderators can review reported content and take action on users if necessary.

Mute Spammers

There are a couple of options here. You can mute spammers and let them know they’ve been muted. Or, for a bit of fun try a stealth ban. Let them post away, blissfully unaware that no one in the room can see what they’re saying.

Combining Artificial and Human Intelligence

The final winning strategy? Artificial intelligence (AI) and computer automation are smarter, more advanced, and more powerful than they’ve ever been. Combine that with well-trained and thoughtful human teams, and you have the opportunity to bring moderation and community health to the next level.

A great real world example of this is Twitch. In December 2016 they introduced a new tool called AutoMod.

It allows individual streamers to select a unique resilience level for their own channel. On a scale of 1–4, streamers set their tolerance level for hate speech, bullying, sexual language, and profanity. AutoMod reviews and labels each message for the above topics. Based on the streamer’s chosen tolerance level, AutoMod holds the message back for moderators to review, then approve or reject.

Reactions to AutoMod were resoundingly positive:

Positive user responses and great press? We hope the industry is watching.

The Cost of Doing Nothing

So, what have Trials of Serathian and AI Warzone taught us? First, we really, really need someone to make these games. Like seriously. We’ll wait…

 

This is as far as we got.

 

We learned that toxicity increases user churn, that traditional moderation techniques don’t work, and that community resilience is essential. We learned that trolls can impact profits in surprising ways.

In the end, there are three costs of doing nothing:

  • Financial. Money matters.
  • Brand. Reputation matters.
  • Community. People matter.

Our fictional friends at AI Warzone found a way to keep the trolls away — and keep profits up. They carefully considered how to achieve community balance, and how to build resilience. They constructed a moderation strategy that divided users into four distinct groups and dealt with each group differently. They consistently reinforced community guidelines in real-time. And in the process, they proved to their community that a troll-free environment doesn’t diminish tension or competition. Quite the opposite — it keeps it alive and thriving.

Any community can use the four moderation strategies outlined here, whether it’s an online game, social sharing app, or comments section, and regardless of demographic. And as we’ve seen with Twitch’s AutoMod, communities are welcoming these strategies with open arms and open minds.

One final thought:

Think of toxicity as a computer virus. We know that online games and social networks attract trolls. And we know that if we go online without virus protection, we’re going to get a virus. It’s the nature of social products, and the reality of the internet. Would you deliberately put a virus on your computer, knowing what’s out there? Of course not. You would do everything in your power to protect your computer from infection.

By the same token, shouldn’t you do everything in your power to protect your community from infection?

Want more? Check out the rest of the series:

At Two Hat Security, we use Artificial Intelligence to protect online communities from high-risk content. Visit our website to learn more.

Just getting started? Growing communities deserve to be troll-free, too.

Originally published on Medium

Want more articles like this? Subscribe to our newsletter and never miss an update!

* indicates required


Doing The Math: Does Moderation Matter?

Welcome back to our series about the cost of doing nothing. Feeling lost? Take a minute to read the first two posts, The Other Reason You Should Care About Online Toxicity and A Tale of Two Online Communities.

Today we test our theory: when social products do nothing about toxicity, they lose money. Using AI Warzone and Trials of Serathian (two totally-made-up-but-awesome online games) as examples, we’ll run their theoretical financials through our mathematical formula to see how they perform.

Remember — despite being slightly different games, AI Warzone and Trials of Serathian have similar communities. They’re both competitive MMOs, are targeted to a 13+ audience, and are predominantly male.

But they differ in one key way. Our post-apocalyptic robot battle game AI Warzone proactively moderates the community, and our epic Medieval fantasy Trials of Serathian does nothing.

Let’s take a look at the math.

The Math of Toxicity

In 2014, Jeffrey Lin from Riot Games presented a stat at GDC that turned the gaming world on its head. According to their research, users who experience toxicity are 320% more likely to quit. That’s huge. To put that number in further perspective, consider this statistic from a 2015 study:

52% of MMORPG players reported that they had been cyber-victimized, and 35% said they had committed cyberbullying themselves.

A majority of players have experienced toxicity. And a surprising amount of them admit to engaging in toxic behavior.

We’ll take those numbers as our starting point. Now, let’s add a few key facts — based on real data — about our two fictional games to fill in the blanks:

  • Each community has 1 million users
  • Each community generates $13.51 in revenue from each user
  • The base monthly churn rate for an MMO is 5%, regardless of moderation
  • According to the latest Fiksu score, it costs $2.78 to acquire a new user
  • They’ve set a 10% Month over Month growth target

So far, so good — they’re even.

Now let’s add toxicity into the mix.

Even with a proactive moderation strategy in place, we expect AI Warzone users to experience about 10% toxicity. It’s a complex battle game where tension is built into the game mechanic, so there will be conflict. Users in Trials of Serathian — our community that does nothing to mitigate that tension— experience a much higher rate of toxicity, at 30%.

Using a weighted average, we’ll raise AI Warzone’s churn rate from 5% to 6.6%. And we’ll raise Trials of Serathian to 9.8%.

Taking all of these numbers into account, we can calculate the cost of doing nothing using a fairly simple formula, where U is total users, and U¹ is next month’s total users:

U¹ = U — (U * Loss Rate) + Acquired through Advertising

Using our formula to calculate user churn and acquisition costs, let’s watch what happens in their first quarter.

Increased User Churn = Increased Acquisition Costs

In their first quarter, AI Warzone loses 218,460 users. And to meet their 10% growth rate target, they spend $1,527,498 to acquire more.

Trials of Serathian, however, loses 324,380 users (remember, their toxicity rate is much higher). And they have to spend $1,821,956 to acquire more users to meet the same growth target.

Let’s imagine that AI Warzone spends an additional $60,000 in that first quarter on moderation costs. Even with the added costs, they’ve still saved $234,457 in profits.

That’s a lot. Not enough to break a company, but enough to make executives nervous.

Let’s check back in at the end of the year.

The Seven Million Dollar Difference

We gathered a few key stats from our two communities.

When Trials of Serathian does nothing, their EOY results are:

  • Churn rate: 9.8%
  • User Attrition: -8,672,738
  • Total Profits (after acquisition costs): $39,784,858

And when AI Warzone proactively moderates, their EOY results are:

  • Churn rate: 6.6%
  • User Attrition: -5,840,824
  • Total Profits (after acquisition costs): $47,177,580

AI Warzone deals with toxicity in real time and loses fewer users in the process — by nearly 3 million. They can devote more of their advertising budget to acquiring new users, and their userbase grows exponentially. The end result? They collect $7,392,722 more in profits than Trials of Serathian, who does nothing.

Userbase growth with constant 30% revenue devoted to advertising.

And what does AI Warzone do with $7 million more in revenue? Well, they develop and ship new features, fix bugs, and even start working on their next game. AI Warzone: Aftermath, anyone?

These communities don’t actually exist, of course. And there are a multitude of factors that can effect userbase growth and churn rate. But it’s telling, nonetheless.

And there are real-world examples, too.

Sticks and Stones

Remember the human cost that we talked about earlier? Money matters — but so do people.

We mentioned Twitter in The Other Reason You Share About Online Toxicity. Twitter is an easy target right now, so it’s tempting to forget how important the social network is, and how powerful it can be.

Twitter is a vital platform for sharing new ideas and forging connections around the globe. Crucially, it’s a place where activists and grassroots organizers can assemble and connect with like-minded citizens to incite real political change. The Arab Spring in 2011 and the Women’s March in January of this year are only two examples out of thousands.

But it’s become known for the kind of abuse that Lily Allen experienced recently — and for failing to deal with it adequately. Twitter is starting to do something — over the last two years, they’ve released new features that make it easier to report and block abusive accounts. And earlier this week even more new features were introduced. The question is, how long can a community go without doing something before the consequences catch up to them?

Twitter’s user base is dwindling, and their stock is plummeting, in large part due to their inability to address toxicity. Can they turn it around? We hope so. And we have some ideas about how they can do it (stay tuned for tomorrow’s post).

What Reddit Teaches us About Toxicity and Churn

Reddit is another real-world example of the cost of doing nothing.

In collaboration with Simon Fraser University, we provided the technology to conduct an independent study of 180 subreddits, using a public Reddit data set. In their academic paper “The Impact of Toxic Language on the Health of Reddit Communities,” SFU analyzes the link between toxicity and community growth.

They found a correlation between an increase in toxic posts and a decrease in community growth. Here is just one example:

The blue line shows high-risk posts decreasing; the red line shows the corresponding increase in community growth.

It’s a comprehensive study and well worth your time. You can download the whitepaper here.

What Now?

Using our formula, we can predict how a proactive moderation strategy can impact your bottom line. And using our two fictional games as a model, we can see how a real-world community might be affected by toxicity.

AI Warzone chose to engineer a healthy community — and Trials of Serathian chose to do nothing.

But what does it mean to “engineer a healthy community”? And what strategies can you leverage in the real world to shape a troll-free community?

In tomorrow’s post, we examine the moderation techniques that AI Warzone used to succeed.

Spoiler alert: They work in real games, too.

Originally published on Medium

Want more articles like this? Subscribe to our newsletter and never miss an update!

* indicates required


A Tale of Two Online Communities

What happens when two games with similar communities take two very different approaches to chat?

Welcome to the end of the world. We have robots!

Picture this:

It’s dark. The faint green glow of a computer screen lights your field of vision. You swipe left, right, up, down, tracing the outline of a floating brain, refining a neural network, making connections. Now, an LED counter flashes red to your right, counting down from ten. You hear clanking machinery and grinding cogs in the distance. To your left, a new screen appears: a scrap yard, miles of twisted, rusty metal. The metal begins to move, slowly. It shakes itself like a wet dog. The counter is closer to zero. Urgent voices, behind, below, above you:

“NOW.”

“YOUR TURN.”

“DON’T MESS IT UP!”

“LET’S DO THIS!”

“YOU GOT THIS!”

Welcome to AI Warzone, a highly immersive, choice-driven game in which players create machines that slowly gain self-awareness, based on user’s key moral decisions. Set in 3030, machines battle each other in the industrial ruins of Earth. You create and join factions with other users that can help or hinder their progress, leading to — as we see above — a tense atmosphere rife with competition. A complex game with a steep learning curve, AI Warzone is not for the faint of heart.

Welcome to the past. We have dragons!

Now, imagine this:

You stand atop a great rocky crag, looking down on a small village consisting of a few thatch-roofed cottages. A motley crew stands behind you; several slope-browed goblins, the towering figure of a hooded female Mage, and two small dragons outfitted with rough-hewn leather saddles.

You hold a gleaming silver sword in your hand. A group of black-robed men and women, accompanied by trolls and Mages, approach the village, some on dragon-back, others atop snarling wolves. Some of them shout, their voices ringing across the bleak landscape. Almost time, you whisper, lifting your broadsword in the air and swinging it, so it shines in the pale sun. Almost time

“FUCK YOU FAGGOT,” you hear from far below.

“kill yurself,” a goblin behind you says.

“Show us yr tits!” yells one of the black-robed warriors in the village.

“Oh fuck this,” says the hooded female Mage. She disappears abruptly.

This is life in Trials of Serathian, an MMO set in the Medieval world of Haean. Users can play on the Dawn or Dusk side. On the Dawn side, they can choose to be descendants of the famed warrior Serathian, Sun Mages, or goblins; on the Dusk side, they can play as descendants of the infamous warrior Lord Warelind, Moon Mages, or trolls. Dawn and Dusk clans battle for the ultimate goal — control of Haean.

Two Communities, Two Approaches to Chat

Spoiler alert: AI Warzone and Trials of Serathian aren’t real games. We cobbled together elements from existing games to create two typical gaming communities.

Like most products with social components, both AI Warzone and Trials of Serathian struggle with trolls. And not the mythical, Tolkien-esque kind — the humans-behaving-badly-online kind.

In both games, players create intense bonds with their clan or faction, since they are dependent on fellow players to complete challenges. When players make mistakes, both games have seen incidents of ongoing harassment in retaliation. Challenges are complex, and new users are subject to intense harassment if they don’t catch on immediately.

Second spoiler alert: Only one of these games avoids excessive user churn. Only one of these games has to spend more and more out of their advertising budget to attract new users. And only one of these games nurtures a healthy, growing community that is willing to follow the creators — that’s you — to their next game. The difference? One of these games took steps to deal with toxicity, and the other did nothing.

In tomorrow’s post, we take a deep dive into the math. Remember our “math magic” from The Other Reason You Should Care About Online Toxicity? We’re going to put it to the test.

Originally published on Medium

Want more articles like this? Subscribe to our newsletter and never miss an update!

* indicates required


The Other Reason You Should Care About Online Toxicity

In these divisive and partisan times, there seems to be one thing we can all agree on, regardless of party lines — online toxicity sucks.

Earlier this week Lily Allen announced that she was leaving Twitter. When you read this recent thread about her devastating early labor in 2010, it’s not hard to see why:

Does anyone want their social feeds to be peppered with hate speech or threats? Does anyone like logging into their favorite game and being greeted with a barrage of insults? And does anyone want to hear another story about cyberbullying gone tragically, fatally wrong? And yet we allow it to happen, time and time again.

The human cost of online abuse is obvious. But there’s another hidden cost when you allow trolls and toxicity to flourish in your product.

Toxicity is poison — and it will eat away at your profits.

Every company faces a critical decision when creating a social network or online game. Do you take steps to deal with toxicity from the very beginning? Do you proactively moderate the community to ensure that everyone plays nice?

Or — do you do nothing? Do you launch your product and hope for the best? Maybe you build a Report feature so users can report abuse or harassment. Maybe you build a Mute button so players can ignore other players who post offensive content. Sure, it’s a traditional approach to moderation, but does it really work?

If you’re not sure what to choose, you’re not alone. The industry has grappled with these questions for years now.

We want to make it an easy choice. We want it to be a no-brainer. We want doing something to be the industry standard. We believe that chat is a game mechanic like any other, and that community balance is as important as game balance.

When you choose to do something, not only do you build the framework for a healthy, growing, loyal community — you’ll also save yourself a bunch of money in the process.

In this series of posts, we’ll introduce two fictional online games, AI Warzone and Trials of Serathian. We’ll people them with communities, each a million users strong. One game will choose to proactively moderate the community, and the other will do nothing. Think of it as an A/B test.

Then, armed with real-world statistics, our own research, and a few brilliant data scientists, we’ll perform a bit of math magic. We’ll toss them all into a hat (minus the data scientists; they get cranky when we try to put them in hats), say the magic words, wave our wands, and — tada! — pull out a formula. We’ll run both games’ profits, user churn, and acquisition costs through our formula to determine, once and for all, the cost of doing nothing.

But first, let’s have a bit of fun and delve into our fictional communities. Who is Serathian and why is he on trial? And what kind of virtual battles can one expect in an AI Warzone?

Join us tomorrow for our second installment in this four-part series: A Tale of Two Online Communities.

 

Originally published on Medium

Want more articles like this? Subscribe to our newsletter and never miss an update!

* indicates required


To Mark Zuckerberg

Re: Building Global Communities

“There are billions of posts, comments and messages across our services each day, and since it’s impossible to review all of them, we review content once it is reported to us. There have been terribly tragic events — like suicides, some live streamed — that perhaps could have been prevented if someone had realized what was happening and reported them sooner. There are cases of bullying and harassment every day, that our team must be alerted to before we can help out. These stories show we must find a way to do more.” — Mark Zuckerberg

This is hard.

I built a company (Two Hat Security) that’s also contracted to process 4 billion chat messages, comments, and photos a day. We specifically look for high-risk content in real-time, such as bullying, harassment, threats of self-harm, and hate speech. It is not easy.

“There are cases of bullying and harassment every day, that our team must be alerted to before we can help out. These stories show we must find a way to do more.”

I must ask — why wait until cases get reported?

If you wait for a report to be filed by someone, haven’t they already been hurt? Some things that are reported can never be unseen. Some like Amanda Todd cannot have that image retracted. Others post when they are enraged or drunk and the words like air cannot be taken back. The saying goes, “What happens in Vegas stays in Vegas, Facebook, Twitter and Instagram forever” so maybe some things should never go live. What if you could proactively create a safe global community for people by preventing (or pausing) personal attacks in real-time instead?

This, it appears, is key to creating the next vision point.

“How do we help people build an informed community that exposes us to new ideas and builds common understanding in a world where every person has a voice?”

One of the biggest challenges to free speech online in 2017 is that we allow a small group of toxic trolls the ‘right’ to shut up a larger group of people. Ironically, these users’ claim to free speech often ends up becoming hate speech and harassment, destroying the opportunity for anyone else to speak up, much like bullies in the lunchroom. Why would someone share their deepest thoughts if others would just attack them? Instead, the dream for real conversations gets lost beneath a blanket of fear. Instead, we get puppy pictures, non-committal thumbs up, and posts that are ‘safe.’ If we want to create an inclusive community, people need to be able to share ideas and information online without fear of abuse from toxic bullies. I applaud your manifesto, as it calls this out, and calls us all to work together to achieve this.

But how?

Fourteen years ago, we both set out to change the social network of our world. We were both entrepreneurial engineers, hacking together experiments using the power of code. It was back in the days of MySpace and Friendster and the later Orkut. We had to browse to every single friend we had on MySpace just to see if they wrote anything new. To solve this I created myTWU — a social stream of all the latest blogs and photos of fellow students, alumni and sports teams on our internal social tool. Our office was in charge of building online learning but we realized that education is not about ideas but community. It was not enough to dump curriculum online for independent study, people needed places of belonging.

A year later “The Facebook” came out. You reached beyond the walls of one University and over time opened it to the world.

So I pivoted. As part of our community, we had a little chat room where you could waddle around and talk to others. It was a skin of a little experiment my brother was running. He was caught by surprise when it grew to a million users which showed how users long for community and places of belonging. In those days chat rooms were the dark part of the web and it was nearly impossible to keep up with the creative ways users tried to hurt each other.

So I was helping my brother code the safety mechanisms for his little social game. That little social game grew to become a global community with over 300 million users and Disney bought it back in 2007. I remember huddling in my brother’s basement rapidly building the backend to fix the latest trick to get around the filter. Club Penguin was huge.

After a decade of kids breaking the filter and building tools to moderate the millions upon millions of user reports, I had a breakthrough. By then I was security at Disney, with the job to hack everything with a Mouse logo on it. In my training, we learned that if someone DDoS’es a network or tries to break the system, you find a signature of what they are doing and turn up the firewall against that.

“What if we did that with social networks and social attacks?” I thought.

I’ve spent the last five years building an AI system with signatures and firewalls as it relates to social content. As we process billions of messages with Community Sift, we build reputation scores in real-time. We know who the trolls are — they leave digital signatures everywhere they go. Moreover, I can adjust the AI to turn up the sensitivity only where it counts. In so doing we drastically dropped false positives, opened communication with the masses while detecting the highest risk when it matters.

I had to build whole new AI algorithms to do this since traditional methods only hit 90–95% percent. That is great for most AI tasks but when it comes to cyber-bullying, hate-speech, and suicide the stakes are too high for the current state of art in NLP.

“To prevent harm, we can build social infrastructure to help our community identify problems before they happen. When someone is thinking of suicide or hurting themselves, we’ve built infrastructure to give their friends and community tools that could save their life.”

Since Two Hat is a security company, we are uniquely positioned to prevent harm with the largest vault of high-risk signatures, like grooming conversations and CSAM (child sexual abuse material.) In collaboration with our partners at the RCMP (Royal Canadian Mounted Police), we are developing a system to predict and prevent child exploitation before it happens to complement the efforts our friends at Microsoft have made with PhotoDNA. With CEASE.ai, we are training AI models to find CSAM, and have lined up millions of dollars of Ph.D. research to give students world-class experience in working with our team.

“Artificial intelligence can help provide a better approach. We are researching systems that can look at photos and videos to flag content our team should review. This is still very early in development, but we have started to have it look at some content, and it already generates about one-third of all reports to the team that reviews content for our community.”

It is incredible what deep learning has accomplished in the last few years. And although we have been able to see near perfect recall in finding pornography with our current work there is an explosion of new topics we are training on. Further, the subtleties you outline are key.

I look forward to two changes to resolve this:

  1. I call on networks to trust that their users have resilience. It is not imperative to find everything just the worst. If all content can be sorted by maybe bad to absolutely bad we can then draw a line in the sand and say these cannot be unseen and these the community will find. In so doing we don’t have to wait for technology to reach perfection nor wait for users to report things we already know are bad. Let computers do what they do well and let humans deal with the rest.
  2. I call on users to be patient. Yes, sometimes in our ambition to prevent harm we may find a Holocaust photo. We know this is terrible but we ask for your patience. Computer vision is like a child still learning. A child that sees that image for the first time is still deeply impacted and is concerned. Join us to report these problems and to help train the system to mature and discern.

However, you are right that many more strides need to happen to get this to where it needs to be. We need to call on the world’s greatest thinkers. Of all the hard problems to solve, our next one is child pornography (CSAM). Some things cannot be unseen. There are things when seen re-victimize over and over again. We are the first to gain access to hundreds of thousands of CSAM material and train deep learning models on them with CEASE.ai. We are pouring millions of dollars and putting the best minds on this topic. It is a problem that must be solved.

And before I move on I want to give a shout out to your incredible team whom I have had the chance to volunteer at hack-a-thons with and who have helped me think through how to get this done. Your company commitment to social good is outstanding and they have helped many other companies and not for profits.

“The guiding principles are that the Community Standards should reflect the cultural norms of our community, that each person should see as little objectionable content as possible, and each person should be able to share what they want while being told they cannot share something as little as possible. The approach is to combine creating a large-scale democratic process to determine standards with AI to help enforce them.”

That is cool. I have got a couple of the main pieces needed for that completed if you need them.

“The idea is to give everyone in the community options for how they would like to set the content policy for themselves. Where is your line on nudity? On violence? On graphic content? On profanity?”

I had the chance to swing by Twitter 18 months ago. I took their sample firehose and have been running it through our system. We label each message across 1.8 million of our signatures, then put together a quick demo of what it would be like if you could turn off the toxicity on Twitter. It shows low, medium, and high-risk. I would not expect to see anything severe on there, as they have recently tried to clean it up.

My suggestion to Twitter was to allow each user the option to choose what they want to see. The suggestion was that a global policy gets rid of clear infractions against terms of use for content that can never be unseen such as gore or CSAM. After the global policy is applied, you can then let each user choose their own risk and tolerance levels.

We are committed to helping you and the Facebook team with your mission to build a safe, supportive, and inclusive community. We are already discussing ways we can help your team, and we are always open to feedback. Good luck on your journey to connect the world, and hope we cross paths next time I am in the valley.

Sincerely,
Chris Priebe
CEO, Two Hat Security

 

Originally published on Medium 

Can Community Sift Outperform Google Jigsaw’s Conversation AI in the War on Trolls?

There are some problems in the world that everyone should be working on, like creating a cure for cancer and ensuring that everyone in the world has access to clean drinking water.

On the internet, there is a growing epidemic of child exploitative content, and it is up to us as digital service providers to protect users from illegal and harmful content. Another issue that’s been spreading is online harassment — celebrities, journalists, game developers, and many others face an influx of hate speech and destructive threats on a regular basis.

Harassment is a real problem — not a novelty startup idea like ‘the Uber for emergency hairstylists.’ Cyberbullying and harassment are problems that affect people in real-life, causing them psychological damage, trauma, and sometimes even causing people to self-harm or take their own lives. Young people are particularly susceptible to this, but so are many adults. There is no disconnect between our virtual lives and our real lives in our interconnected, mesh-of-things society. Our actual reality is already augmented.

Issues such as child exploitation, hate speech, and harassment are problems we should be solving together.

We are excited to see that our friends at Alphabet (Google) are publicly joining the fray, taking proactive action against harassment. The internal incubator formerly known as Google Ideas will now be known as Jigsaw, with a mission to make people in the world safer. It’s encouraging to see that they are tackling the same problems that we are — countering extremism and protecting people from harassment and hate speech online.

Like Jigsaw, we also employ a team of engineers, scientists, researchers, and designers from around the world. And like the talented folks at Google, we also collaborate to solve the really tough problems using technology.

There are also some key differences in how we approach these problems!

Since the Two Hat Security team started by developing technology solutions for child-directed products, we have unique, rich, battle-tested experience with conversational subversion, grooming, and cyberbullying. We’re not talking about sitting on the sidelines here — we have hands-on experience protecting kids’ communities from high-risk content and behaviours.

Our CEO, Chris Priebe, helped code and develop the original safety and moderation solutions for Club Penguin, the children’s social network with over 300 million users acquired by The Walt Disney Company in 2007. Chris applied what he’s learned over the past 20 years of software development and security testing to Community Sift, our flagship product.

At Two Hat, we have an international, native-speaking team of professionals from all around the world — Italy, France, Germany, Brazil, Japan, India, and more. We combine their expertise with computer algorithms to validate their decisions, increase efficiency, and improve future results. Instead of depending on crowdsourced results (which require that users are forced to see a message
before they can report it), we focus on enabling platforms to sift out messages before they are deployed.

Google vs. Community Sift — Test Results

In a recent article published in Wired, writer Andy Greenberg put Google Jigsaw’s Conversation AI to the test. As he rightly stated in his article, “Conversation AI, meant to curb that abuse, could take down its own share of legitimate speech in the process.” This is exactly the issue we have in maintaining Community Sift — ensuring that we don’t take down legitimate free speech in the process of protecting users from hate speech.

We thought it would be interesting to run the same phrases featured in the Wired article through Community Sift to see how we’re measuring up. After all, the Google team sets a fairly high bar when it comes to quality!

From these examples, you can see that our human-reviewed language signatures provided a more nuanced classification to the messages than the artificial intelligence did. Instead of starting with artificial intelligence assigning risk, we bring conversation trends and human professionals to the forefront, then allow the A.I. to learn from their classifications.

Here’s a peak behind the scenes at some of our risk classifications.

We break apart sentences into phrase patterns, instead of just looking at the individual words or the phase on its own. Then we assign other labels to the data, such as the user’s reputation, the context of the conversation, and other variables like vertical chat to catch subversive behaviours, which is particularly important for child-directed products.

Since both of the previous messages contain a common swearword, we need to classify that to enable child-directed products to filter this out of their chat. However, in this context, the message is addressing another user directly, so it is at higher risk of escalation.

This phrase, while seemingly harmless to an adult audience, contains some risk for younger demographics, as it could be used inappropriately in some contexts.

As the Wired writer points out in his article, “Inside Google’s Internet Justice League and Its AI-Powered War on Trolls”, this phrase is often a response from troll victims to harassment behaviours. In our system, this is a lower-risk message.

The intention of our classification system is to empower platform owners to make informed and educated decisions about their content. Much like how the MPAA rates films or the ESRB rates video games, we rate user-generated content to empower informed decision-making.

*****

Trolls vs. Regular Users

We’re going to go out on a limb here and say that every company cares about how their users are being treated. We want customers to be treated with dignity and respect.

Imagine you’re the owner of a social platform like a game or app. If your average cost of acquisition sits at around $4, then it will cost you a lot of money if a troll starts pushing people away from your platform.

Unfortunately, customers who become trolls don’t have your community’s best interests or your marketing budget in mind — they care more about getting attention… at any cost. Trolls show up on a social platform to get the attention they’re not getting elsewhere.

Identifying who these users are is the first step to helping your community, your product, and even the trolls themselves. Here at Two Hat, we like to talk about our “Troll Performance Improvement Plans” (Troll PIPs), where we identify who your top trolls are, and work on a plan to give them a chance to reform their behaviour before taking disciplinary action. After all, we don’t tolerate belligerent behaviour or harassment in the workplace, so why would we tolerate it within our online communities?

Over time, community norms set in, and it’s difficult to reshape those norms. Take 4chan, for example. While this adult-only anonymous message board has a team of “volunteer moderators and janitors”, the site is still regularly filled with trolling, flame wars, racism, grotesque images, and pornography. And while there may be many legitimate, civil conversations lurking beneath the surface of 4chan, the site has earned a reputation that likely won’t change in the eyes of the public.

Striking a balance between free speech while preventing online harassment is tricky, yet necessary. If you allow trolls to harass other users, you are inadvertently enabling someone to cause another psychological harm. However, if you suppress every message, you’re just going to annoy users who are just trying to express themselves.

*****

We’ve spent the last four years improving and advancing our technology to help make the internet great again. It’s a fantastic compliment to have a company as amazing as Google jumping into the space we’ve been focused on for so long, where we’re helping social apps and games like Dreadnought, PopJam, and ROBLOX.

Having Google join the fray shows that harassment is a big problem worth solving, and it also helps show that we have already made some tremendous strides to pave the way for them. We have had conversations with the Google team about the Riot Games’ experiments and learnings about toxic behaviours in games. Seeing them citing the same material is a great compliment, and we are honored to welcome them to the battle against abusive content online.

Back at Two Hat, we are already training the core Community Sift system on huge data sets — we’re under contract to process four billion messages a day across multiple languages in real-time. As we all continue to train artificial intelligence to recognize toxic behaviors like harassment, we can better serve the real people who are using these social products online. We can empower a freedom of choice for users to allow them to choose meaningful settings, like opting out of rape threats if they so choose. After all, we believe a woman shouldn’t have to self-censor herself, questioning whether that funny meme will result in a rape or death threat against her family. We’d much rather enable people to censor out inappropriate messages from those special kind of idiots who threaten to rape women.

While it’s a shame that we have to develop technology to curb behaviours that would be obviously inappropriate (and in some cases, illegal) in real-life, it is encouraging to know that there are so many groups taking strides to end hate speech now. From activist documentaries and pledges like The Bully Project, inspiring people to stand up against
bullying, to Alphabet/Google’s new Jigsaw division, we are on-track to start turning the negative tides in a new direction. And we are proud to be a part of such an important movement.