Affected by the Smyte Closure? Two Hat Security Protects Communities From Abusive Comments and Hate Speech

Statement from CEO and founder Chris Priebe: 

As many of you know, Smyte was recently acquired by Twitter and its services are no longer available, affecting many companies in the industry.

As CEO and founder of Two Hat Security, creators of the chat filter and content moderation solution Community Sift, I would like to assure both our valued customers and the industry at large that we are, and will always remain, committed to user protection and safety. For six years we have worked with many of the largest gaming and social platforms in the world to protect their communities from abuse, harassment, and hate speech.

We will continue to serve our existing clients and welcome the opportunity to work with anyone affected by this unfortunate situation. Our mandate is and will always be to protect the users on behalf of all sites. We are committed to uninterrupted service to those who rely on us.

If you’re in need of a filter to protect your community, we can be reached at hello@twohat.com.

Quora: Does it make sense for media companies to disallow comments on articles?

It’s not hard to understand why more and more media companies are inclined to turn off comments. If you’ve spent any time reading the comments section on many websites, you’re bound to run into hate speech, vitriol, and abuse. It can be overwhelming and highly unpleasant. But the thing is, even though it feels like they’re everywhere, hate speech, vitriol, and abuse are only present in a tiny percentage of comments. Do the math, and you find that thoughtful, reasonable comments are the norm. Unfortunately, toxic voices almost always drown out healthy voices.

But it doesn’t have to be that way.

The path of least resistance is tempting. It’s easy to turn off comments — it’s a quick fix, and it always works. But there is a hidden cost. When companies remove comments, they send a powerful message to their best users: Your voice doesn’t matter. After all, users who post comments are engaged, they’re interested, and they’re active. If they feel compelled to leave a comment, they will probably also feel compelled to return, read more articles, and leave more comments. Shouldn’t media companies cater to those users, instead of the minority?

Traditionally, most companies approach comment moderation in one of two ways, both of which are ineffective and inefficient:

  • Pre-moderation. Costly and time-consuming, pre-moderating everything requires a large team of moderators. As companies scale up, it can become impossible to review every comment before it’s posted.
  • Crowdsourcing. A band-aid solution that doesn’t address the bigger problem. When companies depend on users to report the worst content, they force their best users to become de facto moderators. Engaged and enthusiastic users shouldn’t have to see hate speech and harassment. They should be protected from it.

I’ve written before about techniques to help build a community of users who give high-quality comments. The most important technique? Proactive moderation.

My company Two Hat Security has been training and tuning AI since 2012 using multiple unique data sets, including comments sections, online games, and social networks. In our experience, proactive moderation uses a blend of AI-powered automation, human review, real-time user feedback, and crowdsourcing.

It’s a balancing act that combines what computers do best (finding harmful content and taking action on users in real-time) and what humans do best (reviewing and reporting complex content). Skim the dangerous content — things like hate speech, harassment, and rape threats — off the top using a finely-tuned filter that identifies and removes it in real-time. That way no one has to see the worst comments. You can even customize the system to warn users when they’re about to post dangerous content. Then, your (much smaller and more efficient) team of moderators can review reported comments, and even monitor comments as they’re posted for anything objectionable that slips through the cracks.

Comments section don’t have to be the darkest places on the internet. Media companies have a choice — they can continue to let the angriest, loudest, and most hateful voices drown out the majority, or they can give their best users a platform for discussion and debate.

Originally published on Quora

Want more articles like this? Subscribe to our newsletter and never miss an update!

* indicates required


Can Community Sift Outperform Google Jigsaw’s Conversation AI in the War on Trolls?

There are some problems in the world that everyone should be working on, like creating a cure for cancer and ensuring that everyone in the world has access to clean drinking water.

On the internet, there is a growing epidemic of child exploitative content, and it is up to us as digital service providers to protect users from illegal and harmful content. Another issue that’s been spreading is online harassment — celebrities, journalists, game developers, and many others face an influx of hate speech and destructive threats on a regular basis.

Harassment is a real problem — not a novelty startup idea like ‘the Uber for emergency hairstylists.’ Cyberbullying and harassment are problems that affect people in real-life, causing them psychological damage, trauma, and sometimes even causing people to self-harm or take their own lives. Young people are particularly susceptible to this, but so are many adults. There is no disconnect between our virtual lives and our real lives in our interconnected, mesh-of-things society. Our actual reality is already augmented.

Issues such as child exploitation, hate speech, and harassment are problems we should be solving together.

We are excited to see that our friends at Alphabet (Google) are publicly joining the fray, taking proactive action against harassment. The internal incubator formerly known as Google Ideas will now be known as Jigsaw, with a mission to make people in the world safer. It’s encouraging to see that they are tackling the same problems that we are — countering extremism and protecting people from harassment and hate speech online.

Like Jigsaw, we also employ a team of engineers, scientists, researchers, and designers from around the world. And like the talented folks at Google, we also collaborate to solve the really tough problems using technology.

There are also some key differences in how we approach these problems!

Since the Two Hat Security team started by developing technology solutions for child-directed products, we have unique, rich, battle-tested experience with conversational subversion, grooming, and cyberbullying. We’re not talking about sitting on the sidelines here — we have hands-on experience protecting kids’ communities from high-risk content and behaviours.

Our CEO, Chris Priebe, helped code and develop the original safety and moderation solutions for Club Penguin, the children’s social network with over 300 million users acquired by The Walt Disney Company in 2007. Chris applied what he’s learned over the past 20 years of software development and security testing to Community Sift, our flagship product.

At Two Hat, we have an international, native-speaking team of professionals from all around the world — Italy, France, Germany, Brazil, Japan, India, and more. We combine their expertise with computer algorithms to validate their decisions, increase efficiency, and improve future results. Instead of depending on crowdsourced results (which require that users are forced to see a message
before they can report it), we focus on enabling platforms to sift out messages before they are deployed.

Google vs. Community Sift — Test Results

In a recent article published in Wired, writer Andy Greenberg put Google Jigsaw’s Conversation AI to the test. As he rightly stated in his article, “Conversation AI, meant to curb that abuse, could take down its own share of legitimate speech in the process.” This is exactly the issue we have in maintaining Community Sift — ensuring that we don’t take down legitimate free speech in the process of protecting users from hate speech.

We thought it would be interesting to run the same phrases featured in the Wired article through Community Sift to see how we’re measuring up. After all, the Google team sets a fairly high bar when it comes to quality!

From these examples, you can see that our human-reviewed language signatures provided a more nuanced classification to the messages than the artificial intelligence did. Instead of starting with artificial intelligence assigning risk, we bring conversation trends and human professionals to the forefront, then allow the A.I. to learn from their classifications.

Here’s a peak behind the scenes at some of our risk classifications.

We break apart sentences into phrase patterns, instead of just looking at the individual words or the phase on its own. Then we assign other labels to the data, such as the user’s reputation, the context of the conversation, and other variables like vertical chat to catch subversive behaviours, which is particularly important for child-directed products.

Since both of the previous messages contain a common swearword, we need to classify that to enable child-directed products to filter this out of their chat. However, in this context, the message is addressing another user directly, so it is at higher risk of escalation.

This phrase, while seemingly harmless to an adult audience, contains some risk for younger demographics, as it could be used inappropriately in some contexts.

As the Wired writer points out in his article, “Inside Google’s Internet Justice League and Its AI-Powered War on Trolls”, this phrase is often a response from troll victims to harassment behaviours. In our system, this is a lower-risk message.

The intention of our classification system is to empower platform owners to make informed and educated decisions about their content. Much like how the MPAA rates films or the ESRB rates video games, we rate user-generated content to empower informed decision-making.

*****

Trolls vs. Regular Users

We’re going to go out on a limb here and say that every company cares about how their users are being treated. We want customers to be treated with dignity and respect.

Imagine you’re the owner of a social platform like a game or app. If your average cost of acquisition sits at around $4, then it will cost you a lot of money if a troll starts pushing people away from your platform.

Unfortunately, customers who become trolls don’t have your community’s best interests or your marketing budget in mind — they care more about getting attention… at any cost. Trolls show up on a social platform to get the attention they’re not getting elsewhere.

Identifying who these users are is the first step to helping your community, your product, and even the trolls themselves. Here at Two Hat, we like to talk about our “Troll Performance Improvement Plans” (Troll PIPs), where we identify who your top trolls are, and work on a plan to give them a chance to reform their behaviour before taking disciplinary action. After all, we don’t tolerate belligerent behaviour or harassment in the workplace, so why would we tolerate it within our online communities?

Over time, community norms set in, and it’s difficult to reshape those norms. Take 4chan, for example. While this adult-only anonymous message board has a team of “volunteer moderators and janitors”, the site is still regularly filled with trolling, flame wars, racism, grotesque images, and pornography. And while there may be many legitimate, civil conversations lurking beneath the surface of 4chan, the site has earned a reputation that likely won’t change in the eyes of the public.

Striking a balance between free speech while preventing online harassment is tricky, yet necessary. If you allow trolls to harass other users, you are inadvertently enabling someone to cause another psychological harm. However, if you suppress every message, you’re just going to annoy users who are just trying to express themselves.

*****

We’ve spent the last four years improving and advancing our technology to help make the internet great again. It’s a fantastic compliment to have a company as amazing as Google jumping into the space we’ve been focused on for so long, where we’re helping social apps and games like Dreadnought, PopJam, and ROBLOX.

Having Google join the fray shows that harassment is a big problem worth solving, and it also helps show that we have already made some tremendous strides to pave the way for them. We have had conversations with the Google team about the Riot Games’ experiments and learnings about toxic behaviours in games. Seeing them citing the same material is a great compliment, and we are honored to welcome them to the battle against abusive content online.

Back at Two Hat, we are already training the core Community Sift system on huge data sets — we’re under contract to process four billion messages a day across multiple languages in real-time. As we all continue to train artificial intelligence to recognize toxic behaviors like harassment, we can better serve the real people who are using these social products online. We can empower a freedom of choice for users to allow them to choose meaningful settings, like opting out of rape threats if they so choose. After all, we believe a woman shouldn’t have to self-censor herself, questioning whether that funny meme will result in a rape or death threat against her family. We’d much rather enable people to censor out inappropriate messages from those special kind of idiots who threaten to rape women.

While it’s a shame that we have to develop technology to curb behaviours that would be obviously inappropriate (and in some cases, illegal) in real-life, it is encouraging to know that there are so many groups taking strides to end hate speech now. From activist documentaries and pledges like The Bully Project, inspiring people to stand up against
bullying, to Alphabet/Google’s new Jigsaw division, we are on-track to start turning the negative tides in a new direction. And we are proud to be a part of such an important movement.

How to Remove Online Hate Speech in Under 24 Hours

Note: This post was originally published on July 5th, 2016. We’ve updated the content in light of the draft bill presented by the German government on March 14th.

In July of last year, the major players in social media came together as a united front with a pact to remove hate speech within 24 hours. Facebook defines hate speech as “content that attacks people based on their perceived or actual race, ethnicity, religion, sex, gender, sexual orientation, disability or disease.” Hate speech is a serious issue, as it shapes the core beliefs of people all over the globe.

Earlier this week, the German government took their fight against online hate speech one step further. They have proposed a new law that would levy fines up to €50 million against social media companies that failed to remove or block hate speech within 24 hours of a complaint. And the proposed law wouldn’t just affect companies — it would affect individuals as well. Social media companies would be expected to appoint a “responsible contact person.” This individual could be subject to a fine up to €5 million if user complaints aren’t dealt with promptly.

Those are big numbers — the kinds of numbers that could potentially cripple a business.

As professionals with social products, we tend to rally around the shared belief that empowering societies to exchange ideas and information will create a better, more connected world. The rise of the social web has been one of the most inspiring and amazing changes in recent history, impacting humanity for the better.

Unfortunately, like many good things in the world, there tends to be a dark underbelly hidden beneath the surface. While the majority of users use social platforms to share fun content, interesting information and inspirational news, there is a small fraction of users that use these platforms to spread messages of hate.

It is important to make the distinction that we are not talking about complaints, anger, or frustration. We recognize that there is a huge difference between trash talking vs. harassing specific individuals or groups of people.

We are a protection layer for social products, and we believe everyone should have the power to share without fear of harassment or abuse. We believe that social platforms should be as expressive as possible, where everyone can share thoughts, opinions, and information freely.

We also believe that hate speech does not belong on any social platform. To this end, we want to enable all social platforms to remove hate speech as fast as possible — and not just because they could be subject to a massive fine. As professionals in the social product space, we want everyone to be able to get this right — not just the huge companies like Google.

Smaller companies may be tempted to do this manually, but the task becomes progressively harder to manage with increased scale and growth. Eventually, moderators will be spending every waking moment looking at submissions, making for an inefficient process and slow reaction time.

Instead of removing hate speech within 24 hours, we want to remove it within minutes or even seconds. That is our big, hairy, audacious goal.

Here’s how we approach this vision of ‘instant hate speech removal.’

Step 1 — Label everything.

Full disclosure: traditional filters suck. They have a bad reputation for being overly-simplistic, unable to address context, and prone to flagging false-positives. Still, leaving it up to users to report all terrible content is unfair to them and bad for your brand. Filters are not adequate for addressing something as complicated as hate speech, so we decided to invest our money into creating something different.

Using the old environmentally-friendly adage of “reduce, reuse, recycle (in that specific order)”, we first want to reduce all the noise. Consider movie ratings: all films are rated, and “R” ratings come accompanied by explanations. For instance, “Rated R for extreme language and promotion of genocide.” We want to borrow this approach and apply labels that indicate the level of risk associated with the content.

There are two immediate benefits: First, users can decide what they want to see; and second, we can flag any content above our target threshold. Of course, content that falls under ‘artistic expression’ can be subjective. Films like “Schindler’s List” are hard to watch but do not fall under hate speech, despite touching upon subjects of racism and genocide. On social media, some content may address challenging issues without promoting hate. The rating allows people to prepare themselves for what they are about to see, but we need more information to know if it is hate speech.

In the real world, we might look at the reputation of the individual to gain a better sense of what to expect. Likewise, content on social media does not exist in a vacuum; there are circumstances at play, including the reputation of the speaker. To simulate human judgment, we have built out our system with 119 features to examine the text, context, and reputation. Just looking for words like “nigga” will generate tons of noise, but if you combine that with past expressions of racism and promotions of violence, you can start sifting out the harmless stuff to determine what requires immediate action.

User reputation is a powerful tool in the fight against hate speech. If a user has a history of racism, you can prioritize reviewing — and removing — their posts above others.

The way we approach this with Community Sift is to apply a series of lenses to the reported content — internally, we call this ‘classification.’ We assess the content on a sliding scale of risk, note the frequency of user-submitted reports, the context of the message (public vs. large group vs. small group vs. 1:1), and the speaker’s reputation. Note that at this point in the process we have not done anything yet other than label the data. Now it is time to do something with it.

Step 2 — Take automatic action.

 

After we label the data, we can place it into three distinct ‘buckets.’ The vast majority (around 95%) will fall under ‘obviously good’, since social media predominantly consists of pictures of kittens, food, and reposted jokes. Just like there is the ‘obviously good,’ however, there is also the ‘obviously bad’.

In this case, think of the system like anti-virus technology. Every day, people are creating new ways to mess up your computer. Cybersecurity companies dedicate their time to finding the latest malware signatures so that when one comes to you, it is automatically removed. Similarly, our company uses AI to find new social signatures by processing billions of messages across the globe for our human professionals to review. The manual review is critical to reducing false positives. Just like with antivirus technology, you do not want to delete innocuous content on people’s computers, lest you end up making some very common mistakes like this one.

So what is considered ‘obviously bad?’ That will depend on the purpose of the site. Most already have a ‘terms of use’ or ‘community guidelines’ page that defines what the group is for and the rules in place to achieve that goal. When users break the rules, our clients can configure the system to take immediate action with the reported user, such as warning, muting, or banning them. The more we can automate meaningfully here, the better. When seconds matter, speed is of the essence.

Now that we have labeled almost everything as either ‘obviously good’ and ‘obviously bad,’ we can prioritize which messages to address first.

Step 3 — Create prioritized queues for human action.

Computers are great at finding the good and the bad, but what about all the stuff in the middle? Currently, the best practice is to crowdsource judgment by allowing your users to report content. Human moderation of some kind is key to maintaining and training a quality workflow to eliminate hate speech. The challenge is going to be getting above the noise of bad reports and dealing with the urgent right now.

Remember the Steven Covey model of time management? Instead of only using a simple chronologically sorted list of hate speech reports, we want to provide humans with a streamlined list of items to action quickly, with the most important items at the top of the list.

A simple technique is to have two lists. One list has all the noise of user reported content. We see that about 80–95% of those reports are junk (one user like dogs, so they report the person who likes cats). Since we labeled the data in step 1, we know a fair bit about it already: the severity of the content, the intensity of the context, and the person’s reputation. If the community thinks the content violates the terms of use and our label says it is likely bad, chances are, it is bad. Alternatively, if the label thinks it is fine, then we can wait until more people report it, thus reducing the noise.

The second list focuses on high-risk, time-sensitive content. These are rare events, so this work queue is kept minuscule. Content enters when the system thinks it is high-risk, but cannot be sure; or, when users report content that is right on the border of triggering the conditions necessary for a rating of ‘obviously bad.’ The result is a prioritized queue that humans can stay on top of and remove content from in minutes instead of days.

In our case, we devote millions of dollars a year into continual refinement and improvement with human professionals, so product owners don’t have to. We take care of all that complexity to get product owners back to the fun stuff instead — like making more amazing social products.

Step 4 — Take human action.

Product owners could use crowdsourced, outsourced, or internal moderation to handle these queues, though this depends on the scale and available resources within the team. The important thing is to take action as fast as humanly possible, starting with the questionable content that the computers cannot catch.

Step 5 — Train artificial intelligence based on decisions.

To manage the volume of reported content for a platform like Facebook or Twitter, you need to employ some level of artificial intelligence. By setting up the moderation AI to learn from human decisions, the system becomes increasingly effective at automatically detecting and taking action against emerging issues. The more precise the automation, the faster the response.

After five years of dedicated research in this field, we’ve learned a few tricks.

Machine learning AI is a powerful tool. But when it comes to processing language, it’s far more efficient to use a combination of a well-trained human team working alongside an expert system AI.

By applying the methodology above, it is now within our grasp to remove hate speech from social platforms almost instantly. Prejudice is an issue that affects everyone, and in an increasingly connected global world, it affects everyone in real-time. We have to get this right.

Since Facebook, YouTube, Twitter and Microsoft signed the EU hate speech code back in 2016, more and more product owners have taken up the fight and are looking for ways to combat intolerance in their communities. With this latest announcement by the German government— and the prospect of substantial fines in the future — we wanted to go public with our insights in hopes that someone sees something he or she could apply to a platform right now. In truth, 24 hours just isn’t fast enough, given the damage that racism, threats, and harassment can cause. Luckily, there are ways to prevent hate speech from ever reaching the community.

At Community Sift and Two Hat Security, we have a dream — that all social products have the tools at their disposal to protect their communities. The hardest problems on the internet are the most important to solve. Whether it’s hate speech, child exploitation, or rape threats, we cannot tolerate dangerous or illegal content in our communities.

If we work together, we have a real shot at making the online world a better place. And that’s never been more urgent than it is today.

Freedom of Speech = Freedom From Accountability

We believe freedom of speech can be a positive force, especially when used with a level of care and respect for others. Realistically, we don’t live in a world where people will always be sweet and happy like Teletubbies. People are not always going to be kind to each other, and everyone is bound to have a bad day…

Here’s where the true challenge comes in for product owners — what do you do to protect free speech while also protecting your community from hate speech and online harassment? Do you allow users to threaten to rape other users in the name of freedom of expression? How will you discern the difference between someone having a bad day versus repeat offenders in need of correction?

Context is everything

The knee-jerk reaction might be to implement an Orwellian censorship strategy. In some cases, this may be the correct approach. Strict filtration is the right strategy for a child-directed product, where there are topics that are never acceptable from a legal perspective. However, filtering out certain words or phrases may not be the solution for a 17+ gaming community or a social media platform for adults. The context of a conversation between two strangers is much different from a conversation between a group of old friends, or a public chatroom where many voices are ‘shouting’ at each other.

Every community has different rules of engagement — each company has a philosophy about what they deem to be an appropriate conversation within their social product. What Flickr considers acceptable will differ significantly from what’s socially accepted on 4chan, or from within a professional Slack channel. Every product is different and unique, and that is one of the challenges we have in providing a protection service to such a wide variety of social apps and games.

Each product has a set of goals and guidelines that govern what they believe is acceptable or unacceptable within their community. Similarly, online collectives tend to have expectations about what they think is appropriate or inappropriate behaviour within their tribe. A moderator or community manager should act as a facilitator, reconciling any differences of expectation between the product owners and the community.

Respect each other as humans

With anonymity, it is much easier to divorce oneself from the reality that there’s a real human on the receiving end of cruel comments or so-called rape ‘jokes’.

Writer Lindy West shared a bit of her story about confronting a ‘troll’ who had been harassing her online in an excellent episode of “This American Life”. The writer and the man engage in a civil conversation, acknowledging the tension directly, eventually coming to somewhat of understanding about each other.

People forget that the victims of these ‘trolls’ are real people, but they also forget that ‘trolls’ are real people, too. As Lindy West describes, “empathy, boldness, and kindness” are some practical ways to bridge differences between two humans. There is a substantial difference between a virus and a corrupted file, just as there is a difference between a real troll and someone who’s having a bad day. With respect comes an opportunity to see each other as human beings rather than avatars on the other side of a screen.

Freedom of speech does not equal freedom from accountability

Some have described the internet as a haven for freedom of expression, where there is less pressure to be “politically correct”. While this may be partially true, there is still an inherent accountability that comes with our freedom. When someone chooses to exploit their freedom to publish hate speech, he or she will likely face some natural consequences, like the effect on his or her personal reputation (or in some extreme cases, legal repercussions).

Freedom of speech is not always sweet. It can even be ugly without crossing the line of transforming into toxic behavior. It can also be amazing and transformative. The democratization of thought enabled by modern social platforms have had a profound effect on society, empowering millions to share and exchange ideas and information.

One of our goals with Community Sift is to create safety without censorship, empowering product owners to preserve user freedom while also protecting their social apps and games. There are so many issues that plague online communities, including spam, radicalization, and illegal content. Businesses work with us because we use a combination of machine learning, artificial intelligence, and human community professionals to protect their products and services.

Moreover, while we respect the need for freedom of speech, we cannot support any activity that results in someone taking their own life. That is why we do what we do. If we can protect a single life through automated escalations and improved call-for-help workflows, we will have made the world a better place. While this may sound overly altruistic, we believe this is a challenge that is worth tackling head-on, regardless of the perspective about “freedom of speech.”

 

Originally published on Medium

Photo by Cory Doctorow. Source: Flickr