Tech Perspectives: Surpassing 100 billion online interactions in a month

In 2020, social platforms that wish to expand their product and scale their efforts are faced with a critical decision — how will they automate the crucial task of content moderation? As platforms grow from hundreds to thousands to millions of users, that means more usernames, more live chat, and more comments, all of which require some form of moderation. From app store requirements to legal compliance with global legislation, ensuring that all user-generated content is aligned with community guidelines is nothing short of an existential matter.

When it comes to making a technical choice for a content moderation platform, what I hear in consultations and demos can be distilled down to this: engineers want a solution that’s simple to integrate and maintain, and that can scale as their product scales. They are also looking for a solution that’s battle-tested and allows for easy troubleshooting — and that won’t keep them up at night with downtime issues!

“Processing 100 billion online interactions in one month is technically hard to achieve. That is not simply just taking a message and passing it on to users but doing deep textual analysis for over 3 million patterns of harmful things people can say online. It includes building user reputation and knowing if the word on the line above mixed with this line is also bad. Just trying to maintain user reputation for that many people is a very large technical challenge. And to do it all on 20 milliseconds per message is incredible”.  Chris Priebe, Two Hat’s CEO and Founder

Surpassing 100 Billion Online Interactions in a Month
I caught up with Laurence Brockman, Two Hat’s Vice President of Core Services, and Manisha Eleperuma, our Manager of Development Operations, just as we surpassed the mark of 100 billion pieces of human interactions processed in one month.

I asked them about what developers value in a content moderation platform, the benefits of an API-based service, and the technical challenges and joys of safeguarding hundreds of millions of users globally.

Carlos Figueiredo: Laurence, 100 billion online interactions processed in one month. Wow! Can you tell us about what that means to you and the team, and the journey to getting to that landmark?

“At the core, it’s meant we were able to keep people safe online and let our customers focus on their products and communities. We were there for each of our customers when they needed us most”.

Laurence Brockman: The hardest part for our team was the pace of getting to 100 billion. We tripled the volume in three months! When trying to scale & process that much data in such a short period, you can’t cut any corners.  And you know what? I’m pleased to say that it’s been business as usual – even with this immense spike in volume. We took preventative measures along the way, we focused on key areas to ensure we could scale. Don’t get me wrong, there were few late nights and a week of crazy refactoring a system but our team and our solution delivered. I’m very proud of the team and how they dug in, identified any potential problem areas and jumped right in. At 100 billion, minor problems can become major problems and our priority is to ensure our system is ready to handle those volumes. 

“What I find crazy is our system is now processing over 3 billion events every day! That’s six times the volume of Twitter”.

CF: Manisha, what are the biggest challenges and joys of running a service that safeguards hundreds of millions of users globally?

Manisha Eleperuma: I would start off with the joys. I personally feel really proud to be a part of making the internet a safer place. The positive effect that we can have on an individual’s life is immense. We could be stopping a kid from harming themself, we could be saving them from a predator, we could be stopping a friendly conversation turning into a cold battle of hate speech. This is possible because of the safety net that our services provide to online communities. Also, it is very exciting to have some of the technology giants and leaders in the entertainment industry using our services to safeguard their communities. 

It is not always easy to provide such top-notch service, and it definitely has its own challenges. We as an Engineering group are maintaining a massive complex system and keeping it up and running with almost zero downtime. We are equipped with monitoring tools to check the system’s health and engineers have to be vigilant for alerts triggered by these tools and promptly act upon any anomalies in the system even during non-business hours. A few months ago, when the pandemic situation was starting to affect the world, the team could foresee an increase in transactions that could potentially start hitting our system. 

“This allowed the team to get ahead of the curve and pre-scale some of the infrastructure components to be ready for the new wave so that when traffic increases, it hits smoothly without bringing down the systems”. 

Another strenuous exercise that the team often goes through is to maintain the language quality of the system. Incorporating language-specific characteristics into the algorithms is challenging, but exciting to deal with. 

CF: Manisha, what are the benefits of using an API-based service? What do developers value the most in a content moderation platform?

ME: In our context, when Two Hat’s Community Sift is performing as a classification tool for a customer, all transactions happen via customer APIs. In every customer API, based on their requirements, it has the capability to access different components of our platform side without much hassle. For example, certain customers rely on getting the player/user context, their reputation, etc. The APIs that they are using to communicate with our services are easily configurable to fetch all that information from the internal context system, without extra implementation from the customer’s end.

This API approach has accelerated the integration process as well. We recently had a customer who was integrated with our APIs and went live successfully within a 24 hour period”.

Customers expect reliability and usability in moderation platforms. When a moderator goes through content in a Community Sift queue, we have equipped the moderator with all the necessary data, including player/user information with the context of the conversation, history and the reputation of the player which eases decision-making. This is how we support their human moderation efforts. Further, we are happy to say that Two Hat has expanded the paradigm to another level of automated moderation, using AI models that make decisions on behalf of human moderators after it has learned from their consistent decisions, which lowers the moderation costs for customers. 

CF: Laurence, many of our clients prefer to use our services via a server to server communication, instead of self-hosting a moderation solution. Why is that? What are the benefits of using a service like ours?

LB: Just as any SaaS company will tell you, our systems are able to scale to meet the demand without our customers’ engineers having to worry about it. It also means that as we release new features and functions, our customers don’t have to worry about expensive upgrades or deployments. While all this growth was going on, we also delivered more than 40 new subversion detection capabilities into our core text-classification product.

Would you like to see our content moderation platform in action? Request a demo today.

Two Hat Named One of 2019 “Ready to Rocket” Growth Companies in British Columbia’s Technology Sectors

List Profiles B.C.’s Tech Companies Best-Positioned to Capitalize on Current Sector Trends

VANCOUVER, B.C. (March 20, 2019) – Rocket Builders announced its seventeenth (17th) annual “Ready to Rocket” lists naming leading automated content moderation company Two Hat as one of the “Ready to Rocket” companies in the Information and Communication Technology category. The list profiles British Columbia technology companies that are best positioned to capitalize on the technology sector trends that will lead them to faster growth than their peers. Two Hat was highlighted for their leading Community Sift chat filter.

The annual 2019 “Ready to Rocket” lists provide accurate predictions of private companies that will likely experience significant growth, venture capital investment or acquisition by a major player in the coming year. Two Hat is listed among 85 companies across this year’s list of companies in the Information and Communication Technology category.

“We’ve experienced incredible growth over the last year, and we expect it to only get better in 2019,” said Chris Priebe, Two Hat CEO and founder. “We’ve been working with the biggest gaming companies in the world for several years now. But last year social platforms went through a major paradigm shift, which opened content moderation solutions like ours to break into new and emerging industries like edtech, fintech, travel and hospitality, and more.”

Two Hat is the creator of Community Sift, a powerful risk-based chat filter and content moderation software that protects online communities, brands, and bottom lines. Community Sift is the industry leader in high-risk content detection and moderation, protecting some of the biggest online games, virtual worlds, and social products on the internet. With the number of child pornography incidents in Canada on the rise, Two Hat collaborated with Canadian law enforcement and leading academic partners to train a groundbreaking new AI model, CEASE.ai, to detect and remove child sexual abuse material (CSAM) for investigators and social platforms.

“Over the 17 years of the program, the B.C. technology sector has steadily grown each year, and presents a growing challenge to select and identify the most likely to succeed for our Ready to Rocket lists,” said Geoffrey Hansen, Managing Partner at Rocket Builders.

“In recent years, a startup economy has blossomed yielding a rich field of companies for our consideration, with over 450 companies reviewed to make our selections of 203 winners. Our Emerging Rocket lists enables us to profile those earlier stage companies that are well positioned for investment.”

The average growth rate on the list was over 40 percent growth, 32 companies exceeding double-digit growth and six companies exceeding 100 percent growth.

Two Hat has been named a “Ready to Rocket” company for four consecutive years. This year’s award follows Two Hat’s recent acquisition of ImageVision, an image recognition and visual search company, and the launch of CEASE.ai.

About Two Hat
Founded in 2012, Two Hat is an AI-based technology company that empowers gaming and social platforms to grow and protect their online communities. With their flagship product Community Sift, an enterprise-level content filter and automated chat, image, and video moderation tool, online communities can proactively filter abuse, harassment, hate speech, adult content, and other disruptive behavior.

About Rocket Builders
Rocket Builders produces the “Ready to Rocket” list which profiles information technology companies with the greatest potential for revenue growth in the coming year. The lists are predictive of future success making them unique in approach and unique in value for our business audience. The “Ready to Rocket” lists are the only predictive lists of its kind in North America, requiring many months of sector and company analysis. The 2019 list features 85 “Ready to Rocket” technology growth companies and 118 “Emerging Rocket” early stage startups.

Contact
GreenSmith PR
Mike Smith, 703.623.3834
mike@greensmithpr.com

The Changing Landscape of Automated Content Moderation in 2019

Is 2019 the year that content moderation goes mainstream? We think so.

Things have changed a lot since 1990 when Tim Berners-Lee invented the World Wide Web. A few short years later, the world started to surf the information highway – and we’ve barely stopped to catch our collective breath since.

Learn about the past, present, and future of online content moderation in an upcoming webinar

The internet has given us many wonderful things over the last 30 years – access to all of recorded history, an instant global connection that bypasses country, religious, and racial lines, Grumpy Cat – but it’s also had unprecedented and largely unexpected consequences.

Rampant online harassment, an alarming rise in child sexual abuse imagery, urgent user reports that go unheard – it’s all adding up. Now that well over half of Earth’s population is online (4 billion people as of January 2018), we’re finally starting to see an appetite to clean up the internet and create safe spaces for all users.

The change started two years ago.

Mark Zuckerberg’s 2017 manifesto hinted at what was to come:

“There are billions of posts, comments, and messages across our services each day, and since it’s impossible to review all of them, we review content once it is reported to us. There have been terribly tragic events — like suicides, some live streamed — that perhaps could have been prevented if someone had realized what was happening and reported them sooner. There are cases of bullying and harassment every day, that our team must be alerted to before we can help out. These stories show we must find a way to do more.”

In 2018, the industry finally realized that it was time to find solutions to the problems outlined in Facebook’s manifesto. The question was no longer, “Should we moderate content on our platforms?” and instead became, “How can we better moderate content on our platforms?”

Play button on a film stripLearn how you can leverage the latest advances in content moderation in an upcoming webinar

The good news is that in 2019, we have access to the tools, technology, and years of best practices to make the dream of a safer internet a reality. At Two Hat, we’ve been working behind the scenes for nearly seven years now (alongside some of the biggest games and social networks in the industry) to create technology to auto-moderate content so accurately that we’re on the path to “invisible AI” – filters that are so good you don’t even know they’re in the background.

On February 20th, we invite you to join us for a very special webinar, “Invisible AI: The Future of Content Moderation”. Two Hat CEO and founder Chris Priebe will share his groundbreaking vision of artificial intelligence in this new age of chat, image, and video moderation.

In it, he’ll discuss the past, present, and future of content moderation, expanding on why the industry shifted its attitude towards moderation in 2018, with a special focus on the trends of 2019.

He’ll also share exclusive, advance details about:

We hope you can make it. Give us 30 minutes of your time, and we’ll give you all the information you need to make 2019 the year of content moderation.

PS: Another reason you don’t want to miss this – the first 25 attendees will receive a free gift! ; )


Read about Two Hat’s big announcements:

Two Hat Is Changing the Landscape of Content Moderation With New Image Recognition Technology

Two Hat Leads the Charge in the Fight Against Child Sexual Abuse Images on the Internet

Two Hat Releases New Artificial Intelligence to Moderate and Triage User-Generated Reports in Real Time

 

The Future of Image Moderation: Why We’re Creating Invisible AI (Part Two)

Yesterday, we announced that Two Hat has acquired image moderation service ImageVision. With the addition of ImageVision’s technology to our existing image recognition tech stack, we’ve boosted our filter accuracy — and are determined to push image moderation to the next level.

Today, Two Hat CEO and founder Chris Priebe discusses why ImageVision was the ideal choice for a technology acquisition— and how he hopes to change the landscape of image moderation in 2019.

We were approached by ImageVision over a year ago. Their founder Steven White has a powerful story that led him to found the company (it’s his to tell so I won’t share). His story resonated with me and my own journey of why I founded Two Hat. He spent over 10 years perfecting his art. He had clients with Facebook, Yahoo, Flickr, and Apple. That is 10 years of experience and over $10 million in investment to solve the problems of accurately detecting pornographic images.

Of course 10 years ago we all did things differently. Neural networks weren’t popular yet. Back then, you would look at how much skin tone was in an image. You looked at angles and curves and how they relate to each other. ImageVision made 185 of these hand-coded features.

Later they moved on to neural networks but ImageVision did something amazing. They took their manually coded features and fed both them and the pixels into the neural network. And they got a result different from what everyone else was doing at the time.

Now here is the reality — there is no way I’m going to hire people to write nearly 200 manually coded features in this modern age. And yet the problem of child sexual abuse imagery is so important that we need to throw every resource we can at it. It’s not good enough to only prevent 90% of exploitation — we need all the resources we can get.

Like describing an elephant

So we did a study. We asked, “What would happen if we took several image detectors and mixed them together? Would they give a better answer than any alone?”

It’s like the story of several blind men describing an elephant. One describes a tail, another a trunk, another a leg. They each think they know what an elephant looks like, but until they start listening to each other they’ll never actually “see” the real elephant. Likewise in AI, some systems are good at finding one kind of problem and another at another problem. What if we trained another model (called an ensemble) to figure out when each of them is right?

For our study, we took 30,000 pornographic images and 55,000 clean images. We used ImageVision images since they are full of really hard ones to find; the kind of images you might actually see in real life and not just a lab experiment. The big cloud providers found between 89-98% of pornographic images out of all 30k images, while the precision rate was around 95-98% for all of them (precision refers to the proportion of positive identifications that are correct).

We were excited that our current system found most of the images, but we wanted to do better.

For the CEASE.ai project, we had to create a bunch of weak learners to find CSAM. Detecting CSAM is such a huge problem that we needed to throw everything we could at it. So we ensembled the weak learners all together to see what would happen — and we got another 1% of accuracy, which is huge because the gap from 97% to 100% is the hardest to close.

But how do you close the last 2%? This is where millions of dollars and decades of experience are critical. This is where we must acquire and merge every trick in the book. When we took ImageVision’s work and merged it with our own, we squeezed out another 1%. And that’s why we bought them.

We’re working on a white paper where we’ll present our findings in further detail. Stay tuned for that soon.

The final result

So if we bought ImageVision, not only would we gain 10 years of experience, multiple patents, and over $10 million in technology, but we would be the best NSFW detector in the industry. And if we added that into our CSAM detector (along with age detection, face detection, body part detection, and abuse detection) then we could push that accuracy even closer and hopefully save more kids from the horrors of abuse. Spending money to solve this problem was a no-brainer for us.

Today, we’re on the path to making AI invisible.


Learn more about Priebe’s groundbreaking vision of artificial intelligence in an on-demand webinar. He shares more details about the acquisition, CEASE.ai, and the content moderation trends that will dominate 2019. Register to watch the webinar here.

Further reading:

Part One of The Future of Image Moderation: Why We’re Creating Invisible AI
Official ImageVision acquisition announcement.

The Future of Image Moderation: Why We’re Creating Invisible AI (Part One)

In December and early January, we teased exciting Two Hat news coming your way in the new year. Today, we’re pleased to share our first announcement of 2019 — we have officially acquired ImageVision, an image recognition and visual search company. With the addition of ImageVision’s groundbreaking technology, we are now poised to provide the most accurate NSFW image moderation service in the industry.

We asked Two Hat CEO and founder Chris Priebe to discuss the ambitious technology goals that led to the acquisition. Here is part one of that discussion:

The future of AI is all about quality. Right now the study of images is still young. Anyone can download TensorFlow or PyTorch, feed it a few thousand images and get a model that gets things right 80-90% of the time. People are excited about that because it seems magical – “They fed a bunch of images into a box and it gave an answer that surprisingly right most of the time!” But even if you get 90% right, you are still getting 10% wrong.

Think of it this way: If you do 10 million images a day that is a million mistakes. A million times someone tried to upload a picture that was innocent and meaningful to them and they had to wait for a human to review it. That is one million images humans need to review. We call those false positives.

Worse than false positives are false negatives, where someone uploads an NSFW (not safe for work) picture or video and it isn’t detected. Hopefully, it was a mature adult who saw it. Even if it was an adult, they weren’t expecting to see adult content, so their trust in the site is in jeopardy. They’re probably less likely to encourage a friend to join them on the site or app.

Worse if it was a child who saw it. Worst of all if it is a graphic depiction of a child being abused.

Protecting children is the goal

That last point is closest to our heart. A few years ago we realized that what really keeps our clients awake at night is the possibility someone will upload child sexual abuse material (CSAM; also known as child exploitive imagery, or CEI, and formerly called child pornography) to their platform. We began a long journey to solve that problem. It began with a hackathon where we gathered some of the largest social networks in the world with international law enforcement and academia all in the same room and attempted to build a solution together.

So AI must mature. We need to get beyond a magical box that’s “good enough” and push it until AI becomes invisible. What do I mean by invisible? For us, that means you don’t even notice that there is a filter because it gets it right every time.

Today, everyone is basically doing the same thing, like what I described earlier — label some NSFW images and throw them at the black box. Some of us are opening up the black box and changing the network design to hotrod the engine, but for the most part it’s a world of “good enough”.

Invisible AI

But in the future, “good enough” will no longer be tolerated. The bar of expectation will rise and people will expect it to just work. From that, we expect companies to hyper-specialize. Models will be trained that do one thing really, really well. Instead of a single model that answers all questions, instead, there will be groups of hyper-specialists with a final arbiter over them deciding how to best blend all their opinions together to make AI invisible.

We want to be at the top of the list for those models. We want to be the best at detecting child abuse, bullying, sextortion, grooming, and racism. We are already top of the market in several of those fields and trusted by many of the largest games and social sharing platforms. But we can do more.

Solving the biggest problems on the internet

That’s why we’ve turned our attention to acquiring. These problems are too big, too important to have a “not built here, not interested” attitude. If someone else has created a model that brings new experience to our answers, then we owe it our future to embrace every advantage we can get.

Success for me means that one day my children will take for granted all the hard work we’re doing today. That our technology will be invisible.

In part two, Chris discusses why ImageVision was the ideal choice for a technology acquisition— and how he hopes to change the landscape of image moderation in 2019.

Sneak peek:

“It’s like the story of several blind men describing an elephant. One describes a tail, another a trunk, another a leg. They each think they know what an elephant looks like, but until they start listening to each other they’ll never actually “see” the real elephant. Likewise in AI, some systems are good at finding one kind of problem and another at another problem. Could we train another model (called an ensemble) to figure out when each of them is right?”

 

Read the official ImageVision acquisition announcement

What Is the Difference Between a Profanity Filter and a Content Moderation Tool?

Profanity filter, content moderation, automated moderation tool, oh my! Ever noticed that these terms are often used interchangeably in the industry? The thing is, the many subtle (and not so subtle) differences between them can affect your long-term growth plans, and leave you stuck in a lengthy service contract with a solution that doesn’t fit your community.

Selecting the right software for content moderation is an important step if you want to build a healthy, engaged online community. To make things easier for you, let’s explore the main points of confusion between profanity filters and automated moderation tools.

Profanity filters catch, well, profanity
Profanity filters are pretty straightforward. They work by using a set blacklist/whitelist to allow or deny certain words. They’re great at finding your typical four-letter words, especially when they’re spelled correctly. Be aware, though — the minute you implement a blacklist/whitelist, your users are likely to start using language subversions to get around the filter. Even a simple manipulation like adding punctuation in the middle of an offensive word can cause a profanity filter to misread it, allowing it to slip through the cracks.

Be prepared to work overtime adding words to your allow and deny list, based on community trends and new manipulations.

A typical example of escalating filter subversion.

Profanity filters can be set up fast
One benefit of profanity filters, at least at first glance? They’re easy to set up. Many profanity filters allow you to enter your credit card and integrate in just a few minutes, and they often offer freemium versions or free trials to boot.

While this is great news for pre-revenue platforms and one-person shows, trading accuracy for speed can come back to bite you in the end. If you’re in a growth mindset and expect your community to scale, it’s in your best interest to implement the most effective and scalable moderation tools at launch. Remember that service contract we mentioned earlier? This is where you don’t want to get stuck with the wrong software for your community.

So, what are your other options? Let’s take a look at content moderation tools.

Content moderation tools filter more than just profanity
Online communities are made up of real people, not avatars. That means they behave like real people and use language like real people. Disruptive behavior (what we used to call “toxicity”) comes in many forms, and it’s not always profanity.

Some users will post abusive content in other languages. Some will harass other community members in more subtle ways — urging them to harm themselves or even commit suicide, using racial slurs, engaging in bullying behavior without using profanity, or doxxing (sharing personal information without consent). Still others will manipulate language with l337 5p34k, ÙniÇode ÇharaÇters, or kreative mizzpellingzz.

Accuracy is key here — and a profanity filter that only finds four-letter words cannot provide that same level of fine-tuned detection.

A context-based moderation tool can even make a distinction between words that are perfectly innocent in one context… but whose meaning changes based on the conversation (“balls” or “sausage” are two very obvious examples).

What else should you look for?

1. Vertical Chat
Also known as “dictionary dancing”. Those same savvy users who leverage creative misspellings to bypass community guidelines will also use multiple lines of chat to get their message across:

Vertical chat in action.

2. Usernames
Most platforms allow users to create a unique username for their profile. But don’t assume that a simple profanity filter will detect and flag offensive language in usernames. Unlike other user-generated content like chat, messages, comments, and forum posts, usernames rarely consist of “natural” language. Instead, they’re made up of long strings of letters and numbers — “unnatural” language. Most profanity filters lack the complex technology to filter usernames accurately, but some moderation tools are designed to adapt to all kinds of different content.

3. Language & Culture
Can you think of many online communities where users only chat in English? Technology has brought people of different cultures, languages, and backgrounds together in ways that were unheard of in the past. If scaling into the global market is part of your business plan, choose a moderation tool that can support multiple languages. Accuracy and context are key here. Look for moderation software that supports languages built in-house by native speakers with a deep understanding of cultural and contextual nuances.

4. User Reputation
One final difference that we should call out here. Profanity filters treat everyone in the community the same. But anyone who has worked in online community management or moderation knows that human behavior is complex. Some users will never post a risky piece of content in their lifetime; some users will break your community guidelines occasionally; some will consistently post content that needs to be filtered.

Profanity filters apply the same settings to all of these users, while some content moderation tools will actually look at the user’s reputation over time, and apply a more permissive or restrictive filter based on behavior. Pretty sophisticated stuff.

Content moderation tools can be adapted to fit your community
A “set it and forget it” approach might work for a static, unchanging community with no plans for growth. If that’s the case for you, a profanity filter might be your best option. But if you plan to scale up, adding new users while keeping your current userbase healthy, loyal, and engaged, a content moderation tool with a more robust feature set is a much better long-term option.

Luckily, in today’s world, most content moderation technology is just a simple RESTful API call away.

Not only that, content moderation tools allow you to moderate your community much more efficiently and effectively than a simple profanity filter. With automated workflows in place, you can escalate alarming content (suicide threats, child exploitation, extreme harassment) to queues for your team to review, as well as take automatic action on accounts that post disruptive content.

Selecting a moderation solution for your platform is no easy task. When it’s time to make a decision, we hope you’ll use the information outlined above to make the right decision for your online community.