Thinking of Building Your Own Chat Filter? Five Reasons You’re Wasting Your Time!
If you’re building an online community, whether a game or social network, flagging and dealing with abusive language and users is critical to success. Back in 2014, a Riot Games study suggested that users who experience abuse their first time in the game are potentially three times more likely to quit and never return.
“Chatting is a major step in our funnel towards creating engaged, paying users. And so, it’s really in Twitch’s best interests — and in the interest of most game dev companies and other social media companies — to make being social on our platform as pleasant and safe as possible.” – Ruth Toner, Twitch
But that begs the question — should you build it yourself or use an outside vendor? Like anti-virus software, it’s better left to a team dedicated day in, day out, to keeping the software updated.
A few things to consider before investing a great deal of time and expense into an in-house chat filter.
1. A blacklist/whitelist doesn’t work because language isn’t binary
Traditionally, most filters use a binary blacklist/whitelist. The thing is, language isn’t binary. It’s complex and nuanced.
For instance, in many older gaming communities, some swear words will be acceptable, based on context. You could build a RegEx tool to string match input text, and it would have no problem finding an f-bomb. But can it recognize the critical difference between “Go #$%^ yourself” and “That was #$%^ing awesome”?
What if your players spell a word incorrectly? What if they use l337 5p34k (and they will)? What if they deliberately try to manipulate the filter?
It’s an endless arms race, and your users have way more time on their hands than you do.
Think about the hundreds of different variations of these phrases:
You should kill yourself / She deserves to die / He needs to drink bleach / etc
You are a [insert racial slur here]
Imagine the time and effort it would take to enter every single variation. Now add misspellings. Now add l337 mapping. Now add the latest slang. Now add the latest latest slang.
It never ends.
Now, imagine using a filter that has access to billions of lines of chat across dozens of different platforms. By using a third-party filter, you’ll benefit from the network effect, detecting words and phrases you would likely never find on your own.
2. Keep your team focused on building an awesome product — not chasing a few bad actors around the block
“When I think about being a game developer, it’s because we love creating this cool content and features. I wish we could take the time that we put into putting reporting [features] on console, and put that towards a match history system or a replay system instead. It was the exact same people that had to work on both who got re-routed to work on the other. – Jeff Caplin, Blizzard Entertainment
Like anything else built in-house, someone has to maintain the filter as well as identify and resolve specific incidents. If your plan is to scale your community, maintaining your own filter will quickly become unmanageable. The dev and engineering teams will end up spending more time keeping the community safe than actually building the community and features.
Compare that with simply tapping into the RESTful API of a service provider that reliably uses AI and human review to keep abusive language definitions current and quickly process billions of reports per day. Imagine letting community managers identify and effectively deal with the few bad actors while the rest of your team relentlessly improves the community itself.
3. Moderation without triage means drowning in user reports
There is a lot more to moderation than just filtering abusive chat. Filtering — regardless of how strict or permissive your community may be — is only the first layer of defense against antisocial behavior.
You’ll also need a way for users to report abusive behavior, an algorithm that bubbles the worst reports to the top for faster review, an automated process for escalating especially dangerous (and potentially illegal) content for your moderation team to review, various workflows to accurately and progressively message, warn, mute, and sanction accounts and (hopefully) correct user behavior, a moderation tool with content queues for moderators to actually review UGC, a live chat viewer, an engine to generate business intelligence reports…
“Invest in tools so you can focus on building your game with the community.”
That’s Lance Priebe, co-creator of the massively popular kid’s virtual world Club Penguin, sharing one of the biggest lessons he learned as a developer.
Focus on what matters to you, and on what you and your team do best — developing and shipping kickass new game features.
4. It’s obsolete before it ships
The more time and money you can put into your core product — improved game mechanics, new features, world expansions — the better.
Think of it this way. Would you build your own anti-virus software? Of course not. It would be outdated before launch. Researching, reviewing, and fighting the latest malware isn’t your job. Instead, you rely on the experts.
Now, imagine you’ve built your own chat filter and are hosting it locally. Every day, users find new ways around the filter, faster than you can keep up. That means every day you have to spend precious time updating the repository with new expressions. And that means testing and finally deploying the update… and that means an increase in game downtime.
This all adds up to a significant loss of resources and time — your time, your team’s time, and your player’s time.
5. Users don’t only chat in English
What if your community uses other languages? Consider the work that you’ll have to put into building an English-only filter. Now, double, triple, quadruple that work when you add Spanish, Portuguese, French, German, etc.
Word-for-word translation might work for simple profanity, but as soon as you venture into colloquial expressions (“let’s bang,” “I’m going to pound you,” etc) it gets messy.
In fact, many languages have complicated grammar rules that make direct translation literally impossible. Creating a chat filter in, say, Spanish, would require the expertise of a native speaker with a deep understanding of the language. That means hiring or outsourcing multiple language experts to build an internal multi-language filter.
And anyone who has ever run a company knows — people are awesome but they’re awfully expensive.
How complex are other languages? German has four grammar cases and three genders. Finnish uses 15 noun cases in the singular and 16 in the plural. And the Japanese language uses three independent writing systems (hiragana, katakana, kanji), all three of which can be combined in a single sentence.
Tl;dr, because grammar: Every language is complex in its own way. Running your English filter through a direct translation like Google translate won’t result in a clean, accurate chat filter. In fact, it will likely alienate your community if you get it wrong.
Engineering time is too valuable to waste
Is there an engineering team on the planet that has the time (not to mention resources) to maintain an internally-hosted solution?
Dev teams are already overtaxed with overflowing sprint cycles, impossible QA workloads, and resource-depleting deployment processes. Do you really want to maintain another internal tool?
If the answer is “no,” luckily there is a solution — instead of building it yourself, rely on the experts.
Think of it as anti-virus software for your online community.
Talk to the experts
Consider Community Sift by Two Hat Security for your community’s chat filter. Specializing in identification and triage of high-risk and illegal content, we are under contract to process 4 billion messages every day. Since 2012 we have been empowering gaming and social platforms to build healthy, engaged communities by providing cost-effective, purposeful automated moderation.
You’ll be in good company with some of the largest online communities by Supercell, Roblox, Kabam, and many more. Simply call our secure RESTful API to moderate text, usernames, and images in over 20 of the most popular IRL and digital languages, all built and maintained by our on-site team of real live native speakers.