Feeling overwhelmed by content moderation? Not sure where to start? Determining how you will handle text – including chat, forums, comments, and usernames – is a great place to start building your community moderation processes and workflows.
We’ve put together six highly effective chat filtering best practices that you can implement today in your moderation strategy. All of these tactics can be used with Two Hat’s content moderation platform to build safe and healthy online communities.
Check back soon for more best practices, including content escalations, image moderation, how to craft effective Community Guidelines, and more.
1. Filter the worst of the worst content
2. Create situationally appropriate settings
3. Consider different filter options
4. Send users warnings prior to sanctioning
5. Automatically moderate usernames
6. Leverage the ability to mute users
Start with knowing and enforcing your non-negotiables. From here, you can focus on encouraging all the positive behavior you want to foster on your platform. We’ll cover that in an upcoming best practice.
There are types of user-generated content that threaten the well-being of your users, the health of your community, and the reputation of your brand. You’ve likely already identified the content that you simply cannot tolerate in your community, including things like hate speech, child abuse, and violent threats. It’s essential that you implement proactive chat filters to identify and action on this type of content, in addition to facilitating expedient escalation processes for moderation review.
Filters that identify and action on abusive behaviors are table stakes. This is an essential piece of Safety by Design for online communities. The days of allow/disallow lists are long gone; it’s critical that you use contextual filters, capable of identifying harmful content, but also precise enough that they don’t over-filter and curb user expression.
Ask (and regularly revisit) the following question: What type of content do you always want to proactively identify and prevent by filtering? Is it extreme hate speech (characterized by attacks towards groups of people based on nationality, gender, sexual orientation, etc)? Do you want to ensure a zero-tolerance for sexual harassment (ie, unwelcome sexual advances and comments)? The answers will translate into your operational baseline filtering settings.
Run a team exercise to provide clear examples of content that falls into this non-negotiable bucket. Frame it within the purpose of your platform and what your Community Guidelines stand for.
Your community is unique, and so are the different experiences within it.
That’s why it’s important to leverage flexible settings to customize the social/communication experience in your product, and in different areas of your product.
It’s paramount that you understand the different contexts within your platform. A public area (like a lobby or waiting room where users gather before and after a match or app experience) requires stricter settings since most users interacting there don’t know each other yet. Public areas are also where you would typically find new users. It’s critical to provide a positive, welcoming experience for new users if we are to expect them to keep coming back. After all, it’s far more expensive to acquire a new user than it is to keep the ones you already have.
If you provide private, friend, clan, or alliance chat in your product, you can have “lighter” settings for your chat filtering in this area. Users typically know each other well in these private, curated groups, and will have different behavior norms and expectations. Instead of adopting a heavy-handed filter, action on the worst of the worst content like abuse and hate speech, while enabling maximum user expression in more intimate settings. In some cases, you can even let that subgroup choose their own settings. Some will make it a group that is “E for Everyone,” while others will make theirs for older audiences only. All groups will be subject to your minimum standard, but you can give them the flexibility to shape their own sub-community.
Different audiences (kids, teen, mature, etc) will also call for unique settings. A kid-oriented platform catering to under-13 users in the US will need filter settings that block Personally Identifiable Information (PII) that help achieve COPPA compliance, while a mature audience may actually depend on the ability for users to share PII.
Ultimately, part of an effective community moderation strategy includes self-declaring what your platform and community are all about. Be intentional about the online community you are fostering. Don’t leave it to chance.
Map the different communication and interaction needs within your platform:
Another consideration when building a chat filtering strategy is what you do with filtered text.
As with situationally appropriate settings, what you choose is dependant on your community and goals.
In communities with savvy users known for circumventing the filters with creative text (e.g. Unicode characters, l33t speak, internet lingo, etc), we recommend dropping the message (known as a false send or shadow ban) without providing feedback to the user that their message was filtered.
This is helpful in larger group chat as disruptive users will only receive attention to their engaging instead of disruptive messages, thereby reinforcing that positive, healthy behavior. However, a word of caution here. False sends do not work well in private chat or low-volume areas (where potentially only two or three users are in a room together), as it can actually lead to an increase in disruptive behavior if users believe their messages are being maliciously ignored.
If you want to provide feedback to your users so they see which part(s) of their message was blocked, you can instead replace the offensive content with hashes.
Some companies even display positive emojis or messages (“You’re awesome!”) in place of offensive content, eg, “Why don’t you go 🙂🙂 yourself”
Take a moment to plan for the types of messages that are still sensitive in nature but shouldn’t be filtered. For example, consider not filtering messages of suicide or self-harm ideation (different from self-harm and suicide encouragement). Filtering this kind of content could result in a negative experience for a user that is already in distress. It can also prevent well-meaning users from offering support and comfort.
Use triggers to warn users (in tandem with filtering content) to nudge them in a positive direction before resorting to account sanctions. Several Two Hat clients have benefited from sending warning messages to users when they attempt to post abusive content.
Here’s an industry example in which warnings have significantly reduced disruptive behavior by simply reminding users that the message they are typing could be perceived as abusive:
This also forces us to take accountability for the interactive features we design for users and examine how those very features might create friction that inadvertently encourages users to behave negatively. If we are to foster more positive online communities, it’s critical that we look at user behavior and motivation holistically, and recognize the role that product and feature design play in user choices.
This is a key motivation of organizations like the Fair Play Alliance.
Differentiate between messages that are designed to support users as a means of intervention from messages that are meant to modify negative behavior. They have different outcomes and as such should be treated differently within your moderation strategy and operations.
A good example of messaging with the purpose of discouraging users from negative behavior is letting them know that their message could be perceived as abusive or is not aligned with the community values: “The message you are about to send is not aligned with our Community Guidelines,” “Most users in our community could find that message offensive. We encourage you to reconsider it,” or, “The content you are trying to share is not something our community stands for.”
If you don’t allow users to share personally identifiable information (especially pertinent to under-13 communities), take this opportunity to clearly communicate your stance: “Please don’t share or request personally identifiable information, including your real name, email, address, etc.”
Messages that are meant to support users and intervene at key moments can be leveraged when you identify suicidal ideation in chat. Consider sharing a helpful resource like a crisis phone line. Here’s a handy list of global suicide hotlines.
Learn how to leverage robust language classification for better triggers in our Five Layers of Community Protection
Username moderation is a crucial first step in building your community moderation strategy.
If you allow users to create their own display names, it’s best practice to proactively filter at username creation to stop unwanted names and reduce moderation workload.
After all, usernames are front and center for new users. Hate speech and overt sexual content in a username can create an early negative user experience and a potential brand reputation issue for your platform. Inappropriate display names become a billboard that says, “This community doesn’t have good safeguarding mechanisms in place.” It sets a negative tone and can potentially normalize anti-social behaviors from the start.
Create a filtering policy specific for usernames, and consider the following elements in addition to core topics you want to block (sexual content, profanity, slurs, etc) as a starting point:
Muting is an effective way to limit social interactions without preventing disruptive users from accessing your platform. With this option, the user is still able to send and view messages, but none of their content is actually viewable to other users. Unlike filtering, muting applies to only one user at a time and can be toggled off and on depending on behavior. The mute feature is a way to avoid other more drastic sanctions like suspensions and bans by allowing a user to still access a platform without disrupting other users’ experiences.
Users can be muted automatically based on behavioral triggers (this is a key feature in Two Hat’s patented User Reputation technology) or can be set manually by moderators. You can even define a specific timeframe for muting, for example, 1 hour, 30 minutes, etc. Once that time period has elapsed, the user is able to chat again.
We believe that social platforms should strive to enable user interactions and not stifle them. Sometimes that means tackling the negative behavior that gets in the way of productive and positive interactions. If you pair muting with warning messages, clearly and timely indicating why a user is receiving this time out and what is the desired behavior, you can guide them to a more productive path that’s aligned with your community’s purpose.
Map your behavioral modification and sanctioning approach by considering these questions.