3 Use Cases for Automated Triggers in Content Moderation

Addressing the challenges of moderating the world’s content requires both artificial intelligence and human interaction, the formula we refer to as AI+HI. In the case of triggers specifically, the essential task is to align key moments of behavior with an automated process triggered by those behaviors.

In simplest terms, that means the ability for technology to intelligently identify not just content, but particular actions, or set of actions, that are known to cause harm. By setting various triggers and measuring their impact, it is possible to prevent harmful content and messages of bullying or hate speech from ever making it online.

More than this, triggers can be set to increase or reduce different user permissions, such as what they are allowed to share or comment on, and can also be used to escalate contentious or urgent issues to human moderators. The following use cases are based on some of our client’s approaches for applying automated triggers in managing User Reputation, incidents of encouragement of suicide or self-harm, and live-streamed content similar to the Christchurch terrorist video.

1. Setting Trust Levels to vet new users

The ability to automatically adjust users’ Trust Levels is a key component of our patented User Reputation technology. Consider placing new user accounts into a special moderation wherein their posts are pre-moderated before sharing until the user reaches a certain threshold of trust, i.e. five posts have to be approved by moderators before they go into the standard workflow for posting. As a user’s trust rating improves, they can be automatically triggered to new privilege or access levels.

Conversely, triggers can be set to automatically reduce trust levels based on incidents of flagged behavior, which in turn could restrict future ability to share, etc. If your community uses profile recognition (e.g. rankings, stickers, etc.), these could also be publicly applied or removed based on X threshold being met.  

2. Escalating responses for incidents of self harm or suicide

Certain strings of text and discussion are known to indicate either a will towards self harm, or the encouragement of self harm by another. Incidents of encouragement of self harm are of particular concern in communities frequented by young people.  

In these incidents, triggers could be applied to mute any harassing party, send a written message to at-risk users, escalate the incident to human interaction e.g. a phone call, or even to alert local police or medical professionals in real-time as to a possible mental health crisis.

3. Identifying and responding to live streaming events in real time

AI can only act on things it has seen many times before (computer vision models require 50,000 examples to do it well). For live streaming events, such as the Christchurch shootings, AI is currently able to detect a gun and threat of violence before the shooter even enters the front door. However (and fortunately), events like the Christchurch shooting haven’t happened enough for AI to really learn from them. But it’s not just one murderous rampage — live streaming of bullying, fights, thefts, sexual assaults and even killings are all too common.

To help manage the response to such incidents, triggers can be set that use language signals to escalate content that requires human intervention. For example, an escalation could be set based on text conversations around the live stream: “Is this really happening?” “Is that a real gun?” “OMG he’s shooting.” Somebody please help,” etc.

In concert with improving AI models for image recognition and violent acts, these triggers could alert human moderators, network operators and law enforcement of events in real-time. This, in turn, will be able to prevent future violent live streams from making their way online and limit the virality and reach of content that does e.g. once identified, an automated trigger prevents users from sharing it.

Summary

For the first time in history, the collective global will exists to make the internet a safer place. By learning to use set automated triggers to manage common incidents and workflows, content platforms can ensure faster response times to critical incidents, reduce stress on human moderators, and provide users with a safer, more enjoyable experience.

Prepare for Online Harms Legislation With a Community Audit

Duty of Care

The regulatory landscape is changing rapidly. In the last two months, we have seen huge changes in the UK and Australia, with potentially more countries to follow, including France and Canada. And just this week 18 countries and 8 major tech companies pledged to eliminate terrorist and violent extremist content online in the Christchurch Call.

As part of my job as a Trust and Safety professional, I’ve been studying the UK Online Harms white paper, which proposes establishing a Duty of Care law, which would hold companies accountable for online harms on their platforms. Online harms would include anything from illegal activity and content to behaviours which are “harmful but not necessarily illegal.”

It’s an important read and I encourage everyone in the industry to spend time reviewing the Department for Digital, Culture, Media & Sports’ proposal because it could very well end up the basis for similar legislation around the world.

Safety by Design

All of this has got me thinking – how can platforms be proactive and embed purposeful content moderation at the core of their DNA? As an industry, none of us want hate speech, extremism, or abuse happening on our platforms – but how prepared are we to comply with changing regulations? Where are our best practices?

Are we prepared to deal with the increasing challenges to maintain healthy spaces online?

The changes are complex but also deeply important – eSafety Commissioner in Australia has identified three Safety by Design (SbD) principles and are creating a framework for SbD, with a white paper set to be published in the coming months. It’s exciting that they are proactively establishing best practices guidance for online safety.

Organizations like the Fair Play Alliance are also taking a proactive path and looking at how the very design of products (online games, in this particular case) can be conducive to productive and positive interactions while mitigating abuse and harassment.

Over the past year, I was consulted for pioneering initiatives and participated in roundtables as well as industry panels to discuss those topics. I also co-founded the FPA along with industry friends and have seen positive changes first hand as more and more companies come together to drive lasting change in this space. Now I want to do something else that can hopefully bring value – something tangible that I can provide my industry friends today.

Protect Your Platform 

To that end, I’m offering free community audits to any platform that is interested. I will examine your community, locate areas of potential risk, and provide you with a personalized community analysis, including recommended best practices and tips to maximize positive social interactions and user engagement.

Of course, I can’t provide legal advice but I can provide tips and best practices based on my years of experience, first at Disney Online Studios and now at Two Hat, working with social and gaming companies across the globe

I believe in a shared responsibility when it comes to fostering healthy online spaces and protecting users online. I’m already talking to many companies and going over the audit process with them and look forward to providing as much value as I possibly can.

If you’re concerned about community health, user safety, and compliance, let’s talk.

Witnessing the Dawn of the Internet’s Duty of Care

As I write this, we are a little more than two months removed from the terrorist attacks in Christchurch. Among many things, Christchurch will be remembered as the incident that galvanized world view, and more importantly global action, around online safety.

In the last two months, there has been a seismic shift in how we look at internet safety and how content is shared. Governments in London, Sydney, Washington, DC, Paris and Ottawa are considering or introducing new laws, financial penalties and even prison time for those who fail to remove harmful content and do so quickly. Others will follow, and that’s a good thing — securing the internet’s future requires the world’s governments to collectively raise the bar on safety, and cooperate across boundaries.

In order to reach this shared goal, it is essential that technology companies engage fully as partners. We witnessed a huge step forward in just last week when Facebook, Amazon, and other tech leaders came out in strong support of the Christchurch Call to Action. Two Hat stands proudly with them.

Clear terms of use, timely actions by social platforms on user reports of extremist content, and transparent public reporting are the building blocks of a safer internet. Two Hat also believes every web site should have baseline filtering for cyberbullying, images of sexual abuse, extremist content, and encouragement of self-harm or suicide.

Crisis protocols for service providers and regulators are essential, as well — we have to get better at managing incidents when they happen. Two Hat also echoes the need for bilateral education initiatives with the goal of helping people become better informed and safer internet users.

In all cases, open collaboration between technology companies, government, not for profit organizations, and both public and private researchers will be essential to create an internet of the future that is Safe by Design. AI + HI (artificial intelligence plus human intelligence) is the formula we talk about that can make it happen.

AI+HI is the perfect marriage of machines, which excel at processing billions of units of data quickly, guided by humans, who provide empathy, compassion and critical thinking. Add a shared global understanding of what harmful content is and how we define and categorize it, and we are starting to address online safety in a coordinated way.

New laws and technology solutions to moderate internet content are necessary instruments to help prevent the incitement of violence and the spread of online hate, terror and abuse. Implementing duty of care measures in the UK and around the world requires a purposeful, collective effort to create a healthier and safer internet for everyone.

Our vision of that safer internet will be realized when exposure to hate, abuse, violence and exploitation no longer feels like the price of admission for being online.

The United Kingdom’s new duty of care legislation, the Christchurch Call to Action, and the rise of the world’s collective will move us closer to that day.

Two Hat is currently offering no cost, no obligation community audits for anyone who could benefit from a second look at their moderation techniques.

Our Director of Community Trust & Safety will examine your community, locate areas of potential risk, and provide you with a personalized community analysis, including recommended best practices and tips to maximize user engagement. This is a unique opportunity to gain insight into your community from an industry expert.

Book your audit today.