Addressing the challenges of moderating the world’s content requires both artificial intelligence and human interaction, the formula we refer to as AI+HI. In the case of triggers specifically, the essential task is to align key moments of behavior with an automated process triggered by those behaviors.
In simplest terms, that means the ability for technology to intelligently identify not just content, but particular actions, or set of actions, that are known to cause harm. By setting various triggers and measuring their impact, it is possible to prevent harmful content and messages of bullying or hate speech from ever making it online.
More than this, triggers can be set to increase or reduce different user permissions, such as what they are allowed to share or comment on, and can also be used to escalate contentious or urgent issues to human moderators. The following use cases are based on some of our client’s approaches for applying automated triggers in managing User Reputation, incidents of encouragement of suicide or self-harm, and live-streamed content similar to the Christchurch terrorist video.
1. Setting Trust Levels to vet new users
The ability to automatically adjust users’ Trust Levels is a key component of our patented User Reputation technology. Consider placing new user accounts into a special moderation wherein their posts are pre-moderated before sharing until the user reaches a certain threshold of trust, i.e. five posts have to be approved by moderators before they go into the standard workflow for posting. As a user’s trust rating improves, they can be automatically triggered to new privilege or access levels.
Conversely, triggers can be set to automatically reduce trust levels based on incidents of flagged behavior, which in turn could restrict future ability to share, etc. If your community uses profile recognition (e.g. rankings, stickers, etc.), these could also be publicly applied or removed based on X threshold being met.
2. Escalating responses for incidents of self harm or suicide
Certain strings of text and discussion are known to indicate either a will towards self harm, or the encouragement of self harm by another. Incidents of encouragement of self harm are of particular concern in communities frequented by young people.
In these incidents, triggers could be applied to mute any harassing party, send a written message to at-risk users, escalate the incident to human interaction e.g. a phone call, or even to alert local police or medical professionals in real-time as to a possible mental health crisis.
3. Identifying and responding to live streaming events in real time
AI can only act on things it has seen many times before (computer vision models require 50,000 examples to do it well). For live streaming events, such as the Christchurch shootings, AI is currently able to detect a gun and threat of violence before the shooter even enters the front door. However (and fortunately), events like the Christchurch shooting haven’t happened enough for AI to really learn from them. But it’s not just one murderous rampage — live streaming of bullying, fights, thefts, sexual assaults and even killings are all too common.
To help manage the response to such incidents, triggers can be set that use language signals to escalate content that requires human intervention. For example, an escalation could be set based on text conversations around the live stream: “Is this really happening?” “Is that a real gun?” “OMG he’s shooting.” Somebody please help,” etc.
In concert with improving AI models for image recognition and violent acts, these triggers could alert human moderators, network operators and law enforcement of events in real-time. This, in turn, will be able to prevent future violent live streams from making their way online and limit the virality and reach of content that does e.g. once identified, an automated trigger prevents users from sharing it.
For the first time in history, the collective global will exists to make the internet a safer place. By learning to use set automated triggers to manage common incidents and workflows, content platforms can ensure faster response times to critical incidents, reduce stress on human moderators, and provide users with a safer, more enjoyable experience.