To make the internet of the future a safer and more enjoyable place, it is critical to get a clearly defined minimum standard of Safety by Design established internet-wide. That said, it is important to recognize that “Design for Scale” and “Design for Monetization” are the embedded norms.
Many websites and apps are built to reach live state as a first priority, and forget safety or fail to come back to it until their product is mired in a situation where making it safe is very hard. To that end, it’s important we develop guidelines for startups and SMEs to understand best practices for Safety by Design, and access resources to help them build that way.
The regulation stems from the concept of “Duty of Care”. This is an old concept that says if you are going to make a social space, such as a nightclub, you have a responsibility to ensure it is safe. Likewise, we need to learn from our past mistakes and build out shared standards of best practices so users don’t get hurt in our online social spaces.
We believe that there are four layers of protection every site should have:
Communities don’t just happen, we create them. In real life, if you add a swing set to a park, the community expectation is that it is a place for kids. As a society, we change our language and behaviour based on that environment. We still have free speech, but we regulate ourselves for the benefit of the kids. The adult equivalent of this scenario is a nightclub; the environment allows for a loosening of behavioural norms, but step out of line with house rules and the establishment’s bouncers deal with you. Likewise, step out of line while online, and there must be consequences.
2. Embedded filters that are situationally appropriate
Many don’t add automated filters because they are afraid of the slippery slope of inhibiting free speech. In so doing they fall down the other slippery slope – doing nothing — allowing harm to continue. For the most part, this is a solved problem. You can buy off-the-shelf solutions just like you can buy anti-virus technology that matches known signatures of things users say or share. These filters must be on every social platform, app, and web site.
3. Using User Reputation to make smarter decisions
Reward positive users. For those who keep harassing everyone else, take automated action. Two Hat are pioneers of a new technique where you can give all users maximum expression by only filtering the worst abusive content, and then increasing the filter level incrementally on those who harass others. Predictive Moderation based on user reputation is a must.
4. Let users report bad content
If someone has to report something then harm is already done. Everything that users can create needs to be able to be reported. When content is reported, record the moderator decisions (in a pseudonymized, minimized way) and train AI (like our Predictive Moderation) to scale out the easy decision-making and escalate critical issues. Engaging and empowering users to assist in identifying and escalating objectionable content is a must.
Why we must create a better internet
In 2019, the best human intentions paired with best technology platforms and companies in the world couldn’t stop a terrorist from live-streaming the murder of innocents. We still can’t understand why 1.5 million chose to share it.
What we can do is continue to build and connect datasets and train AI models to get better. We can also find new ways to work together to make the internet a better, safer, place.
We’ll know it’s working when exposure to bullying, hate, abuse, and exploitation no longer feels like the price of admission for being online.
To learn more about Two Hat’s vision for a better internet that’s Safe by Design, download our white paper By Design: 6 Tenets for a Safer Internet.
Addressing the challenges of moderating the world’s content requires both artificial intelligence and human interaction, the formula we refer to as AI+HI. In the case of triggers specifically, the essential task is to align key moments of behavior with an automated process triggered by those behaviors.
In simplest terms, that means the ability for technology to intelligently identify not just content, but particular actions, or set of actions, that are known to cause harm. By setting various triggers and measuring their impact, it is possible to prevent harmful content and messages of bullying or hate speech from ever making it online.
More than this, triggers can be set to increase or reduce different user permissions, such as what they are allowed to share or comment on, and can also be used to escalate contentious or urgent issues to human moderators. The following use cases are based on some of our client’s approaches for applying automated triggers in managing User Reputation, incidents of encouragement of suicide or self-harm, and live-streamed content similar to the Christchurch terrorist video.
1. Setting Trust Levels to vet new users
The ability to automatically adjust users’ Trust Levels is a key component of our patented User Reputation technology. Consider placing new user accounts into a special moderation wherein their posts are pre-moderated before sharing until the user reaches a certain threshold of trust, i.e. five posts have to be approved by moderators before they go into the standard workflow for posting. As a user’s trust rating improves, they can be automatically triggered to new privilege or access levels.
Conversely, triggers can be set to automatically reduce trust levels based on incidents of flagged behavior, which in turn could restrict future ability to share, etc. If your community uses profile recognition (e.g. rankings, stickers, etc.), these could also be publicly applied or removed based on X threshold being met.
2. Escalating responses for incidents of self harm or suicide
Certain strings of text and discussion are known to indicate either a will towards self harm, or the encouragement of self harm by another. Incidents of encouragement of self harm are of particular concern in communities frequented by young people.
In these incidents, triggers could be applied to mute any harassing party, send a written message to at-risk users, escalate the incident to human interaction e.g. a phone call, or even alert local police or medical professionals in real-time to a possible mental health crisis.
3. Identifying and responding to live streaming events in real time
AI can only act on things it has seen many times before (computer vision models require 50,000 examples to do it well). For live streaming events, such as the Christchurch shootings, AI is currently able to detect a gun and threat of violence before the shooter even enters the front door. However (and fortunately), events like the Christchurch shooting haven’t happened enough for AI to really learn from them. But it’s not just one murderous rampage — live streaming of bullying, fights, thefts, sexual assaults and even killings are all too common.
To help manage the response to such incidents, triggers can be set that use language signals to escalate content that requires human intervention. For example, an escalation could be set based on text conversations around the live stream: “Is this really happening?” “Is that a real gun?” “OMG he’s shooting.” Somebody please help,” etc.
In concert with improving AI models for image recognition and violent acts, these triggers could alert human moderators, network operators and law enforcement of events in real-time. This, in turn, will be able to prevent future violent live streams from making their way online and limit the virality and reach of content that does e.g. once identified, an automated trigger prevents users from sharing it.
For the first time in history, the collective global will exists to make the internet a safer place. By learning to use set automated triggers to manage common incidents and workflows, content platforms can ensure faster response times to critical incidents, reduce stress on human moderators, and provide users with a safer, more enjoyable experience.
As I write this, we are a little more than two months removed from the terrorist attacks in Christchurch. Among many things, Christchurch will be remembered as the incident that galvanized world view, and more importantly global action, around online safety.
In the last two months, there has been a seismic shift in how we look at internet safety and how content is shared. Governments in London, Sydney, Washington, DC, Paris and Ottawa are considering or introducing new laws, financial penalties and even prison time for those who fail to remove harmful content and do so quickly. Others will follow, and that’s a good thing — securing the internet’s future requires the world’s governments to collectively raise the bar on safety, and cooperate across boundaries.
In order to reach this shared goal, it is essential that technology companies engage fully as partners. We witnessed a huge step forward in just last week when Facebook, Amazon, and other tech leaders came out in strong support of the Christchurch Call to Action. Two Hat stands proudly with them.
Crisis protocols for service providers and regulators are essential, as well — we have to get better at managing incidents when they happen. Two Hat also echoes the need for bilateral education initiatives with the goal of helping people become better informed and safer internet users.
In all cases, open collaboration between technology companies, government, not for profit organizations, and both public and private researchers will be essential to create an internet of the future that is Safe by Design. AI + HI (artificial intelligence plus human intelligence) is the formula we talk about that can make it happen.
AI+HI is the perfect marriage of machines, which excel at processing billions of units of data quickly, guided by humans, who provide empathy, compassion and critical thinking. Add a shared global understanding of what harmful content is and how we define and categorize it, and we are starting to address online safety in a coordinated way.
New laws and technology solutions to moderate internet content are necessary instruments to help prevent the incitement of violence and the spread of online hate, terror and abuse. Implementing duty of care measures in the UK and around the world requires a purposeful, collective effort to create a healthier and safer internet for everyone.
Our vision of that safer internet will be realized when exposure to hate, abuse, violence and exploitation no longer feels like the price of admission for being online.
The United Kingdom’s new duty of care legislation, the Christchurch Call to Action, and the rise of the world’s collective will move us closer to that day.
Two Hat is currently offering no cost, no obligation community audits for anyone who could benefit from a second look at their moderation techniques.
Our Director of Community Trust & Safety will examine your community, locate areas of potential risk, and provide you with a personalized community analysis, including recommended best practices and tips to maximize user engagement. This is a unique opportunity to gain insight into your community from an industry expert.
Today, user-generated content like chat, private messaging, comments, images, and videos are all must-haves in an overstuffed market where user retention is critical to long-term success. Users love to share, and nothing draws a crowd like a crowd — and a crowd of happy, loyal, and welcoming users will always bring in more happy, loyal, and welcoming users.
But as we’ve seen all too often, there is risk involved when you have social features on your platform. You run the risk of users posting offensive content – like hate speech, NSFW images, and harassment – which can cause serious damage to your brand’s reputation.
That’s why understanding the risks when adding social features to your product are also critical to long-term success.
Here are four questions to consider when it comes to user-generated content on your platform.
1. How much risk is my brand willing to accept?
Every brand is different. Community demographic will usually be a major factor in determining your risk tolerance.
Communities with under-13 users in the US have to be COPPA compliant, so preventing them from sharing PII (personally identifiable information) is essential. Edtech platforms should be CIPA and FERPA compliant.
If your users are teens and 18+, you might be less risk-averse, but will still need to define your tolerance for high-risk content.
Consider your brand’s tone and history. Review your corporate guidelines to understand what your brand stands for. This is a great opportunity to define exactly what kind of an online community you want to create.
2. What type of high-risk content is most dangerous to my brand?
Try this exercise: Imagine that just one pornographic post was shared on your platform. How would it affect the brand? How would your audience react? How would your executive team respond? What would happen if the media/press found out?
What about hate speech? Sexual harassment? What is your brand’s definition of abuse or harassment? The better you can define these often vague terms, the better you will understand what kind of content you need to moderate.
3. How will I communicate my expectations to the community?
Don’t expect your users to automatically know what is and isn’t acceptable on your platform. Post your community guidelines where users can see them. And make sure users have to agree to your guidelines before they can post.
4. What content moderation tools and strategies can I leverage to protect my community?
We recommend taking a proactive instead of a reactive approach to managing risk and protecting your community. That means finding the right blend of pre- and post-moderation for your platform, while also using a mixture of automated artificial intelligence with real human moderation.
On top of these techniques, there are also different tools you can use to take a proactive approach, including in-house filters (read about the build internally vs buy externally debate), or content moderation solutions like Two Hat’s Community Sift (learn about the difference between a simple profanity filter and a content moderation tool).
While social features may be inherently risky, remember that they’re also inherently beneficial to your brand and your users. Whether you’re creating a new social platform or adding chat and images to your existing product, nothing engages and delights users more than being part of a positive and healthy online community.
And if you’re not sure where to start – we have good news.
Two Hat is currently offering a no-cost, no-obligation community audit. Our team of industry experts will examine your community, locate high-risk areas, and identify how we can help solve any moderation challenges.
It’s a unique opportunity to sit down with our Director of Community Trust & Safety to see how you can mitigate risk in your community.
To book your free audit, fill out the form below and we’ll reach out with next steps!
Two Hat Security CEO Chris Priebe says the extent to which people are harassed online can end up costing businesses big bucks in the long run if companies don’t take the right steps to fight the problem. His Kelowna-based tech firm has been employing artificial-intelligence-powered tools to weed out inappropriate language or abusive content, such as pornographic images, on social networks.