As AI and machine learning technologies continue to advance, there is increasingly more hype – and debate – about what it can or cannot do effectively. On June 29, the World Economic Forum released a pivotal report on Digital Safety. Some of the challenges identified in the report are:

  • The pandemic created challenges for countering vaccine misinformation.
  • January 6 (storming the US Capital) has necessitated a deeper look into the relationship between social platforms and extremist activity.
  • Child exploitation and abuse material (CSEAM) has continued to spread online.

Internationally, the G7 has committed to grow the international safety tech sector. We at Two Hat have made several trips to the UK pre-pandemic to provide feedback on the new online harms bill. With attention on solving online harms on the rise, we are excited to see new startups enter the field. Over 500 new jobs were created in the last year and the industry needs to continue attracting the best technology talent to solve this problem.

AI is a Valuable Component

For many, AI is a critical part of the solution. As the largest digital safety provider, we alone handle 100B human interactions a month. To put that in scale, that is 6.57 times the volume of Twitter. If a human could review 500 items an hour you would need 1.15 million humans to review all that data. To ask humans to do that would never scale. Worse, human eyes would gloss over and miss things. Alternatively if they only saw the worst we would be subjecting humans to days filled with looking at beheadings, rape, child abuse, harassment and many other harms leading to PTSD.

AI Plus Humans is the Solution

One of our mantras at Two Hat is, “Let humans do what humans do well and let computers do what they do well.” Computers are great at scale. Teach it a clear signal like “hello” is good and “hateful badwords” are bad and it will scale that to the billions. Humans, however, understand why those “hateful badwords” are bad. They bring empathy, loosely connected context, and they can make exceptions. Humans fit things into a bigger picture while machines (as magical as they may seem) are just following rules. We need both. Thus a human feedback loop is essential. Humans provide the creativity, teach the nuances, are the ethics committee, and stay on top of emerging trends in language and culture. According to Personabots CEO, Lauren Kunze, internet trolls have tried and failed to corrupt Mitsuku, an award-winning chatbot persona, on several occasions due to human supervisors being required to approve any knowledge retained globally by the AI.

We also need multiple forms of AI. “If all you have is a hammer everything looks like a nail” – modern proverb. A common mistake we see is people relying too much on one form of AI and forgetting the other.

Let’s consider some definitions of several parts of AI:

  • Artificial Intelligence refers to any specialized task done by a machine. This includes machine learning and expert systems.
  • Expert System refers to systems that use databases of expert knowledge to offer advice or make decisions.
  • Machine Learning refers to a machine that was coded to learn a task ‘on its own’ from the data it was given, but the decisions it makes are not coded.
  • Deep Learning refers to a specific form of machine learning, which is very trendy at the moment. This type of machine learning is based on ‘deep’ artificial neural networks.

To avoid the “everything looks like a nail” we use Expert Systems, Machine Learning, and Deep Learning within our stack. The trick is to use the right tool at the right level and to have a good human feedback loop.

And, given that we only view AI as a contributor to the solution vs The Solution – it allows us to see the screws and other “non-nails” and use humans and other systems and methods more effectively than the AI hammer to solve those issues.

Don’t Leave It To Chance

There was a great article by Neal Lathia where he reminds us that we shouldn’t be afraid to launch a product without machine learning. In our case, if you know a particular offensive phrase is not acceptable in your community, you don’t need to train a giant neural network to find it. An expert system will do. The problem with a neural network in this case is that you’re leaving it to chance. You’re feeding examples of it into a black box , it begins to see it everywhere, perhaps where you don’t want it. If you give it more examples that are mislabelled or even just too many counterexamples, it may ignore it completely.

At this point we learn something from antivirus companies that impacted how we’ve modelled our company.

  1. Process 100 billion messages a month
  2. be aware of new patterns that are harming one community
  3. have humans write manual signatures that are well vetted and accurate
  4. roll that out proactively to the other communities.

Determined Users will Subvert Your Filter

“The moment you fix a new problem, the solution is obsolete.” Many think the problem is “find badword”, not realizing the moment they find “badword” then users change their behaviour and no longer use it. Now they use “ba.dword” and “b4dw0rd”. When you solve that, they move on to “pɹoʍpɐq” and “baᕍw⬡rd” and somehow hide “badword” inside “goodword” or in a phrase. After 9 years we have so many tests for these types of subversions that would make you want to give these guys an honorary phD in creative hacking.

However if you rely on logical rules alone to find “badword” in all its many subversive forms you run the risk of missing similar words. For instance, if you take the phrase “bad word” and feed it into a pre-trained machine learning model to find words that are similar, you get words like “terrible”, “horrible, and “lousy”. In the antivirus analogy, humans use their imagination to create a manual signature. They might find “badword” is trending but did they consider “terrible”, “horrible”, “lousy”. Maybe – maybe not; it depends on their imagination. This is not a good strategy if missing “lousyword” means someone may commit suicide. Obviously we are not really talking about “lousyword”, but things that really matter.

The Holistic 5 Layer Approach to Community Protection:

How do you get all your tools to work together? Self-driving cars have a piece of the answer. In that context, if the AI gets it wrong someone gets run over. To resolve that, manufacturers mount as many cameras and sensors as they can. They train multiple AI systems and blend them together. If one system fails another takes over. My new van can read the lines on the side of the road and “assist” me by turning the corner on the highway. One day I was coming home from skiing with my kids in the back and it flashed at me telling me humans were required.

To scale to billions of messages we need that multi-layered approach. If one layer is defeated there is another behind it to back us up. If the AI is not confident, it should call in humans and it should learn from them. That is why Community Sift has 5 Layers of Community Protection. Each layer combines AI plus human insight, using the best of both.

  • Community Guidelines: Tell your community what you expect. In this way, you are creating safety via defining the acceptable context for your community. This is incredibly effective, as it solves the problem before it’s even begun. You are creating a place of community so set the tone at the door. This can be as simple as a short page of icons of what the community is about as you sign up. You can learn more about designing for online safety here. Additionally, our Director of Trust & Safety, Carlos Figueiredo, consults clients on setting the right community tone from the ground up and creating community guidelines as the foundational element to community health and safety operations.
  • Classify and Filter: The moment you state in your Community Guidelines that you will not tolerate harassment, abuse, child exploitation and hate, someone will test you to see if you really care. The classify and filter line of defense backs up your promise that you actually care about these things by finding and removing the obviously bad and damaging. Think of this like anti-virus technology but for words and images. This should focus on what are “deal-breakers” to your company; things once seen that cannot be unseen. Things that will violate the trust your community has in you. Just like with anti-virus technology, you use a system that works across the industry so that new trends and signatures can keep you safe in real-time.
  • User Reputation: Some online harm occurs in the borderline content over several interactions. You don’t want to over-filter for this because it restricts communication and frustrates normally positive members of your community. In this layer we address those types of harm by building a reputation on each user and on each context. There is a difference between a normally positive community member exceptionally sharing something offensive and a bad actor or a bot willfully trying to disrupt normal interactions. For example, it may be okay that someone says “do you want to buy” once. It is not okay if they say it 20 times in a row. In a more advanced sense, everything about buying is marked as borderline spam. For new and long standing users that may be allowed. But for people or bots that misuse that privilege, it is taken away automatically and automatically re-added when they go back to normal. The same principle works for sexual harassment, hate-speech, grooming of children, and filter manipulations. All those categories are full of borderline words and counter statements that need context. If context is King then reputation is Queen. Working in concert with the other two layers, user reputation is used to discourage bad actors while only reinforcing the guidelines for the occasional misstep.
  • User Report Automation: Even with the best technology in the above three layers, some things will get through. We need another layer of protection. Anytime you allow users to add content, allow other users to report that content. Feedback from your community is essential to keep the other three layers fresh and relevant. As society is continuously establishing new norms, your community is doing the same and telling you through user reports. Those same reports can also tell you a crisis is emerging. Our custom AI learns to take the same actions your moderators take consistently, reducing manual review by up to 70%, so your human moderators can focus on the things that matter.
  • Transparency Reports: In addition to legislation being introduced worldwide requiring transparency from social networks on safety measures, data insights from the other four layers drive actions to improve your communities. Are the interactions growing over time? Are you filtering too heavily and restricting the flow of communication? Is bullying on the rise in a particular language? How long does it take you to respond to suicide or a public threat? How long are high reputation members contributing to the community? These data insights demonstrate the return on investment of community management because a community that plays well together stays together. A community that stays together longer builds the foundation and potential for a successful business.

To Achieve Digital Safety, Use A Multi-Layered Approach

Digital safety is a complex problem which is getting increasing attention from international governments and not-for-profit organizations like the World Economic Forum. AI is a critical part of the solution, but AI alone is not enough. To scale to billions of messages we need that multi-layered approach that blends multiple types of AI systems together with human creativity and agility to respond to emerging trends. At the end of the day, digital safety is not just classifying and filtering bad words and phrases. Digital safety is about appropriately handling what really matters.


Request Demo