Subtopic Deep Dive

← Hate Speech and Cyberbullying Detection

Explainable Hate Speech Detection
Research Guide

What is Explainable Hate Speech Detection?

Explainable Hate Speech Detection uses interpretable AI techniques such as attention mechanisms and counterfactuals to provide rationales for hate speech classification decisions.

This subtopic integrates XAI methods with hate speech models to enhance transparency in content moderation. Key works include Mehta and Passi (2022) applying XAI to deep learning for social media hate speech detection (75 citations). Meske and Bunde (2022) outline design principles for user interfaces in explainable hate speech systems (52 citations). Over 10 papers from 2020-2024 address explainability in this domain.

Curated Papers

Key Challenges

Why It Matters

Transparency in hate speech detection supports regulatory compliance on platforms like YouTube, as explored by Ma and Kou (2021) on algorithmic moderation impacts (70 citations). It builds user trust through personalized moderation interfaces, per Jhaver et al. (2023) user studies (65 citations). XAI rationales mitigate biases in politically skewed data, as shown by Wich et al. (2020) on hate speech classification (55 citations), enabling fairer deployment in social media.

Key Research Challenges

Interpreting Deep Learning Decisions

Deep models for hate speech lack transparency, complicating rationale extraction. Mehta and Passi (2022) highlight needs for XAI in understanding AI decisions on social media content. Attention mechanisms often fail to align with human judgments.

User Interface Design

Effective UI for explainable systems remains underdeveloped. Meske and Bunde (2022) propose principles for AI decision support in hate speech detection. Challenges include balancing detail and usability for moderators.

Bias in Training Data

Politically biased datasets distort hate speech models. Wich et al. (2020) demonstrate impacts on classification accuracy. Explainability must reveal and correct these biases for reliable deployment.

Essential Papers

Taxonomy of Risks posed by Language Models

Laura Weidinger, Jonathan Uesato, Maribeth Rauh et al. · 2022 · 2022 ACM Conference on Fairness, Accountability, and Transparency · 482 citations

Responsible innovation on large-scale Language Models (LMs) re- quires foresight into and in-depth understanding of the risks these models may pose. This paper develops a comprehensive taxon- omy o...

Detection and moderation of detrimental content on social media platforms: current status and future directions

Vaishali U. Gongane, Mousami V. Munot, Alwin Anuse · 2022 · Social Network Analysis and Mining · 112 citations

A Human-Centered Systematic Literature Review of the Computational Approaches for Online Sexual Risk Detection

Afsaneh Razi, Seung-Hyun Kim, Ashwaq Alsoubai et al. · 2021 · Proceedings of the ACM on Human-Computer Interaction · 82 citations

In the era of big data and artificial intelligence, online risk detection has become a popular research topic. From detecting online harassment to the sexual predation of youth, the state-of-the-ar...

Social Media Hate Speech Detection Using Explainable Artificial Intelligence (XAI)

Harshkumar Mehta, Kalpdrum Passi · 2022 · Algorithms · 75 citations

Explainable artificial intelligence (XAI) characteristics have flexible and multifaceted potential in hate speech detection by deep learning models. Interpreting and explaining decisions made by co...

"How advertiser-friendly is my video?": YouTuber's Socioeconomic Interactions with Algorithmic Content Moderation

Renkai Ma, Yubo Kou · 2021 · Proceedings of the ACM on Human-Computer Interaction · 70 citations

To manage user-generated harmful video content, YouTube relies on AI algorithms (e.g., machine learning) in content moderation and follows a retributive justice logic to punish convicted YouTubers ...

Personalizing Content Moderation on Social Media: User Perspectives on Moderation Choices, Interface Design, and Labor

Shagun Jhaver, Alice Qian Zhang, Quan Ze Chen et al. · 2023 · Proceedings of the ACM on Human-Computer Interaction · 65 citations

Social media platforms moderate content for each user by incorporating the outputs of both platform-wide content moderation systems and, in some cases, user-configured personal moderation preferenc...

Exhaustive Study into Machine Learning and Deep Learning Methods for Multilingual Cyberbullying Detection in Bangla and Chittagonian Texts

Tanjim Mahmud, Michał Ptaszyński, Fumito Masui · 2024 · Electronics · 61 citations

Cyberbullying is a serious problem in online communication. It is important to find effective ways to detect cyberbullying content to make online environments safer. In this paper, we investigated ...

Reading Guide

Foundational Papers

No pre-2015 foundational papers available; start with Wich et al. (2020) for bias basics in hate speech models.

Recent Advances

Prioritize Mehta and Passi (2022) for XAI applications and Meske and Bunde (2022) for UI designs in detection systems.

Core Methods

Core techniques: attention-based explanations (Mehta and Passi, 2022), user-centered interfaces (Meske and Bunde, 2022), bias auditing (Wich et al., 2020).

How PapersFlow Helps You Research Explainable Hate Speech Detection

Discover & Search

Research Agent uses searchPapers and exaSearch to find explainable hate speech papers like Mehta and Passi (2022), then citationGraph reveals connections to Meske and Bunde (2022) and Jhaver et al. (2023). findSimilarPapers expands to related XAI moderation works.

Analyze & Verify

Analysis Agent applies readPaperContent to extract XAI methods from Mehta and Passi (2022), verifies claims with CoVe chain-of-verification, and uses runPythonAnalysis for re-running attention visualizations with NumPy/pandas. GRADE grading scores evidence strength in bias studies like Wich et al. (2020).

Synthesize & Write

Synthesis Agent detects gaps in UI designs post-Meske and Bunde (2022), flags contradictions in bias explanations from Wich et al. (2020). Writing Agent uses latexEditText, latexSyncCitations for rationale diagrams, and latexCompile to produce publication-ready reports with exportMermaid for model flows.

Use Cases

"Reproduce attention mechanism bias analysis from hate speech XAI papers"

Research Agent → searchPapers('explainable hate speech attention') → Analysis Agent → runPythonAnalysis (load Mehta 2022 datasets, plot attention weights with matplotlib) → statistical verification output with bias metrics.

"Draft a paper section on UI principles for explainable hate speech detection"

Synthesis Agent → gap detection (Meske 2022) → Writing Agent → latexEditText (insert principles) → latexSyncCitations (add Jhaver 2023) → latexCompile → PDF with integrated rationale diagrams.

"Find GitHub repos with code for XAI hate speech models"

Research Agent → paperExtractUrls (Mehta 2022) → Code Discovery → paperFindGithubRepo → githubRepoInspect → cleaned code snippets and reproduction notebooks.

Automated Workflows

Deep Research workflow conducts systematic reviews of 50+ papers on XAI hate speech via searchPapers → citationGraph → structured reports with GRADE scores. DeepScan applies 7-step analysis with CoVe checkpoints to verify Mehta and Passi (2022) methods against Wich et al. (2020) biases. Theorizer generates hypotheses on UI improvements from Meske and Bunde (2022) literature.

Try Doxa for Explainable Hate Speech Detection Research

Frequently Asked Questions

What defines Explainable Hate Speech Detection?

It applies XAI techniques like attention and counterfactuals to justify hate speech predictions, as in Mehta and Passi (2022).

What are key methods in this subtopic?

Methods include attention mechanisms and UI designs for rationales, detailed in Mehta and Passi (2022) and Meske and Bunde (2022).

What are major papers?

Mehta and Passi (2022, 75 citations) on XAI for hate speech; Meske and Bunde (2022, 52 citations) on UI principles.

What open problems exist?

Challenges include bias mitigation in explanations (Wich et al., 2020) and scalable UIs for real-time moderation.

Research Hate Speech and Cyberbullying Detection with AI

PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:

AI Literature Review

Automate paper discovery and synthesis across 474M+ papers

Code & Data Discovery

Find datasets, code repositories, and computational tools

Deep Research Reports

Multi-source evidence synthesis with counter-evidence

AI Academic Writing

Write research papers with AI assistance and LaTeX support

See how researchers in Computer Science & AI use PapersFlow

Field-specific workflows, example queries, and use cases.

Computer Science & AI Guide

Start Researching Explainable Hate Speech Detection with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

Try PapersFlow Free See AI Literature Review

See how PapersFlow works for Computer Science researchers

Part of the Hate Speech and Cyberbullying Detection Research Guide