Subtopic Deep Dive
Explainable Hate Speech Detection
Research Guide
What is Explainable Hate Speech Detection?
Explainable Hate Speech Detection uses interpretable AI techniques such as attention mechanisms and counterfactuals to provide rationales for hate speech classification decisions.
This subtopic integrates XAI methods with hate speech models to enhance transparency in content moderation. Key works include Mehta and Passi (2022) applying XAI to deep learning for social media hate speech detection (75 citations). Meske and Bunde (2022) outline design principles for user interfaces in explainable hate speech systems (52 citations). Over 10 papers from 2020-2024 address explainability in this domain.
Why It Matters
Transparency in hate speech detection supports regulatory compliance on platforms like YouTube, as explored by Ma and Kou (2021) on algorithmic moderation impacts (70 citations). It builds user trust through personalized moderation interfaces, per Jhaver et al. (2023) user studies (65 citations). XAI rationales mitigate biases in politically skewed data, as shown by Wich et al. (2020) on hate speech classification (55 citations), enabling fairer deployment in social media.
Key Research Challenges
Interpreting Deep Learning Decisions
Deep models for hate speech lack transparency, complicating rationale extraction. Mehta and Passi (2022) highlight needs for XAI in understanding AI decisions on social media content. Attention mechanisms often fail to align with human judgments.
User Interface Design
Effective UI for explainable systems remains underdeveloped. Meske and Bunde (2022) propose principles for AI decision support in hate speech detection. Challenges include balancing detail and usability for moderators.
Bias in Training Data
Politically biased datasets distort hate speech models. Wich et al. (2020) demonstrate impacts on classification accuracy. Explainability must reveal and correct these biases for reliable deployment.
Essential Papers
Taxonomy of Risks posed by Language Models
Laura Weidinger, Jonathan Uesato, Maribeth Rauh et al. · 2022 · 2022 ACM Conference on Fairness, Accountability, and Transparency · 482 citations
Responsible innovation on large-scale Language Models (LMs) re- quires foresight into and in-depth understanding of the risks these models may pose. This paper develops a comprehensive taxon- omy o...
Detection and moderation of detrimental content on social media platforms: current status and future directions
Vaishali U. Gongane, Mousami V. Munot, Alwin Anuse · 2022 · Social Network Analysis and Mining · 112 citations
A Human-Centered Systematic Literature Review of the Computational Approaches for Online Sexual Risk Detection
Afsaneh Razi, Seung-Hyun Kim, Ashwaq Alsoubai et al. · 2021 · Proceedings of the ACM on Human-Computer Interaction · 82 citations
In the era of big data and artificial intelligence, online risk detection has become a popular research topic. From detecting online harassment to the sexual predation of youth, the state-of-the-ar...
Social Media Hate Speech Detection Using Explainable Artificial Intelligence (XAI)
Harshkumar Mehta, Kalpdrum Passi · 2022 · Algorithms · 75 citations
Explainable artificial intelligence (XAI) characteristics have flexible and multifaceted potential in hate speech detection by deep learning models. Interpreting and explaining decisions made by co...
"How advertiser-friendly is my video?": YouTuber's Socioeconomic Interactions with Algorithmic Content Moderation
Renkai Ma, Yubo Kou · 2021 · Proceedings of the ACM on Human-Computer Interaction · 70 citations
To manage user-generated harmful video content, YouTube relies on AI algorithms (e.g., machine learning) in content moderation and follows a retributive justice logic to punish convicted YouTubers ...
Personalizing Content Moderation on Social Media: User Perspectives on Moderation Choices, Interface Design, and Labor
Shagun Jhaver, Alice Qian Zhang, Quan Ze Chen et al. · 2023 · Proceedings of the ACM on Human-Computer Interaction · 65 citations
Social media platforms moderate content for each user by incorporating the outputs of both platform-wide content moderation systems and, in some cases, user-configured personal moderation preferenc...
Exhaustive Study into Machine Learning and Deep Learning Methods for Multilingual Cyberbullying Detection in Bangla and Chittagonian Texts
Tanjim Mahmud, Michał Ptaszyński, Fumito Masui · 2024 · Electronics · 61 citations
Cyberbullying is a serious problem in online communication. It is important to find effective ways to detect cyberbullying content to make online environments safer. In this paper, we investigated ...
Reading Guide
Foundational Papers
No pre-2015 foundational papers available; start with Wich et al. (2020) for bias basics in hate speech models.
Recent Advances
Prioritize Mehta and Passi (2022) for XAI applications and Meske and Bunde (2022) for UI designs in detection systems.
Core Methods
Core techniques: attention-based explanations (Mehta and Passi, 2022), user-centered interfaces (Meske and Bunde, 2022), bias auditing (Wich et al., 2020).
How PapersFlow Helps You Research Explainable Hate Speech Detection
Discover & Search
Research Agent uses searchPapers and exaSearch to find explainable hate speech papers like Mehta and Passi (2022), then citationGraph reveals connections to Meske and Bunde (2022) and Jhaver et al. (2023). findSimilarPapers expands to related XAI moderation works.
Analyze & Verify
Analysis Agent applies readPaperContent to extract XAI methods from Mehta and Passi (2022), verifies claims with CoVe chain-of-verification, and uses runPythonAnalysis for re-running attention visualizations with NumPy/pandas. GRADE grading scores evidence strength in bias studies like Wich et al. (2020).
Synthesize & Write
Synthesis Agent detects gaps in UI designs post-Meske and Bunde (2022), flags contradictions in bias explanations from Wich et al. (2020). Writing Agent uses latexEditText, latexSyncCitations for rationale diagrams, and latexCompile to produce publication-ready reports with exportMermaid for model flows.
Use Cases
"Reproduce attention mechanism bias analysis from hate speech XAI papers"
Research Agent → searchPapers('explainable hate speech attention') → Analysis Agent → runPythonAnalysis (load Mehta 2022 datasets, plot attention weights with matplotlib) → statistical verification output with bias metrics.
"Draft a paper section on UI principles for explainable hate speech detection"
Synthesis Agent → gap detection (Meske 2022) → Writing Agent → latexEditText (insert principles) → latexSyncCitations (add Jhaver 2023) → latexCompile → PDF with integrated rationale diagrams.
"Find GitHub repos with code for XAI hate speech models"
Research Agent → paperExtractUrls (Mehta 2022) → Code Discovery → paperFindGithubRepo → githubRepoInspect → cleaned code snippets and reproduction notebooks.
Automated Workflows
Deep Research workflow conducts systematic reviews of 50+ papers on XAI hate speech via searchPapers → citationGraph → structured reports with GRADE scores. DeepScan applies 7-step analysis with CoVe checkpoints to verify Mehta and Passi (2022) methods against Wich et al. (2020) biases. Theorizer generates hypotheses on UI improvements from Meske and Bunde (2022) literature.
Frequently Asked Questions
What defines Explainable Hate Speech Detection?
It applies XAI techniques like attention and counterfactuals to justify hate speech predictions, as in Mehta and Passi (2022).
What are key methods in this subtopic?
Methods include attention mechanisms and UI designs for rationales, detailed in Mehta and Passi (2022) and Meske and Bunde (2022).
What are major papers?
Mehta and Passi (2022, 75 citations) on XAI for hate speech; Meske and Bunde (2022, 52 citations) on UI principles.
What open problems exist?
Challenges include bias mitigation in explanations (Wich et al., 2020) and scalable UIs for real-time moderation.
Research Hate Speech and Cyberbullying Detection with AI
PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:
AI Literature Review
Automate paper discovery and synthesis across 474M+ papers
Code & Data Discovery
Find datasets, code repositories, and computational tools
Deep Research Reports
Multi-source evidence synthesis with counter-evidence
AI Academic Writing
Write research papers with AI assistance and LaTeX support
See how researchers in Computer Science & AI use PapersFlow
Field-specific workflows, example queries, and use cases.
Start Researching Explainable Hate Speech Detection with AI
Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.
See how PapersFlow works for Computer Science researchers