Subtopic Deep Dive

Interpretable Models versus Post-Hoc Explanations
Research Guide

What is Interpretable Models versus Post-Hoc Explanations?

Interpretable models versus post-hoc explanations compares inherently transparent models like decision trees and rule lists against post-hoc interpretability methods applied to black-box models, evaluating trade-offs in accuracy, fidelity, and human understanding.

This subtopic examines intrinsically interpretable models that provide direct transparency versus post-hoc techniques like LIME and SHAP that explain opaque models after training. Key surveys include Murdoch et al. (2019) with 1925 citations defining interpretability methods and Carvalho et al. (2019) with 1644 citations surveying metrics. Over 10 provided papers since 2019 benchmark these approaches across domains like healthcare.

11
Curated Papers
3
Key Challenges

Why It Matters

In regulated fields like healthcare, interpretable models ensure direct accountability, while post-hoc methods enable high-accuracy black boxes with explanations, as surveyed by Tjoa and Guan (2020, 1908 citations) for medical XAI. Benchmarks reveal post-hoc vulnerabilities like those in Slack et al. (2020, 680 citations) fooling LIME and SHAP, impacting trustworthiness in clinical decisions (Antoniadi et al., 2021, 525 citations). This debate drives standards for AI deployment in high-stakes applications, balancing performance and transparency as reviewed by Samek et al. (2021, 1177 citations).

Key Research Challenges

Fidelity-Accuracy Trade-off

Interpretable models sacrifice predictive accuracy for transparency, while post-hoc methods risk low fidelity to the black-box model. Murdoch et al. (2019) highlight this confusion in interpretability definitions. Carvalho et al. (2019) survey metrics showing sparse models underperform dense ones.

Post-Hoc Explanation Vulnerabilities

Techniques like LIME and SHAP can be manipulated or fail under distribution shifts. Slack et al. (2020) demonstrate fooling attacks reducing reliability. Zhou et al. (2021) evaluate explanation quality metrics exposing these flaws.

Human Interpretability Evaluation

Assessing if explanations aid human decision-making remains subjective without standardized metrics. Burkart and Huber (2021) survey explainability noting opaque insights for high-stakes use. Hassija et al. (2023) review black-box interpretation challenges in real-world trust.

Essential Papers

1.

Definitions, methods, and applications in interpretable machine learning

William J. Murdoch, Chandan Singh, Karl Kumbier et al. · 2019 · Proceedings of the National Academy of Sciences · 1.9K citations

Significance The recent surge in interpretability research has led to confusion on numerous fronts. In particular, it is unclear what it means to be interpretable and how to select, evaluate, or ev...

2.

A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI

Erico Tjoa, Cuntai Guan · 2020 · IEEE Transactions on Neural Networks and Learning Systems · 1.9K citations

Recently, artificial intelligence and machine learning in general have demonstrated remarkable performances in many tasks, from image processing to natural language processing, especially with the ...

3.

Machine Learning Interpretability: A Survey on Methods and Metrics

Diogo V. Carvalho, Eduardo M. Pereira, Jaime S. Cardoso · 2019 · Electronics · 1.6K citations

Machine learning systems are becoming increasingly ubiquitous. These systems’s adoption has been expanding, accelerating the shift towards a more algorithmic society, meaning that algorithmically i...

4.

Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence

Vikas Hassija, Vinay Chamola, A. Mahapatra et al. · 2023 · Cognitive Computation · 1.3K citations

Abstract Recent years have seen a tremendous growth in Artificial Intelligence (AI)-based methodological development in a broad range of domains. In this rapidly evolving field, large number of met...

5.

Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications

Wojciech Samek, Grégoire Montavon, Sebastian Lapuschkin et al. · 2021 · Proceedings of the IEEE · 1.2K citations

With the broader and highly successful usage of machine learning in industry\nand the sciences, there has been a growing demand for Explainable AI.\nInterpretability and explanation methods for gai...

6.

A Survey on the Explainability of Supervised Machine Learning

Nadia Burkart, Marco F. Huber · 2021 · Journal of Artificial Intelligence Research · 900 citations

Predictions obtained by, e.g., artificial neural networks have a high accuracy but humans often perceive the models as black boxes. Insights about the decision making are mostly opaque for humans. ...

7.

Fooling LIME and SHAP

Dylan Slack, Sophie Hilgard, Emily Jia et al. · 2020 · Proceedings of the AAAI/ACM Conference on AI Ethics and Society · 680 citations

As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these b...

Reading Guide

Foundational Papers

Start with Murdoch et al. (2019) for core definitions and methods resolving interpretability confusion, then Kelly (2011) on simplicity-truth trade-offs in early foundations.

Recent Advances

Study Slack et al. (2020) for post-hoc vulnerabilities, Hassija et al. (2023) reviewing black-box methods, Antoniadi et al. (2021) on clinical challenges.

Core Methods

Intrinsically interpretable: decision trees, rule lists; post-hoc: LIME (local surrogates), SHAP (additive features), gradient visualization. Core metrics: fidelity, stability, human comprehensibility from Carvalho et al. (2019) and Zhou et al. (2021).

How PapersFlow Helps You Research Interpretable Models versus Post-Hoc Explanations

Discover & Search

Research Agent uses citationGraph on Murdoch et al. (2019) to map 1925-cited works linking interpretable models to post-hoc methods, then findSimilarPapers uncovers Carvalho et al. (2019) for metrics; exaSearch queries 'fidelity trade-offs decision trees SHAP' retrieves 50+ papers like Slack et al. (2020).

Analyze & Verify

Analysis Agent applies readPaperContent to Slack et al. (2020) exposing LIME/SHAP vulnerabilities, then verifyResponse with CoVe chain-of-verification flags contradictions; runPythonAnalysis recreates fooling attacks using NumPy/pandas for statistical verification, with GRADE grading on evidence strength for fidelity claims.

Synthesize & Write

Synthesis Agent detects gaps in post-hoc reliability via contradiction flagging across Tjoa and Guan (2020) and Antoniadi et al. (2021); Writing Agent uses latexEditText for model comparison tables, latexSyncCitations integrates 10 papers, latexCompile generates report, exportMermaid diagrams trade-off flows.

Use Cases

"Reproduce fooling attacks on SHAP from Slack et al. in Python sandbox."

Research Agent → searchPapers 'Slack 2020 fooling LIME SHAP' → Analysis Agent → readPaperContent + runPythonAnalysis (NumPy/matplotlib replots attacks) → researcher gets verified attack code and fidelity stats.

"Draft LaTeX review comparing decision trees vs LIME fidelity metrics."

Synthesis Agent → gap detection on Murdoch (2019)/Carvalho (2019) → Writing Agent → latexEditText (add sections) → latexSyncCitations (10 papers) → latexCompile → researcher gets compiled PDF with cited benchmarks.

"Find GitHub repos implementing interpretable rule lists from XAI papers."

Research Agent → searchPapers 'interpretable rule lists XAI' → Code Discovery → paperExtractUrls → paperFindGithubRepo → githubRepoInspect → researcher gets 5 repos with code examples and usage stats.

Automated Workflows

Deep Research workflow scans 50+ papers via searchPapers on 'interpretable vs post-hoc', structures report with citationGraph clustering Murdoch et al. (2019) hubs, outputs benchmark tables. DeepScan applies 7-step analysis with CoVe checkpoints on Slack et al. (2020), verifying explanation attacks. Theorizer generates hybrid model theories from Carvalho et al. (2019) metrics and Samek et al. (2021) reviews.

Frequently Asked Questions

What defines interpretable models versus post-hoc explanations?

Interpretable models like decision trees offer inherent transparency; post-hoc methods like SHAP explain black boxes externally. Murdoch et al. (2019) define this distinction amid interpretability confusion.

What are key methods in each category?

Interpretable: decision trees, rule lists; post-hoc: LIME, SHAP, gradient-based. Carvalho et al. (2019) survey methods and metrics; Samek et al. (2021) review DNN explanations.

What are major papers on this debate?

Murdoch et al. (2019, 1925 citations) on definitions; Slack et al. (2020, 680 citations) fooling post-hoc; Tjoa and Guan (2020, 1908 citations) medical applications.

What open problems exist?

Standardized fidelity metrics, robust post-hoc under attacks, human evaluation benchmarks. Zhou et al. (2021) survey explanation quality; Burkart and Huber (2021) note high-stakes opacity.

Research Explainable Artificial Intelligence (XAI) with AI

PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:

See how researchers in Computer Science & AI use PapersFlow

Field-specific workflows, example queries, and use cases.

Computer Science & AI Guide

Start Researching Interpretable Models versus Post-Hoc Explanations with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

See how PapersFlow works for Computer Science researchers