Subtopic Deep Dive

Patient Similarity Metrics
Research Guide

What is Patient Similarity Metrics?

Patient similarity metrics are computational methods that quantify phenotypic and genotypic resemblance between patients to enable cohort discovery and personalized treatment recommendations.

These metrics compare patients using embedding-based, kernel-based, and graph-based approaches derived from electronic health records (EHRs). Key works include Deep Patient by Miotto et al. (2016, 1653 citations), which learns unsupervised representations from EHRs for future disease prediction, and Gottlieb et al. (2013, 90 citations), which infers diagnoses from patient similarities. Over 20 papers in the provided lists address related EHR analysis and deep learning applications.

Curated Papers

Key Challenges

Why It Matters

Patient similarity metrics power precision medicine by identifying similar cases for treatment recommendations, as in Gottlieb et al. (2013) inferring diagnoses from phenotypic similarities. Miotto et al.'s Deep Patient (2016) demonstrates prediction of future diseases from EHR embeddings, improving clinical decision support. In heart failure detection, Choi et al. (2016) use RNNs on EHRs to model temporal similarities, enhancing early diagnosis accuracy.

Key Research Challenges

High-dimensional EHR heterogeneity

EHR data spans unstructured notes, lab results, and time series, complicating similarity computation (Miotto et al., 2016). Deep Patient addresses this via autoencoders but struggles with rare phenotypes. Over 2793 citations highlight persistent dimensionality challenges (Miotto et al., 2017).

Privacy in federated similarity

Sharing patient embeddings across institutions risks data leakage, addressed in federated learning frameworks (Xu et al., 2020, 1280 citations). Centralizing metrics violates regulations, requiring decentralized kernels. Multimodal data integration adds privacy layers (Acosta et al., 2022).

Interpretable phenotypic graphs

Graph-based metrics capture relationships but lack clinical interpretability (Gottlieb et al., 2013). Embedding methods like Med-BERT excel in prediction yet obscure feature contributions (Rasmy et al., 2021, 742 citations). Validating graph similarities against outcomes remains open.

Essential Papers

Deep learning for healthcare: review, opportunities and challenges

Riccardo Miotto, Fei Wang, Shuang Wang et al. · 2017 · Briefings in Bioinformatics · 2.8K citations

Gaining knowledge and actionable insights from complex, high-dimensional and heterogeneous biomedical data remains a key challenge in transforming health care. Various types of data have been emerg...

MIMIC-IV, a freely accessible electronic health record dataset

Alistair E. W. Johnson, Lucas Bulgarelli, Lu Shen et al. · 2023 · Scientific Data · 2.2K citations

Abstract Digital data collection during routine clinical practice is now ubiquitous within hospitals. The data contains valuable information on the care of patients and their response to treatments...

Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records

Riccardo Miotto, Li Li, Brian Kidd et al. · 2016 · Scientific Reports · 1.7K citations

Federated Learning for Healthcare Informatics

Jie Xu, Benjamin S. Glicksberg, Chang Su et al. · 2020 · Journal of Healthcare Informatics Research · 1.3K citations

Multimodal biomedical AI

Julián Acosta, Guido J. Falcone, Pranav Rajpurkar et al. · 2022 · Nature Medicine · 928 citations

The increasing availability of biomedical data from large biobanks, electronic health records, medical imaging, wearable and ambient biosensors, and the lower cost of genome and microbiome sequenci...

Using recurrent neural network models for early detection of heart failure onset

Edward Choi, Andy Schuetz, Walter F. Stewart et al. · 2016 · Journal of the American Medical Informatics Association · 925 citations

Objective: We explored whether use of deep learning to model temporal relations among events in electronic health records (EHRs) would improve model performance in predicting initial diagnosis of h...

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

Laila Rasmy, Yang Xiang, Ziqian Xie et al. · 2021 · npj Digital Medicine · 742 citations

Reading Guide

Foundational Papers

Start with Gottlieb et al. (2013) for core similarity-to-diagnosis method, then Miotto et al. (2016) Deep Patient for EHR embeddings as they establish unsupervised representations.

Recent Advances

Study Med-BERT (Rasmy et al., 2021) for contextual embeddings and MIMIC-IV (Johnson et al., 2023) for large-scale validation of similarity metrics.

Core Methods

Autoencoders for dense representations (Miotto et al., 2016), RNNs for sequences (Choi et al., 2016), kernel/graph distances (Gottlieb et al., 2013), BERT variants (Rasmy et al., 2021).

How PapersFlow Helps You Research Patient Similarity Metrics

Discover & Search

Research Agent uses searchPapers and citationGraph to map Deep Patient (Miotto et al., 2016) connections to 1653 citing works on EHR embeddings, then exaSearch for 'patient similarity kernels in MIMIC-IV' linking to Johnson et al. (2023). findSimilarPapers expands to Gottlieb et al. (2013) for phenotypic methods.

Analyze & Verify

Analysis Agent applies readPaperContent to extract Deep Patient autoencoder code from Miotto et al. (2016), verifies similarity metrics via runPythonAnalysis with NumPy cosine distance on EHR subsets, and uses verifyResponse (CoVe) with GRADE grading to score embedding efficacy against Choi et al. (2016) RNN baselines.

Synthesize & Write

Synthesis Agent detects gaps in kernel vs. graph metrics across Miotto (2016) and Gottlieb (2013), flags contradictions in federated scalability (Xu et al., 2020); Writing Agent uses latexEditText for metric comparisons, latexSyncCitations for 10+ papers, and latexCompile for publication-ready tables with exportMermaid for similarity network diagrams.

Use Cases

"Reproduce Deep Patient similarity on MIMIC-IV heart failure cohort"

Research Agent → searchPapers('Deep Patient MIMIC-IV') → Analysis Agent → readPaperContent(Miotto 2016) → runPythonAnalysis(pandas EHR loading, NumPy embedding cosine similarity) → matplotlib distance histograms output.

"Compare patient similarity papers in LaTeX review"

Synthesis Agent → gap detection(Deep Patient vs Med-BERT) → Writing Agent → latexEditText(intro section) → latexSyncCitations(Gottlieb 2013, Rasmy 2021) → latexCompile(full PDF with tables) → researcher gets arXiv-ready manuscript.

"Find GitHub code for EHR patient similarity metrics"

Research Agent → citationGraph(Deep Patient) → Code Discovery → paperExtractUrls(Choi 2016) → paperFindGithubRepo(RNN similarity) → githubRepoInspect(code quality, metrics impl) → exportCsv(repos with cosine/kernel functions).

Automated Workflows

Deep Research workflow conducts systematic review: searchPapers('patient similarity EHR') → 50+ papers including Miotto (2016/2017) → structured report with citation clusters. DeepScan applies 7-step analysis: readPaperContent(Gottlieb 2013) → verifyResponse on graph metrics → CoVe checkpoints. Theorizer generates hypotheses on multimodal similarity from Acosta et al. (2022) + Venugopalan et al. (2021).

Try Doxa for Patient Similarity Metrics Research

Frequently Asked Questions

What defines patient similarity metrics?

Computational measures of phenotypic/genotypic resemblance from EHRs for cohort matching and predictions (Miotto et al., 2016; Gottlieb et al., 2013).

What are common methods?

Unsupervised embeddings (Deep Patient, Miotto et al., 2016), RNN temporal modeling (Choi et al., 2016), and similarity-based diagnosis inference (Gottlieb et al., 2013).

What are key papers?

Deep Patient (Miotto et al., 2016, 1653 citations), MIMIC-IV dataset (Johnson et al., 2023, 2205 citations), diagnoses from similarities (Gottlieb et al., 2013, 90 citations).

What open problems exist?

Federated privacy-preserving metrics (Xu et al., 2020), interpretable multimodal graphs (Acosta et al., 2022), and rare disease cohort scaling.

Research Machine Learning in Healthcare with AI

PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:

AI Literature Review

Automate paper discovery and synthesis across 474M+ papers

Code & Data Discovery

Find datasets, code repositories, and computational tools

Deep Research Reports

Multi-source evidence synthesis with counter-evidence

AI Academic Writing

Write research papers with AI assistance and LaTeX support

See how researchers in Computer Science & AI use PapersFlow

Field-specific workflows, example queries, and use cases.

Computer Science & AI Guide

Start Researching Patient Similarity Metrics with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

Try PapersFlow Free See AI Literature Review

See how PapersFlow works for Computer Science researchers

Part of the Machine Learning in Healthcare Research Guide