Subtopic Deep Dive

Machine Learning for Crystal Structure Prediction
Research Guide

What is Machine Learning for Crystal Structure Prediction?

Machine Learning for Crystal Structure Prediction uses ML models to predict stable crystal structures from chemical composition, replacing costly DFT calculations with graph neural networks and generative approaches.

Researchers apply GNNs and diffusion models to generate atomic arrangements matching target properties. Models train on databases like OQMD containing 300,000 DFT energies (Kirklin et al., 2015). Over 20 papers since 2019 explore ML acceleration of structure search (Schmidt et al., 2019).

15
Curated Papers
3
Key Challenges

Why It Matters

ML structure prediction screens thousands of compositions in hours, enabling discovery of perovskites and novel alloys (Bartel et al., 2019). It reduces reliance on DFT by 1000x in compute time, accelerating battery and catalyst design (Ramprasad et al., 2017). High-throughput workflows identify synthesizable materials from vast chemical spaces (Hautier et al., 2012).

Key Research Challenges

Accurate Energy Ranking

ML models must predict formation energies within 10 meV/atom of DFT to identify true ground states. Errors compound in generative sampling of 10^6 structures (Kirklin et al., 2015). Active learning mitigates data scarcity but requires iterative DFT validation (Kusne et al., 2014).

Symmetry and Space Groups

Predicting correct space groups from compositions remains error-prone for low-symmetry crystals. GNNs struggle with long-range order beyond local neighborhoods (Wang and Ma, 2014). Datasets lack diversity in high-pressure phases (Larsen et al., 2017).

Scalable Generative Design

Diffusion models generate realistic structures but fail to enforce physical constraints like charge balance. Sampling efficiency drops for multi-element systems (Choudhary et al., 2022). Transfer learning from OQMD helps but needs domain adaptation (Friederich et al., 2021).

Essential Papers

1.

Open Babel: An open chemical toolbox

Noel M. O’Boyle, Michael Banck, Craig A. James et al. · 2011 · Journal of Cheminformatics · 10.4K citations

Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering...

2.

The atomic simulation environment—a Python library for working with atoms

Ask Hjorth Larsen, Jens Jørgen Mortensen, Jakob Blomqvist et al. · 2017 · Journal of Physics Condensed Matter · 4.3K citations

The atomic simulation environment (ASE) is a software package written in the Python programming language with the aim of setting up, steering, and analyzing atomistic simulations. In ASE, tasks are...

3.

Recent advances and applications of machine learning in solid-state materials science

Jonathan Schmidt, Mário R. G. Marques, Silvana Botti et al. · 2019 · npj Computational Materials · 2.2K citations

Abstract One of the most exciting tools that have entered the material science toolbox in recent years is machine learning. This collection of statistical methods has already proved to be capable o...

4.

The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies

Scott Kirklin, James E. Saal, Bryce Meredig et al. · 2015 · npj Computational Materials · 2.2K citations

Abstract The Open Quantum Materials Database (OQMD) is a high-throughput database currently consisting of nearly 300,000 density functional theory (DFT) total energy calculations of compounds from ...

5.

Machine learning in materials informatics: recent applications and prospects

Rampi Ramprasad, Rohit Batra, Ghanshyam Pilania et al. · 2017 · npj Computational Materials · 1.6K citations

6.

New tolerance factor to predict the stability of perovskite oxides and halides

Christopher J. Bartel, Christopher Sutton, Bryan R. Goldsmith et al. · 2019 · Science Advances · 1.5K citations

Simple and interpretable data-driven descriptor accurately predicts the synthesizability of single and double perovskites.

7.

Recent advances and applications of deep learning methods in materials science

Kamal Choudhary, Brian DeCost, Chi Chen et al. · 2022 · npj Computational Materials · 941 citations

Abstract Deep learning (DL) is one of the fastest-growing topics in materials data science, with rapidly emerging applications spanning atomistic, image-based, spectral, and textual data modalities...

Reading Guide

Foundational Papers

Start with Kirklin et al. (2015) OQMD for DFT benchmark data, then Hautier et al. (2012) for high-throughput discovery context, and Larsen et al. (2017) ASE for simulation workflows.

Recent Advances

Schmidt et al. (2019) surveys ML methods; Choudhary et al. (2022) covers deep learning advances; Friederich et al. (2021) discusses learned potentials.

Core Methods

Crystal graphs via GNNs (Deringer et al., 2019); SOAP descriptors from DScribe (Himanen et al., 2019); diffusion models and active learning (Choudhary et al., 2022).

How PapersFlow Helps You Research Machine Learning for Crystal Structure Prediction

Discover & Search

Research Agent uses searchPapers('machine learning crystal structure prediction') to find Schmidt et al. (2019) with 2227 citations, then citationGraph reveals Kirklin et al. (2015) OQMD dataset connections, and findSimilarPapers uncovers Bartel et al. (2019) tolerance factors.

Analyze & Verify

Analysis Agent applies readPaperContent on Kirklin et al. (2015) to extract OQMD's 300k DFT energies, verifyResponse with CoVe checks energy prediction claims against ASE benchmarks (Larsen et al., 2017), and runPythonAnalysis fits Gaussian processes to formation energy errors with GRADE scoring for model reliability.

Synthesize & Write

Synthesis Agent detects gaps in perovskite stability prediction between Bartel et al. (2019) and generative models, while Writing Agent uses latexEditText for structure diagrams, latexSyncCitations for 50+ references, and latexCompile to produce camera-ready reviews with exportMermaid for GNN architecture flows.

Use Cases

"Analyze OQMD formation energies with Python to benchmark ML structure predictors"

Research Agent → searchPapers('OQMD') → Analysis Agent → readPaperContent(Kirklin 2015) → runPythonAnalysis(pandas energy histograms, matplotlib RMSE plots) → CSV export of 300k structures for local ML training.

"Write LaTeX review of GNNs for crystal prediction citing 20 papers"

Synthesis Agent → gap detection(Schmidt 2019 + Choudhary 2022) → Writing Agent → latexEditText(structure prediction section) → latexSyncCitations(20 papers) → latexCompile(PDF) with crystal symmetry diagrams.

"Find GitHub code for ML crystal generators from recent papers"

Research Agent → searchPapers('graph neural crystal prediction') → Code Discovery → paperExtractUrls → paperFindGithubRepo(DScribe Himanen 2019) → githubRepoInspect(descriptor notebooks) → cloneable training pipelines.

Automated Workflows

Deep Research workflow scans 50+ papers via searchPapers chains, producing structured reports ranking ML methods by MAE on OQMD benchmarks (Kirklin et al., 2015). DeepScan's 7-step analysis verifies Bartel et al. (2019) tolerance factor against ASE simulations (Larsen et al., 2017). Theorizer generates hypotheses linking DScribe descriptors to stable perovskite prediction (Himanen et al., 2019).

Frequently Asked Questions

What defines ML for crystal structure prediction?

ML models predict lowest-energy atomic arrangements from stoichiometry, using GNNs trained on DFT databases like OQMD to bypass explicit optimization (Kirklin et al., 2015).

What are core methods used?

Graph neural networks encode crystal graphs, diffusion models generate coordinates, and Gaussian processes rank stability; DScribe provides atomic descriptors (Himanen et al., 2019).

What are key papers?

Schmidt et al. (2019) reviews ML applications; Kirklin et al. (2015) provides OQMD dataset; Bartel et al. (2019) predicts perovskite stability (2227, 2175, 1546 citations).

What open problems exist?

Reliable prediction of low-symmetry and high-pressure phases; enforcing physical constraints in generative models; scaling to 10+ element compositions (Wang and Ma, 2014).

Research Machine Learning in Materials Science with AI

PapersFlow provides specialized AI tools for your field researchers. Here are the most relevant for this topic:

Start Researching Machine Learning for Crystal Structure Prediction with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.