PapersFlow Research Brief

Machine Learning in Materials Science
Research Guide

What is Machine Learning in Materials Science?

Machine learning in materials science is the use of statistical and algorithmic models trained on materials-related data (e.g., structures, simulations, or measurements) to predict properties, guide simulations, or propose candidate materials for targeted applications.

The provided topic corpus contains 125,646 works on machine learning in materials science, indicating a large and mature research area, although a 5-year growth rate is not available in the provided data. Highly cited enabling infrastructure for data generation and interpretation includes simulation software such as "GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation" (2008) and visualization tools such as "Visualization and analysis of atomistic simulation data with OVITO–the Open Visualization Tool" (2009) and "VESTA 3 for three-dimensional visualization of crystal, volumetric and morphology data" (2011). Widely used computational chemistry components that often supply training labels or baseline physics models include "Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen" (1989) and "The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals" (2007).

125.6K

Papers

N/A

5yr Growth

913.5K

Total Citations

Research Sub-Topics

Machine Learning for Crystal Structure Prediction

Researchers develop ML models to predict stable crystal structures from chemical composition, bypassing expensive DFT calculations. This sub-topic focuses on graph neural networks and generative models for generating novel materials with target properties.

15 papers

Accelerating Density Functional Theory with ML

This area uses machine learning to create surrogate models that approximate DFT energies, forces, and properties with near-quantum accuracy at a fraction of the computational cost. Active research includes kernel methods, neural network potentials, and uncertainty quantification for reliable predictions.

12 papers

ML Potentials for Molecular Dynamics Simulations

ML interatomic potentials trained on quantum data enable accurate, scalable molecular dynamics for materials like alloys and polymers. Researchers study equivariant networks and active learning to improve transferability across chemical spaces.

15 papers

Machine Learning for Inverse Materials Design

Inverse design uses ML to search chemical spaces for materials with specified properties like band gap or elasticity, often via generative models and Bayesian optimization. This sub-topic explores multi-objective optimization and experimental validation pipelines.

15 papers

Uncertainty Quantification in Materials ML Models

Researchers investigate methods like Bayesian neural networks and ensemble models to quantify prediction uncertainty in materials ML, essential for decision-making in active learning loops. Focus areas include epistemic and aleatoric uncertainty separation for high-stakes applications.

11 papers

Why It Matters

Machine learning workflows in materials research are commonly built around high-throughput computation and simulation outputs, then validated and interpreted using established modeling and visualization toolchains. For example, molecular simulation outputs produced with Hess et al. (2008) in "GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation" are frequently post-processed and quality-checked using Stukowski (2009) in "Visualization and analysis of atomistic simulation data with OVITO–the Open Visualization Tool" to extract features (e.g., local environments, defects, trajectories) that can serve as ML inputs or evaluation diagnostics. In porous materials discovery and screening, the design space described by Furukawa et al. (2013) in "The Chemistry and Applications of Metal-Organic Frameworks"—noting that “more than 20,000 different MOFs” had been reported—illustrates why ML-based surrogate models and prioritization can be practically valuable: exhaustive experimental or first-principles evaluation over such combinatorial libraries is costly, so learned predictors can be used to rank candidates before committing resources. In molecular and interfacial applications where binding or adsorption is central, Trott and Olson (2009) in "AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading" provides a concrete example of a fast scoring-and-search engine whose outputs can be used as labels, baselines, or filters in ML-driven screening pipelines.

Reading Guide

Where to Start

Start with Furukawa et al. (2013) "The Chemistry and Applications of Metal-Organic Frameworks" because it clearly defines a major materials family with an explicitly stated large design space (“more than 20,000 different MOFs”), making it easy to see why prediction and screening problems arise.

Key Papers Explained

A practical ML-in-materials workflow often begins with physics-based data generation, continues with analysis/visualization, and then supports screening and interpretation. Dunning (1989) "Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen" and Zhao and Truhlar (2007) "The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals" represent core ingredients used to produce quantum-chemistry/DFT labels, while Perdew and Zunger (1981) "Self-interaction correction to density-functional approximations for many-electron systems" frames a key limitation that can propagate into ML training data. For atomistic dynamics data, Hess et al. (2008) "GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation" provides the simulation engine, and Stukowski (2009) "Visualization and analysis of atomistic simulation data with OVITO–the Open Visualization Tool" provides post-processing and feature/defect analysis capabilities. For structural inspection and communication of results, Pettersen et al. (2004) "UCSF Chimera—A visualization system for exploratory research and analysis" and Momma and Izumi (2011) "VESTA 3 for three-dimensional visualization of crystal, volumetric and morphology data" cover complementary visualization needs (molecular/volumetric/crystal structure), supporting dataset curation and error detection before model fitting.

Paper Timeline

100%

graph LR P0["Self-interaction correction to d...
1981 · 20.4K cites"] P1["The reflective practitioner: How...
1984 · 19.9K cites"] P2["Gaussian basis sets for use in c...
1989 · 31.1K cites"] P3["UCSF Chimera—A visualization sys...
2004 · 46.5K cites"] P4["The M06 suite of density functio...
2007 · 29.0K cites"] P5["AutoDock Vina: Improving the spe...
2009 · 34.7K cites"] P6["VESTA 3 for three-dimensi...
2011 · 23.5K cites"] P0 --> P1 P1 --> P2 P2 --> P3 P3 --> P4 P4 --> P5 P5 --> P6 style P3 fill:#DC5238,stroke:#c4452e,stroke-width:2px

Scroll to zoom • Drag to pan

Most-cited paper highlighted in red. Papers ordered chronologically.

Advanced Directions

Within the provided list, the most concrete frontier direction is scaling ML-driven screening and design over very large candidate families like those highlighted in Furukawa et al. (2013) "The Chemistry and Applications of Metal-Organic Frameworks" while maintaining physically grounded labeling pipelines based on the electronic-structure and simulation stack reflected by Dunning (1989), Zhao and Truhlar (2007), and Hess et al. (2008). Another advanced direction is systematic treatment of label noise and bias introduced by approximate physics, motivated by Perdew and Zunger (1981) "Self-interaction correction to density-functional approximations for many-electron systems", because ML models can otherwise learn and amplify these artifacts.

Papers at a Glance

#	Paper	Year	Venue	Citations	Open Access
1	UCSF Chimera—A visualization system for exploratory research a...	2004	Journal of Computation...	46.5K	✕
2	AutoDock Vina: Improving the speed and accuracy of docking wit...	2009	Journal of Computation...	34.7K	✓
3	Gaussian basis sets for use in correlated molecular calculatio...	1989	The Journal of Chemica...	31.1K	✕
4	The M06 suite of density functionals for main group thermochem...	2007	Theoretical Chemistry ...	29.0K	✓
5	<i>VESTA 3</i> for three-dimensional visualization of crystal,...	2011	Journal of Applied Cry...	23.5K	✕
6	Self-interaction correction to density-functional approximatio...	1981	Physical review. B, Co...	20.4K	✓
7	The reflective practitioner: How professionals think in action	1984	Patient Education and ...	19.9K	✕
8	The Chemistry and Applications of Metal-Organic Frameworks	2013	Science	15.8K	✕
9	GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, an...	2008	Journal of Chemical Th...	15.7K	✕
10	Visualization and analysis of atomistic simulation data with O...	2009	Modelling and Simulati...	15.0K	✕

In the News

How generative AI can help scientists synthesize complex ...

Feb 2026 news.mit.edu

### New tool makes generative AI models more likely to create breakthrough materials

TU/e Lands €1.5M for AI Breakthrough in Materials Science

Jan 2026 miragenews.com Mirage News

TU/e has been awarded 1.5 million euros as part of the European Horizon Europe project SimuLingua, a 6-million-euro initiative that aims to create a next-generation artificial intelligence platform...

Periodic Labs reportedly raises $300M for AI-powered ...

Sep 2025 mlq.ai Bootlab

science research.

National Science Foundation announces Cornell-led AI ...

Jul 2025 news.cornell.edu By Kate Blackwood College of Arts and Sciences

Directed by Eun-Ah Kim , principal investigator (PI) and the Hans A. Bethe Professor of physics in the College of Arts and Sciences (A&S), NSF AI-MI will accelerate and transform the discovery of n...

Ex-OpenAI execs raise $200M at $1B valuation for AI ...

Aug 2025 techfundingnews.com Abhinaya Prabhu

The backing from Andreessen Horowitz signals growing investor conviction in AI’s role in accelerating material science. By using machine learning models to simulate and predict material properties,...

Code & Tools

GitHub - metatensor/metatomic: Atomistic machine learning models you can use everywhere for everything

github.com

machine learning models, and atomistic simulation engines. Our main goal is to define and train models once, and then be able to re-use them across...

GitHub - PaddlePaddle/PaddleMaterials: PaddleMaterials is a data-mechanism dual-driven, foundation model development and deployment, end to end toolkit based on PaddlePaddle deep learning framework for materials science and engineering.

github.com

**PaddleMaterials**is a data-mechanism dual-driven, development and deployment of AI4Materials foundation models, end to end toolkit based on Paddl...

mala-project

github.com

Materials Learning Algorithms. A framework for machine learning materials properties from first-principles data. mala-project/mala’s past year o...

GitHub - dhw059/maml: maml (MAterials Machine Learning) is a Python package that aims to provide useful high-level interfaces that make ML for materials science as easy as possible.

github.com

maml (MAterials Machine Learning) is a Python package that aims to provide useful high-level interfaces that make ML for materials science as easy ...

GitHub - IntelLabs/matsciml: Open MatSci ML Toolkit is a framework for prototyping and scaling out deep learning models for materials discovery supporting widely used materials science datasets, and built on top of PyTorch Lightning, the Deep Graph Library, and PyTorch Geometric.

github.com

Open MatSci ML Toolkit is a framework for prototyping and scaling out deep learning models for materials discovery supporting widely used materials...

Recent Preprints

Artificial intelligence-driven approaches for materials design and discovery

Jan 2026 nature.com Preprint

designing materials that meet specific property requirements. In this Review, we present key computational advances in materials design over the past few decades. We follow the evolution of relevan...

777306 PDFs | Review articles in MACHINE LEARNING

Jan 2026 researchgate.net Preprint

Research Advances on the Role of Deep Learning in Materials Informatics Article Full-text available - Aug 2025

From Algorithms to Applications: A Comprehensive Review of Machine Learning in Computational Materials Science

Aug 2025 link.springer.com Preprint

In recent years, the integration of machine learning (ML) with the materials genome initiative has accelerated advancements in materials informatics, transforming the traditionally intricate proces...

Materials Graph Library (MatGL), an open-source graph deep learning library for materials science and chemistry

Aug 2025 nature.com Preprint

Graph deep learning models, which incorporate a natural inductive bias for atomic structures, are of immense interest in materials science and chemistry. Here, we introduce the Materials Graph Libr...

Materials Expert-Artificial Intelligence for materials discovery

Sep 2025 nature.com Preprint

Advances in materials databases create an opportunity to uncover descriptors that predict emergent properties, yet most studies rely on high-throughput ab initio calculations that can diverge from ...

Latest Developments

Recent developments in machine learning in materials science include the ongoing evolution of the Materials Project's AI capabilities for accelerated materials discovery, with plans for enhanced computational methods and better data handling (Berkeley Lab, 01/13/2026), the increasing adoption of predictive models tailored to experimental constraints, and the use of generative AI models like DiffSyn for synthesizing complex materials more rapidly (MIT, today at 10:27 AM). Additionally, AI-driven approaches such as graph neural networks and autonomous laboratories are significantly transforming R&D timelines and expanding chemical space (Cypris, December 2025), with foundational models and generative AI for crystal structures also emerging as key areas of research (Nature, March 2025, Nature, December 2025).

Sources

Accelerating Discovery: How the Materials Project Is...

newscenter.lbl.gov

How generative AI can help scientists synthesize com...

news.mit.edu

AI-Accelerated Materials Discovery: How Generative M...

cypris.ai

Foundation models for materials discovery – current ...

nature.com

Generative AI for crystal structures: a review

nature.com

AI and Machine Learning in Materials Science: A Comp...

polymerize.io

Advancing materials discovery through artificial int...

sciencedirect.com

AI4Mat-ICLR 2026

sites.google.com

Frequently Asked Questions

What is machine learning in materials science used for in practice?

Machine learning in materials science is used to predict materials properties, accelerate screening over large candidate sets, and assist analysis of simulation or experimental data. In practice, these pipelines often rely on simulation and post-processing infrastructure such as Hess et al. (2008) "GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation" and Stukowski (2009) "Visualization and analysis of atomistic simulation data with OVITO–the Open Visualization Tool" to generate and interpret the data that ML models learn from.

How do researchers generate training data for ML models in computational materials studies?

A common approach is to generate labels from physics-based calculations and simulations, then pair them with structural representations for supervised learning. Examples of widely used components include Dunning (1989) "Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen" and Zhao and Truhlar (2007) "The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals" as part of electronic-structure workflows.

Which tools are commonly used to visualize and sanity-check materials structures and atomistic trajectories in ML workflows?

Visualization is commonly handled by general-purpose molecular and atomistic viewers that support structures, volumetric data, and trajectories. Pettersen et al. (2004) "UCSF Chimera—A visualization system for exploratory research and analysis", Momma and Izumi (2011) "VESTA 3 for three-dimensional visualization of crystal, volumetric and morphology data", and Stukowski (2009) "Visualization and analysis of atomistic simulation data with OVITO–the Open Visualization Tool" are frequently cited examples of such tooling.

Why does density-functional theory (DFT) still matter in machine learning for materials science?

DFT remains a major source of consistent, computable labels for training and benchmarking ML models, especially when experimental labels are sparse or costly. Methodological choices and known approximation issues—such as the self-interaction problem discussed by Perdew and Zunger (1981) in "Self-interaction correction to density-functional approximations for many-electron systems"—directly affect the quality and transferability of the data used to train ML models.

Which materials classes are often highlighted as large search spaces where ML can help prioritize candidates?

Metal-organic frameworks are a canonical example of a large, chemically tunable family where prioritization is valuable. Furukawa et al. (2013) "The Chemistry and Applications of Metal-Organic Frameworks" states that “more than 20,000 different MOFs” had been reported, illustrating the scale that motivates surrogate modeling and candidate ranking.

Which highly cited methods papers connect to ML-driven screening in molecular and materials contexts?

Docking and fast scoring methods are often used for screening and can provide labels or baselines for ML models in molecular-scale materials problems. Trott and Olson (2009) "AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading" reports an “approximately two orders of magnitude speed-up” compared with AutoDock 4, exemplifying why fast approximate evaluators are useful in high-throughput pipelines.

Open Research Questions

? How can ML models be trained on simulation data while explicitly accounting for approximation errors and known pathologies in the underlying electronic-structure methods, such as the self-interaction issues discussed in Perdew and Zunger (1981) "Self-interaction correction to density-functional approximations for many-electron systems"?
? Which representations and learning objectives best preserve the physically meaningful degrees of freedom present in atomistic trajectories produced by Hess et al. (2008) "GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation" while remaining stable to visualization/analysis choices used in Stukowski (2009) "Visualization and analysis of atomistic simulation data with OVITO–the Open Visualization Tool"?
? How should ML-driven screening objectives be defined and validated for combinatorial materials families at the scale described by Furukawa et al. (2013) "The Chemistry and Applications of Metal-Organic Frameworks" (more than 20,000 MOFs), given that different end uses require different target properties and constraints?
? How can fast approximate evaluators used in screening, such as Trott and Olson (2009) "AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading", be integrated with ML so that uncertainty and systematic bias are quantified rather than implicitly inherited?
? What standardized visualization and reporting practices (e.g., via Pettersen et al. (2004) "UCSF Chimera—A visualization system for exploratory research and analysis" and Momma and Izumi (2011) "VESTA 3 for three-dimensional visualization of crystal, volumetric and morphology data") most improve reproducibility and comparability of ML-ready datasets built from heterogeneous simulations and structural sources?

Recent Trends

The provided data indicate a very large literature base (125,646 works) and continued reliance on general-purpose simulation and visualization infrastructure that supports ML dataset creation and validation, including Pettersen et al. "UCSF Chimera—A visualization system for exploratory research and analysis", Momma and Izumi (2011) "VESTA 3 for three-dimensional visualization of crystal, volumetric and morphology data", and Stukowski (2009) "Visualization and analysis of atomistic simulation data with OVITO–the Open Visualization Tool". In screening-oriented workflows, Trott and Olson (2009) "AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading" exemplifies emphasis on computational throughput (reporting an “approximately two orders of magnitude speed-up” over AutoDock 4), aligning with high-throughput data generation that can feed ML. Application-driven motivation is reinforced by large combinatorial materials families such as the MOFs described by Furukawa et al. (2013) "The Chemistry and Applications of Metal-Organic Frameworks", which reports “more than 20,000 different MOFs,” making prioritization and surrogate modeling persistent themes even when specific growth-rate statistics are not provided.

2004

Research Machine Learning in Materials Science with AI

PapersFlow provides specialized AI tools for your field researchers. Here are the most relevant for this topic:

AI Literature Review

Automate paper discovery and synthesis across 474M+ papers

Deep Research Reports

Multi-source evidence synthesis with counter-evidence

Paper Summarizer

Get structured summaries of any paper in seconds

AI Academic Writing

Write research papers with AI assistance and LaTeX support

Start Researching Machine Learning in Materials Science with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

Try PapersFlow Free See AI Literature Review