PapersFlow Research Brief

Life Sciences · Biochemistry, Genetics and Molecular Biology

Biomedical Text Mining and Ontologies
Research Guide

What is Biomedical Text Mining and Ontologies?

Biomedical Text Mining and Ontologies is the development and application of ontologies, text mining, and natural language processing techniques to extract, annotate, and integrate knowledge from biomedical literature, including gene annotation, disease integration, phenotype ontology, and semantic web technologies for knowledge management.

This field encompasses 130,525 works focused on biomedical ontologies and text mining for processing literature. Key tools like the Gene Ontology unify biology by standardizing gene and protein functions, as shown in 'Gene Ontology: tool for the unification of biology' by Ashburner et al. (2000) with 43,070 citations. Protein-protein association networks in STRING databases support functional discovery, with 'STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets' by Szklarczyk et al. (2018) receiving 18,294 citations.

Topic Hierarchy

100%
graph TD D["Life Sciences"] F["Biochemistry, Genetics and Molecular Biology"] S["Molecular Biology"] T["Biomedical Text Mining and Ontologies"] D --> F F --> S S --> T style T fill:#DC5238,stroke:#c4452e,stroke-width:2px
Scroll to zoom • Drag to pan
130.5K
Papers
N/A
5yr Growth
720.6K
Total Citations

Research Sub-Topics

Why It Matters

Biomedical Text Mining and Ontologies enables extraction of structured knowledge from vast literature for applications in gene annotation and disease integration. For example, 'Gene Ontology: tool for the unification of biology' by Ashburner et al. (2000) standardizes terms across databases, facilitating cross-species comparisons used in over 43,070 cited studies. Tools like DAVID provide integrated discovery from genome-scale datasets, as in 'DAVID: Database for Annotation, Visualization, and Integrated Discovery' by Dennis et al. (2003) with 9,342 citations, accelerating analysis in functional genomics. STRING networks, updated in 'STRING v11' by Szklarczyk et al. (2018), integrate interactions for 18,294-cited functional insights in genome-wide experiments. Recent tools like OntoGPT use LLMs for ontology-grounded extraction from text.

Reading Guide

Where to Start

'Gene Ontology: tool for the unification of biology' by Ashburner et al. (2000), as it provides the foundational vocabulary for gene function annotation central to the field, with 43,070 citations.

Key Papers Explained

'Gene Ontology: tool for the unification of biology' by Ashburner et al. (2000) establishes core ontology principles, extended by 'Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research' by Rokitta et al. (2005) for sequence annotation, and 'DAVID: Database for Annotation, Visualization, and Integrated Discovery' by Dennis et al. (2003) for integrated analysis; these build toward network tools like 'STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets' by Szklarczyk et al. (2018), which uses ontologies for interaction mapping.

Paper Timeline

100%
graph LR P0["A translation approach to portab...
1993 · 12.4K cites"] P1["Gene Ontology: tool for the unif...
2000 · 43.1K cites"] P2["DAVID: Database for Annotation, ...
2003 · 9.3K cites"] P3["Blast2GO: a universal tool for a...
2005 · 11.8K cites"] P4["Research electronic data capture...
2008 · 48.2K cites"] P5["STRING v10: protein–protein inte...
2014 · 10.8K cites"] P6["STRING v11: protein–protein asso...
2018 · 18.3K cites"] P0 --> P1 P1 --> P2 P2 --> P3 P3 --> P4 P4 --> P5 P5 --> P6 style P4 fill:#DC5238,stroke:#c4452e,stroke-width:2px
Scroll to zoom • Drag to pan

Most-cited paper highlighted in red. Papers ordered chronologically.

Advanced Directions

Preprints focus on LLMs in bio-ontology research, mention-agnostic extraction for ontological annotation, and community approaches to knowledge graph quality; tools like OntoGPT and SciLinker advance text mining for entity associations and AI-driven graphs.

Papers at a Glance

# Paper Year Venue Citations Open Access
1 Research electronic data capture (REDCap)—A metadata-driven me... 2008 Journal of Biomedical ... 48.2K
2 Gene Ontology: tool for the unification of biology 2000 Nature Genetics 43.1K
3 STRING v11: protein–protein association networks with increase... 2018 Nucleic Acids Research 18.3K
4 A translation approach to portable ontology specifications 1993 Knowledge Acquisition 12.4K
5 Blast2GO: a universal tool for annotation, visualization and a... 2005 Bioinformatics 11.8K
6 STRING v10: protein–protein interaction networks, integrated o... 2014 Nucleic Acids Research 10.8K
7 DAVID: Database for Annotation, Visualization, and Integrated ... 2003 Genome biology 9.3K
8 KEGG: new perspectives on genomes, pathways, diseases and drugs 2016 Nucleic Acids Research 9.0K
9 Toward principles for the design of ontologies used for knowle... 1995 International Journal ... 7.6K
10 The STRING database in 2017: quality-controlled protein–protei... 2016 Nucleic Acids Research 7.3K

In the News

Code & Tools

Recent Preprints

Artificial Intelligence in Biomedical Sciences: A Scoping Review

pmc.ncbi.nlm.nih.gov Preprint

The aim of this scoping review is to explore AI in biomedical sciences. Specific objectives are to synthesize six scopes addressing the characteristics of AI in biomedical sciences and to provide i...

Large Language Models in Bio-Ontology Research: A Review

Nov 2025 mdpi.com Preprint

Biomedical ontologies are critical for structuring domain knowledge and enabling integrative analyses in the life sciences. Traditional ontology development is labor-intensive, requiring extensive ...

Ontologies as the semantic bridge between artificial intelligence and healthcare

Aug 2025 frontiersin.org Preprint

Enrichment involves refining ontologies by adding new concepts, relationships, and data properties, which enhances their completeness and contextual relevance. Artificial intelligence, particularly...

Mention-Agnostic Information Extraction for Ontological Annotation of Biomedical Articles

Nov 2025 hal.science Preprint

propose a novel two-stage system for information extraction where we annotate biomedical articles based on a specific ontology (HOIP). The major challenge is annotating relation between biomedica...

Improving Biomedical Knowledge Graph Quality: A Community Approach

Aug 2025 arxiv.org Preprint

> Biomedical knowledge graphs (KGs) are widely used across research and translational settings, yet their design decisions and implementation are often opaque. Unlike ontologies that more frequentl...

Latest Developments

Recent developments in biomedical text mining and ontologies research include the advancement of ontology enrichment and concept discovery methods, such as the creation of new datasets for out-of-KB mention discovery and concept placement using large language models (e.g., SNOMED CT) (arXiv, ACM). Additionally, innovative pipelines like RELATE leverage ontology constraints and large language models for relation extraction in biomedical literature, improving standardization and accuracy (arXiv). Furthermore, large-scale biomedical knowledge graphs and AI systems are being developed to facilitate data-driven research, with recent efforts emphasizing interpretability and sustainability in AI models (Nature, CBMS). Overall, the field is actively integrating NLP, machine learning, and ontological frameworks to enhance biomedical knowledge extraction and organization (MDPI).

Frequently Asked Questions

What is the Gene Ontology?

The Gene Ontology unifies biology by providing a structured vocabulary for gene and protein functions across databases. 'Gene Ontology: tool for the unification of biology' by Ashburner et al. (2000) introduced this tool, cited 43,070 times. It supports consistent annotation in molecular biology research.

How does STRING support biomedical research?

STRING provides protein-protein association networks from known and predicted interactions. 'STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets' by Szklarczyk et al. (2018) expanded coverage, with 18,294 citations. It aids functional discovery in large datasets.

What is Blast2GO used for?

Blast2GO enables Gene Ontology annotation, visualization, and analysis for sequences without prior GO data. 'Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research' by Rokitta et al. (2005) integrates similarity searches, cited 11,766 times. It supports functional genomics data mining.

What role do ontologies play in knowledge sharing?

Ontologies specify portable knowledge representations for sharing across systems. 'A translation approach to portable ontology specifications' by Gruber (1993) defined principles for this, with 12,400 citations. They enable semantic integration in biomedical applications.

How does DAVID facilitate data analysis?

DAVID offers annotation, visualization, and integrated discovery for genome-scale data. 'DAVID: Database for Annotation, Visualization, and Integrated Discovery' by Dennis et al. (2003) links results to primary data, cited 9,342 times. It transitions from raw lists to biological insights.

What is the current state of text mining in this field?

Recent preprints highlight LLMs for bio-ontology research and information extraction. Tools like OntoGPT use LLMs with ontology grounding for structured extraction. NaCTeM provides text mining services for UK academics.

Open Research Questions

  • ? How can LLMs automate labor-intensive ontology curation while maintaining semantic accuracy?
  • ? What methods best extract implicit relations between biomedical processes not explicitly mentioned in text?
  • ? How to standardize construction and documentation practices for biomedical knowledge graphs?
  • ? Which co-occurrence and relationship extraction techniques optimize large-scale entity associations from PubMed?
  • ? How do ontologies bridge AI techniques like NLP with clinical data for healthcare integration?

Research Biomedical Text Mining and Ontologies with AI

PapersFlow provides specialized AI tools for Biochemistry, Genetics and Molecular Biology researchers. Here are the most relevant for this topic:

See how researchers in Life Sciences use PapersFlow

Field-specific workflows, example queries, and use cases.

Life Sciences Guide

Start Researching Biomedical Text Mining and Ontologies with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

See how PapersFlow works for Biochemistry, Genetics and Molecular Biology researchers