PapersFlow Research Brief
Biomedical Text Mining and Ontologies
Research Guide
What is Biomedical Text Mining and Ontologies?
Biomedical Text Mining and Ontologies is the development and application of ontologies, text mining, and natural language processing techniques to extract, annotate, and integrate knowledge from biomedical literature, including gene annotation, disease integration, phenotype ontology, and semantic web technologies for knowledge management.
This field encompasses 130,525 works focused on biomedical ontologies and text mining for processing literature. Key tools like the Gene Ontology unify biology by standardizing gene and protein functions, as shown in 'Gene Ontology: tool for the unification of biology' by Ashburner et al. (2000) with 43,070 citations. Protein-protein association networks in STRING databases support functional discovery, with 'STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets' by Szklarczyk et al. (2018) receiving 18,294 citations.
Topic Hierarchy
Research Sub-Topics
Biomedical Ontologies
This sub-topic develops and evaluates standardized ontologies like Gene Ontology and SNOMED CT for biomedical knowledge representation. Researchers focus on ontology engineering, interoperability, and semantic consistency.
Biomedical Text Mining
This sub-topic applies machine learning and NLP to extract entities, relations, and events from PubMed literature. Researchers improve techniques for named entity recognition and information retrieval.
Gene Annotation
This sub-topic automates functional annotation of genes and proteins using tools like Blast2GO and DAVID. Researchers enhance accuracy through integration of experimental and computational evidence.
Phenotype Ontology
This sub-topic constructs ontologies like Human Phenotype Ontology for standardizing phenotypic descriptions across species and diseases. Researchers apply them to genotype-phenotype mapping and rare disease diagnostics.
Data Integration Biomedical
This sub-topic develops methods to harmonize heterogeneous biomedical data sources using semantic web technologies like RDF and SPARQL. Researchers address challenges in linking genomic, clinical, and literature data.
Why It Matters
Biomedical Text Mining and Ontologies enables extraction of structured knowledge from vast literature for applications in gene annotation and disease integration. For example, 'Gene Ontology: tool for the unification of biology' by Ashburner et al. (2000) standardizes terms across databases, facilitating cross-species comparisons used in over 43,070 cited studies. Tools like DAVID provide integrated discovery from genome-scale datasets, as in 'DAVID: Database for Annotation, Visualization, and Integrated Discovery' by Dennis et al. (2003) with 9,342 citations, accelerating analysis in functional genomics. STRING networks, updated in 'STRING v11' by Szklarczyk et al. (2018), integrate interactions for 18,294-cited functional insights in genome-wide experiments. Recent tools like OntoGPT use LLMs for ontology-grounded extraction from text.
Reading Guide
Where to Start
'Gene Ontology: tool for the unification of biology' by Ashburner et al. (2000), as it provides the foundational vocabulary for gene function annotation central to the field, with 43,070 citations.
Key Papers Explained
'Gene Ontology: tool for the unification of biology' by Ashburner et al. (2000) establishes core ontology principles, extended by 'Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research' by Rokitta et al. (2005) for sequence annotation, and 'DAVID: Database for Annotation, Visualization, and Integrated Discovery' by Dennis et al. (2003) for integrated analysis; these build toward network tools like 'STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets' by Szklarczyk et al. (2018), which uses ontologies for interaction mapping.
Paper Timeline
Most-cited paper highlighted in red. Papers ordered chronologically.
Advanced Directions
Preprints focus on LLMs in bio-ontology research, mention-agnostic extraction for ontological annotation, and community approaches to knowledge graph quality; tools like OntoGPT and SciLinker advance text mining for entity associations and AI-driven graphs.
Papers at a Glance
In the News
National Centre for Text Mining — NaCTEM
The National Centre for Text Mining (NaCTeM) is the first publicly-funded text mining centre in the world. We provide text mining services in response to the requirements of the UK academic community.
SciLinker: a large-scale text mining framework for mapping associations among biological entities
**Conclusion:** SciLinker represents a novel text mining approach that extracts and quantifies associations between biomedical entities through co-occurrence analysis and relationship extraction fr...
Large Language Models in Bio-Ontology Research
Biomedical ontologies are critical for structuring domain knowledge and enabling integrative analyses in the life sciences. Traditional ontology development is labor-intensive, requiring extensive ...
A comprehensive large scale biomedical knowledge graph for AI powered data driven biomedical research
New Results# A comprehensive large scale biomedical knowledge graph for AI powered data driven biomedical research
Datalinx AI Raises $4.2M Seed Round to Solve Data ...
## Datalinx AI Raises $4.2M Seed Round to Solve Data Readiness Challenges for Enterprise Marketing Data Management January 29, 2026 **Shares
Code & Tools
## Repository files navigation # OntoGPT ## Introduction _OntoGPT_ is a Python package for extracting structured information from text with larg...
#### A tool for mapping free-text descriptions of (biomedical) entities to ontology terms
#### A tool for mapping free-text descriptions of (biomedical) entities to ontology terms
Tools for biological identifiers, names, synonyms, xrefs, hierarchies, relations, and properties through the perspective of OBO. ## Example Usage ...
Onto2Vec is a program that can be used to produce feature vectors for biological entities based on their annotations to biomedical ontologies. Onto...
Recent Preprints
Artificial Intelligence in Biomedical Sciences: A Scoping Review
The aim of this scoping review is to explore AI in biomedical sciences. Specific objectives are to synthesize six scopes addressing the characteristics of AI in biomedical sciences and to provide i...
Large Language Models in Bio-Ontology Research: A Review
Biomedical ontologies are critical for structuring domain knowledge and enabling integrative analyses in the life sciences. Traditional ontology development is labor-intensive, requiring extensive ...
Ontologies as the semantic bridge between artificial intelligence and healthcare
Enrichment involves refining ontologies by adding new concepts, relationships, and data properties, which enhances their completeness and contextual relevance. Artificial intelligence, particularly...
Mention-Agnostic Information Extraction for Ontological Annotation of Biomedical Articles
propose a novel two-stage system for information extraction where we annotate biomedical articles based on a specific ontology (HOIP). The major challenge is annotating relation between biomedica...
Improving Biomedical Knowledge Graph Quality: A Community Approach
> Biomedical knowledge graphs (KGs) are widely used across research and translational settings, yet their design decisions and implementation are often opaque. Unlike ontologies that more frequentl...
Latest Developments
Recent developments in biomedical text mining and ontologies research include the advancement of ontology enrichment and concept discovery methods, such as the creation of new datasets for out-of-KB mention discovery and concept placement using large language models (e.g., SNOMED CT) (arXiv, ACM). Additionally, innovative pipelines like RELATE leverage ontology constraints and large language models for relation extraction in biomedical literature, improving standardization and accuracy (arXiv). Furthermore, large-scale biomedical knowledge graphs and AI systems are being developed to facilitate data-driven research, with recent efforts emphasizing interpretability and sustainability in AI models (Nature, CBMS). Overall, the field is actively integrating NLP, machine learning, and ontological frameworks to enhance biomedical knowledge extraction and organization (MDPI).
Sources
Frequently Asked Questions
What is the Gene Ontology?
The Gene Ontology unifies biology by providing a structured vocabulary for gene and protein functions across databases. 'Gene Ontology: tool for the unification of biology' by Ashburner et al. (2000) introduced this tool, cited 43,070 times. It supports consistent annotation in molecular biology research.
How does STRING support biomedical research?
STRING provides protein-protein association networks from known and predicted interactions. 'STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets' by Szklarczyk et al. (2018) expanded coverage, with 18,294 citations. It aids functional discovery in large datasets.
What is Blast2GO used for?
Blast2GO enables Gene Ontology annotation, visualization, and analysis for sequences without prior GO data. 'Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research' by Rokitta et al. (2005) integrates similarity searches, cited 11,766 times. It supports functional genomics data mining.
What role do ontologies play in knowledge sharing?
Ontologies specify portable knowledge representations for sharing across systems. 'A translation approach to portable ontology specifications' by Gruber (1993) defined principles for this, with 12,400 citations. They enable semantic integration in biomedical applications.
How does DAVID facilitate data analysis?
DAVID offers annotation, visualization, and integrated discovery for genome-scale data. 'DAVID: Database for Annotation, Visualization, and Integrated Discovery' by Dennis et al. (2003) links results to primary data, cited 9,342 times. It transitions from raw lists to biological insights.
What is the current state of text mining in this field?
Recent preprints highlight LLMs for bio-ontology research and information extraction. Tools like OntoGPT use LLMs with ontology grounding for structured extraction. NaCTeM provides text mining services for UK academics.
Open Research Questions
- ? How can LLMs automate labor-intensive ontology curation while maintaining semantic accuracy?
- ? What methods best extract implicit relations between biomedical processes not explicitly mentioned in text?
- ? How to standardize construction and documentation practices for biomedical knowledge graphs?
- ? Which co-occurrence and relationship extraction techniques optimize large-scale entity associations from PubMed?
- ? How do ontologies bridge AI techniques like NLP with clinical data for healthcare integration?
Recent Trends
Preprints from the last 6 months emphasize large language models for automating ontology development, as in 'Large Language Models in Bio-Ontology Research: A Review,' and mention-agnostic extraction in 'Mention-Agnostic Information Extraction for Ontological Annotation of Biomedical Articles.' News covers SciLinker for co-occurrence analysis from PubMed and comprehensive biomedical knowledge graphs for AI research.
Tools like OntoGPT integrate LLMs with ontology grounding; the field has 130,525 works.
Research Biomedical Text Mining and Ontologies with AI
PapersFlow provides specialized AI tools for Biochemistry, Genetics and Molecular Biology researchers. Here are the most relevant for this topic:
AI Literature Review
Automate paper discovery and synthesis across 474M+ papers
Paper Summarizer
Get structured summaries of any paper in seconds
Deep Research Reports
Multi-source evidence synthesis with counter-evidence
See how researchers in Life Sciences use PapersFlow
Field-specific workflows, example queries, and use cases.
Start Researching Biomedical Text Mining and Ontologies with AI
Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.
See how PapersFlow works for Biochemistry, Genetics and Molecular Biology researchers