Subtopic Deep Dive

Data Reuse in Computational Workflows
Research Guide

What is Data Reuse in Computational Workflows?

Data Reuse in Computational Workflows refers to practices for integrating shared datasets into reproducible pipelines using tools like Nextflow and Galaxy while assessing reuse metrics and workflow portability.

Researchers examine barriers to data reuse in workflow-based analysis, finding low reuse rates in long-tail science domains (Wallis et al., 2013, 483 citations). FAIR principles extend to workflows, enabling findable, accessible, interoperable, and reusable computational pipelines (Goble et al., 2019, 158 citations). Over 30 studies since 2013 quantify reuse challenges across disciplines.

15
Curated Papers
3
Key Challenges

Why It Matters

Data reuse in workflows accelerates innovation by enabling reproducible analyses in fields like neuroimaging, where COINS supports large heterogeneous datasets (Scott et al., 2011, 190 citations). It reduces redundant data collection, as data creators hold reuse advantages but face quality issues (Pasquetto et al., 2019). Tenopir et al. (2020, 269 citations) show worldwide scientists store data on personal drives, hindering scalable pipelines; FAIR workflows address this for transparent research.

Key Research Challenges

Low Data Quality for Reuse

Publicly archived data often lacks documentation for reanalysis (Roche et al., 2015, 317 citations). Surveys of 100 datasets reveal insufficient metadata. Workflow integration fails without standardized formats.

Workflow Portability Barriers

Heterogeneous tools like Nextflow and Galaxy complicate data integration across environments (Goble et al., 2019). Portability metrics show low reuse due to dependency issues. FAIR workflows propose solutions but adoption lags.

Limited Reuse Incentives

Scientists perceive high effort for sharing with low rewards (Wallis et al., 2013, 483 citations). Data creators dominate reuse, marginalizing others (Pasquetto et al., 2019). Policies like PDA increase archiving but not reuse.

Essential Papers

1.

Open science challenges, benefits and tips in early career and beyond

Christopher Allen, David Marc Anton Mehler · 2019 · PLoS Biology · 593 citations

The movement towards open science is a consequence of seemingly pervasive failures to replicate previous research. This transition comes with great benefits but also significant challenges that are...

2.

If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology

Jillian C. Wallis, Elizabeth Rolando, Christine L. Borgman · 2013 · PLoS ONE · 483 citations

Research on practices to share and reuse data will inform the design of infrastructure to support data collection, management, and discovery in the long tail of science and technology. These are re...

3.

FAIRsharing as a community approach to standards, repositories and policies

Susanna‐Assunta Sansone, Peter McQuilton, Philippe Rocca‐Serra et al. · 2019 · Nature Biotechnology · 350 citations

4.

Public Data Archiving in Ecology and Evolution: How Well Are We Doing?

Dominique G. Roche, Loeske E. B. Kruuk, Robert Lanfear et al. · 2015 · PLoS Biology · 317 citations

Policies that mandate public data archiving (PDA) successfully increase accessibility to data underlying scientific publications. However, is the data quality sufficient to allow reuse and reanalys...

5.

Data sharing, management, use, and reuse: Practices and perceptions of scientists worldwide

Carol Tenopir, Natalie M. Rice, Suzie Allard et al. · 2020 · PLoS ONE · 269 citations

Most respondents displayed what we describe as high and mediocre risk data practices by storing their data on their personal computer, departmental servers or USB drives. Respondents appeared to be...

6.

Ten Simple Rules for the Care and Feeding of Scientific Data

Alyssa Goodman, Alberto Pepe, Alexander W. Blocker et al. · 2014 · PLoS Computational Biology · 235 citations

In the early 1600s, Galileo Galilei turned a telescope toward Jupiter. In his log book each night, he drew to-scale schematic diagrams of Jupiter and some oddly moving points of light near it. Gali...

7.

COINS: An Innovative Informatics and Neuroimaging Tool Suite Built for Large Heterogeneous Datasets

Adam Scott, Will Courtney, Dylan Wood et al. · 2011 · Frontiers in Neuroinformatics · 190 citations

The availability of well-characterized neuroimaging data with large numbers of subjects, especially for clinical populations, is critical to advancing our understanding of the healthy and diseased ...

Reading Guide

Foundational Papers

Start with Wallis et al. (2013, 483 citations) for long-tail reuse barriers; Goodman et al. (2014, 235 citations) for data care rules; Scott et al. (2011, 190 citations) for workflow tools like COINS.

Recent Advances

Goble et al. (2019, 158 citations) on FAIR workflows; Pasquetto et al. (2019) on creator advantages; Tenopir et al. (2020, 269 citations) on global practices.

Core Methods

FAIR workflow principles (Goble et al., 2019); public data archiving (Roche et al., 2015); citation tracking for reuse (Mooney and Newton, 2012).

How PapersFlow Helps You Research Data Reuse in Computational Workflows

Discover & Search

Research Agent uses searchPapers and exaSearch to find core papers like 'FAIR Computational Workflows' by Goble et al. (2019); citationGraph reveals connections from Wallis et al. (2013) to Tenopir et al. (2020), while findSimilarPapers uncovers reuse metrics studies.

Analyze & Verify

Analysis Agent applies readPaperContent to extract reuse barriers from Roche et al. (2015), then verifyResponse with CoVe checks claims against abstracts; runPythonAnalysis parses citation data for trends, with GRADE grading evidence quality on workflow portability.

Synthesize & Write

Synthesis Agent detects gaps in Nextflow-Galaxy integration from Goble et al. (2019); Writing Agent uses latexEditText, latexSyncCitations for reports, and latexCompile to generate manuscripts with exportMermaid diagrams of FAIR workflow pipelines.

Use Cases

"Analyze reuse rates in computational workflows from 2013-2020 papers"

Research Agent → searchPapers + runPythonAnalysis → pandas trend plot of citations from Wallis (2013) and Tenopir (2020); researcher gets CSV of reuse metrics with matplotlib visualization.

"Draft LaTeX review on FAIR workflows with citations"

Synthesis Agent → gap detection → Writing Agent → latexEditText + latexSyncCitations (Goble 2019, Scott 2011) → latexCompile; researcher gets compiled PDF with synced bibliography.

"Find GitHub repos for Nextflow data reuse examples"

Research Agent → paperExtractUrls (Goble 2019) → paperFindGithubRepo → githubRepoInspect; researcher gets inspected repos with workflow code and reuse documentation.

Automated Workflows

Deep Research workflow scans 50+ papers via searchPapers, producing structured reports on reuse metrics (e.g., Wallis et al. 2013 to Pasquetto et al. 2019). DeepScan applies 7-step CoVe analysis to verify FAIR workflow claims in Goble et al. (2019). Theorizer generates hypotheses on portability barriers from Tenopir et al. (2020) data practices.

Frequently Asked Questions

What defines data reuse in computational workflows?

It involves integrating shared datasets into reproducible pipelines like Nextflow and Galaxy, assessing metrics and portability (Goble et al., 2019).

What are key methods for enabling reuse?

FAIR principles for workflows ensure findability and interoperability (Goble et al., 2019); tools like COINS handle heterogeneous data (Scott et al., 2011).

What are seminal papers?

Wallis et al. (2013, 483 citations) on long-tail reuse; Goodman et al. (2014, 235 citations) on data care rules; Goble et al. (2019, 158 citations) on FAIR workflows.

What open problems persist?

Low data quality blocks reanalysis (Roche et al., 2015); weak incentives limit sharing (Pasquetto et al., 2019); portability across tools remains unstandardized.

Research Research Data Management Practices with AI

PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:

See how researchers in Computer Science & AI use PapersFlow

Field-specific workflows, example queries, and use cases.

Computer Science & AI Guide

Start Researching Data Reuse in Computational Workflows with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

See how PapersFlow works for Computer Science researchers