Subtopic Deep Dive

Corpus-Based Translation Studies
Research Guide

What is Corpus-Based Translation Studies?

Corpus-Based Translation Studies applies corpus linguistics methods to analyze translation patterns, shifts, universals, and norms using large parallel and comparable corpora across languages and genres.

Researchers build bilingual corpora to quantify phenomena like explicitation and simplification (Baker, 2019). Key tools include Sketch Engine for collocation analysis (Kilgarriff et al., 2014, 1936 citations). Over 10 papers in the list demonstrate applications from lexis creativity to translator training (Kenny, 2001; Zanettin, 2002).

15
Curated Papers
3
Key Challenges

Why It Matters

Corpus-Based Translation Studies provides empirical data for translation universals, challenging intuition-based theories (Baker, 2019, 602 citations). It informs translator training with comparable corpora activities (Zanettin, 2002, 275 citations). Tools like Sketch Engine enable analysis of stylistic shifts in real datasets (Kilgarriff et al., 2014). Applications include quality assessment models integrating corpus evidence (House, 2002, 256 citations).

Key Research Challenges

Parallel Corpora Scarcity

Few high-quality aligned bilingual corpora exist for low-resource languages. Building them requires manual alignment and annotation (Zanettin, 2002). This limits cross-genre studies (Baker, 2019).

Quantifying Translation Shifts

Measuring explicitation or simplification demands robust statistical baselines. Normalization across corpus sizes poses issues (Kilgarriff et al., 2014). Directionality in comparable corpora confounds causality (Kenny, 2001).

Tool Integration Gaps

Adapting general corpus tools like Sketch Engine to translation-specific queries is non-trivial. Handling multilingual data strains software limits (O’Keeffe and McCarthy, 2010). Validation against human judgments remains inconsistent (House, 2002).

Essential Papers

1.

The Sketch Engine

Adam Kilgarriff, Vít Baisa, Jan Bušta et al. · 2014 · Lexicography · 1.9K citations

The Sketch Engine is a leading corpus tool, widely used in lexicography. Now, at 10 years old, it is mature software. The Sketch Engine website offers many ready-to-use corpora, and tools for users...

2.

Corpus Linguistics and Translation Studies*

Mona Baker · 2019 · 602 citations

The rise of corpus linguistics has serious implications for any discipline in which language plays a major role. This paper explores the impact that the availability of corpora is likely to have on...

3.

The Routledge Handbook of Corpus Linguistics

Anne O’Keeffe, Michael McCarthy · 2010 · 494 citations

The Routledge Handbook of Corpus Linguistics is edited by Anne O'Keeffe (University of Limerick, Ireland) and Michael McCarthy (University of Nottingham, UK and Pennsylvania State University, USA)....

4.

Findings of the 2019 Conference on Machine Translation (WMT19)

Loïc Barrault, Ondřej Bojar, Marta R. Costa‐jussà et al. · 2019 · 470 citations

Loïc Barrault, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias Müller, Santa...

5.

Contemporary Corpus Linguistics

Paul Baker · 2009 · 360 citations

1. Introduction, Paul Baker 2. Metaphor, Alice Deignan 3. Corpora and Critical Discourse Analysis, Gerlinde Mautner 4. Corpus stylistics and the Pickwickian watering-pot, Michaela Mahlberg 5. The m...

6.

Corpus-Based Translation Studies*

Mona Baker · 2019 · 351 citations

The rise of corpus linguistics has serious implications for any discipline in which language plays a major role. This paper explores the impact that the availability of corpora is likely to have on...

7.

Building a translation competence model

Pacte · 2003 · Dipòsit Digital de Documents de la UAB (Universitat Autònoma de Barcelona) · 337 citations

Autors llistats per ordre alfabètic. Investigadora principal: A. Hurtado Albir

Reading Guide

Foundational Papers

Start with Kilgarriff et al. (2014, 1936 citations) for Sketch Engine tool mastery, then Kenny (2001, 275 citations) for lexis applications, and Zanettin (2002) for pedagogical uses.

Recent Advances

Baker (2019, 602 citations) surveys corpus impacts; O’Keeffe and McCarthy (2010, 494 citations) handbook covers methods; Barrault et al. (2019) links to MT evaluation.

Core Methods

Core techniques: parallel corpus alignment, keyword extraction, mutual information scores, Sketch Engine word sketches, and comparable corpus normalization (Kilgarriff et al., 2014; Baker, 2009).

How PapersFlow Helps You Research Corpus-Based Translation Studies

Discover & Search

Research Agent uses searchPapers and exaSearch to find 50+ papers on 'explicitation in parallel corpora', then citationGraph on Baker (2019) reveals 602 citing works. findSimilarPapers expands to Kenny (2001) for lexis studies.

Analyze & Verify

Analysis Agent runs readPaperContent on Kilgarriff et al. (2014) to extract Sketch Engine APIs, verifies claims with CoVe against O’Keeffe and McCarthy (2010), and uses runPythonAnalysis for collocate frequency stats with GRADE scoring on translation shift metrics.

Synthesize & Write

Synthesis Agent detects gaps in explicitation studies across genres, flags contradictions between Baker (2019) and House (2002), while Writing Agent applies latexEditText for corpus method sections, latexSyncCitations for 10-paper bibliographies, and latexCompile for camera-ready manuscripts with exportMermaid for shift visualization diagrams.

Use Cases

"Analyze collocations in English-Chinese parallel corpora for explicitation patterns"

Research Agent → searchPapers('explicitation parallel corpora') → Analysis Agent → runPythonAnalysis(pandas on extracted freq tables from Kilgarriff et al. 2014) → statistical output with p-values and matplotlib plots.

"Draft a review on corpus tools in translator training"

Research Agent → citationGraph(Zanettin 2002) → Synthesis Agent → gap detection → Writing Agent → latexEditText + latexSyncCitations(10 papers) + latexCompile → PDF with integrated bibliography.

"Find GitHub repos for building translation corpora"

Research Agent → exaSearch('open source parallel corpora builders') → Code Discovery → paperExtractUrls(Kilgarriff 2014) → paperFindGithubRepo → githubRepoInspect → list of 5 repos with Sketch Engine extensions.

Automated Workflows

Deep Research workflow scans 50+ papers via searchPapers on 'corpus translation universals', structures report with GRADE-verified sections on shifts (Baker 2019). DeepScan applies 7-step analysis to Sketch Engine applications (Kilgarriff et al. 2014) with CoVe checkpoints. Theorizer generates hypotheses on normalization methods from Kenny (2001) and House (2002).

Frequently Asked Questions

What defines Corpus-Based Translation Studies?

It uses corpus linguistics to study translation empirically through parallel/comparable corpora, quantifying shifts like explicitation (Baker, 2019).

What are main methods?

Methods include collocation extraction with Sketch Engine, frequency normalization, and log-likelihood tests on aligned texts (Kilgarriff et al., 2014; Kenny, 2001).

What are key papers?

Baker (2019, 602 citations) on corpus impact; Kilgarriff et al. (2014, 1936 citations) on Sketch Engine; Zanettin (2002) on training corpora.

What open problems exist?

Scarcity of low-resource parallel corpora and causal inference in shifts without baselines (Baker, 2019; House, 2002).

Research Translation Studies and Practices with AI

PapersFlow provides specialized AI tools for Arts and Humanities researchers. Here are the most relevant for this topic:

See how researchers in Arts & Humanities use PapersFlow

Field-specific workflows, example queries, and use cases.

Arts & Humanities Guide

Start Researching Corpus-Based Translation Studies with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

See how PapersFlow works for Arts and Humanities researchers