Subtopic Deep Dive

Phraseology and Collocations
Research Guide

What is Phraseology and Collocations?

Phraseology and collocations study multi-word units, idioms, and statistically significant word combinations across languages using corpus-based methods.

Researchers catalog collocations and phraseological units via corpus statistics to analyze non-compositional meaning (Feilke, 2004; 76 citations). Key works trace collocation concepts in phraseology, focusing on French lexicography and cultural representation (Hausmann, 2007; 62 citations; Teliya et al., 1998; 74 citations). Over 10 major papers since 1998 explore applications in translation, lexicography, and legal discourse.

15
Curated Papers
3
Key Challenges

Why It Matters

Phraseology research improves NLP models by identifying non-compositional units essential for machine translation and language generation. In lexicography, corpus analysis of collocations enhances dictionary entries (Hausmann, 2007; Ďurčo, 2010). Language teaching benefits from phraseodidactics, addressing translation challenges (Sułkowska, 2016), while legal phraseology aids cross-genre analysis (Goźdź‐Roszkowski and Pontrandolfo, 2015). Web corpora enable real-time phrase extraction (Colson, 2007).

Key Research Challenges

Defining collocation boundaries

Distinguishing collocations from free combinations remains contentious, with varying definitions across phraseology frameworks (Hausmann, 2007). Historical development in French lexicography highlights inconsistent base and collocator roles. Corpus statistics struggle with subjective frequency thresholds (Feilke, 2004).

Cross-lingual equivalence detection

Parallel corpora reveal partial phraseological matches, complicating translation (Pęzik, 2017). Cultural specificity in idioms resists direct equivalents (Teliya et al., 1998). Phraseotranslation requires new metrics beyond literal alignment (Sułkowska, 2016).

Corpus scale for rare phrases

Web corpora aid set phrase discovery but introduce noise from non-standard usage (Colson, 2007). Rare collocations demand massive data volumes unattainable in traditional corpora. Statistical validation conflicts with psycholinguistic processing evidence (Bürger, 2017).

Essential Papers

1.

Kontext - Zeichen - Kompetenz. Wortverbindungen unter sprachtheoretischem Aspekt

Helmuth Feilke · 2004 · 76 citations

Die kontinuierliche Ausweitung des Gegenstandsbereichs der Phraseologie in den vergangenen 30 Jahren geht einher mit einer Pragmatisierung theoretischer Grundannahmen in der Disziplin selbst.Damit ...

2.

Phraseology as a Language of Culture: Its Role in the Representation of a Collective Mentality

Veronika Teliya, Natalya Bragina, Elena Oparina et al. · 1998 · 74 citations

Abstract Phraseology is a domain of linguistic study which to a high degree illustrates the correlation between language and culture. In a typological approach, it is necessary to define and classi...

3.

DIE KOLLOKATIONEN IM RAHMEN DER PHRASEOLOGIE – SYSTEMATISCHE UND HISTORISCHE DARSTELLUNG

Franz Josef Hausmann · 2007 · Zeitschrift für Anglistik und Amerikanistik · 62 citations

This paper gives an overview over the development of the concept and the term collocation, focussing on French lexicography. The first section examines the place of collocations within phraseology,...

4.

Phraséodidactique et phraséotraduction: quelques remarques sur les nouvelles disciplines de la phraséologie appliquée

Monika Sułkowska · 2016 · Yearbook of Phraseology · 26 citations

Abstract The major task of this paper is the implementation of new emerging phraseological disciplines, such as phraseodidactics and phraseotranslation. The author discusses the attempt to specify ...

5.

Exploring phraseological equivalence with Paralela

Piotr Pęzik · 2017 · CeON Repository (Centre for Evaluation in Education and Science) · 24 citations

Gruszczyńska, Ewa; Leńko-Szymańska, Agnieszka, red. (2016). Polskojęzyczne korpusy równoległe. Polish-language Parallel Corpora. Warszawa: Instytut Lingwistyki Stosowanej, pp. 67-81.

6.

Legal Phraseology Today: Corpus-based Applications Across Legal Languages and Genres

Stanisław Goźdź‐Roszkowski, Gianluca Pontrandolfo · 2015 · Fachsprache · 21 citations

Phraseology is now taking centre stage in a wide range of fields. However, there are still relatively few empirical studies of word combinations in the domain of law and in the many different conte...

7.

The World Wide Web as a corpus for set phrases

Jean-Pierre Colson · 2007 · 21 citations

The use of the World Wide Web for linguistic purposes is a fairly recent development. In their everyday practice, more and more teachers, language students, linguists or translators take recourse t...

Reading Guide

Foundational Papers

Start with Feilke (2004; 76 citations) for pragmatic theory, Hausmann (2007; 62 citations) for collocation history, and Teliya et al. (1998; 74 citations) for cultural aspects to build core concepts.

Recent Advances

Study Pęzik (2017) on parallel corpora equivalence, Sułkowska (2016) on phraseodidactics, and Bürger (2017) on 30-year German research advances.

Core Methods

Corpus frequency analysis (Colson, 2007); parallel alignment for translation (Pęzik, 2017); statistical tests like MI for salience (Ďurčo, 2010).

How PapersFlow Helps You Research Phraseology and Collocations

Discover & Search

Research Agent uses searchPapers and exaSearch to find core papers like Hausmann (2007) on collocations in phraseology. citationGraph reveals citation links from Feilke (2004; 76 citations) to recent works like Pęzik (2017). findSimilarPapers expands to parallel corpora studies from Teliya et al. (1998).

Analyze & Verify

Analysis Agent applies readPaperContent to extract corpus methods from Colson (2007), then runPythonAnalysis with pandas to compute MI-scores on sample collocation data. verifyResponse via CoVe cross-checks claims against GRADE evidence grading for statistical reliability in Hausmann (2007).

Synthesize & Write

Synthesis Agent detects gaps in cross-lingual studies post-Sułkowska (2016), flagging contradictions in cultural phraseology (Teliya et al., 1998). Writing Agent uses latexEditText and latexSyncCitations to draft phraseology reviews, latexCompile for publication-ready output, and exportMermaid for collocation network diagrams.

Use Cases

"Compute mutual information scores for English-German collocations using sample corpus data."

Research Agent → searchPapers (Hausmann 2007) → Analysis Agent → runPythonAnalysis (pandas MI computation) → matplotlib plot of top collocations.

"Draft a LaTeX review of phraseodidactics citing Sułkowska 2016 and Feilke 2004."

Synthesis Agent → gap detection → Writing Agent → latexEditText (integrate sections) → latexSyncCitations (add 5 papers) → latexCompile (PDF output).

"Find GitHub repos extracting collocations from legal texts like Goźdź‐Roszkowski 2015."

Research Agent → searchPapers (legal phraseology) → Code Discovery → paperExtractUrls → paperFindGithubRepo → githubRepoInspect (corpus scripts).

Automated Workflows

Deep Research workflow conducts systematic reviews of 50+ phraseology papers, chaining searchPapers → citationGraph → structured report on collocation evolution (Hausmann 2007 to Pęzik 2017). DeepScan applies 7-step analysis with CoVe checkpoints to verify corpus claims in Colson (2007). Theorizer generates hypotheses on web corpora for rare idioms from Bürger (2017).

Frequently Asked Questions

What defines phraseology and collocations?

Phraseology examines fixed multi-word units and idioms; collocations are statistically salient word pairs (Hausmann, 2007). Feilke (2004) frames them under pragmatic theory with 76 citations.

What are main methods in this subtopic?

Corpus statistics identify collocations via MI or t-scores; parallel corpora test equivalence (Pęzik, 2017). Web data extracts set phrases (Colson, 2007).

Which are key papers?

Feilke (2004; 76 citations) on theory; Teliya et al. (1998; 74 citations) on culture; Hausmann (2007; 62 citations) on collocations.

What open problems exist?

Cross-lingual mapping lacks robust metrics (Sułkowska, 2016); rare phrase detection needs larger web corpora (Colson, 2007); psycholinguistic validation trails corpus evidence (Bürger, 2017).

Research Linguistic research and analysis with AI

PapersFlow provides specialized AI tools for Arts and Humanities researchers. Here are the most relevant for this topic:

See how researchers in Arts & Humanities use PapersFlow

Field-specific workflows, example queries, and use cases.

Arts & Humanities Guide

Start Researching Phraseology and Collocations with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

See how PapersFlow works for Arts and Humanities researchers