PapersFlow Research Brief

Physical Sciences · Computer Science

Authorship Attribution and Profiling
Research Guide

What is Authorship Attribution and Profiling?

Authorship attribution and profiling is the application of stylometry, text classification, machine learning, and forensic linguistics to identify authors of anonymous texts, predict demographic attributes from online content, and analyze linguistic uniqueness across genres and languages, including gender differences and language use in social media.

The field encompasses 23,296 works focused on authorship attribution, stylometry, and user profiling in text. Techniques include text classification, machine learning, and forensic linguistics for analyzing gender differences and language use in social media. Research targets author identification of anonymous texts and prediction of demographic attributes from online content.

Topic Hierarchy

100%

graph TD D["Physical Sciences"] F["Computer Science"] S["Artificial Intelligence"] T["Authorship Attribution and Profiling"] D --> F F --> S S --> T style T fill:#DC5238,stroke:#c4452e,stroke-width:2px

Scroll to zoom • Drag to pan

23.3K

Papers

N/A

5yr Growth

123.6K

Total Citations

Research Sub-Topics

Stylometry for Authorship Attribution

This sub-topic examines statistical and machine learning methods to identify authors based on linguistic style markers such as n-gram frequencies, function word usage, and syntactic patterns. Researchers develop and evaluate stylometric models for attributing authorship in literary, forensic, and digital texts across languages.

15 papers

Authorship Attribution in Social Media

This area focuses on attributing authorship and profiling users from short, noisy social media texts using features like emojis, hashtags, and posting patterns combined with deep learning classifiers. Studies address challenges like evolving language and multi-author accounts in platforms such as Twitter and Facebook.

15 papers

Cross-Lingual Authorship Attribution

Researchers investigate methods to attribute authorship across different languages, leveraging transfer learning and language-independent stylometric features to handle multilingual corpora. Work includes evaluating performance on low-resource languages and genre-specific adaptations.

13 papers

Demographic Profiling from Text

This sub-topic explores predicting user attributes like age, gender, and personality from linguistic cues in online texts using supervised learning and topic models. Research quantifies biases in profiling models and improves accuracy across genres like blogs and forums.

15 papers

Adversarial Stylometry and Obfuscation

Studies develop attacks on stylometric systems and countermeasures, including authorship obfuscation techniques that alter text style while preserving semantics using GANs and paraphrasing. Researchers benchmark robustness of attribution models against such evasions.

15 papers

Why It Matters

Authorship attribution and profiling enables identification of authors behind anonymous texts, with applications in forensic linguistics and security. Bertrand and Mullainathan (2004) demonstrated labor market discrimination by sending fictitious resumes with African-American- or White-sounding names, where White names received 50 percent more callbacks, highlighting how linguistic profiling reveals biases in text-based decisions. Caliskan et al. (2017) showed that semantics from language corpora contain human-like biases, as machines learn word associations mirroring societal prejudices, impacting AI fairness in hiring and content moderation. Chung and Pennebaker (2012) developed LIWC software to classify texts along psychological dimensions using word categories, aiding prediction of outcomes from social media language.

Reading Guide

Where to Start

"Linguistic Inquiry and Word Count (LIWC)" by Chung and Pennebaker (2012) because it provides a practical tool for text classification along psychological dimensions using word categories, serving as an accessible entry to profiling techniques.

Key Papers Explained

Bertrand and Mullainathan (2004) in "Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination" established linguistic profiling by showing name-based discrimination in resumes, with White names receiving 50% more callbacks. Brown et al. (1993) in "The mathematics of statistical machine translation: parameter estimation" built foundational statistical models for word alignment, extended by Brown et al. (1992) in "Class-based n-gram models of natural language" for predicting words via co-occurrence classes. Church and Hanks (1990) in "Word association norms, mutual information, and lexicography" added mutual information for language patterns, while Caliskan et al. (2017) in "Semantics derived automatically from language corpora contain human-like biases" connected these to bias detection in modern ML.

Paper Timeline

100%

graph LR P0["Language identification in the l...
1967 · 3.6K cites"] P1["Word association norms, mutual i...
1990 · 3.7K cites"] P2["Class-based n -gram models of na...
1992 · 2.9K cites"] P3["The mathematics of statistical m...
1993 · 4.1K cites"] P4["Are Emily and Greg More Employab...
2004 · 4.3K cites"] P5["Quantitative Analysis of Culture...
2010 · 3.0K cites"] P6["Semantics derived automatically ...
2017 · 2.6K cites"] P0 --> P1 P1 --> P2 P2 --> P3 P3 --> P4 P4 --> P5 P5 --> P6 style P4 fill:#DC5238,stroke:#c4452e,stroke-width:2px

Scroll to zoom • Drag to pan

Most-cited paper highlighted in red. Papers ordered chronologically.

Advanced Directions

Research continues building on statistical models from Brown et al. (1993) and n-gram approaches in Brown et al. (1992), focusing on social media analysis without recent preprints. Emphasis remains on forensic applications from Chung and Pennebaker (2012) and bias mitigation per Caliskan et al. (2017).

Papers at a Glance

#	Paper	Year	Venue	Citations	Open Access
1	Are Emily and Greg More Employable Than Lakisha and Jamal? A F...	2004	American Economic Review	4.3K	✕
2	The mathematics of statistical machine translation: parameter ...	1993	—	4.1K	✕
3	Word association norms, mutual information, and lexicography	1990	Computational Linguistics	3.7K	✕
4	Language identification in the limit	1967	Information and Control	3.6K	✕
5	Quantitative Analysis of Culture Using Millions of Digitized B...	2010	Science	3.0K	✓
6	Class-based n -gram models of natural language	1992	Computational Linguistics	2.9K	✕
7	Semantics derived automatically from language corpora contain ...	2017	Science	2.6K	✓
8	A survey of named entity recognition and classification	2007	Lingvisticae Investiga...	2.5K	✕
9	Linguistic Inquiry and Word Count (LIWC)	2012	IGI Global eBooks	2.2K	✕
10	On the prediction of occurrence of particular verbal intrusion...	1959	Journal of Experimenta...	2.1K	✕

Frequently Asked Questions

What is stylometry in authorship attribution?

Stylometry analyzes linguistic features to identify authors of anonymous texts. It uses techniques like word frequency and n-gram models to distinguish individual writing styles. Brown et al. (1993) applied statistical models for word alignment in translation, foundational for stylometric parameter estimation.

How does machine learning contribute to user profiling?

Machine learning classifies texts to predict demographic attributes from online content. Caliskan et al. (2017) trained models on language corpora to detect human-like biases in word associations. Chung and Pennebaker (2012) used LIWC for efficient psychological classification of texts.

What role does forensic linguistics play in gender differences analysis?

Forensic linguistics examines language use in social media to attribute authorship and profile users. Nadeau and Sekine (2007) surveyed named entity recognition systems developed with hand-crafted grammars for text analysis. Church and Hanks (1990) used mutual information from word associations for statistical descriptions of language patterns.

What are key methods in text classification for author identification?

Methods include class-based n-gram models and linguistic inquiry tools. Brown et al. (1992) developed n-gram models assigning words to classes based on co-occurrence frequencies. Chung and Pennebaker (2012) introduced LIWC to count grammatical, psychological, and content words for text classification.

How has LIWC advanced authorship profiling?

LIWC references a dictionary of word categories to classify texts psychologically. Chung and Pennebaker (2012) describe its use in predicting outcomes from language use. It efficiently analyzes social media for user profiling and demographic prediction.

What is the current state of research in this field?

The field includes 23,296 works on stylometry and profiling across languages and genres. Top papers focus on statistical models and bias detection, with 4317 citations for Bertrand and Mullainathan (2004). No recent preprints or news coverage reported in the last 12 months.

Open Research Questions

? How can stylometric models distinguish authors across multiple languages and genres while accounting for topic variation?
? What techniques mitigate human-like biases in machine learning models trained for authorship profiling?
? How do n-gram models and word association norms improve prediction of demographic attributes from short social media texts?
? Which linguistic features best capture individual uniqueness for forensic attribution of anonymous online content?
? How can LIWC categories be extended to real-time profiling in diverse cultural contexts?

Recent Trends

The field maintains 23,296 works with no reported 5-year growth rate.

Highly cited papers like Bertrand and Mullainathan with 4317 citations and Brown et al. (1993) with 4118 citations dominate, focusing on linguistic discrimination and statistical models.

2004

No recent preprints or news coverage in the last 12 months indicates steady reliance on established methods.

Research Authorship Attribution and Profiling with AI

PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:

AI Literature Review

Automate paper discovery and synthesis across 474M+ papers

Code & Data Discovery

Find datasets, code repositories, and computational tools

Deep Research Reports

Multi-source evidence synthesis with counter-evidence

AI Academic Writing

Write research papers with AI assistance and LaTeX support

See how researchers in Computer Science & AI use PapersFlow

Field-specific workflows, example queries, and use cases.

Computer Science & AI Guide

Start Researching Authorship Attribution and Profiling with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

Try PapersFlow Free See AI Literature Review

See how PapersFlow works for Computer Science researchers

Topic Hierarchy

Research Sub-Topics

Stylometry for Authorship Attribution

Authorship Attribution in Social Media

Cross-Lingual Authorship Attribution

Demographic Profiling from Text

Adversarial Stylometry and Obfuscation

Related Topics

Why It Matters

Reading Guide

Where to Start

Key Papers Explained

Paper Timeline

Advanced Directions

Papers at a Glance

Frequently Asked Questions

What is stylometry in authorship attribution?

How does machine learning contribute to user profiling?

What role does forensic linguistics play in gender differences analysis?

What are key methods in text classification for author identification?

How has LIWC advanced authorship profiling?

What is the current state of research in this field?

Open Research Questions

Recent Trends

Research Authorship Attribution and Profiling with AI

AI Literature Review

Code & Data Discovery

Deep Research Reports

AI Academic Writing

Start Researching Authorship Attribution and Profiling with AI