Subtopic Deep Dive

Authorship Attribution in Social Media
Research Guide

What is Authorship Attribution in Social Media?

Authorship attribution in social media identifies authors or profiles users from short, noisy texts using linguistic features, emojis, hashtags, and deep learning on platforms like Twitter and Facebook.

This subtopic analyzes brief posts with evolving language and multi-author challenges. Over 10 key papers since 2011 explore n-gram features, CNNs, and personality-gender correlations, with Schwartz et al. (2013) at 1701 citations leading. Methods target cybersecurity via fake account detection.

Curated Papers

Key Challenges

Why It Matters

Authorship attribution enables content moderation by identifying trolls and fake accounts on social platforms (Schwartz et al., 2013; Cheng et al., 2011). It combats misinformation through gender and personality profiling from digital footprints (Hinds and Joinson, 2018). Applications include cybersecurity forensics and anonymization defenses (Shetty et al., 2023; Abbasi et al., 2022).

Key Research Challenges

Short Noisy Texts

Social media posts limit words, reducing traditional stylometric features. Character n-grams and CNNs address this (Shrestha et al., 2017; Miller et al., 2012). Noise from emojis and hashtags complicates signals.

Evolving Language Patterns

Platform slang and trends shift rapidly, degrading models. Surveys highlight sociolinguistic variation needs (Nguyen et al., 2016). Adversarial training counters attribute leakage (Shetty et al., 2023).

Multi-Author Accounts

Shared accounts mix styles, evading attribution. Ensemble learning tackles anonymity in texts (Abbasi et al., 2022). Dialectal variations add complexity (Zampieri et al., 2017).

Essential Papers

Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach

H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern et al. · 2013 · PLoS ONE · 1.7K citations

We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in lan...

Author gender identification from text

Na Cheng, R. Chandramouli, K. P. Subbalakshmi · 2011 · Digital Investigation · 220 citations

Computational Sociolinguistics: A Survey

Dong Nguyen, A. Seza Doğruöz, Carolyn Penstein Rosé et al. · 2016 · Computational Linguistics · 219 citations

Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimens...

Do Characters Abuse More Than Words?

Yashar Mehdad, Joel Tetreault · 2016 · 198 citations

Although word and character n-grams have been used as features in different NLP applications, no systematic comparison or analysis has shown the power of character-based features for detecting abus...

Convolutional Neural Networks for Authorship Attribution of Short Texts

Prasha Shrestha, Sebastián Sierra, Fabio A. González et al. · 2017 · 171 citations

Prasha Shrestha, Sebastian Sierra, Fabio González, Manuel Montes, Paolo Rosso, Thamar Solorio. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Lingui...

Findings of the VarDial Evaluation Campaign 2017

Marcos Zampieri, Shervin Malmasi, Nikola Ljubešić et al. · 2017 · 145 citations

This paper describes the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2021. The campaign was part of the eighth workshop on Natural Language Processing (NLP) for...

What demographic attributes do our digital footprints reveal? A systematic review

Joanne Hinds, Adam Joinson · 2018 · PLoS ONE · 102 citations

To what extent does our online activity reveal who we are? Recent research has demonstrated that the digital traces left by individuals as they browse and interact with others online may reveal who...

Reading Guide

Foundational Papers

Start with Schwartz et al. (2013) for open-vocabulary personality-gender-age on 700M Facebook words; Cheng et al. (2011) for gender ID baselines; Miller et al. (2012) for Twitter n-grams.

Recent Advances

Study Shrestha et al. (2017) CNNs for short texts; Shetty et al. (2023) adversarial anonymization; Abbasi et al. (2022) ensemble attribution.

Core Methods

Core techniques: character/word n-grams (Miller et al., 2012), CNN classifiers (Shrestha et al., 2017), open-vocabulary LIWC (Schwartz et al., 2013), ensembles (Abbasi et al., 2022).

How PapersFlow Helps You Research Authorship Attribution in Social Media

Discover & Search

Research Agent uses searchPapers and exaSearch to find top papers like Schwartz et al. (2013) on personality-gender in Facebook texts. citationGraph reveals connections from Cheng et al. (2011) to Shrestha et al. (2017) CNNs. findSimilarPapers expands from Miller et al. (2012) Twitter n-grams.

Analyze & Verify

Analysis Agent applies readPaperContent to extract features from Shrestha et al. (2017), then runPythonAnalysis reproduces n-gram classifiers with pandas/NumPy on tweet datasets. verifyResponse (CoVe) checks claims against Nguyen et al. (2016) survey. GRADE scores evidence strength for noisy text challenges.

Synthesize & Write

Synthesis Agent detects gaps in multi-author defenses post-Shetty et al. (2023), flags contradictions in feature efficacy. Writing Agent uses latexEditText for methods sections, latexSyncCitations for 10+ papers, latexCompile for full reviews, exportMermaid for stylometry pipelines.

Use Cases

"Reproduce n-gram gender prediction accuracy on Twitter data from Miller et al. 2012."

Research Agent → searchPapers → Analysis Agent → runPythonAnalysis (pandas tokenization, scikit-learn classifier) → matplotlib accuracy plot and CSV export.

"Draft LaTeX review comparing CNN vs ensemble authorship methods."

Synthesis Agent → gap detection → Writing Agent → latexEditText (intro/methods) → latexSyncCitations (Schrestha 2017, Abbasi 2022) → latexCompile → PDF output.

"Find GitHub repos implementing social media attribution from recent papers."

Research Agent → citationGraph (Shetty 2023) → Code Discovery → paperExtractUrls → paperFindGithubRepo → githubRepoInspect → runnable Jupyter notebooks.

Automated Workflows

Deep Research scans 50+ papers from Schwartz (2013) to Abbasi (2022), outputs structured report on feature evolution. DeepScan verifies claims in Shrestha et al. (2017) via 7-step CoVe with runPythonAnalysis checkpoints. Theorizer generates hypotheses on adversarial defenses from Shetty et al. (2023) and Nguyen (2016).

Try Doxa for Authorship Attribution in Social Media Research

Frequently Asked Questions

What defines authorship attribution in social media?

It attributes authors or profiles from short texts using features like n-grams, emojis, and CNNs on Twitter/Facebook (Shrestha et al., 2017; Miller et al., 2012).

What are main methods used?

Character n-grams for gender (Miller et al., 2012), CNNs for short texts (Shrestha et al., 2017), ensembles for anonymity (Abbasi et al., 2022).

What are key papers?

Schwartz et al. (2013, 1701 citations) on personality/gender; Cheng et al. (2011, 220 citations) on gender ID; Shrestha et al. (2017, 171 citations) on CNNs.

What open problems exist?

Handling evolving slang, multi-author mixing, and adversarial anonymization persist (Nguyen et al., 2016; Shetty et al., 2023).

Research Authorship Attribution and Profiling with AI

PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:

AI Literature Review

Automate paper discovery and synthesis across 474M+ papers

Code & Data Discovery

Find datasets, code repositories, and computational tools

Deep Research Reports

Multi-source evidence synthesis with counter-evidence

AI Academic Writing

Write research papers with AI assistance and LaTeX support

See how researchers in Computer Science & AI use PapersFlow

Field-specific workflows, example queries, and use cases.

Computer Science & AI Guide

Start Researching Authorship Attribution in Social Media with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

Try PapersFlow Free See AI Literature Review

See how PapersFlow works for Computer Science researchers

Part of the Authorship Attribution and Profiling Research Guide