Subtopic Deep Dive
Authorship Attribution in Social Media
Research Guide
What is Authorship Attribution in Social Media?
Authorship attribution in social media identifies authors or profiles users from short, noisy texts using linguistic features, emojis, hashtags, and deep learning on platforms like Twitter and Facebook.
This subtopic analyzes brief posts with evolving language and multi-author challenges. Over 10 key papers since 2011 explore n-gram features, CNNs, and personality-gender correlations, with Schwartz et al. (2013) at 1701 citations leading. Methods target cybersecurity via fake account detection.
Why It Matters
Authorship attribution enables content moderation by identifying trolls and fake accounts on social platforms (Schwartz et al., 2013; Cheng et al., 2011). It combats misinformation through gender and personality profiling from digital footprints (Hinds and Joinson, 2018). Applications include cybersecurity forensics and anonymization defenses (Shetty et al., 2023; Abbasi et al., 2022).
Key Research Challenges
Short Noisy Texts
Social media posts limit words, reducing traditional stylometric features. Character n-grams and CNNs address this (Shrestha et al., 2017; Miller et al., 2012). Noise from emojis and hashtags complicates signals.
Evolving Language Patterns
Platform slang and trends shift rapidly, degrading models. Surveys highlight sociolinguistic variation needs (Nguyen et al., 2016). Adversarial training counters attribute leakage (Shetty et al., 2023).
Multi-Author Accounts
Shared accounts mix styles, evading attribution. Ensemble learning tackles anonymity in texts (Abbasi et al., 2022). Dialectal variations add complexity (Zampieri et al., 2017).
Essential Papers
Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach
H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern et al. · 2013 · PLoS ONE · 1.7K citations
We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in lan...
Author gender identification from text
Na Cheng, R. Chandramouli, K. P. Subbalakshmi · 2011 · Digital Investigation · 220 citations
Computational Sociolinguistics: A Survey
Dong Nguyen, A. Seza Doğruöz, Carolyn Penstein Rosé et al. · 2016 · Computational Linguistics · 219 citations
Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimens...
Do Characters Abuse More Than Words?
Yashar Mehdad, Joel Tetreault · 2016 · 198 citations
Although word and character n-grams have been used as features in different NLP applications, no systematic comparison or analysis has shown the power of character-based features for detecting abus...
Convolutional Neural Networks for Authorship Attribution of Short Texts
Prasha Shrestha, Sebastián Sierra, Fabio A. González et al. · 2017 · 171 citations
Prasha Shrestha, Sebastian Sierra, Fabio González, Manuel Montes, Paolo Rosso, Thamar Solorio. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Lingui...
Findings of the VarDial Evaluation Campaign 2017
Marcos Zampieri, Shervin Malmasi, Nikola Ljubešić et al. · 2017 · 145 citations
This paper describes the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2021. The campaign was part of the eighth workshop on Natural Language Processing (NLP) for...
What demographic attributes do our digital footprints reveal? A systematic review
Joanne Hinds, Adam Joinson · 2018 · PLoS ONE · 102 citations
To what extent does our online activity reveal who we are? Recent research has demonstrated that the digital traces left by individuals as they browse and interact with others online may reveal who...
Reading Guide
Foundational Papers
Start with Schwartz et al. (2013) for open-vocabulary personality-gender-age on 700M Facebook words; Cheng et al. (2011) for gender ID baselines; Miller et al. (2012) for Twitter n-grams.
Recent Advances
Study Shrestha et al. (2017) CNNs for short texts; Shetty et al. (2023) adversarial anonymization; Abbasi et al. (2022) ensemble attribution.
Core Methods
Core techniques: character/word n-grams (Miller et al., 2012), CNN classifiers (Shrestha et al., 2017), open-vocabulary LIWC (Schwartz et al., 2013), ensembles (Abbasi et al., 2022).
How PapersFlow Helps You Research Authorship Attribution in Social Media
Discover & Search
Research Agent uses searchPapers and exaSearch to find top papers like Schwartz et al. (2013) on personality-gender in Facebook texts. citationGraph reveals connections from Cheng et al. (2011) to Shrestha et al. (2017) CNNs. findSimilarPapers expands from Miller et al. (2012) Twitter n-grams.
Analyze & Verify
Analysis Agent applies readPaperContent to extract features from Shrestha et al. (2017), then runPythonAnalysis reproduces n-gram classifiers with pandas/NumPy on tweet datasets. verifyResponse (CoVe) checks claims against Nguyen et al. (2016) survey. GRADE scores evidence strength for noisy text challenges.
Synthesize & Write
Synthesis Agent detects gaps in multi-author defenses post-Shetty et al. (2023), flags contradictions in feature efficacy. Writing Agent uses latexEditText for methods sections, latexSyncCitations for 10+ papers, latexCompile for full reviews, exportMermaid for stylometry pipelines.
Use Cases
"Reproduce n-gram gender prediction accuracy on Twitter data from Miller et al. 2012."
Research Agent → searchPapers → Analysis Agent → runPythonAnalysis (pandas tokenization, scikit-learn classifier) → matplotlib accuracy plot and CSV export.
"Draft LaTeX review comparing CNN vs ensemble authorship methods."
Synthesis Agent → gap detection → Writing Agent → latexEditText (intro/methods) → latexSyncCitations (Schrestha 2017, Abbasi 2022) → latexCompile → PDF output.
"Find GitHub repos implementing social media attribution from recent papers."
Research Agent → citationGraph (Shetty 2023) → Code Discovery → paperExtractUrls → paperFindGithubRepo → githubRepoInspect → runnable Jupyter notebooks.
Automated Workflows
Deep Research scans 50+ papers from Schwartz (2013) to Abbasi (2022), outputs structured report on feature evolution. DeepScan verifies claims in Shrestha et al. (2017) via 7-step CoVe with runPythonAnalysis checkpoints. Theorizer generates hypotheses on adversarial defenses from Shetty et al. (2023) and Nguyen (2016).
Frequently Asked Questions
What defines authorship attribution in social media?
It attributes authors or profiles from short texts using features like n-grams, emojis, and CNNs on Twitter/Facebook (Shrestha et al., 2017; Miller et al., 2012).
What are main methods used?
Character n-grams for gender (Miller et al., 2012), CNNs for short texts (Shrestha et al., 2017), ensembles for anonymity (Abbasi et al., 2022).
What are key papers?
Schwartz et al. (2013, 1701 citations) on personality/gender; Cheng et al. (2011, 220 citations) on gender ID; Shrestha et al. (2017, 171 citations) on CNNs.
What open problems exist?
Handling evolving slang, multi-author mixing, and adversarial anonymization persist (Nguyen et al., 2016; Shetty et al., 2023).
Research Authorship Attribution and Profiling with AI
PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:
AI Literature Review
Automate paper discovery and synthesis across 474M+ papers
Code & Data Discovery
Find datasets, code repositories, and computational tools
Deep Research Reports
Multi-source evidence synthesis with counter-evidence
AI Academic Writing
Write research papers with AI assistance and LaTeX support
See how researchers in Computer Science & AI use PapersFlow
Field-specific workflows, example queries, and use cases.
Start Researching Authorship Attribution in Social Media with AI
Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.
See how PapersFlow works for Computer Science researchers