Subtopic Deep Dive
Text Mining and Social Media Analysis
Research Guide
What is Text Mining and Social Media Analysis?
Text Mining and Social Media Analysis applies natural language processing, topic modeling, and sentiment analysis to extract insights from social media texts and unstructured data.
This subtopic focuses on techniques like keyword extraction, fuzzy K-means clustering, and text network analysis for processing large-scale social media corpora (Rashid et al., 2019; Park, 2019). Key applications include real-time Twitter trend detection and election opinion mining (Bae et al., 2013; Rodrigues et al., 2021). Over 30 papers since 2013 demonstrate its growth, with 68 citations for Huh's (2018) personalized health keyword extraction.
Why It Matters
Text mining of social media enables real-time public opinion tracking during elections, as shown in Bae et al.'s (2013) analysis of the 2012 South Korea presidential election using text mining on Twitter data. In healthcare, Huh (2018) applied keyword extraction to big data for personalized obesity interventions, while Song and Ryu (2015) developed frameworks for healthcare and social sectors. Rodrigues et al. (2021) used machine learning for Twitter trend analysis, aiding crisis management and marketing with 32 citations.
Key Research Challenges
Handling Noisy Social Data
Social media texts contain slang, emojis, and non-standard formats, complicating preprocessing (Bae et al., 2013). Bae et al. (2013) addressed this in Twitter election analysis but noted real-time processing limits. Current methods struggle with multilingual and evolving slang.
Scalable Topic Modeling
Extracting coherent topics from massive corpora requires efficient clustering like fuzzy K-means (Rashid et al., 2019). Rashid et al. (2019) combined inverse document frequency with fuzzy K-means for biomedical texts, yet scaling to petabyte social data remains challenging. Computational overhead limits real-time use.
Sentiment Accuracy in Contexts
Sentiment classifiers like Naive Bayes face sarcasm and context shifts in reviews (Zuo, 2018). Zuo (2018) applied Naive Bayes and decision trees to Steam reviews with 22 citations, but domain adaptation across platforms is unresolved. Rodrigues et al. (2021) highlighted machine learning needs for Twitter trends.
Essential Papers
Big Data Analysis for Personalized Health Activities: Machine Learning Processing for Automatic Keyword Extraction Approach
Jun‐Ho Huh · 2018 · Symmetry · 68 citations
The obese population is increasing rapidly due to the change of lifestyle and diet habits. Obesity can cause various complications and is becoming a social disease. Nonetheless, many obese patients...
Topic Modeling Technique for Text Mining Over Biomedical Text Corpora Through Hybrid Inverse Documents Frequency and Fuzzy K-Means Clustering
Junaid Rashid, Syed Muhammad Adnan Shah, Aun Irtaza et al. · 2019 · IEEE Access · 44 citations
Text data plays an imperative role in the biomedical domain. As patient's data comprises of a huge amount of text documents in a non-standardized format. In order to obtain the relevant data, the t...
Content Analytics: The Definition, Scope, and an Overview of Published Research
Vitomir Kovanović, Srécko Joksimovíc, Dragan Gašević et al. · 2017 · Society for Learning Analytics Research (SoLAR) eBooks · 38 citations
With the large amounts of data related to student learning being collected by digital systems, the potential for using this data for improving learning processes educational researchers, practition...
Big Data Analysis Framework for Healthcare and Social Sectors in Korea
Tae-Min Song, Seewon Ryu · 2015 · Healthcare Informatics Research · 36 citations
There are some concerns with the utilization of big data in the healthcare and social welfare sectors. Thus, research on these issues must be conducted so that sophisticated and practical solutions...
Using Text Network Analysis for Analyzing Academic Papers in Nursing
Chan Sook Park · 2019 · Perspectives in Nursing Science · 33 citations
Purpose: This study examined the suitability of using text network analysis (TNA) methodology for topic analysis of academic papers related to nursing. Methods: TNA background theories, software pr...
Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques
Junghwan Bae, Jieun Son, Min Song · 2013 · Journal of Intelligence and Information Systems · 33 citations
최근 소셜미디어는 전세계적 커뮤니케이션 도구로서 사용에 전문적인 지식이나 기술이 필요하지 않기 때문에 이용자들로 하여금 콘텐츠의 실시간 생산과 공유를 가능하게 하여 기존의 커뮤니케이션 양식을 새롭게 변화시키고 있다. 특히 새로운 소통매체로서 국내외의 사회적 이슈를 실시간으로 전파하면서 이용자들이 자신의 의견을 지인 및 대중과 소통하게 하여 크게는 사회...
Real‐Time Twitter Trend Analysis Using Big Data Analytics and Machine Learning Techniques
Anisha P Rodrigues, Roshan Fernandes, Adarsh Bhandary et al. · 2021 · Wireless Communications and Mobile Computing · 32 citations
Twitter is a popular microblogging social media, using which its users can share useful information. Keeping a track of user postings and common hashtags allows us to understand what is happening a...
Reading Guide
Foundational Papers
Start with Bae et al. (2013) for Twitter text mining in elections (33 citations), then Logunov et al. (2011) for sentiment series predictability, establishing core social media analysis techniques.
Recent Advances
Study Huh (2018, 68 citations) for keyword extraction, Rashid et al. (2019, 44 citations) for hybrid topic modeling, and Rodrigues et al. (2021, 32 citations) for real-time trends.
Core Methods
Core techniques: keyword extraction (Huh, 2018), fuzzy K-means with inverse document frequency (Rashid et al., 2019), text network analysis (Park, 2019), Naive Bayes/decision trees (Zuo, 2018).
How PapersFlow Helps You Research Text Mining and Social Media Analysis
Discover & Search
Research Agent uses searchPapers and exaSearch to find core papers like Bae et al. (2013) on Twitter election text mining, then citationGraph reveals 33 citing works on social media trends. findSimilarPapers expands to Rodrigues et al. (2021) for real-time Twitter analysis.
Analyze & Verify
Analysis Agent employs readPaperContent on Huh (2018) to extract keyword methods, verifies claims via verifyResponse (CoVe) against Rashid et al. (2019) topic modeling, and runs PythonAnalysis with pandas for sentiment dataset stats from Zuo (2018). GRADE grading scores evidence strength in social media applications.
Synthesize & Write
Synthesis Agent detects gaps in real-time trend papers like Rodrigues et al. (2021), flags contradictions between Bae et al. (2013) and Huh (2018) preprocessing. Writing Agent uses latexEditText, latexSyncCitations for Bae et al., and latexCompile to produce reports with exportMermaid for topic model diagrams.
Use Cases
"Replicate sentiment analysis from Zuo (2018) Steam reviews on my Twitter health dataset"
Research Agent → searchPapers(Zuo 2018) → Analysis Agent → runPythonAnalysis(Naive Bayes on uploaded CSV) → outputs classified sentiments and accuracy metrics.
"Write a review paper on Twitter election mining citing Bae et al. (2013)"
Research Agent → citationGraph(Bae 2013) → Synthesis Agent → gap detection → Writing Agent → latexSyncCitations + latexCompile → generates LaTeX PDF with 33 citations integrated.
"Find GitHub code for fuzzy K-means topic modeling from Rashid et al. (2019)"
Research Agent → paperExtractUrls(Rashid 2019) → Code Discovery → paperFindGithubRepo → githubRepoInspect → delivers runnable fuzzy K-means scripts for social media texts.
Automated Workflows
Deep Research workflow conducts systematic review: searchPapers on 'Twitter text mining' → clusters 50+ papers like Bae et al. (2013) and Rodrigues et al. (2021) → structured report with GRADE scores. DeepScan applies 7-step analysis with CoVe checkpoints to verify Huh (2018) keyword extraction on custom datasets. Theorizer generates hypotheses on scalable sentiment from Zuo (2018) and Rashid et al. (2019).
Frequently Asked Questions
What defines Text Mining and Social Media Analysis?
It applies NLP, topic modeling, and sentiment analysis to extract insights from social media and unstructured texts, as in Bae et al. (2013) Twitter election study.
What are key methods used?
Methods include keyword extraction (Huh, 2018), fuzzy K-means clustering (Rashid et al., 2019), text network analysis (Park, 2019), and Naive Bayes sentiment (Zuo, 2018).
What are major papers?
Top papers: Huh (2018, 68 citations) on health keywords; Bae et al. (2013, 33 citations) on election Twitter; Rashid et al. (2019, 44 citations) on biomedical topics.
What open problems exist?
Challenges include noisy data handling, scalable real-time modeling, and context-aware sentiment, as noted in Rodrigues et al. (2021) and Zuo (2018).
Research Technology and Data Analysis with AI
PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:
AI Literature Review
Automate paper discovery and synthesis across 474M+ papers
Code & Data Discovery
Find datasets, code repositories, and computational tools
Deep Research Reports
Multi-source evidence synthesis with counter-evidence
AI Academic Writing
Write research papers with AI assistance and LaTeX support
See how researchers in Computer Science & AI use PapersFlow
Field-specific workflows, example queries, and use cases.
Start Researching Text Mining and Social Media Analysis with AI
Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.
See how PapersFlow works for Computer Science researchers
Part of the Technology and Data Analysis Research Guide