Subtopic Deep Dive

Query Analysis and Reformulation
Research Guide

What is Query Analysis and Reformulation?

Query Analysis and Reformulation is the process of examining user queries to infer intent, correct errors, and expand terms for improved information retrieval performance.

Researchers analyze query logs to identify patterns in user behavior and reformulate queries using techniques like automatic expansion and spell correction. Key studies include log analyses from AltaVista (Silverstein et al., 1999, 1155 citations) and web queries (Jansen et al., 2000, 1318 citations). Over 10 papers from the list focus on query characteristics and expansion methods.

15
Curated Papers
3
Key Challenges

Why It Matters

Query analysis reduces vocabulary mismatch between users and systems, as shown in Furnas et al. (1987, 1478 citations), enabling systems to match spontaneous user terms to documents. Jansen et al. (2000) analyzed real web queries, revealing short lengths and specificity issues that reformulation addresses to lower abandonment. Rose and Levinson (2004, 933 citations) classified goals like navigational and informational, improving relevance in search engines.

Key Research Challenges

Vocabulary Mismatch

Users select terms systems fail to match spontaneously (Furnas et al., 1987). This requires predicting user vocabulary without training. Log analyses confirm term rarity (Silverstein et al., 1999).

Inferring User Intent

Queries lack context, complicating goal classification (Rose and Levinson, 2004). Jansen et al. (2000) found multisession refinements needed. Distinguishing informational from navigational searches remains difficult.

Effective Expansion

Query expansion risks topic drift (Qiu and Frei, 1993). Probabilistic models using thesauri help but need evaluation (Järvelin and Kekäläinen, 2000). Balancing precision and recall persists.

Essential Papers

1.

The vocabulary problem in human-system communication

George W. Furnas, Thomas K. Landauer, Louis M. Gomez et al. · 1987 · Communications of the ACM · 1.5K citations

In almost all computer applications, users must enter correct words for the desired objects or actions. For success without extensive training, or in first-tries for new targets, the system must re...

2.

Real life, real users, and real needs: a study and analysis of user queries on the web

Bernard J. Jansen, Amanda Spink, Tefko Saračević · 2000 · Information Processing & Management · 1.3K citations

3.

Information filtering and information retrieval

Nicholas J. Belkin, W. Bruce Croft · 1992 · Communications of the ACM · 1.3K citations

article Free AccessInformation filtering and information retrieval: two sides of the same coin? Authors: Nicholas J. Belkin Rutgers Univ., New Brunswick, NJ Rutgers Univ., New Brunswick, NJView Pro...

4.

Analysis of a very large web search engine query log

Craig Silverstein, Hannes Marais, Monika Henzinger et al. · 1999 · ACM SIGIR Forum · 1.2K citations

In this paper we present an analysis of an AltaVista Search Engine query log consisting of approximately 1 billion entries for search requests over a period of six weeks. This represents almost 285...

5.

IR evaluation methods for retrieving highly relevant documents

Kalervo Järvelin, Jaana Kekäläinen · 2000 · 958 citations

This paper proposes evaluation methods based on the use of non-dichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to...

6.

Understanding user goals in web search

Daniel E. Rose, Danny Levinson · 2004 · 933 citations

Previous work on understanding user web search behavior has focused on how people search and what they are searching for, but not why they are searching. In this paper, we describe a framework for ...

7.

How are we searching the World Wide Web? A comparison of nine search engine transaction logs

Bernard J. Jansen, Amanda Spink · 2005 · Information Processing & Management · 802 citations

Reading Guide

Foundational Papers

Start with Furnas et al. (1987) for vocabulary mismatch core problem, then Jansen et al. (2000) for real query analysis, and Silverstein et al. (1999) for large-scale log insights.

Recent Advances

Rose and Levinson (2004) on user goals; Pan et al. (2007) on trust in reformulated results; Jansen and Spink (2005) comparing search engine logs.

Core Methods

Query log mining (Silverstein et al., 1999), probabilistic expansion (Qiu and Frei, 1993), non-dichotomous evaluation (Järvelin and Kekäläinen, 2000).

How PapersFlow Helps You Research Query Analysis and Reformulation

Discover & Search

Research Agent uses searchPapers for 'query reformulation techniques' to find Jansen et al. (2000), then citationGraph reveals backward citations to Furnas et al. (1987), and findSimilarPapers uncovers Silverstein et al. (1999) for log analysis parallels.

Analyze & Verify

Analysis Agent applies readPaperContent to extract query length stats from Silverstein et al. (1999), verifies claims with CoVe against Jansen et al. (2000), and runs PythonAnalysis on log data for statistical verification of reformulation impact, graded by GRADE for evidence strength.

Synthesize & Write

Synthesis Agent detects gaps in intent classification post-Rose and Levinson (2004), flags contradictions between Belkin and Croft (1992) filtering views, while Writing Agent uses latexEditText, latexSyncCitations for Furnas et al., and latexCompile for query expansion surveys.

Use Cases

"Analyze query log statistics for reformulation research"

Research Agent → searchPapers → Analysis Agent → runPythonAnalysis (pandas on extracted log stats from Silverstein et al.) → matplotlib plots of query lengths and reformulation needs.

"Write survey on query intent classification"

Synthesis Agent → gap detection → Writing Agent → latexEditText for sections → latexSyncCitations (Rose and Levinson) → latexCompile → PDF with bibliography.

"Find code for query expansion models"

Research Agent → searchPapers 'concept query expansion' → Code Discovery → paperExtractUrls (Qiu and Frei) → paperFindGithubRepo → githubRepoInspect for thesaurus implementations.

Automated Workflows

Deep Research workflow scans 50+ papers on query logs via searchPapers, structures reports with citationGraph from Jansen et al. (2000), and synthesizes reformulation trends. DeepScan applies 7-step analysis with CoVe checkpoints on Furnas et al. (1987) vocabulary claims. Theorizer generates hypotheses on intent evolution from Silverstein et al. (1999) logs.

Frequently Asked Questions

What is query analysis in information retrieval?

Query analysis examines user input for errors, intent, and expansion needs (Jansen et al., 2000). It uses log mining to identify patterns like short queries (Silverstein et al., 1999).

What are main reformulation methods?

Methods include concept-based expansion (Qiu and Frei, 1993) and intent classification (Rose and Levinson, 2004). Log studies inform probabilistic models (Furnas et al., 1987).

What are key papers on query analysis?

Furnas et al. (1987, 1478 citations) on vocabulary; Jansen et al. (2000, 1318 citations) on web queries; Silverstein et al. (1999, 1155 citations) on AltaVista logs.

What open problems exist?

Inferring conversational intent and avoiding expansion drift persist (Belkin and Croft, 1992). Evaluation needs graded relevance (Järvelin and Kekäläinen, 2000).

Research Information Retrieval and Search Behavior with AI

PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:

See how researchers in Computer Science & AI use PapersFlow

Field-specific workflows, example queries, and use cases.

Computer Science & AI Guide

Start Researching Query Analysis and Reformulation with AI

Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.

See how PapersFlow works for Computer Science researchers