Subtopic Deep Dive
Time Series Feature Extraction
Research Guide
What is Time Series Feature Extraction?
Time Series Feature Extraction is the automated process of deriving statistical, frequency-domain, and shape-based features from raw time series data to enable machine learning tasks like classification and forecasting.
This subtopic encompasses methods such as catch22 features, shapelet transforms, and scalable pipelines for large datasets. Key libraries include tsfresh by Christ et al. (2018, 1140 citations), which automates hypothesis-test-based extraction. Over 10 highly cited papers from 1993 to 2021 address feature engineering for time series analysis.
Why It Matters
Extracted features transform raw signals into inputs for predictive models, improving forecasting accuracy in electricity consumption (Zhou et al., 2021) and anomaly detection (Breunig et al., 2000). Shapelet-based features enhance time series classification (Bagnall et al., 2016), while symbolic representations like SAX enable efficient streaming analysis (Lin et al., 2003). These features drive applications in finance, traffic forecasting (Wu et al., 2020), and industrial monitoring, reducing manual engineering time (Christ et al., 2018).
Key Research Challenges
Scalability for Long Sequences
Extracting features from long time series exceeds memory limits in pipelines like tsfresh. Zhou et al. (2021) highlight efficiency needs for LSTF. Scalable approximations are required for real-time applications.
Feature Relevance Selection
Thousands of candidates from libraries overwhelm models without automated filtering. Christ et al. (2018) use hypothesis tests but selection remains computationally intensive. Unsupervised methods struggle with noisy data (Längkvist et al., 2014).
Interpretability of Deep Features
End-to-end deep networks bypass explicit extraction but lack interpretability (Wang et al., 2017). Hybrid approaches blending handcrafted and learned features need validation. Bagnall et al. (2016) benchmark shows trade-offs in accuracy versus explainability.
Essential Papers
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting
Haoyi Zhou, Shanghang Zhang, Jieqi Peng et al. · 2021 · Proceedings of the AAAI Conference on Artificial Intelligence · 5.1K citations
Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction ca...
LOF
Markus Breunig, Hans‐Peter Kriegel, Raymond T. Ng et al. · 2000 · 3.6K citations
For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work i...
Support Vector Data Description
David M. J. Tax, Robert P. W. Duin · 2003 · Machine Learning · 3.4K citations
Efficient similarity search in sequence databases
Rakesh Agrawal, Christos Faloutsos, Arun Swami · 1993 · Lecture notes in computer science · 2.0K citations
Time series classification from scratch with deep neural networks: A strong baseline
Zhiguang Wang, Weizhong Yan, Tim Oates · 2017 · 1.9K citations
We propose a simple but strong baseline for time series classification from scratch with deep neural networks. Our proposed baseline models are pure end-to-end without any heavy preprocessing on th...
A symbolic representation of time series, with implications for streaming algorithms
Jessica Lin, Eamonn Keogh, Stefano Lonardi et al. · 2003 · 1.8K citations
The parallel explosions of interest in streaming data, and data mining of time series have had surprisingly little intersection. This is in spite of the fact that time series data are typically str...
Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks
Zonghan Wu, Shirui Pan, Guodong Long et al. · 2020 · 1.6K citations
Modeling multivariate time series has long been a subject that has attracted researchers from a diverse range of fields including economics, finance, and traffic. A basic assumption behind multivar...
Reading Guide
Foundational Papers
Start with SAX representation (Lin et al., 2003, 1771 citations) for streaming basics, then LOF (Breunig et al., 2000, 3642 citations) and SVDD (Tax and Duin, 2003, 3384 citations) for anomaly features, Längkvist et al. (2014, 1228 citations) review for unsupervised learning foundations.
Recent Advances
tsfresh package (Christ et al., 2018, 1140 citations) for scalable extraction, bake off (Bagnall et al., 2016, 1318 citations) for classification benchmarks, Informer (Zhou et al., 2021, 5132 citations) for long-sequence challenges.
Core Methods
Hypothesis tests (tsfresh), symbolic approximations (SAX), shapelet transforms, deep convolutional networks (Wang et al., 2017), graph-based multivariate (Wu et al., 2020).
How PapersFlow Helps You Research Time Series Feature Extraction
Discover & Search
Research Agent uses searchPapers with query 'time series feature extraction scalable' to find tsfresh (Christ et al., 2018), then citationGraph reveals 100+ downstream works and findSimilarPapers uncovers catch22 extensions. exaSearch on 'shapelet transform efficiency' surfaces Bagnall et al. (2016) benchmarks.
Analyze & Verify
Analysis Agent runs readPaperContent on Christ et al. (2018) to extract tsfresh algorithm details, verifies feature counts via runPythonAnalysis on UCR archive datasets, and applies GRADE grading for statistical significance. verifyResponse (CoVe) cross-checks claims against Lin et al. (2003) SAX methods.
Synthesize & Write
Synthesis Agent detects gaps in scalable shapelet extraction post-Bagnall et al. (2016), flags contradictions between deep baselines (Wang et al., 2017) and symbolic features. Writing Agent uses latexEditText for equations, latexSyncCitations for 20+ papers, latexCompile for arXiv-ready manuscript, and exportMermaid for feature pipeline diagrams.
Use Cases
"Benchmark tsfresh features on UCR time series classification datasets"
Research Agent → searchPapers 'tsfresh UCR' → Analysis Agent → runPythonAnalysis (load UCR CSV, extract 700+ features, compute AUC via sklearn) → GRADE report with p-values and visualizations.
"Write LaTeX review of shapelet vs deep feature extraction"
Research Agent → citationGraph on Bagnall et al. (2016) → Synthesis Agent → gap detection → Writing Agent → latexEditText (add SAX equations from Lin et al., 2003), latexSyncCitations, latexCompile → PDF with diagrams.
"Find GitHub repos implementing catch22 time series features"
Research Agent → searchPapers 'catch22 features' → Code Discovery → paperExtractUrls → paperFindGithubRepo → githubRepoInspect (verify hctsa/catch22 implementation) → runPythonAnalysis test on sample data.
Automated Workflows
Deep Research workflow scans 50+ papers via searchPapers on 'time series feature extraction', chains citationGraph → findSimilarPapers, outputs structured report ranking tsfresh (Christ et al., 2018) vs deep methods (Wang et al., 2017). DeepScan applies 7-step analysis: readPaperContent on Zhou et al. (2021) → runPythonAnalysis efficiency tests → CoVe verification → GRADE synthesis. Theorizer generates hypotheses like 'hybrid SAX-shapelet features outperform deep baselines' from Lin et al. (2003) and Bagnall et al. (2016).
Frequently Asked Questions
What is Time Series Feature Extraction?
It automates deriving statistical, frequency, and shape features from time series for ML. tsfresh (Christ et al., 2018) extracts 700+ via hypothesis tests. Used in classification (Bagnall et al., 2016).
What are main methods?
Hypothesis-test features (Christ et al., 2018), symbolic SAX (Lin et al., 2003), shapelets (Bagnall et al., 2016). Deep end-to-end alternatives in Wang et al. (2017). Scalable pipelines address long sequences (Zhou et al., 2021).
What are key papers?
tsfresh (Christ et al., 2018, 1140 citations), bake off (Bagnall et al., 2016, 1318 citations), SAX (Lin et al., 2003, 1771 citations). Deep baseline (Wang et al., 2017, 1913 citations).
What open problems exist?
Scalability for multivariate long sequences (Zhou et al., 2021). Interpretable deep features (Längkvist et al., 2014). Automated relevance filtering from 1000+ candidates (Christ et al., 2018).
Research Time Series Analysis and Forecasting with AI
PapersFlow provides specialized AI tools for Computer Science researchers. Here are the most relevant for this topic:
AI Literature Review
Automate paper discovery and synthesis across 474M+ papers
Code & Data Discovery
Find datasets, code repositories, and computational tools
Deep Research Reports
Multi-source evidence synthesis with counter-evidence
AI Academic Writing
Write research papers with AI assistance and LaTeX support
See how researchers in Computer Science & AI use PapersFlow
Field-specific workflows, example queries, and use cases.
Start Researching Time Series Feature Extraction with AI
Search 474M+ papers, run AI-powered literature reviews, and write with integrated citations — all in one workspace.
See how PapersFlow works for Computer Science researchers