TLDR: This research introduces Spatio-Temporal Multivariate Time Series Forecast with Chosen Variables (STCV), a new problem focused on optimally selecting a subset of ‘m’ variables (e.g., sensor locations) from ‘n’ total variables to maximize forecast accuracy under budget constraints. The proposed solution, Variable-Parameter Iterative Pruning (VIP), is a unified framework that jointly performs variable selection and model optimization. VIP uses masked variable-parameter pruning to reduce model complexity, dynamic extrapolation to infer values for unselected variables, and prioritized variable-parameter replay to prevent catastrophic forgetting. Experiments show VIP significantly outperforms baselines in accuracy and efficiency, demonstrating strong scalability for large-scale systems.
Spatio-Temporal Multivariate Time Series Forecast (STMF) is a crucial area of study that involves predicting future values of multiple variables distributed across space, using their past observations. This has wide-ranging applications, from forecasting road traffic to predicting air pollution levels. Traditionally, STMF models assume that data from all relevant locations are available for both training and real-time forecasting. However, real-world scenarios often present a significant challenge: the problem of missing variables.
Addressing the Challenge of Missing Variables
In many practical applications, such as deploying sensors for traffic monitoring or air quality assessment, budget constraints mean that it’s often impossible to place sensors at every single location of interest. This leads to a situation where the number of available sensors (m) is far less than the total number of locations (n) that need to be monitored. Previous research has acknowledged this issue, focusing on how to build forecast models when a subset of variables is permanently missing from the input. However, a critical question remained unanswered: how to optimally choose which ‘m’ locations should receive sensors to maximize the accuracy of the overall forecast.
This is the new problem introduced by recent research: STMF with Chosen Variables (STCV). The goal is to intelligently select the ‘m’ variables for model input to achieve the best possible forecast accuracy for all ‘n’ locations. Imagine a city with hundreds or thousands of intersections. Deploying expensive permanent traffic sensors at every single one is cost-prohibitive. Instead, temporary sensors might be used to collect comprehensive training data initially, but for ongoing operations, only a limited number of permanent sensors can be deployed. The challenge then becomes deciding where to place these permanent sensors for the most effective prediction.
The problem presents several complexities. First, traffic or air quality patterns can vary significantly across different locations, making a uniform selection strategy ineffective. Second, the sheer number of possible combinations for selecting ‘m’ variables from ‘n’ is astronomically large, making exhaustive search impossible. Third, the scale of data can be immense, requiring efficient models. Finally, as variables are selected and removed during the learning process, models risk ‘catastrophic forgetting,’ losing previously learned patterns.
Introducing VIP: A Unified Framework for Optimal Selection and Forecasting
To tackle the STCV problem, researchers have proposed a novel model-learning framework called Variable-Parameter Iterative Pruning (VIP). This framework offers a unified approach that jointly performs variable selection and model optimization, considering both forecast accuracy and computational efficiency. VIP is built upon three innovative technical components:
The first component is Masked Variable-Parameter Pruning. This mechanism progressively identifies and eliminates less informative variables (locations) and attention parameters within the model. It uses learnable masks that are iteratively optimized, effectively reducing the number of input variables and the model’s internal complexity. This not only improves forecast accuracy by focusing on the most relevant data but also significantly reduces the model’s memory footprint and inference time, making it scalable for large systems.
The second component is Dynamic Extrapolation. Even with a reduced set of input variables, the model still needs to forecast values for all ‘n’ locations. Dynamic extrapolation addresses this by propagating information from the selected variables to all other variables. It achieves this by computing a similarity-based attention matrix over variable embeddings and fusing it with the spatial adjacency information (like road networks). This creates a ‘global information bridge,’ ensuring accurate forecasting even when many variables are not directly observed.
The third component is Prioritized Variable-Parameter Replay. To prevent catastrophic forgetting—where the model might lose valuable learned knowledge as variables and parameters are pruned—VIP employs a prioritized replay strategy. This involves buffering and reusing past training samples that yielded low loss. By prioritizing these informative samples, the model can preserve learned patterns, ensuring stable optimization and convergence throughout the iterative pruning process.
Also Read:
- OneCast: A Structured Approach to Cross-Domain Time Series Forecasting
- Variational Pólya Tree: Bridging Bayesian Nonparametrics with Deep Generative Models
Key Findings and Impact
Experiments conducted on five real-world datasets, including four road traffic datasets (METR-LA, PEMSBAY, PEMS04, PEMS08) and an air quality dataset (AQI), demonstrate the significant effectiveness of the VIP framework. The VIP model consistently outperforms state-of-the-art baseline models in forecast accuracy, showing improvements of up to 30% in some metrics. Furthermore, VIP exhibits superior efficiency, achieving high accuracy with relatively low computational cost, even when considering the pre-training phase.
A qualitative case study using the METR-LA dataset visually illustrates VIP’s intelligent location selection. The model strategically chooses sensor locations that effectively cover major junctions and areas with high traffic variability, such as the US-101 corridor in Los Angeles, which connects central areas to Burbank. This spatially diverse selection strategy is key to maintaining high accuracy even with a limited sensor deployment ratio (e.g., 10%).
Perhaps one of the most compelling findings is VIP’s exceptional scalability. As the total number of variables increases from 100 to 100,000, VIP demonstrates significantly better memory efficiency and inference speed compared to existing models. For instance, at 100,000 variables, VIP’s model size is substantially smaller, and its inference latency remains manageable, while other models experience out-of-memory errors. This makes VIP highly suitable for real-time spatio-temporal forecasting in very large-scale systems.
In conclusion, the VIP framework introduces a groundbreaking solution to the problem of spatio-temporal multivariate time series forecast with chosen variables. By jointly optimizing variable selection and parameter reduction through iterative pruning, dynamic extrapolation, and prioritized replay, VIP achieves superior forecast accuracy and efficiency, paving the way for more practical and cost-effective deployment of sensing applications. You can read the full research paper here.


