WaveSFNet: A Wavelet-Based Codec and Spatial--Frequency Dual-Domain Gating Network for Spatiotemporal Prediction
WaveSFNet: A Wavelet-Based Codec and Spatial—Frequency Dual-Domain Gating Network for Spatiotemporal Prediction
Authors: Xinyong Cai, Runming Xie, Hu Chen, Yuankai Wu Date: 2026-03-24 Paper ID: arxiv:2603.23284
Summary
WaveSFNet is proposed as an efficient, recurrent-free framework for spatiotemporal prediction that addresses the loss of high-frequency details common in standard downsampling methods. The core innovation is a two-part system: a wavelet-based codec to retain high-frequency subband information during resolution changes, and a spatial-frequency dual-domain gated translator. This translator enhances dynamic features by injecting adjacent-frame differences and then fuses local spatial features with global frequency-domain modulation using gated mechanisms. Experiments show WaveSFNet achieves competitive accuracy on tasks like Moving MNIST and WeatherBench while maintaining low computational overhead.
Key Contributions
- Introduction of a wavelet-based codec to preserve high-frequency subband cues during downsampling and reconstruction for sharper predictions.
- Design of a dual-domain gated spatiotemporal translator that explicitly enhances dynamics via adjacent-frame differences.
- Implementation of gated fusion between large-kernel spatial modeling and frequency-domain global modulation within the translator.
- Achieving competitive prediction accuracy on benchmarks like Moving MNIST, TaxiBJ, and WeatherBench with low computational complexity.
Limitations
The paper focuses on efficiency and detail preservation but does not explicitly detail comparisons against state-of-the-art recurrent models or deep generative sequence models.
Open Questions & Future Work
Key Concepts
- Wavelet-Based Codec: A codec utilizing wavelet transforms to preserve high-frequency subband information during spatial downsampling and reconstruction in spatiotemporal prediction.
- Spatial-Frequency Dual-Domain Gating Network: A spatiotemporal translator that fuses large-kernel spatial modeling with frequency-domain modulation via gated fusion mechanisms.
Datasets
Limitations
The paper focuses on efficiency and detail preservation but does not explicitly detail comparisons against state-of-the-art recurrent models or deep generative sequence models.
Links
Metadata & Links
- url
- https://arxiv.org/abs/2603.23284
- paper_id
- 2603.23284
- paper_source
- arxiv
- domain
- computer-vision
- tags
- computer-visionobject-detectionimage-segmentationconvolutional-neural-networkefficient-transformer
- architectures
-
- datasets
- Moving MNISTTaxiBJWeatherBench
- skill
- TimeSeriesSkill
- created_at
- 2026-03-25T21:18:23Z