Skip to content

WaveSFNet: A Wavelet-Based Codec and Spatial--Frequency Dual-Domain Gating Network for Spatiotemporal Prediction

Home / Papers / WaveSFNet: A Wavelet-Based Codec and Spatial--Frequency Dual-Domain Gating Network for Spatiotemporal Prediction

WaveSFNet: A Wavelet-Based Codec and Spatial—Frequency Dual-Domain Gating Network for Spatiotemporal Prediction

Authors: Xinyong Cai, Runming Xie, Hu Chen, Yuankai Wu Date: 2026-03-24 Paper ID: arxiv:2603.23284

Summary

WaveSFNet is proposed as an efficient, recurrent-free framework for spatiotemporal prediction that addresses the loss of high-frequency details common in standard downsampling methods. The core innovation is a two-part system: a wavelet-based codec to retain high-frequency subband information during resolution changes, and a spatial-frequency dual-domain gated translator. This translator enhances dynamic features by injecting adjacent-frame differences and then fuses local spatial features with global frequency-domain modulation using gated mechanisms. Experiments show WaveSFNet achieves competitive accuracy on tasks like Moving MNIST and WeatherBench while maintaining low computational overhead.

Key Contributions

  • Introduction of a wavelet-based codec to preserve high-frequency subband cues during downsampling and reconstruction for sharper predictions.
  • Design of a dual-domain gated spatiotemporal translator that explicitly enhances dynamics via adjacent-frame differences.
  • Implementation of gated fusion between large-kernel spatial modeling and frequency-domain global modulation within the translator.
  • Achieving competitive prediction accuracy on benchmarks like Moving MNIST, TaxiBJ, and WeatherBench with low computational complexity.

Limitations

The paper focuses on efficiency and detail preservation but does not explicitly detail comparisons against state-of-the-art recurrent models or deep generative sequence models.

Open Questions & Future Work

Key Concepts

  • Wavelet-Based Codec: A codec utilizing wavelet transforms to preserve high-frequency subband information during spatial downsampling and reconstruction in spatiotemporal prediction.
  • Spatial-Frequency Dual-Domain Gating Network: A spatiotemporal translator that fuses large-kernel spatial modeling with frequency-domain modulation via gated fusion mechanisms.

Datasets

Limitations

The paper focuses on efficiency and detail preservation but does not explicitly detail comparisons against state-of-the-art recurrent models or deep generative sequence models.

Metadata & Links

url
https://arxiv.org/abs/2603.23284
paper_id
2603.23284
paper_source
arxiv
domain
computer-vision
tags
computer-visionobject-detectionimage-segmentationconvolutional-neural-networkefficient-transformer
architectures
datasets
Moving MNISTTaxiBJWeatherBench
skill
TimeSeriesSkill
created_at
2026-03-25T21:18:23Z