Home / Papers / Integrating Inductive Biases in Transformers via Distillation for Financial Time Series Forecasting

Integrating Inductive Biases in Transformers via Distillation for Financial Time Series Forecasting

Authors: Yu-Chen Den, Kuan‐Yu Chen, Kendro Vincent, Darby Tien-Hao Chang Date: 2026-03-17 Paper ID: openalex:2603.16985

Summary

This paper introduces TIPS (Transformer with Inductive Prior Synthesis), a knowledge distillation framework designed to inject inductive biases—specifically causality, locality, and periodicity—into Transformer models struggling with the non-stationarity of financial time series. TIPS trains specialized Transformer teachers using attention masking corresponding to these biases and distills their knowledge into a unified student model that adapts based on observed market regimes. The resulting model achieved state-of-the-art results across four major equity markets, significantly improving financial performance metrics while simultaneously reducing inference-time computation by 62% relative to the teacher ensembles. The analysis confirms that synthesizing complementary temporal priors is crucial for robust generalization in complex, non-stationary financial environments.

Key Contributions

Introduced TIPS (Transformer with Inductive Prior Synthesis), a knowledge distillation framework that injects domain-specific inductive biases (causality, locality, periodicity) into a Transformer for financial forecasting.
Achieved state-of-the-art performance across four major equity markets, significantly outperforming ensemble baselines on metrics like annual return (55% improvement) and Sharpe ratio (9% improvement).
Demonstrated that the TIPS student model requires only 38% of the inference-time computation compared to the ensemble of specialized teacher models.
Showed that TIPS exhibits regime-dependent alignment with classical architectures (like CNNs/RNNs) during their respective profitable periods, validating the integration of complementary temporal priors.

Limitations

The paper suggests that no single inductive bias dominates across all markets/regimes, implying the distillation process is highly dependent on the quality and selection of specialized teacher models trained on specific market dynamics.

Open Questions & Future Work

Key Concepts

Transformer with Inductive Prior Synthesis: A knowledge distillation framework that integrates causality, locality, and periodicity inductive biases into a unified Transformer student model for financial time series forecasting.

Integrating Inductive Biases in Transformers via Distillation for Financial Time Series Forecasting

Integrating Inductive Biases in Transformers via Distillation for Financial Time Series Forecasting

Summary

Key Contributions

Limitations

Open Questions & Future Work

Key Concepts

Limitations

Links

Metadata & Links