Skip to content

Integrating Inductive Biases in Transformers via Distillation for Financial Time Series Forecasting

Home / Papers / Integrating Inductive Biases in Transformers via Distillation for Financial Time Series Forecasting

Integrating Inductive Biases in Transformers via Distillation for Financial Time Series Forecasting

Authors: Yu-Chen Den, Kuan‐Yu Chen, Kendro Vincent, Darby Tien-Hao Chang Date: 2026-03-17 Paper ID: openalex:2603.16985

Summary

This paper introduces TIPS (Transformer with Inductive Prior Synthesis), a knowledge distillation framework designed to inject inductive biases—specifically causality, locality, and periodicity—into Transformer models struggling with the non-stationarity of financial time series. TIPS trains specialized Transformer teachers using attention masking corresponding to these biases and distills their knowledge into a unified student model that adapts based on observed market regimes. The resulting model achieved state-of-the-art results across four major equity markets, significantly improving financial performance metrics while simultaneously reducing inference-time computation by 62% relative to the teacher ensembles. The analysis confirms that synthesizing complementary temporal priors is crucial for robust generalization in complex, non-stationary financial environments.

Key Contributions

  • Introduced TIPS (Transformer with Inductive Prior Synthesis), a knowledge distillation framework that injects domain-specific inductive biases (causality, locality, periodicity) into a Transformer for financial forecasting.
  • Achieved state-of-the-art performance across four major equity markets, significantly outperforming ensemble baselines on metrics like annual return (55% improvement) and Sharpe ratio (9% improvement).
  • Demonstrated that the TIPS student model requires only 38% of the inference-time computation compared to the ensemble of specialized teacher models.
  • Showed that TIPS exhibits regime-dependent alignment with classical architectures (like CNNs/RNNs) during their respective profitable periods, validating the integration of complementary temporal priors.

Limitations

The paper suggests that no single inductive bias dominates across all markets/regimes, implying the distillation process is highly dependent on the quality and selection of specialized teacher models trained on specific market dynamics.

Open Questions & Future Work

Key Concepts

  • Transformer with Inductive Prior Synthesis: A knowledge distillation framework that integrates causality, locality, and periodicity inductive biases into a unified Transformer student model for financial time series forecasting.

Limitations

The paper suggests that no single inductive bias dominates across all markets/regimes, implying the distillation process is highly dependent on the quality and selection of specialized teacher models trained on specific market dynamics.

Metadata & Links

url
https://arxiv.org/abs/2603.16985
paper_id
2603.16985
paper_source
openalex
domain
finance
tags
language-modeltransformertime-seriesforecastingfinanceknowledge-distillationattention-mechanismreasoning
architectures
transformer
datasets
skill
TimeSeriesSkill
created_at
2026-03-27T14:09:22Z