Skip to content

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Home / Papers / The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Authors: Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mirela Tulbure, Patrick Hostert, Stefan Erasmi Date: 2026-03-25 Paper ID: arxiv:2603.24552

Summary

This study proposes a Vision Transformer approach, extended from the TSViT architecture, to classify organic versus conventional farming systems using intra-annual Sentinel-2 time series. The work examines how incorporating crop type prediction via multitask learning and varying the input spatial context (patch size) influences classification performance. While the approach successfully discriminates farming systems for certain cereal crops (F1 >= 0.8), multitask learning offered limited gains, whereas increasing the spatial context consistently improved the accuracy for both classification tasks. The results confirm feasibility but highlight significant challenges in distinguishing management practices for permanent and specialty crops.

Key Contributions

  • Demonstrated the feasibility of discriminating between organic and conventional farming systems using intra-annual Sentinel-2 time series data.
  • Investigated the impact of incorporating crop type classification as a concurrent task (multitask learning) alongside farming system classification, finding only limited benefit.
  • Quantified the positive effect of increasing spatial context (via patch size) on the classification accuracy for both farming system and crop type discrimination.
  • Achieved high F1 scores (>= 0.8) for discriminating organic/conventional farming for specific crops like winter rye, wheat, and oat, while showing poor performance for permanent grassland and specialty crops.

Limitations

Classification performance varies substantially across crop types, with poor reliability for permanent grassland, orchards, grapevines, and hops.

Open Questions & Future Work

Key Concepts

  • Temporo-Spatial Vision Transformer: A Vision Transformer architecture specifically designed to process and leverage spatio-temporal data, such as multi-temporal satellite imagery, by integrating spatial patch processing with temporal sequence modeling.

Datasets

Limitations

Classification performance varies substantially across crop types, with poor reliability for permanent grassland, orchards, grapevines, and hops.

Metadata & Links

url
https://arxiv.org/abs/2603.24552
paper_id
2603.24552
paper_source
arxiv
domain
computer-vision
tags
vision-transformermultimodaltime-seriesremote-sensingmultitask-learningpatch-sizeevaluation
architectures
vit
datasets
Sentinel-2 time series
skill
TimeSeriesSkill
created_at
2026-03-26T07:10:46Z