The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series
The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series
Authors: Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mirela Tulbure, Patrick Hostert, Stefan Erasmi Date: 2026-03-25 Paper ID: arxiv:2603.24552
Summary
This study proposes a Vision Transformer approach, extended from the TSViT architecture, to classify organic versus conventional farming systems using intra-annual Sentinel-2 time series. The work examines how incorporating crop type prediction via multitask learning and varying the input spatial context (patch size) influences classification performance. While the approach successfully discriminates farming systems for certain cereal crops (F1 >= 0.8), multitask learning offered limited gains, whereas increasing the spatial context consistently improved the accuracy for both classification tasks. The results confirm feasibility but highlight significant challenges in distinguishing management practices for permanent and specialty crops.
Key Contributions
- Demonstrated the feasibility of discriminating between organic and conventional farming systems using intra-annual Sentinel-2 time series data.
- Investigated the impact of incorporating crop type classification as a concurrent task (multitask learning) alongside farming system classification, finding only limited benefit.
- Quantified the positive effect of increasing spatial context (via patch size) on the classification accuracy for both farming system and crop type discrimination.
- Achieved high F1 scores (>= 0.8) for discriminating organic/conventional farming for specific crops like winter rye, wheat, and oat, while showing poor performance for permanent grassland and specialty crops.
Limitations
Classification performance varies substantially across crop types, with poor reliability for permanent grassland, orchards, grapevines, and hops.
Open Questions & Future Work
- generalizability-crop-specific-separability
- incorporating-multi-year-data-and-rotations
- phenology-aware-time-series-sampling
- testing-larger-spatial-contexts
- transferability-of-multitask-findings
Key Concepts
- Temporo-Spatial Vision Transformer: A Vision Transformer architecture specifically designed to process and leverage spatio-temporal data, such as multi-temporal satellite imagery, by integrating spatial patch processing with temporal sequence modeling.
Datasets
Limitations
Classification performance varies substantially across crop types, with poor reliability for permanent grassland, orchards, grapevines, and hops.
Links
Metadata & Links
- url
- https://arxiv.org/abs/2603.24552
- paper_id
- 2603.24552
- paper_source
- arxiv
- domain
- computer-vision
- tags
- vision-transformermultimodaltime-seriesremote-sensingmultitask-learningpatch-sizeevaluation
- architectures
- vit
- datasets
- Sentinel-2 time series
- skill
- TimeSeriesSkill
- created_at
- 2026-03-26T07:10:46Z