From blender to farm: Transforming controlled environment agriculture with synthetic data and SwinUNet for precision crop monitoring
Abstract
The aim of this study was to train a Vision Transformer (ViT) model for semantic segmentation to differentiate between ripe and unripe strawberries using synthetic data to avoid challenges with conventional data collection methods. The solution used Blender to generate synthetic strawberry images along with their corresponding masks for precise segmentation. Subsequently, the synthetic images were used to train and evaluate the SwinUNet as a segmentation method, and Deep Domain Confusion was utilized for domain adaptation. The trained model was then tested on real images from the Strawberry Digital Images dataset. The performance on the real data achieved a Dice Similarity Coefficient of 94.8% for ripe strawberries and 94% for unripe strawberries, highlighting its effectiveness for applications such as fruit ripeness detection. Additionally, the results show that increasing the volume and diversity of the training data can significantly enhance the segmentation accuracy of each class. This approach demonstrates how synthetic datasets can be employed as a cost-effective and efficient solution for overcoming data scarcity in agricultural applications.
- Publication:
-
PLoS ONE
- Pub Date:
- April 2025
- DOI:
- Bibcode:
- 2025PLoSO..2022189A