Spaces:
Running
🌴 Evaluating TerraMind's Thinking-in-Modality's capability for oil palm segmentation
Background
If you like chocolate spread on your toast bread, chance is you consume palm oil. These oils are produced from oil palms, often grown in areas used to be rainforest, where cloud persists. Stakeholders interested in monitoring these palms, e.g.:
- Environmental NGOs
- Local governments and law enforcement
- Sustainability certifiers
- Service providers for the upcoming European Union's Deforestation Regulation compliance monitoring
are often work in a conventional way, relying on cloud-free optical images. This hinders frequent monitoring.
TerraMind foundation model, with its Thinking-in-Modality (TiM), can potentially be used to overcome the persistent cloud-cover problem in oil palm monitoring from optical images, by generating an intermediate Synthetic Aperture Radar (SAR) modality. SAR images are less prone to clouds and have strong signal from palms' unique canopy-shape, making it easier to differentiate oil palm from surrounding non-palm vegetation.
How we used TerraMind
Most of TerraMind's demonstrations have been focusing on objects that are clearly distinguishable on optical images, such as burn scars and flooded areas. We evaluated TerraMind's TiM's capability to segment oil palm from optical Sentinel-2 RGB images, as they looked as green as other vegetation. We obtained the oil palm mask and followed image preprocessing steps from Descals et al. (2021). Experiments were conducted with frozen terramind_v1_base_tim backbone, and we only fine-tuned the decoder. For comparison, we also trained U-Net baselines with Sentinel-2 RGB and Sentinel-1 SAR images. Models trained on optical images were evaluated on clear and cloudy test sets.
Results
Accuracy metrics on the test set
| Model | Input | F1-score | mIoU |
|---|---|---|---|
| U-Net | Sentinel-2 RGB (clear) | 0.89 | 0.51 |
| U-Net | Sentinel-2 RGB (cloudy) | 0.79 | 0.29 |
| U-Net | Sentinel-1 SAR | 0.95 | 0.74 |
| TerraMind TiM base | Sentinel-2 RGB (clear) | 0.92 | 0.81 |
| TerraMind TiM base | Sentinel-2 RGB (cloudy) | 0.88 | 0.74 |
Snapshots of model prediction on test images:



Lesson learned and future outlook
- Cloudy optical images are suboptimal to be used with the U-Net baseline
- However, TerraMind's with SAR TiM seems to improve the model performance despite cloud covers
- We plan to test a larger version of TiM model and extend our evaluation to different regions
- A little bit of a stretch, but it would be great to lower the barrier for stakeholders to use this fancy method in monitoring oil palm, by integrating the model into a desktop GIS plugin that is able to produce oil palm segkents when fed with cloudy, near-real-time optical images. The base model inference should be okay on a consumer-grade computers.
Authors
Afif Fauzan, student, University of Tartu (fauzan [at] ut [dot] ee)
Deha Umarhadi, student, Ludwig-Maximilians-Universität München (deha.agus.u [at] mail [dot] ugm [dot] ac [dot] id)
Acknowledgement
This is an extension of an earlier study supervised by Iris van Duren and Raian Vargas Maretto. We thank Serkan Girgin for recovering the data used in the study.
Relevant references:
- Earlier study blogpost on Big Geodata Story and Medium
- Github repository
- Descals et al. (2021). High-resolution global map of smallholder and industrial closed-canopy oil palm plantations. ESSD.