What recent developments in deep learning can we use in SAR imaging, in a very simple way?
Part 1 — Segmentation
The current efforts in AI make it extremely challenging to maintain competitiveness in the quest for superior algorithm performance, which often requires huge datasets and significant computing power. In this context, SAR images remain a relatively unexplored area. Instead of starting from scratch in this remote sensing field, it is worth considering how best to leverage recent AI advances, particularly through fine-tuning methods. Rather than training a new network from scratch, we can re-use an existing one, thus optimizing performance while saving time and resources.
Here I share my latest findings on how recent deep learning tools could help us advance our research.
Note that my goal is not to develop new artificial intelligence tools, I am simply a user. My core expertise remains the understanding of the content of images, for the inversion of physical parameters, for the search of new contexts of use, for the proposal of new acquisition modalities, etc. I view recent AI advancements as great toolboxes for achieving these objectives.
My blog posts focus on four main functionalities:
- segmentation
- detection of particular objects.
- simulation of SAR images
- coregistration
Segmentation
The SAM (Segment Anything Model) tool has just been in the news. It is an image segmentation model published by Meta’s FAIR lab. It is based on fundamental machine learning models, in particular Transformer vision models.
From convolutional networks to Vision Transformers.
Transformer techniques were originally developed for natural language processing tasks. More recently, work has focused on their use in computer vision, and in 2020, Google Research published the Vision Transformer (ViT) model with results that compete with state-of-the-art models in image recognition (in particular, Convolutional Neural Networks such as U-Net).
Transformer neural networks are models that transform a given sequence of elements, such as the sequence of words in a sentence, into another sequence of variable length. They are inspired by recurrent networks. But instead of having an encoding layer and a decoding layer, the encoding part consists of a stack of encoders and a stack of decoders of the same size. Furthermore, the architecture of the network relies on an attention mechanism that allows to decide, at each step of the model’s operation, which parts of the input sequence are important when it is processing the output sequence.
The principle used to adapt a Transformer network to images is to divide an image into several patches, which will then be processed in the same way as words. The authors wanted to get as close as possible to the original Transformer model, in order to benefit from the scaling of the model and its very powerful implementations.
To encode the image, SAM uses a Transformer vision model, in this case a pre-trained MAE vision transformer (ViT), which has been specially adapted to handle high-resolution inputs. This sophisticated model architecture allows SAM to segment images with high accuracy and speed, while being very flexible to adapt to a variety of computer vision tasks.
Can it be applied to radar images?
SAM is already producing interesting results, particularly for object selection. I have tested it on temporally colored images (using the REACTIV algorithm) to select agricultural fields in Sentinel-1 images, or polarimetrically colored images (see my last post), and oil-slick detection.
Selecting agricultural stands
From a Sentinel-1 time series, the REACTIV visualization algorithm can almost instantly obtain a colored image that highlights the agricultural plots with a strong saturation, unlike the surrounding pixels that are in grayscale. This image can then be injected directly into SAM.
With a single click, it is immediate to select these cultivated plots.
Be careful: the results do not depend only on the resolution! But rather on the size of the image footprint. If you want to detect small objects, don’t expect to do it in one step on the whole area of your image! But do a pre-crop inside.
Selecting buildings in building areas
I had explained in a previous post how to make colored compositions from a time series (VV,VH) or (HH,HV). There again, this color composition allows to highlight with saturated colors the manufactured objects. Imported in SAM, it is very easy to select a particular building.
As before, if we do not guide the segmentation to a specific object, the size of the selected objects depends on the size of the image footprint.
Selecting oil spills
To convince myself that the algorithm remains efficient even in the absence of color and regular contours, I tested it on an image of oil slicks, directly from an article that seeks to detect them:
Derrode, S., & Mercier, G. (2007). Unsupervised multiscale oil slick segmentation from SAR images using a vector HMC model. Pattern Recognition, 40(3), 1135–1147.
The result is immediate. On the left is the original SAR ENVISAT image. On the right, the result of the detection from the click whose location appears as a small blue dot.
Tests could be performed extremely quickly with SAM, without any adaptation of the initial network. The results are very encouraging and allow to consider a multitude of applications for further processing. Of course, to adapt it to a particular use case, fine-tuning seems very promising!