What recent developments in deep learning can we use in SAR imaging?
Part 4 — Simulation
Diffusion for SAR images
You may have heard of networks like Dall-E or Stable Diffusion, which can generate an image of your favorite president in a white down jacket, or a portrait of your pet in a Van Gogh style, from a text prompt.
You might then wonder how this relates to radar images. Well, while a lot of research has focused on generating SAR (Synthetic Aperture Radar) images using GAN-type (Generative Adversarial Networks) networks, we can now consider simulating SAR images with networks like Stable Diffusion, using one of these text-to-image approaches.
Here, for example, are the results obtained by Nicolas Trouvé (Onera-DEMR, SEM) and his team, during work initiated as part of Nathan Letheule’s thesis on SAR image generation.
To achieve these images, they used DreamBooth to teach the network the “radar” style:
DreamBooth is a recent technique in the field of artificial intelligence that allows for the customization of image generation models based on deep neural networks, such as those used by Stable Diffusion. This method enables the training of an existing model to recognize and generate images in a specific style or with specific characteristics, without losing its initial generalization capabilities.
In practice, DreamBooth uses a small number of representative images to teach the model how to handle a particular subject, object, or style. For example, by providing images of a specific type of radar, DreamBooth can help a generalist model specialize in generating radar images while maintaining its ability to perform other image generation tasks. This specificity makes DreamBooth particularly useful for applications that require fine customization without needing a large training dataset.
It is this very specificity that was utilized to generate images in the “radar” style.
The Contribution of Deep Learning to Radar Image Simulation
In reality, diffusion networks are not the only deep learning technologies for radar image simulation. Nathan Letheule’s thesis proposes various possible contributions, summarized in this diagram:
- When a physical simulator is available, i.e., one that generates a radar image based on the physical modeling of the scene and the solving of associated Maxwell’s equations, the most labor-intensive part from a manual perspective is the scene description. Typically, a simulator requires the description of inputs with several classes of materials whose dielectric permittivity, or more generally, backscattering behaviors, are known. Thus, it is conceivable to automate part of this scene description using deep learning; for example, through the automatic segmentation of the scene into different material classes based on the optical image.
- One might consider implementing an image2image translation approach from optical to radar, using a cGAN network. If training on a dataset with paired optical and radar images, the most suitable architecture is the pix2pix architecture.
In these techniques, certain limitations are encountered:
- In the first case: learning a classification into classes that make sense from an electromagnetic perspective requires having supervised learning databases… which are not always available. Moreover, at high resolutions, it becomes important to model the scene in three dimensions, especially for certain types of specific objects.
- In the cGAN-based image2image approaches, speckle and bright spots are challenging to reproduce in practice.
Among the three images shown below, the one on the left is a real image from the Sentinel-1 satellite. The middle image was generated through a seven-class segmentation and the EMPRISE physical simulator by Onera. As for the image on the right, it was produced after training a Pix2Pix architecture, transitioning from optical to radar imaging. Although the middle image may seem less accurate at first glance, it actually replicates the statistics of the SAR image more precisely, especially in terms of speckle and bright spots reproduction
These conclusions have been presented during IGARSS 2023:
Letheule, N., Weissgerber, F., Lobry, S., & Colin, E. (2023, July). Automatic simulation of SAR images: comparing a deep-learning based method to a hybrid method.
Diffusion networks are very appealing BUT they require a vast amount of data because the models have billions of parameters. Therefore, it is legitimate to want to start from networks that are already pre-trained… and today, Stable Diffusion is the ideal candidate.
What are the challenges for the realism of a SAR image?
We have understood that diffusion networks, after fine-tuning on radar images, manage very well to reproduce highly realistic speckle textures. One of the major challenges with image generation using Stable Diffusion lies in the way the model is guided to achieve the desired result. Diffusion networks, which are often paired with large language models, operate based on a specific textual prompt.
To generate realistic and complex scenes, it is essential to perfect the fine-tuning of the prompts. These play a crucial role in directing the image generation process. Take, for example, the two images below: one is a genuine image captured by the Sentinel-1 satellite, while the other is a simulation.
How to distinguish which one is real? When we ask this question on social networks (LinkedIn post), we notice that the correct answers mostly come from people who pay attention to the context. The concept of “spatial coherence” becomes relevant here, as it is crucial that certain visual logics are respected to deceive the human eye. For example:
- A road should not abruptly stop in the middle of nowhere.
- It is unusual to find a skyscraper isolated in the countryside.
- A bridge should span a highway or a river, not emerge in the middle of a field.
Regarding the two images I previously shared, the image on the right is the actual satellite image of the city of Rouen, France. In this image, the large black expanse represents the Seine, the famous French river.
On the left, the AI-generated image also presents an urban area. The small white dots visible could actually be the bridges spanning the river. But it is an unusual arrangement where several bridges seem to lead to a strangely empty southern part. It would be the same problem if the river were a road. This anomaly illustrates a gap in the model’s understanding of geographic and urban context.
In summary, the success of generating realistic images by AI strongly depends on our ability to guide the model with precise and contextually appropriate prompts. This example highlights how artificial intelligence, beyond its ability to replicate shapes and textures, must also learn to synthesize a variety of concepts: in addition to the specifics of speckle in radar images and the subtleties of image processing such as contrast adjustment, we must “learn” the natural composition of a region, the organization of urban spaces, etc.
Thank you to everyone who participated in this revealing experiment! The ongoing advances in the field promise to continually improve our ability to generate ever more faithful and contextually appropriate images.