How is remote sensing specific to deep learning?

Or for now, why there’s still work to be done in this field

5 min readJun 15, 2024

I’m returning with a brief post designed to outline my reflections on “How Remote Sensing Constitutes a Distinct Application Domain for Deep Learning in Image Processing.” I acknowledge that my overview may not encompass every aspect.

The Earth is not flat…

I think platists don’t do remote sensing. Because anyone who has ever worked in this field knows how complicated it is to represent a non-planar surface in a suitable reference frame.
And yet, this notion of reference frame is important when preparing a database. The reference frame conditions both the way in which image queries can be made (with a given zoom level) and the way in which we wish to valorize the result. Remote sensing imagery is first and foremost geospatial information. The notion of coordinates associated with the image is just as important as the image itself.

Most of the time, mapping tools are proposed in a Mercator reference frame. But if you’re working on Antarctica, you’re in a bit of a bind when it comes to knowing whether you’re in the east or west!

Acquisition geometries are different

If you want to combine images from different sensors, you need to georeference them and, at high resolution, orthorectify them, i.e. correct the projection bias due to the fact that you are not acquiring images from a purely vertical viewing direction.
People who work with radar imagery know just how difficult it can be to superimpose geometries.

Labels are called Ground Truth… and are not easy to define

In supervised training, labels are used. Image Net type images are very widespread, so all annotation efforts can be grouped together. Remote sensing image annotation, on the other hand, is much less widespread and straightforward. Most of the time, the label is ill-defined, because we’re describing natural objects that are more complicated to define than they appear.

Here’s an example of some of the situations we encounter:
- What is a house? (is a hobbit house still a house? Is a building with a tennis court on top a building or a sports court?)
- What is a tree? (If a single trunk splits in two, is it one tree or two? Are huge bamboos trees?)
- How do you define a change in time between two dates? If an object has appeared and disappeared between t0 and tN, is it still a change? Is a change of texture or paint a change of building? The addition of a floor?

Acquisitions are “sparse”

The acquisition conditions governing remote sensing images are constrained. For example, images cannot be acquired at any heading angle, as orbits are constrained. Incidence angles are also constrained, as are passage times, time-revisit. The space that describes resolutions, pixel dimensions, frequencies… are all spaces that are described sparsely when described by the set of images available today.

Dealing with Pixels or Objects

Depending on the functions envisaged, “pixel-wise” or “object” functions may be considered. Evaluation processes depend on these choices, and approaches need to be evaluated accordingly.

A lot of dimensions

Many neural networks are dimensioned to take grayscale or 3-channel RGB images as input. How do we deal with multispectral and hyperspectral images? How to handle the complex nature of SAR images? How to manage multimodal databases?

Not only images but other heterogeneous data

Images are often conditioned by meteorological and seasonal factors. How can this kind of non-image information be added to the learning process?
There are also a number of areas in which historical archives contain valuable spatial information. How can we make the most of them?

Not classical « visible » information but different underlying physics

As we all know, radars don’t see the same things as human beings. So even the Large Language Models still have a lot to learn from the information contained in somewhat esoteric remote sensing images. If I ask Dall-e via GPT or Stable Diffusion to generate a radar image for me, at the moment, they can’t do it.

Dynamics are hudge

And it’s sometimes more important than it seems.
In 2016, Google Earth Engine offered all Sentinel-1 images in its platform. These images have undergone Thermal noise removal, Radiometric calibration and Terrain correction. But also a rather unfortunate quantization step, in which we were told that the data had been prepared after a “quantization after log scaling / clamping of the pixel values to the 1st and 99th percentile”. An outcry from radar specialists like me, who know that the extremities of radar values contain valuable statistical properties. A few years later, in 2019, they reprocess the whole collection to code them onto Float format without log scaling!

“This reasoning may be flawed, and we are open to reprocessing the collection in a different way if there are obvious issues with it.”

Some data need to be protected

Very high-resolution or radar data are invaluable to intelligence services. To process them, open source solutions are not necessarily ideal. These are areas where much remains to be done to make the most of the masses of data available…