Emissions from above

Air pollution is harmful to the environment and human health. Nearly 7 million people die prematurely each year due to air pollution, with 99% of the global population living in areas where air pollution exceeds safe levels.

Power plants, metal smelters and other industrial facilities emit large quantities of harmful pollutants, including nitrogen oxides (NOx). In many countries, effective emission monitoring is missing, which makes it difficult to enforce air pollution regulations or to estimate the facility’s true health impact. Without emission data, large polluters easily get off the hook.

Exploiting new technologies

We wish to leverage satellite technology together with machine learning to estimate NOx emissions from large polluters worldwide, where no other emission data exists or the data is unreliable. State-of-the-art air quality satellites provide measurements of NO2 at a granularity of a few kilometers, sufficient to detect large individual sources. Machine learning techniques have been successfully applied in estimating emissions of carbon dioxide and methane on a power plant level (Jain, 2022.; Joyce et al. 2023; Hobbs et al., 2023), and first attempts for estimating NOx emissions (Alnaim et al. , 2022) are promising.

The Challenge

In this challenge, the aim is to estimate NOx emissions from power plants. After NOx is released to the atmosphere (i.e. emission), it gets dispersed by wind and may be removed from the air through various processes. All of the atmospheric processes, together with the different sources, determine the NOx concentration at a particular location and time. This is what the satellite can measure: the total amount of a pollutant in the atmosphere. Large sources create a strong plume that can be recognized in satellite images (see figure below). The task is to link the satellite measured NO2 to the emitted NOx.

Figure 1 – Example of NOx vertical column density measured by Sentinel 5P TROPOMI on 6th August 2023 over western Java, Indonesia.

Get started!

Apply machine learning techniques to determine NOx emissions for given power plants. We provide a dataset with satellite measurements together with weather and ground truth emission data. Although the aim is to estimate NOx emissions for power plants for which no other emission data is available, power plants with ground truth data are used to facilitate training and validation. No prior knowledge on atmospheric science is needed – we have prepared the dataset by sampling for good quality data at the power plant locations, and selected weather variables that are relevant.

Frequently Asked Questions

As we are interested in emissions from power plants that we do not have ground truth data for, you should use different power plants for training/testing/validation. Good metrics of performance are correlation and mean average error (MAE) or root mean squared error (RMSE). Furthermore, you could also consider the evaluation metric for the time series of the individual power plants, to see if certain power plants are performing exceptionally well/poorly.

A Read Me -document accompanies the data. The document describes all variables, gives their physical units and explains abbreviations.

The signals in the satellite data get stronger when averaging over longer time periods, and the effect of individual weather events becomes smaller. Try starting with yearly averages.

To simplify the data, start with using only the data point closest to the power plant. You can consider the spatial dimension later.

Until now, NOx emissions using satellite data have mainly been estimated using physical approaches that rely on simplifications of the problem (for example Lorente et al., 2019 and Beirle et al., 2019). In favorable conditions, these techniques offer reasonably good accuracy, however they often lack either spatial or temporal granularity, hindering their effectiveness in emission monitoring. Additionally, correctly attributing emissions to point sources close to each other remains an open challenge. 

Another conventional approach uses sophisticated and computationally expensive inversion modelling techniques (e.g. Miyazaki et al., 2017). These models often require integration with chemical transport models and are more apt for creating comprehensive emission inventories rather than pinpointing emissions from individual sources.

To our knowledge, only Alnaim et al. (2022), He et al. (2022), and Xing et al. (2022) have published machine learning -based methods for estimating NO2/NOx emissions utilizing satellite data. Among these, only the work by Alnaim et al. specifically addresses point source emissions.

Measuring trace amounts of air pollution from a satellite flying hundreds of kilometers away is very challenging, and in many conditions impossible with modern technology. Most gaps in the data are caused by clouds blocking the signal, but also some surface types or sunglint can hinder measurements. In the given dataset the satellite data is filtered for bad quality data, and therefore sometimes large and persisting gaps occur.

NOx encompasses both NO2 (nitrogen dioxide) and NO (nitric oxide). The goal is to estimate the NOx emission, which is the total emission of all nitrogen oxides. However, only NO2 is available from satellite observations. The NOx-to-NO2 -ratio varies depending on atmospheric conditions (e.g. concentrations of other chemical species and temperature, that determine the NOx chemistry in the air).

Good metrics to evaluate the performance of your solution would be correlation and RMSE. Furthermore, as we are interested in emissions from power plants that we do not have ground truth data for, we suggest using different power plants for training/testing/validation, and to also consider the evaluation metric for the time series of the individual power plants.

Good metrics to evaluate the performance of your solution would be correlation and RMSE. Furthermore, as we are interested in emissions from power plants that we do not have ground truth data for, we suggest using different power plants for training/testing/validation, and to also consider the evaluation metric for the time series of the individual power plants.