Weather-correction of air pollution – Application to COVID-19

[Methodology note]

Air pollution levels are generally determined by two main factors: emissions and weather conditions. The latter are known to affect air quality in various ways: dispersion, precipitation or the facilitation of chemical reactions.

This impact of weather creates uncertainty when it comes to understanding the effect of policies on air quality, or, for that matter, assessing the impact of COVID related lockdowns. Are the unusually blue skies observed during the lockdown due to the traffic reduction? Or could it be due to some exceptional weather conditions? 

To better understand what is due to weather as opposed to emissions, we, at CREA, have adopted weather-correction techniques to isolate the effects of weather and better assess the impact of other factors. The main principles of this methodology are explained below.

For more information, contact hubert@energyandcleanair.org

Weather-correction

In essence, the weather-correction technique consists in training a model linking air quality and weather data at a location of interest. It is then possible to identify and correct for the effects of weather variations. Other variables such as season and day of the week can also be included in the model to capture regular emission patterns.

Two approaches

We can use this type of model in two different ways, depending on the objective:

  • Identify long-term trends: by including a trend term (i.e. typically the date), the model can capture variations in air quality that can’t be explained by weather conditions alone. Because of regularisation and the associated need to avoid overfitting, this approach is well suited for long-term trends, but may fail to capture complex short-term variations.
  • Identify short-term anomalies: in this approach, we build a counterfactual i.e. an estimation of the air pollution level one could expect in observed weather conditions, based on past observations. The predicted levels are then compared to observed pollution levels: the anomaly is defined as the difference between the two. A negative anomaly indicates air pollution levels lower than what one would expect in observed weather conditions. In some cases, we bring these anomalies to a more natural scale by adding to the anomaly the mean of air pollution levels during the model training period.[1]
The ‘trend’ approach
Pros
  • No need to have a separate training period: the whole period where data is available can be used both for model training and for insights
  • Only significant trends are extracted, leading to a less noisy information
Cons
  • Very sudden and repeated changes are difficult to capture (typically the sharp reduction and rebound after lockdown measures were implemented and eased)
The ‘counterfactual’ approach
Pros
  • Can better capture short-term changes
Cons
  • One need to separate the period of interest from the training data (i.e. we need to set a priori what is roughly the period of interest
  • Long-term trends are ignored and may lead to bias in regions where air pollution have been steadily decreasing for other reasons than the studied phenomenon

Table 1. Pros and cons of weather-correction techniques

There are various ways to build these models. At CREA, we use tailored Gradient Boosting Machine models, building upon research from the University of York.[2]

Application: assessing COVID-19 impact

To estimate the impact of COVID-19 on air pollution, we use the so-called ‘counterfactual’ approach described above. A model with weather conditions and air pollution levels is trained on three years, from 2016-12-01 to 2019-11-30. A counterfactual is then produced by feeding the model with observed weather conditions.

The chart below shows both the counterfactual and the actually observed NO2 levels in 2020 in Wuhan, Hubei, China.

Figure 1. Observed and counterfactual NO2 levels in Wuhan in 2020

As we can see, there is very good alignment of counterfactual and observed data before any lockdown measure was implemented, raising our confidence in the model’s ability to ‘predict’ air pollution.

After lockdown, we can see the two curves diverging from each other. The counterfactual goes up, indicating that we would have expected an increase of NO2 levels in these weather conditions. On the opposite, the observed levels kept decreasing.

Two observations:

  • without weather correction, we would have been unable to safely attribute the observed decrease to COVID related measures;
  • had we simply considered the decrease in observed levels, we would have (in this very instance) underestimated the effect of COVID related measures.

In the chart below, we look at the anomaly i.e. the difference between the observed levels and the counterfactual. A negative value indicates that NO2 levels are below what would have been expected in Wuhan within these weather conditions.

Figure 2. NO2 level anomaly in Wuhan in 2020

In the absence of accurate and timely emission data, this anomaly is the best estimation we have of the COVID impact on air pollution. It can be expressed in absolute reduction terms or in percentage when brought in comparison with average levels in that city.

Discussion

The cautious use of satellite imagery

Satellite images have been widely used to illustrate the impact of COVID on air quality, and portrayed dramatic improvements in air quality in regions such as China’s North China Plain and Italy’s highly polluted Po Valley.

Sensors have made tremendous progress in terms of spectral and spatial resolutions lately, with the commissioning of TROPOMI sensor on Sentinel 5P satellite. Nevertheless, one should remain cautious when leveraging satellite imagery for such analyses, for reasons such as:

  • cloud and aerosols can alter or simply prevent air pollutant density estimation;
  • the influence of weather is not accounted for.

Seasonality

In many cities, air pollutant levels follow a seasonal pattern. In Wuhan for instance, we observe:

  • a U-shape for NO2 levels i.e. higher in winter, lower in summer;
  • an inverted U-shape for O3 levels i.e. lower in winter, higher in summer.

This illustrates the risk to unduly attribute to COVID certain trends that are actually due to other phenomena. The weather and season correction techniques described in this document are precisely attempts to account for these underlying patterns.

Scientifics from the Copernicus program itself have recently warned us against the potentially flawed interpretation of satellite imagery.[3]

Figure 3. Observed NO2 and O3 levels in Wuhan in 2015-2020

[1] Depending on the trend terms included in the model, the mean can either be a single number or the mean at this time of the year etc.

[2] Grange, S. K., & Carslaw, D. C. (2019). Using meteorological normalisation to detect interventions in air quality time series. Science of the Total Environment, 653, 578–588. https://doi.org/10.1016/j.scitotenv.2018.10.344

[3] Flawed estimates of the effects of lockdown measures on air quality derived from satellite observations | Copernicus. Retrieved July 31, 2020, from https://atmosphere.copernicus.eu/flawed-estimates-effects-lockdown-measures-air-quality-derived-satellite-observations