Methodology note
Air pollution levels are generally influenced by two main factors: emissions and weather conditions. Weather conditions affect air quality through dispersion, precipitation, and facilitating chemical reactions.
This impact of weather creates uncertainty when it comes to understanding the effect of policies on air quality. For instance, during COVID-related lockdowns, the observed clearer skies could be attributed to reduced traffic or exceptional weather conditions, or to a mix of both.
To differentiate the impact of weather from emissions, CREA employs weather-correction techniques to isolate weather effects and accurately assess the impact of other factors. The core principles of this methodology are outlined below.
For more information, contact hubert@energyandcleanair.org
Weather-correction
The weather-correction technique starts with the training of a model that links air quality data with weather data for a specific location.
Two approaches
We can then use this type of model in two different ways, depending on the objective:
- Identify long-term trends: by including a trend term (e.g., the date) on top of weather parameters, the model captures variations in air quality that can’t be explained by weather conditions alone. Because of regularisation and the associated need to avoid overfitting, this approach is well suited for long-term trends, but may fail to capture complex short-term variations.
- Identify short-term anomalies: in this approach, we build a counterfactual i.e. an estimation of the air pollution level one could expect in observed weather conditions, based on past observations. The predicted levels are then compared to observed pollution levels: the anomaly is defined as the difference between the two. A negative anomaly indicates air pollution levels lower than what one would expect in observed weather conditions. In some cases, we bring these anomalies to a more natural scale by adding to the anomaly the mean of air pollution levels during the model training period.[1]
The ‘Trend’ Approach
Pros:
- No need for a separate training period; the entire data period can be used for both model training and insights.
- Only significant trends are extracted, resulting in less noisy information.
Cons:
- Sudden and repeated changes, such as those following lockdown measures, are difficult to capture.
The ‘Counterfactual’ Approach
Pros:
- Better captures short-term changes.
Cons:
- Requires separating the period of interest from the training data, necessitating an a priori definition of the period of interest.
- Ignores long-term trends, which may lead to bias in regions with steadily decreasing air pollution due to factors other than the studied phenomenon.
There are various ways to build these models. At CREA, we use tailored Gradient Boosting Machine models, building upon research from the University of York.[2]
Counterfactual application: assessing COVID-19 impact on air quality
To estimate the impact of COVID-19 on air pollution, we use the so-called ‘counterfactual’ approach described above. A model with weather conditions and air pollution levels is trained on three years, from 2016-12-01 to 2019-11-30. A counterfactual is then produced by feeding the model with observed weather conditions.
The chart below shows both the counterfactual and the actually observed NO2 levels in 2020 in Wuhan, Hubei, China.
Figure 1. Observed and counterfactual NO2 levels in Wuhan in 2020
As we can see, there is very good alignment of counterfactual and observed data before any lockdown measure was implemented, raising our confidence in the model’s ability to ‘predict’ air pollution.
After lockdown, we can see the two curves diverging from each other. The counterfactual goes up, indicating that we would have expected an increase of NO2 levels in these weather conditions. On the opposite, the observed levels kept decreasing.
Two observations:
- without weather correction, we would have been unable to safely attribute the observed decrease to COVID related measures;
- had we simply considered the decrease in observed levels, we would have (in this very instance) underestimated the effect of COVID related measures.
In the chart below, we look at the anomaly i.e. the difference between the observed levels and the counterfactual. A negative value indicates that NO2 levels are below what would have been expected in Wuhan within these weather conditions.
Figure 2. NO2 level anomaly in Wuhan in 2020
In the absence of accurate and timely emission data, this anomaly is the best estimation we have of the COVID impact on air pollution. It can be expressed in absolute reduction terms or in percentage when brought in comparison with average levels in that city.
Discussion
The cautious use of satellite imagery
Satellite images have been widely used to illustrate the impact of COVID on air quality, and portrayed dramatic improvements in air quality in regions such as China’s North China Plain and Italy’s highly polluted Po Valley.
Sensors have made tremendous progress in terms of spectral and spatial resolutions lately, with the commissioning of TROPOMI sensor on Sentinel 5P satellite. Nevertheless, one should remain cautious when leveraging satellite imagery for such analyses, for reasons such as:
- cloud and aerosols can alter or simply prevent air pollutant density estimation;
- the influence of weather is not accounted for.
Seasonality
In many cities, air pollutant levels follow a seasonal pattern. In Wuhan for instance, we observe:
- a U-shape for NO2 levels i.e. higher in winter, lower in summer;
- an inverted U-shape for O3 levels i.e. lower in winter, higher in summer.
This illustrates the risk to unduly attribute to COVID certain trends that are actually due to other phenomena. The weather and season correction techniques described in this document are precisely attempts to account for these underlying patterns.
Scientifics from the Copernicus program itself have recently warned us against the potentially flawed interpretation of satellite imagery.[3]
Figure 3. Observed NO2 and O3 levels in Wuhan in 2015-2020
Notes
[1] Depending on the trend terms included in the model, the mean can either be a single number or the mean at this time of the year etc.
[2] Grange, S. K., & Carslaw, D. C. (2019). Using meteorological normalisation to detect interventions in air quality time series. Science of the Total Environment, 653, 578–588. https://doi.org/10.1016/j.scitotenv.2018.10.344
[3] Flawed estimates of the effects of lockdown measures on air quality derived from satellite observations | Copernicus. Retrieved July 31, 2020, from https://atmosphere.copernicus.eu/flawed-estimates-effects-lockdown-measures-air-quality-derived-satellite-observations