OPEN ACCESS

Articles online

Hybridization of Air Quality Forecasting Models Using Machine Learning and Clustering: An Original Approach to Detect Pollutant Peaks

Category: Air Pollution Modeling

Volume: 16 | Issue: 2 | Pages: 405-416
DOI: 10.4209/aaqr.2015.03.0193
PDF | RIS | BibTeX

Wani Tamas1, Gilles Notton1, Christophe Paoli 1,2, Marie-Laure Nivet1, Cyril Voyant1,3

  • 1 University of Corsica - Pasquale Paoli, UMR CNRS 6134 SPE, 20250 Corte, France
  • 2 Galatasaray University, Department of Computer Engineering, TR-34357 Istanbul, Turkey
  • 3 CHD Castelluccio, radiophysics unit, BP85 20177 Ajaccio, France

Highlights

High accuracy forecasting of air pollution peaks with machine learning methods.
3 methods: simple MLP, hybridized MLP with hierarchical and k-means clustering.
Robustness verified by multi-location and multi-pollutant (PM10, O3, NO2) study.
ROC curves used to produce a complete sensitivity analysis.
Combination of clustering and MLP improve forecasting results.


Abstract

This paper presents an original approach combining Artificial Neural Networks (ANNs) and clustering in order to detect pollutant peaks. We developed air quality forecasting models using machine learning methods applied to hourly concentrations of ozone (O3), nitrogen dioxide (NO2) and particulate matter (PM10) 24 hours ahead. MultiLayer Perceptron (MLP) was used alone, then hybridized successively with hierarchical clustering and with a combination of self-organizing map and k-means clustering. Clustering methods were used to subdivide the dataset, and then an MLP was trained on each subset. Two urban sites of Corsica Island in the western Mediterranean Sea were investigated. These models showed a good global precision (Index of Agreement reaching 0.87 for O3, 0.80 for NO2 and 0.74 for PM10). Considering it is particularly important than forecasting model used on an operational basis correctly predict pollution peaks, a sensitivity analysis was performed using Receiver Operating Characteristic curves (ROC curves). It allowed to evaluate the behaviour and the robustness of the models for high concentration situations. The results show that for PM10 and O3, hybrid models made of a combination of clustering and MLP outperform classical MLP most of the time for high concentration prediction. An operational tool has been built with the models presented in this paper, and is used for air quality forecasting in Corsica.

Keywords

Air quality forecasting ROC curve Multilayer perceptron Clustering


Related Article

Evaluation of δ13C in Carbonaceous Aerosol Source Apportionment at a Rural Measurement Site

Johan Martinsson , August Andersson, Moa K. Sporre, Johan Friberg, Adam Kristensson, Erik Swietlicki, Pål-Axel Olsson, Kristina Eriksson Stenström

Analysis of Long-Range Transport Effects on PM2.5 during a Short Severe Haze in Beijing, China

Weilin Yang, Guochen Wang, Chunjuan Bi
Volume: 17 | Issue: 6 | Pages: 1610-1622
DOI: 10.4209/aaqr.2016.06.0220
PDF
;