Articles online

Hybridization of Air Quality Forecasting Models Using Machine Learning and Clustering: An Original Approach to Detect Pollutant Peaks

Category: Air Pollution Modeling

Volume: 16 | Issue: 2 | Pages: 405-416
DOI: 10.4209/aaqr.2015.03.0193

Export Citation:  RIS | BibTeX

Wani Tamas1, Gilles Notton1, Christophe Paoli 1,2, Marie-Laure Nivet1, Cyril Voyant1,3

  • 1 University of Corsica - Pasquale Paoli, UMR CNRS 6134 SPE, 20250 Corte, France
  • 2 Galatasaray University, Department of Computer Engineering, TR-34357 Istanbul, Turkey
  • 3 CHD Castelluccio, radiophysics unit, BP85 20177 Ajaccio, France


High accuracy forecasting of air pollution peaks with machine learning methods.
3 methods: simple MLP, hybridized MLP with hierarchical and k-means clustering.
Robustness verified by multi-location and multi-pollutant (PM10, O3, NO2) study.
ROC curves used to produce a complete sensitivity analysis.
Combination of clustering and MLP improve forecasting results.


This paper presents an original approach combining Artificial Neural Networks (ANNs) and clustering in order to detect pollutant peaks. We developed air quality forecasting models using machine learning methods applied to hourly concentrations of ozone (O3), nitrogen dioxide (NO2) and particulate matter (PM10) 24 hours ahead. MultiLayer Perceptron (MLP) was used alone, then hybridized successively with hierarchical clustering and with a combination of self-organizing map and k-means clustering. Clustering methods were used to subdivide the dataset, and then an MLP was trained on each subset. Two urban sites of Corsica Island in the western Mediterranean Sea were investigated. These models showed a good global precision (Index of Agreement reaching 0.87 for O3, 0.80 for NO2 and 0.74 for PM10). Considering it is particularly important than forecasting model used on an operational basis correctly predict pollution peaks, a sensitivity analysis was performed using Receiver Operating Characteristic curves (ROC curves). It allowed to evaluate the behaviour and the robustness of the models for high concentration situations. The results show that for PM10 and O3, hybrid models made of a combination of clustering and MLP outperform classical MLP most of the time for high concentration prediction. An operational tool has been built with the models presented in this paper, and is used for air quality forecasting in Corsica.


Air quality forecasting ROC curve Multilayer perceptron Clustering

Related Article

Topological Characterization of Haze Episodes Using Persistent Homology

Nur Fariha Syaqina Zulkepli , Mohd Salmi Md Noorani, Fatimah Abdul Razak, Munira Ismail, Mohd Almie Alias