Particulate Matter Estimation from Photochemistry : A Modelling Approach Using Neural Networks and Synoptic Clustering

We report on the development and validation of a neural network (NN) model of PM10 concentrations in terms of photochemical measurements of NO, NO2 and O3 and temporal parameters that include the day of the week and the day of the year with its sinusoidal variation. A long-term record (≈10 yr) from 2001–2012 (inclusive) assembled from measurements taken at 10 station nodes in the air quality monitoring network of the Greater Athens Area in Greece has been used. Eight synoptic categorizations of the circulation at 850 hPa were used to partition the data record, and to train individual NNs with Bayesian regularization using 90% of available data for different atmospheric conditions. The time series of PM10 estimates was then reconstructed from the partitioned output. As a control, a NN without synoptic clustering was trained on the same data. The remaining 10% of the data was used for testing the simulation performance. NN models with synoptic clustering achieved an average root mean square error (RMSE) ≈ 16 μg m across the station nodes with an average index of agreement (IA) of 0.71 (somewhat better than the control network whose performance statistics were RMSE ≈ 17 μg m and IA = 0.61, respectively). For routine measurements below the EU Air Quality Directive limit value of 50 μg m, the average error is as low as RMSE ≈ 11 μg m across the station nodes. NN models were found to strongly outperform analogous MLR models over all station nodes.


INTRODUCTION
Epidemiological studies have established a clear association between human exposure to ambient air pollution and the risk of increased mortality (Dockery and Pope, 1994;Boezen et al., 1999;Samet et al., 2000;Davidson et al., 2005;Kassomenos et al., 2008;Pope III et al., 2009;Rückerl et al., 2011;Raaschou-Nielsen et al., 2013).In 2012 alone, ambient air pollution was responsible for 3.7 million deaths; 6.7% of total deaths that year (Brauer et al., 2012).One of the most important components of air pollution is particulate matter (PM) having aerodynamic diameters up to 10 µm (PM 10 ) as this embraces both anthropogenic and natural pollutants (IPCC, 2013).While the highest concentrations of PM tend to be measured next to busy roads in megacities, the observation of relatively high values of PM 10 values at urban background stations (Vardoulakis and Kassomenos, 2008), suggests that large proportions of the population are being exposed to this type of ambient air pollution (Dimitriou and Kassomenos, 2014).In addition to the important impact of PM on human health, high PM concentrations are known to be harmful to ecosystems (Grantz et al., 2003) and strongly affect the balance between air quality and climate change (Unger et al., 2010;Tai et al., 2012;Barnes et al., 2013;Hedegaard et al., 2013;Mues et al., 2013).The European Commission has issued legislation (Directive 2008/50/EC) involving limit values to help control the level of PM.Two limit values have been set for PM 10 : i) the daily mean concentration of 50 µg m -3 should not to be exceeded more than 35 times per year and ii) the annual mean concentration should not exceed 40 µg m -3 , and a "health-based" target value has been set for PM 2.5 with an annual limit of 25 µg m -3 .It should be noted that there is currently no regulation for PM 1 despite calls by the scientific community (Gerasopoulos et al., 2007).As we write, 195 countries are attending the 2015 UN Climate Change Conference (UNCCC) in the 11th session of the meeting of the parties to the 1997 Kyoto Protocol and 166 countries have submitted national commitments to reduce their CO 2 emissions.Importantly, PM is absent from the agenda.This is despite assessments of the International Panel on Climate Change (IPCC, 2013) who have called for expansion of global air quality monitoring of the spatiotemporal distribution of PM.
PM levels show a great deal of spatial variability at the regional level (Hoek et al., 1997;Kassomenos et al., 2014) and also between and within large urban areas (Eeftens et al., 2012;Van Poppel et al., 2013;Kassomenos et al., 2014).However, the sparsity of ground-based monitoring stations has cast doubt on the generality of results obtained from point sampling and has motivated a shift in the last decade towards satellite mapping of surface PM concentrations estimated from aerosol optical depth (AOD) retrieved by space sensors (Hoff and Christopher, 2009), from LIDAR vertical profile information (Boyouk et al., 2010;Zeeshan and Oanh, 2014), or from simulations performed by global atmospheric chemistry models (Liu et al., 2004;van Donkelaar et al., 2006van Donkelaar et al., , 2010)).The problem is that satellite estimates of surface air pollution need to be validated in pixels where coincident ground measurements exist.Indeed, the Surface PARTiculate mAtter Network (SPARTAN) described by Snider et al. (2015) has been initiated for precisely this purpose.In the context of satellite-derived PM models, it is important to note that the prediction of PM from AOD is accurate to only ≈ 30% (Hoff and Christopher, 2009) and that the precision of the measurement of AOD itself is estimated to be only ≈ 20% accurate.Furthermore, a substantial range of skill has been observed in linear regression models of estimated PM (obtained from AOD retrievals) on measured PM values, with correlation coefficients (R) varying greatly (0.4 ≤ R ≤ 0.8) both regionally and also with aerosol type (Engel-Cox et al., 2004).As a result, such high levels of uncertainty have so far hampered the incorporation of such models in operational satellite retrieval of PM 10 and/or PM 2.5 concentrations.Given that the spatial resolution of ground-based PM measurements is low and inhomogeneous, and that this limits the capacity for validating satellite-derived estimates, there is a need to supplement the SPARTAN network of ground-based measurements with independent estimates of PM.In this paper, we therefore develop and validate a data-driven approach for estimating PM 10 directly from photochemical measurements and meteorological parameters since these are both more spatially representative and are also routinely recorded by station nodes in existing air quality monitoring networks .
In the context of meteorology, evidence is also growing with respect to the impact of atmospheric circulation and synoptic conditions on PM; in particular the role of longrange transport on elevated PM 10 levels.In a study of PM in European capitals, Kukkonen et al. (2005) concluded that the vast majority of cases where PM 10 values exceeded the EC Directive were related to the prevalence of specific meteorological conditions including high pressure systems and temperature inversions.A more important example is the appearance of extreme smog episodes in the Greater Athens Area (GAA) in Greece which has been found to be associated with stagnating and re-circulating air masses (Vardoulakis and Kassomenos, 2008).In a landmark study involving a statistical determination of the variables that Granger causes the variability in PM 10 in the GAA, Sfetsos and Vlachogiannis (2010) established causal relationships between the prevailing weather conditions and the observation of elevated values of PM 10 .This conclusion is also confirmed by the study of Yuval et al. (2012) who found that general levels of air pollutants and their spatial distribution are determined by the state of the atmosphere with most of the variability being associated with the atmospheric synoptic scale.These studies suggest that synoptic conditions should be a key element in the development of models of PM.For a survey of the role of meteorology on different PM size fractions (PM 10 , PM 2.5 and PM 2.5-10 ) in the GAA used as the study region in this paper, we refer the reader to Pateraki et al. (2012).The GAA is an important location for air pollution studies since it is situated at an important global air pollution cross-road, and has a well established air quality monitoring network contributing a long-term (decadal) data record of measurements of meteorological parameters and atmospheric chemistry.While not meeting the population criteria of a megacity per se, pollutant concentrations in Athens have been found to rival cities having tens of millions of inhabitants (Kanakidou et al., 2011) and have emerged as a result of the impact of the 2008 financial crisis on household fuel bills and wood burning during the winter months (Vrekoussis et al., 2013).
In constructing a model of PM in terms of chemical, meteorological and/or temporal parameters, it is important to understand how PM co-varies (i.e., correlates) with each of them.In relation to chemistry, a 2003 study in the GAA (Chaloulakou et al., 2003a) found that annual Pearson product-moment correlations (R) between PM 10 concentrations and routinely measured chemical variables were strong in the case of CO (0.71 ≤ R ≤ 0.72), NO and NO x (0.64 ≤ R ≤ 0.69), moderate for SO 2 and NO 2 (0.49 ≤ R ≤ 0.6), weak to moderate for surface temperature (0.39 ≤ R ≤ 0.46), and moderately-negative (i.e., inversely correlated) for local wind speed (-0.43 ≤ R ≤ -0.54).The measurement of the variation of PM 10 in 31 Chinese provincial capital cities using data from 286 monitoring sites (Xie et al., 2015) found that the pairwise correlation between PM 10 and O 3 was weak but that the correlation with CO was unstable (highly variable).These studies suggest that PM 10 is dependent upon photochemical "markers" of ambient air pollution and fairly robust to standard meteorological parameters.In a very comprehensive study, Kukkonen et al. (2003) used 4 temporal variables (the hour, the day of the week and the sine and cosine of the day of the year) and a set of 34 meteorological variables (see Table 1 of Kukkonen et al., 2003 for details) to construct a neural network (NN) 1-step ahead forecast model of PM 10 in terms of time lagged values of PM 10 and CO measured in downtown Helsinki.However, the performance skill of the NN was found to be too low to suggest its adoption for predicting spatial concentration distributions in the urban areas.A similar finding was obtained in the GAA where the impact of meteorological variables (ambient temperature, wind speed and direction, and relative humidity) and the day of the week as a temporal variable on the next-day prediction  et al., 2003b).There is an indication that synergistic approaches like the one adopted by Michaelides et al. (2011) whereby a NN was first deployed to classify synoptic patterns and this is then integrated with satellite AOD retrievals and time-lagged PM 10 measurements in order to make one-stepahead predictions of PM 10 , have the potential to better capture the complex dynamics involved.So, while there are indications that NN models of PM 10 have the potential to provide independent estimates of PM, a high skill recipe involving chemical inputs, temporal and meteorological variables has yet to be found.With this in mind, in this paper we exploit the long record of coincident observations of a broad range of air pollutants available at nodes in the GAA air quality monitoring network combined with the power of permutation analysis, synoptic clustering and NN models, to develop and test a general spatiotemporal model of PM 10 that can potentially fill-in data gaps and support the development of more spatially representative prognostic models.To help maximize the likelihood of the uptake of the models we develop and their general utility, we will focus on photochemical measurements that are routinely made, and we use publicly-available maps of atmospheric circulation.The rest of this paper is arranged as follows.The next section presents the data sources used in this work as well as a sensitivity analysis of the chemical constituents with multiple linear regression (MLR) used to determine the optimal combination of photochemical measurements in the NN models developed.This is followed by a description of the methodology adopted to construct and train NN solvers for PM 10 from input variables that include photochemical measurements of NO, NO 2 and O 3 , associated temporal parameters, and synoptic conditions together with training statistics.In this section, we also provide references on the Bayesian regularization procedure for nterested readers.In the results section, daily PM 10 simulations are assessed using performance statistics for NNs trained with and without synoptic clustering and are compared against the results of the MLR model.The discussion section brings out our main findings and reports on the regimes of validity and general accuracy of the NN models.Finally, we conclude with a review of the potential of this approach for extending studies and data records of PM 10 .

METHODS
In this work, data is drawn from 2 sources: in situ chemical measurements, and a daily synoptic categorization at the 850 hPa isobaric level.The dataset of coincident values was constructed from measurements taken at 20 station nodes in the GAA.In terms of geographical coverage, the GAA contains the capital of Greece, Athens, and spans a basin on the west coast of the Attica Peninsula in a small area of just 450 km 2 containing over 3.8 million inhabitants (35% of the total population of Greece).The GAA is subject to significant local sources of aerosol such as traffic, small to medium-scale industry, domestic combustion of fossil fuel and biomass, but, as we mentioned in the introduction, also long-range transport of atmospheric PM and ozone precursors.Surrounded by mountains on three sides, the topography of the area is unfavorable for the dispersion of air pollutants, and ventilation of the basin takes place only under northeasterly flow (Grivas et al., 2008).

In Situ Chemical Data
A long (≈10 yr) data record (2001-2012 inclusive) of concentration measurements of 10 different chemical species (CO, NO, NO 2 , NO x , O 3 , SO 2 , PM 2.5 , PM 10 , C 6 H 6 and 'smoke') has been assembled from station nodes in the air quality monitoring network spanning the GAA.The characteristics of the sites is summarized in Table 1 and their geographical distribution is shown in Fig. 1 together with a "microarray" depicting data availability.
The spatial distribution of station nodes is not homogeneous and there is a much higher density of sites occupying the south-eastern part of the study region centred on the PAT site.Measurements of CO, NO, NO 2 , NO x , O 3 , SO 2 and C 6 H 6 are provided at the hourly timescale and measurements of PM 2.5 , PM 10 and smoke are provided daily.All measurements are in units of µg m -3 apart from CO which is in mg m -3 .There is also strong inhomogeneity in the array of chemical data as presented in the microarray of chemical measurements in Fig. 1.A first observation is that while chemical data for CO, NO, NO 2 , O 3 and SO 2 are the most numerous (being in excess of 10,000 measurements at any given station node), the spatial representivity of CO is low with measurements only available at 8 of the 20 station nodes.The record of C 6 H 6 measurements is volumous but only exists at the PAT station.Smoke measurements are only coincident with PM 10 at the ARI site.The lack of spatial representivity and utility of C 6 H 6 and smoke led us to exclude these chemical species from our study.A word of caution here is in order here.While there likely to be overlap between some of the chemical constituents present in smoke (e.g., NO x ) and the concentration of individual species measured, in this study, we do not attempt to decouple these effects due to the large uncertainty involved in not having spatially-representative data across the network of stations studied.Instead, we wish to emphasize the lowest common denominators -the routine measurements of individual chemical constituents measured directly, in the hope of widening the applicability and generality of the model for estimation of PM in monitoring studies.The record of PM 10 and PM 2.5 measurements is smaller than the photochemical data (numbering up to several 1000 records) since measurements are made daily, and are correspondingly less numerous than hourly data by a factor of 24.Only 4 station nodes (AGP, GOU, LYK and PIR) providing PM 2.5 data are coincident and overlap with the 12 station nodes providing PM 10 data.Since our aim here is to construct and validate a multivariate model of the spatiotemporal variation of PM 10 concentrations across the study region using a NN, synchronous data is required (i.e., it is necessary to supply simultaneous values of dependent and independent variables).This places a constraint on the number of data records and consequently the level of chemical detail.It is therefore necessary to quantify how the number of sites providing synchronous values varies with the number of different chemical species included.In order to do this, daily values and daily average values were selected.To be more specific, for the chemical species CO, NO, NO 2 , O 3 , PM 2.5 , PM 10 and SO 2 , we extracted synchronous data for each of the 52 different chemical combinations that included PM 10 .We then constructed a MLR model having the general form: PM 10 = f(CO, NO, NO 2 , O 3 , PM 2.5 , SO 2 ) for each combination.To quantify the appropriateness of using each chemical combination for constructing a multivariate NN model of PM 10 the value of the coefficient of determination (R 2 ) calculated from the goodness of fit of the MLR model as compared with the actual PM 10 measurements, was used as a statistic.The rationale behind such an sensitivity analysis is to quantify and make objective the choice of chemical input data to be used for the construction of an NN model, i.e., the choice of explanatory variables.
Table 2 shows the results of this analysis for combinations of chemical species that provide synchronous data for at least 3 station nodes (so as to guarantee a minimum level of spatial representivity).Note that the number of input chemical species used in this pre-processing data compression step varies from 1 to 6 (the latter corresponds to the case of an input vector containing all of CO, NO, NO 2 , O 3 , PM 2.5 and SO 2 , and which gave a goodness of fit value of R 2 = 0.432 at the PIR station).
Table 2 shows that there is a trade-off between potential model complexity (in terms of the number of different chemical species included) and the spatial extent of Table 2. Sensitivity analysis on the chemical constituents.The value of the coefficient of determination (R 2 ) at each station node for input chemical species combinations corresponding to at least 3 station nodes (N).The value of the median value of R 2 calculated across contributing station nodes in each combination is provided and used to rank the input parameter combinations across all contributing stations.

Synoptic Categories for Clustering
The classification of the synoptic scale atmospheric circulation we adopt derives from the manual scheme proposed by Kassomenos et al. (1998a).The scheme has already been employed with success in connection to air quality (Kassomenos et al., 1998b), daily mortality (Kassomenos et al., 2001) and urban heat islands over the GAA (Michalakakou et al., 2002).More recently, Zagouras et al. (2012) automated the scheme by applying a new method based on graph theory.In particular, eight synoptic categories were identified based on the geopotential distribution at 850 hPa isobaric level and the flow direction.
The 850 hPa isobaric level, representative of the low level troposphere, is preferable to the use of the surface geopotential, as it avoids topographical effects.The 8 categories are statistically-distinct and were found to be representive of the whole range of synoptic scale patterns over the Mediterranean region (Kassomenos et al., 1998a), and are shown in Fig. 2.
In this study, the daily synoptic classification was applied to the GAA for the period 2001-2012 (inclusive) by employing the charts at 12:00 UTC derived from the European Meteorological Bulletin (EMB).The time series of daily synoptic conditions comprises a total of 4164 categorizations.The categories, the number of days where the atmospheric circulation corresponds to each category, and their descriptor are as follows: • Synoptic category 1 (465 days): South-Westerly Flow.A trough is observed south-west of the GAA, resulting in south-westerly flow, being accompanied by advection of warm and moist air masses from Africa (Fig. 2(a)).• Synoptic category 2 (401 days): North-Westerly Flow.
When the trough has passed, a strong north-westerly flow is established over GAA.This category is characterized by strong cold air advection (Fig. 2(b)).• Synoptic category 3 (396 days): Long-Wave Trough.
Greece is dominated by a quasi stationary long-wave trough with its axis being positioned over GAA (Fig. 2(c)).This category is related to rainfall.• Synoptic category 6 (1207 days): Open Anti-Cyclone.A large-scale ridge dominates over the Greek area, usually for several days (Fig. 2(f)).with weak variable winds, favouring the development of local flows.• Synoptic category 7 (76 days): Closed Anti-Cyclone.This category is characterized by the presence of a closed anticyclone that extends over the major Greek area (Fig. 2(g)).
This category appears with similar characteristics to the previous one, but causes weaker winds or calm conditions.• Synoptic category 8 (919 days): High-Low.A ridge is combined with a trough over the central-eastern Mediterranean basin, resulting in rather complicated regimes over GAA.In the warm period, this category is mainly characterized by a strengthening of the pressure gradient and strong north-easterlies (the well known "Etesians") that blow over the Aegean Sea and into the GAA (Fig. 2(h)).
Cluster indices (1-8) corresponding to the synoptic category over the GAA on each day were then used to partition the coincident chemical data for PM 10 , NO, NO 2 and O 3 from the 10 stations: AGP, ALI, ELE, KOR, LYK, MAR, OIN, PAN, PIR, THR into 8 subsets (one for each different synoptic category), under the assumption that the entire study region was governed by the same atmospheric circulation on any individual day.

Temporal Inputs
In their study to predict urban PM 10 (and also NO 2 ) concentrations in central Helsinki using NN models, Kukkonen et al. (2003) found that it was necessary to incorporate the temporal/periodic variables into their NN models.The basis for including such periodic components in air quality forecasting is well established (Kolehmainen et al., 2001) and we implicitly included the day of the week (DOW), the day of the year (DOY) and Sin(DOY) and Cos(DOY) into our model design.As mentioned in the introduction, the models we develop aim to address the lack of success in modeling PM 10 in previous studies by: i) increasing the complexity of the range of chemical inputs used in the NN model and ii) by training different NNs for different and specific atmospheric circulation conditions.The subject of spatiotemporal prediction and forecasting of PM 10 using lagged photochemical inputs will be the focus of a follow-up paper.

Statistics and Treatment of Errors
In Appendix A we present a brief description of the difference statistics used to assess the performance of the models developed in this work.While, measurement errors exist in the determination of the chemical concentration of each species and also in the assignment of synoptic categories to atmospheric circulation, it is assumed here that the observations are "error-free" so that the statistical framework described above is valid and can be applied.In this work then, the goodness of fit of NN model simulations to measurements of PM 10 will be assessed with reference to the above statistics.We wish to note also that the above statistics will be calculated for measurements of PM 10 above and also below the daily mean limit value of PM 10 = 50 µg m -3 set by the EU Air Quality Directive.

Data pre-Processing
The sensitivity analysis applied to the chemical in situ data is an important step in model design to enhance the information content of the data.The dimensionality reduction it provides, like alternative approaches (e.g., principal components analysis or factor analysis) helps reduce parameter redundancy in the model.Another key step in preparing our data for use in model construction was normalization to remove potentially undesirable variances that arise from parameters having very different min-max ranges.In the neural network (NN) models we develop (described in the next section), all input and output matrices were preprocessed by mapping each parameter's mean to 0 and their standard deviations to 1 (i.e., to z-scores).In addition, a random number generator was used to ensure that identical initial weights were used in each run so that NN models with the same initial conditions could be compared.In particular, the twister algorithm (Matsumoto and Nishimura, 1998) based on Marsenne prime (2 19937 −1) was called with a constant integer seed value to return a single uniformly distributed pseudo-random number in the interval (0, 1).

THE NEURAL NETWORK MODEL
In this section, we describe the construction, training and validation of a NN model for retrieval of PM 10 estimates (outputs) from time series measurements of the daily photochemistry (NO, NO 2 and O 3 ) together with the periodic temporal variables: DOW, DOY, Sin(DOY) and Cos(DOY) as the input variables.For multivariate and temporallystatic input-output data, a feed-forward NN having at least one layer of "hidden" neurons whose activation functions are general nonlinear sigmoidal functions (e.g., the tanh hyperbolic tangent function) has been proven to be able to operate as a universal function approximator (Cybenko, 1989;Hornik, Stinchombe and White, 1989).This means that, given enough hidden neurons and training data, such networks are capable of learning the exact mathematical relation between inputs and outputs.Fig. 3 presents the topology of a generic 3-layer multiple input, single output function approximating NN.
The exact mathematical equation relating the output to the inputs for this type of NN is given by the matrix equation (by analogy with the method described in Taylor et al. 2014): The multiplication of the matrix IW 1,1 and the vector X is a dot product equivalent to the summation of all input connections to each neuron in the hidden layer.Eq. ( 1) is the continuous (nonlinear) functional approximation that relates the output vector to the matrix of input vectors.
For the NN models we construct in this work, Y = [PM 10 ] T and X = [NO, NO 2 , O 3 , DOW, DOY, Sin(DOY), Cos(DOY)] T .The performance of NN models depends on their architecture (Bishop, 1995) and it is recommended that a sensitivity analysis is performed on the network parameters (Taylor et al., 2014).To identify candidate optimal NN model structures, we initially trained a grid of NNs having a range of architectures: a) with 1 or 2 hidden layers of neurons, b) with between 10 and 40 neurons with tanh activation functions and c) with the proportion of training data varying between 70% and 90%.The NN models were coded using MATLAB's object-oriented scripting language in conjunction with its Neural Network Toolbox (Beale et al., 2015).The size of the available data record is 17,760 coincident input-output vectors.90% of the dataset (16,021 records) was used for NN training and the remaining 10% of the dataset (1,776 records) was used for simulation (see the Results section).The same training data was used in training each NN in the grid of different architectures and the number of learning iterations was set to 100.
The training algorithm used for the NN models developed in this study is Bayesian regularization using a Laplace prior (Williams, 1995) based on the Bayesian framework described by Mackay (1992).Bayesian regularization converts a nonlinear regression into a "well-posed" statistical problem by minimizing a linear combination of squared errors and weights and modifies the linear combination so that at the end of training the resulting network has good generalization qualities (Beale et al., 2015).Bayesian regularization in MATLAB automates the determination of the optimal regularization parameters in its NN training function "trainbr" and takes place within the Levenberg-Marquardt algorithm where back-propagation is used to calculate the Jacobian of performance with respect to the weight and bias variables (Beale et al., 2015).In line with the guidance of MATLAB's Neural Network Toolbox User's Guide (Beale et al., 2015), we scaled all network inputs and targets to the range [-1,1] during training and then inverted outputs back to their true scales for application of the difference statistics described in Appendix A. For mathematical detail of the statistical regularization procedure implemented by MATLAB's "trainbr" NN training function we refer the reader to Williams (1995) and Mackay (1992).Bayesian regularization is well suited to complex/nonlinear problems involving a sizeable number of hidden neurons (e.g., > 10) and deep learning (2+ layers of hidden neurons) where, by restricting the magnitudes of weights, this method avoids overfitting.By optimizing the model parameters, it also has better generalization properties that other NN learning algorithms that may overfit or get trapped in local minima.Indeed, this was the main driving force behind our choice of Bayesian regularization -i.e., it was found to perform much better than three other standard back-propagation training algorithms (the Levenberg-Marquardt algorithm "trainlm", the resilient back-propagation algorithm "trainrp" and the Scaled Conjugate Gradient algorithm "trainscg" training functions in MATLAB).In contrast to the high target versus output correction values obtained with Bayesian regularizarion training (R = 0.97 for the optimal NN in Table 3), the other algorithms achieved much lower values in the range 0.52 ≤ R ≤ 0.73 for the optimal NN architecture (2 layers of 30 hidden neurons).We will see in the results section that NN models trained with Bayesian regularization strongly outperform analogous MLR models (see Table 11) and we recommend that this scheme is adopted for problems of similar complexity.
For each NN, the back-propagation optimization algorithm (Rumelhart et al., 1986) then proceeded to minimize the mean squared error (MSE) calculated from the difference between NN-derived PM 10 outputs (y i ) and target PM 10 measurements (t i ) in the validation dataset (100% -training %): Table 3 shows the results of the sensitivity analysis we performed on the parameters of the generic NN.For this grid of NNs and input-output data, the optimal architecture corresponds to a NN containing 2 hidden layers of 30 neurons when 90% of the data is used for training and 10% is used for validation.This reflects the fact that the relation between PM 10 and the photochemistry is highly complex and nonlinear.The relative success of 3-layer NNs (i.e., containing 2 layers of hidden neurons) over 2-layer NNs supports the notion that deeper learning is required for this particular problem.This is also confirmed to some extent by the relatively large number of 30 neurons needed despite there being only a single output parameter and 7 input parameters.Having established an appropriate NN architecture, we then proceeded to train NNs for each synoptic cluster as described in the next section.

NN Training
Since daily atmospheric circulation over the study region was classified into 8 categories, it was necessary to train a total of 8 NNs.For each NN, the training dataset comprises complete records of daily-averaged values of NO, NO 2 and O 3 together with the associated day of the week (DOW; an integer index running from 1 to 7 starting on Sunday), the day of the year (DOY; an index running from 1 to 365 or 366 in the case of a leap year) and the sine and cosine of the DOY as inputs, together with the daily values of PM 10 as the output assembled from coincident measurements at the 10 station nodes: AGP, ALI, ELE, PAN, THR, KOR, LYK, MAR, OIN and PIR in the air quality monitoring network during the period 2001-2012 (inclusive).As such, each of the NN models has 7 input variables and 1 output variable.The indices of each synoptic category were then used to partition the data into 8 subsets.Each subset was further partitioned into data used for NN training and ("unseen") data used for NN validation (the results of which are presented in the next section).For each synoptic category, a search was performed in each of the 12 years of the study period (2001-2012 inclusive) and 90% of the indices were stored for training while the remaining 10% of the indices were stored for validation.In total, 8 sets of training and validation indices were produced.For each category, the indices were then used to extract coincident chemical measurements together with their associated temporal variables.Fig. 4 shows the progression of training for the 'zonal flow' (Cluster 5) NN towards convergence at the horizontal asymptote for the "best" validation MSE after 100 epochs of back-propagation learning using Bayesian regularization.At this point where the slope of the validation MSE approaches zero, the effective number of parameters used in the regularization has converged.Fig. 4 shows that a high level of correlation has been achieved between target measured values of PM 10 and the NN outputs with a Pearson product-moment correlation coefficient, R = 0.944.Robust regression was performed using the method of Theil (1950) and Sen (1968).The training results and associated statistics for each of the 8 synoptically-clustered NNs (labelled "NN1" to "NN8") are presented in Table 4.The variation of linear and nonlinear average error and goodness of fit statistics are also presented in Fig. 4. Table 4. Statistics associated with training each of the 8 synoptically-clustered NNs (NN1 to NN8).In the table, N is the number of daily values assembled from coincident data drawn from the 10 station nodes and N(> 50) is the number of values which exceed the limit set by the EU Air Quality Directive (2008/EC/50).The proportion of the data which are in excess of this limit is also given as a percentage.R is the Pearson Product-Moment Correlation coefficient from the regression of NN outputs on measured values.All other statistical quantities are described in the last section.6.9 5.7 6.9 4.7 0.2 4.0 6.2 6.1 1.9 0.12 0.12 0.06 Mean + 2SD 4938 1.00 53.3 33.5 53.2 24.6 0.2 18.4 27.3 22.3 13.5 0.87 1.00 1.00 NN5, corresponding to the case of 'zonal flow' (Cluster 5), presents the lowest linear and nonlinear measure of average error (MAE, RMSE, RMSE s and RMSE u ) and concurrently the highest values of the linear and nonlinear goodness of fit statistics (R, d 1 , d 2 , R 2 ).In order to assess whether or not partitioning of the data into synoptic categories and training a NN for each category offers an advantage over not using synoptic clustering (i.e., that taking meteorology into account provides a computational benefit), we also trained an additional NN without clustering using exactly the same input-output vectors (i.e., using 90% of the data from each synoptic category and then accumulation of the input-output vectors over all clusters).In order to be able to directly compare this "unclustered" NN which we label "NNU" with the 8 synoptically-clustered NNs, we recombined the simulated outputs from NN1 to NN8 into a single matrix of input-output vectors which we label "NNC".The statistics associated with NNC reflect the mean training performance over all 8 synoptically-clustered NNs.Table 5 shows the statistics associated with these two models.
The statistics for the NNs presented in Table 5 are based on the largest available sample of coincident data (N = 16021 daily input-output vectors).The training performance is reasonably good in both cases (e.g., the linear and nonlinear average error measures MAE, RMSE, RMSE s and RMSE u are in the range 10.06 to 18.04 µg m -3 , and the linear and nonlinear goodness of fit statistics R, d 1 , d 2 and R 2 are in the range 0.56 to 0.88).Some slight improvement is apparent in the case of NNC suggesting that the use of synoptic clustering appears to offer some benefit.The simulation performance of the NNs is tested by feeding the trained NNs with unseen data (i.e. the 10% of the datasets not used during network learning).In the next section we evaluate the performance of the NN model by comparing simulated PM 10 outputs against daily measurements from station nodes in the air quality monitoring network.

RESULTS
As a result of applying the methodology described in the previous section, a total of 9 NNs were trained (one for each of the 8 synoptic categories plus one without clustering).In each case, 90% of the dataset was used for training.In order to assess the simulation performance of the NNs, the remaining 10% of the dataset containing daily photochemical measurements: NO, NO 2 , O 3 and associated temporal variables (DOW, DOY and the sine and cosine of DOY) coincident with PM 10 measurements and synoptic categories at the station nodes: AGP, ALI, ELE, PAN, THR, KOR, LYK, MAR, OIN and PIR, was used.As for the NN training, cluster indices for synoptic categories were used to divide the data into 8 subsets (one for each synoptic category).The input vectors in each of the subsets were then fed to each of the 8 synoptically-categorized NNs to simulate daily PM 10 values.The time series at each station node was then reconstructed from the simulated outputs for each synoptic category with reference to the associated cluster indices.Since coincident input-output vectors are used at every stage, simulation performance was assessed using the statistical difference measures described earlier.
For the case of the NN trained without synoptic clustering (NNU), PM 10 simulations and coincident measurements at each station node are directly compared without the need to perform time series reconstruction using cluster indices.In addition to the calculation of average error and goodness of fit statistics for time series at each station node, we have also flagged PM 10 measurements that are in excess of the EU Air Quality Directive limit value of 50 µg m -3 so as to assess the performance of the NNs on routine values that are not outliers (in the sense that they exceed a limit value).In Table 6 we present statistics for simulated PM 10 values obtained with the NN trained without synoptic clustering (NNU) for the data points at each station node.In Table 7 we present analogous statistics for data points that are below the limit value.
In Table 8 we present performance statistics associated with the reconstructed time series of PM 10 simulations obtained from the output of the 8 synoptically-clustered trained NNs at each station node (NNC).Table 9 presents analogous statistics for data points that are below the limit value.
In order to assess whether or not partitioning of the data into synoptic categories and training a NN for each category offers an advantage, we present the simulation results from the NN trained without clustering for data accumulated across all station nodes ("NNU") with that resulting from simulation with the 8 synoptically-categorized NNs: NN1 to NN8 ("NNC").Table 10 shows the statistics associated with these two simulations.
While comparable in performance, both NN models strongly improve on the results of the MLR model of PM 10 where the median R 2 value across the 10 station nodes was found to be only 0.25 (see the entry for NO, NO 2 , O 3 in Table 2).Table 11 shows that this improvement is systematic across all station nodes.

DISCUSSION
Comparison of NN estimates against measured values of PM 10 across 10 station nodes in the air quality monitoring network of the GAA using photochemical inputs and associated temporal variables over the 2001-2012 (inclusive) study period, suggests that PM 10 can be retrieved at the daily timescale with some confidence.This is true both for the case of an NN trained on the 90% of the dataset and also for time series of PM 10 reconstructed from 8 NNs trained on synoptically-clustered data -with some small improvement with regards to the overall RMSE with the latter.Fig. 5 shows the size of linear and nonlinear average error statistics across the station nodes obtained from NNs trained with and without synoptic clustering.
The largest average error corresponds to the NN trained without synoptic clustering whereby at the KOR station node the RMSE = 25.68 µg m -3 (associated with 134 pairs of NN estimates and measured daily values).For the 125 data points of this sample where PM 10 < 50 µg m -3 (i.e., below the EU Air Quality Directive limit value) the corresponding RMSE falls to 9.82 µg m -3 .For the same input data fed to synoptically-clustered NNs, RMSE = 23.66 µg m -3 for the 134 NN estimates and RMSE = 9.83 µg m -3 for the 125 data points of this sample where PM 10 < 50 µg m -3 .A similar trend is observed at all station nodes and suggests that the NN model is capturing the bulk of routine values very well (to within ≈ 11 µg m -3 ).Outlier values (i.e., measurements which are in excess of the limit value) are less well modeled by the NNs.This effect translates into a modulation of the goodness of fit at station nodes such as KOR and OIN where measurements are strongly in excess of the limit value of 50 µg m -3 as shown in the lower panels of Fig. 5.
To place this effect in context, at the KOR station node, while only 9 out of 134 data points (i.e., 6.7%) are in excess of the EU Air Quality Directive limit value, they have a strong impact on the goodness of fit measures when comparing the whole simulation sample ("All") with the statistics for routine values where PM 10 < 50 µg m -3 .At the LYK station node, where some 40.2% of the simulation data exceed the limit value, the performance of the NNs trained both with and without synoptic clustering is robust for the subset of routine values (below the limit value) as shown in Fig. 6.This gives additional credence to the notion that the NNs perform reliably well for routine values.
To sum up our findings in this section, NN models with synoptic clustering achieved an average root mean square error (RMSE) ≈ 16 µg m -3 across the station nodes with an average index of agreement (IA) of 0.71 (somewhat better than the control network whose performance statistics were RMSE ≈ 17 µg m -3 and IA = 0.61 respectively).For routine measurements below the EU Air Quality Directive limit value of 50 µg m -3 , the average error is as low as RMSE ≈ 11 µg m -3 across the station nodes.

CONCLUSIONS
The NN modeling approach developed and tested here for estimating daily PM 10 concentrations from photochemical measurements can be readily applied to model other PM concentrations such as PM 2.5 and/or PM 1 , and can help increase the spatiotemporal coverage and representivity of existing air quality monitoring networks enabling them to contribute more data to urban, suburban or interurban assessments of PM.The estimated daily values of PM 10 have average errors small enough (≈ 11 µg m -3 ) to make such models appear to be operationally feasible.Both NN models with and without synoptic clustering, were found to greatly outperform their linear MLR model counterparts across all sites of the air quality monitoring network.MLR was found to be extremely helpful in helping determine the most relevant chemical species to be included in the model design and, despite the complexity of the NN architecture needed to achieve deep learning (two layers of 30 hidden neurons), Bayesian regularization was found to be instrumental in helping stabilize network learning, avoid overfitting and to maximize the generalization potential of the trained NN models.
A major advantage of the methodology described in this paper is that photochemical data (NO, NO 2 and O 3 ), unlike PM measurement, is already a feature of most monitoring networks.Furthermore, daily analysis of atmospheric circulation and synoptic pattern categorization is also something readily available and easy to obtain from daily meteorological bulletins.It is envisaged that NN models like the ones developed here can be used: i) to 'fill-in' gaps in PM 10 time series were only coincident NO, NO 2 , and O 3 measurements are available, and ii) to help extend existing time series of PM 10 (and potentially PM 2.5 and PM 1 from construction of analogous models) for the development of PM nowcast (1-day ahead forecast) models, and the production of (interpolated) air quality maps of study regions.This spatiotemporal approach therefore has the potential to increase the effectiveness and reach of existing air quality monitoring networks that can support the legislation of environmental policy for engendering behavioural change to reduce air pollution levels.

Fig. 1 .
Fig. 1.Upper Panels: The Attica study region in Greece and the location of the 20 air quality monitoring station nodes.Lower Panel: The number (N) of chemical concentration measurements at each station for the period 2001-2012 (inclusive).

•
Synoptic category 4 (465 days): Closed Low.This category is characterized by the presence of a closed low, being accompanied by intense winds, usually from the northern sector, and rainfall (Fig. 2(d)).• Synoptic category 5 (235 days): Zonal Flow.The circulation is almost zonal over the GAA, resulting in a prevailing westerly flow (Fig. 2(e)) with considerably lower intensity in the warm period of the year.

Fig. 2 .
Fig. 2. Charts of the atmospheric circulation at the 850 hPa isobaric level (as derived from the European Meteorological Bulletin) for each of the 8 statistically-distinct synoptic categories over the Greater Athens Area (GAA): a) South-Westerly Flow, b) North-Westerly Flow, c) Long-Wave Trough, d) Closed Low, e) Zonal Flow, f) Open Anti-Cyclone, g) Closed Anti-Cyclone, h) High-Low, as per the taxonomy provided by Zagouras et al. (2012).

Fig. 3 .
Fig. 3. Schematic showing the neural connectivity between input parameters and an output parameter in a multiple input, single output feed-forward NN having 2 layers of hidden neurons and a linear output layer.

Fig. 4 .
Fig. 4. Upper Left: Training of the 'zonal flow' (Cluster 5) NN showing how the MSE varies with back-propagation iteration (epoch).This NN has 2 layers of 30 hidden neurons with tanh activation functions and is trained with Bayesian regularization with a goal for the back-propagation cost function set to 1/100th of the total variance of the targets.Upper Right: A high level of correlation has been achieved between target measured values of PM 10 and the NN simulation outputs (Pearson product-moment correlation coefficient, R = 0.944).Lower Left: Linear and nonlinear average error statistics for the 8 NNs corresponding to the partitioning of the data by synoptic category.Lower Right: Linear and nonlinear goodness of fit statistics for each trained NN.

Fig. 5 .
Fig. 5. Upper Panels: Linear and nonlinear average error statistics associated with PM 10 estimates compared with measured values for the NN trained without synoptic clustering values.The average error statistics for measured values of PM 10 < 50 µg m -3 is shown at right.Middle Panels: Error statistics for the output produced by training a separate NN for each data partition corresponding to a different synoptic cluster.The average error statistics for measured values of PM 10 < 50 µg m -3 is shown at right.Lower Panels: Linear and nonlinear goodness of fit statistics for the output produced from the 8 synoptic cluster NNs.The goodness of fit statistics for measured values of PM 10 < 50 µg m -3 is shown at right.

Fig. 6 .
Fig. 6.Upper: NN estimates of PM 10 (cyan) at the LYK station node obtained by the NN trained without synoptic clustering and corresponding measurements (dark grey) connected by black lines when PM 10 < 50 µg m -3 and by red lines when PM 10 ≥ 50 µg m -3 .The right hand panel shows regression of NN estimates on measured values for the case of PM 10 ≥ 50 µg m -3 (red) and PM 10 < 50 µg m -3 (black).Lower: As for the upper panels but with NN estimates of PM 10 obtained by NNs trained with synoptic clustering.

Table 1 .
Description of the 20 station nodes of the air quality monitoring network in the Attica region contributing chemical measurement data during the ≈10 yr study period: 2001-2012 (inclusive) used in this work.
(in terms of the number of contributing station nodes).Within these constraints, the optimal multivariate model arising from this data preprocessing step is that corresponding to the highest value of N, and concurrently the highest value of median R 2 , and involves the chemical species quadruplet: PM 10 , NO, NO 2 , O 3 .This optimal data combination involves synchronous data from the 10 station nodes: AGP, ALI, ELE, PAN, THR, KOR, LYK, MAR, OIN and PIR.Note that, while the actual performance of the multiple linear model in terms of the magnitude of R 2 is not important in the context of the aim of this work -which is to develop a much more powerful and accurate model, it is a necessary pre-processing step for data compression in terms of allowed chemical complexity in the model being constructed.Since the need to maximize the spatial representivity of our model was a constraint in our study, rather than perform a correlation analysis at each site and report the magnitudes and signs of the impact of each explanatory variable on the modeled PM, we have opted instead to show the overall effect of the choice of explanatory variables on modeled PM 10 at each station by ranking goodness of fit statistics (the coefficient of determination, R 2 ).

Table 3 .
Validation performance of a grid of NN architectures with: a) 1 or 2 hidden layers of neurons with hyperbolic tangent (tanh) activation functions, b) between 10 and 40 neurons in each hidden layer, and c) the proportion of training data varying between 70% and 90%.The optimal NN architecture is indicated by **.

Table 5 .
Statistics associated with training a NN without synoptic clustering ("NNU") and by reconstructing the data from aggregation of the simulations from the 8 synoptically-clustered NNs ("NNC").

Table 6 .
Statistics associated with PM 10 simulations obtained from NNU at each station node compared with measured values.

Table 7 .
Statistics associated with PM 10 simulations obtained from NNU compared with measurements which are below the EU Air Quality Directive limit value of 50 µgm -3 .

Table 8 .
Statistics associated with the reconstructed time series of PM 10 simulations obtained from the output of the 8 synoptically-clustered trained NNs at each station node (NNC) compared with measured values.

Table 9 .
Statistics associated with the reconstructed time series of PM 10 simulations obtained from the output of the 8 synoptically-clustered trained NNs at each station node (NNC) compared with measurements which are below the EU Air Quality Directive limit value of 50 µgm -3 .

Table 10 .
Statistics associated with simulation across the 10 station nodes with the NN trained without synoptic clustering ("NNU") together with statistics associated with reconstructing and aggregating outputs across the 10 station nodes using the 8 synoptically-clustered NNs ("NNC").

Table 11 .
Comparison of the coefficient of determination (R 2 ) for the multiple lnear regression (MLR) model of PM 10 using the photochemical input triplet (NO,NO 2 ,O 3 ) and the neural network models without synoptic clustering (NNU) and with synoptic clustering (NNC) for each of the 10 stations providing coincident measurement data.For the MLR model the average value of R 2 across the network of stations is reported as the median of the values while for the NNU and NNC, the mean value is reported.