Forecasting of Hourly PM 2 . 5 in South-West Zone in Santiago de Chile

We present the results of a neural network model designed for the forecasting of hourly PM2.5 concentrations in Santiago, Chile. The study focuses on the observed values at two of the monitoring stations, which are located in the south-west zone of the city and are among the stations that register the highest concentrations during the period between April and August. This is the season when air quality is very often in ranges that are harmful to the population and some restrictions on emissions become useful. The forecasting model is a multilayer neural network. The input variables are observed values of hourly PM10 and PM2.5 concentrations measured at the station of interest and at a neighboring station at 7 PM of the present day and some observed and forecasted meteorological variables. NO2 concentrations during the morning and afternoon hours, which may be associated with secondary particle formation, are also used as input. The implemented models are trained with 2014 and 2015 data and tested with 2016 values. Information is collected until 7 PM of the present day, and the largest forecasting error up to 21 hours in advance is 32%. The accuracy of this forecasting is better than that obtained with a neural model previously used for the forecasting of hourly PM2.5 concentrations in the north-west zone in Santiago. Our neural models show better results than those obtained with linear models with the same input variables. The developed models provide a tool for anticipating episodes in Santiago and other cities with similarly unfavorable conditions for pollutant dispersion.


INTRODUCTION
Atmospheric fine particulate matter PM 2.5 has received more attention in Chile since 2012, when a standard for this pollutant was established.Five levels were defined: Level 1 (good): 24-h average of PM 2.5 is less than 50 µg m -3 Level 2 (fair): 24-h average is between 50 µg m -3 and 80 µg m -3 Level 3 (bad): 24-h average is between 80 µg m -3 and 110 µg m -3 Level 4 (critical): 24-h average is between 110 µg m -3 and 170 µg m -3 Level 5 (emergency): 24-h average is greater than 170 µg m -3 Regulations establish that when 24-hour moving average concentrations are in Level 3 or higher, some restrictions to emissions apply.This condition was observed in at least one of the 11 monitoring stations in Santiago during at least one hour on 42 days during 2016.
during the cold season.This trapping through inversion has been verified with lidar measurements (Muñoz and Alcafuz, 2012) and temperature readings from a meteorological tower (Muñoz and Corral, 2017).Another factor that contributes to the increasing of PM 2.5 concentrations during this cold season is combustion of wood and fossil fuels used in residential heating.In order to attenuate the effect of episodes of high pollution, local authorities enforce several permanent and occasional restrictions to emission sources.When PM 2.5 concentrations are in Level 3 or higher, wood stoves are not allowed in the whole region, including rural areas.When concentrations reach Level 4, 20% of motor vehicles cannot circulate.We must consider that the probability for respiratory and cardiovascular disease increases when PM 2.5 concentrations in the breathable air increases (Kim et al., 2015).There is also an impact on early-life mortality (Jayachandran, 2009).The results of a recent study suggest that long-term exposure to PM 2.5 above the current US EPA standards is associated with increased risk of Alzheimer's disease (Lin and Hwang, 2015).
In cities where particulate matter concentrations reach values considered harmful for the population, it becomes a necessity the implementation of operational air quality forecasting models which may be used to anticipate unfavorable situations.Among air quality forecasting models used until today we must mention the deterministic chemical transport models (CTM), which incorporate equations for the behavior of air components and pollutants.Emission information of pollutants in the region of interest is required.Examples of CTMs applied for the forecasting of air quality forecasting in different regions of the planet are: CHIMERE (Rouïl et al., 2009), CMAQ (Otte et al., 2005), WRF/Chem (McKeen et al., 2007), CALIOPE (Baldasano et al., 2008).Shahraiyni and Sodoudi (2016) presented recently an extensive review of statistical models for particulate matter concentrations forecasting.Here, we verify that more than 50% of the successfully implemented forecasting models correspond to the multilayer perceptron (MLP).In most cases when a comparison between neural networks and multilinear regressions has been performed, the neural approach has been more accurate.According to this review, in 57% of the models the forecasted quantity is a 24-hour average of particulate matter concentrations.
Parameters of the statistical models are adjusted based on historical values of associated variables and may vary depending on the specific factors that are more relevant at a given location.Being based on first principles, deterministic chemical models are expected to be more accurate than statistical models, provided it is possible to feed the details of emissions and topography for the region of interest.However, according to the results of forecasting models implemented in recent years, both types of models produce results with errors of the same order of magnitude.Some authors have suggested that best results could be obtained with combination and feedback of CTMs with results from statistical models (Stern et al., 2008;Konovalov et al., 2009;Neal et al., 2014;Cheu et al., 2015).With regard to computation resources needed to run next-day forecast, statistical models are significantly less demanding (Fernando et al., 2012;Zhang et al., 2012).
Thus, depending on specific local conditions related to criticality of air pollution, accuracy needed and resources available, authorities will tend to implement operational air quality forecasting models, whether CTM or statistical in order to anticipate situations that may put in risk the population.In this paper we analyze the possibility to build a statistical forecasting model to estimate, several hours in advance, hourly concentrations of PM 2.5 in Santiago, Chile.Time series analysis of Santiago's PM 2.5 has shown it has a chaotic behavior (Salini and Perez, 2015).This finding puts a strong restriction on the potential accuracy to be obtained with fine particulate matter forecasting models.However, the experience of including multiple associated predictors in models allows for a reasonable accuracy forecasting a significant amount of hours in advance.It seems less difficult to build models to forecast daily averages or 24-hour moving averages rather than hourly values because the former variables have a smoother behavior.Ordieres et al. (2005) predicted the daily average of PM 2.5 on the US-Mexico border, showing that neural network models outperform linear regression models.They use PM 2.5 concentration, temperature, humidity and wind information measured at 8 AM to predict the daily average (16 hours in advance).2000 and 2001 data from one monitoring station is used for parameter adjustment and 2002 data is used to test the models.Perez and Salini (2008) have compared three methods in order to forecast the following day 24-hour moving average of PM 2.5 concentrations after collecting concentration and meteorological information until 8 PM of the present day.Their results show that a combination between a neural network and a nearest neighbor method is more convenient for the forecasting of high concentration cases.Cobourn (2010) developed an empirical nonlinear regression to provide a next day forecast of daily PM 2.5 concentration in metropolitan areas of Kentucky.He shows that nonlinear relation between predictor and predicted variables is important to be considered.Forecasting hourly averages seems to provide more accurate information for the efficacy of actions of pollution control intended to protect the population, because in general restrictions based on 24-hour averages are taken with a greater delay.And still, daily averages may be calculated from hourly values.Perez et al. (2000) developed a very simple multilayer neural network (MLP) to predict hourly concentrations of PM 2.5 from 1 to 24 h in advance, based only on the information content of the time series, obtaining percent errors from 30% for early hours to 60% for late hours.This study shows that improvement of accuracy is possible, by including meteorological variables like relative humidity and wind speed as predictors.A model for forecasting hourly concentrations of fine particulate matter near a road with high traffic based on concentration and meteorological information from the previous hour was reported some time ago (Thomas et al., 2007).They show that forecasting of similar accuracy is obtained with a linear model and a neural network, provided an appropriate choice of predictors.An explanation for this result may be that during this short time nonlinear contributions are not relevant.A wavelet-based neural network model was used for the forecasting of hourly, daily mean and daily maximum of PM 2.5 concentrations in Delhi (Prakash et al., 2011).One step ahead hourly PM 2.5 was forecasted with a 10% error.In this work our interest is to generate values with an anticipation from 1 to 20 hours.Mishra et al. (2015) have shown that a neuro-fuzzy model outperforms a neural network and a linear regression when forecasting hourly values of PM 2.5 during haze episodes in Dehli.The neuro-fuzzy system can combine the learning abilities of neural networks with the human-like explanation abilities of fuzzy algorithms.Nevertheless, the presented model generates estimations of pollutant concentration only one step ahead.Recently, a model based protocol for the selection of the best neural network aimed to forecast hourly values of PM 2.5 was presented (Oprea et al., 2016).They conclude that it is possible to generate a very accurate estimation of hourly PM 2.5 one hour in advance using the pollutant concentration information from the previous 4 hours.In a previous work, Perez and Gramsch (2016) reported the results of an hourly PM 2.5 forecasting model for Santiago.Here, previous concentrations of particulate matter and selected meteorological data were found to be necessary as input in order to estimate future values with an anticipation of up to 20 hours and with a mean percent error of the order of 30%.
In summary, based on previous statistical forecasting models for particulate matter concentrations developed elsewhere, best results for hourly and daily average values are obtained with nonlinear rather than linear algorithms.Among nonlinear algorithms, the most used scheme is the neural network model.Among neural networks, the most used scheme is the multilayer perceptron (MLP).In general, more relevant than the specific nonlinear model variation to use is the proper choice of input variables.
The present work uses the results reported in Perez and Gramsch (2016) as a starting point introducing several modifications that seem to improve the accuracy of the forecasts.

DATA
Pollution data was obtained from two monitoring stations located in the south-western zone of Santiago.They belong to a monitoring network (Macam network) which consists of eleven stations distributed through the city.They report 1-hour averages of PM 10 , PM 2.5 and NO 2 , among other pollutants with a delay of one half hour (see Fig. 1).Meteorological data like temperature, wind speed, wind direction and relative humidity are also measured in these stations.Our goal is to forecast hourly concentrations of PM 2.5 at Cerrillos and El Bosque stations, which are among the stations that show frequent episodes with high values during most of the polluted season in Santiago.The two stations are located in the south-west area of the city, and they are 6 km apart (see Fig. 1).Due to this relatively short distance, pollutant concentrations from both stations have a significant correlation.From the forecasted hourly values, we can generate the 24-h averages in order to establish if the limit of 80 µg m -3 , which imply restrictions in the city, is exceeded.Table 1 displays average PM 2.5 concentrations and number of exceedances of this limit for the period and stations of interest for 2014, 2015 and 2016.Fig. 2 shows   during afternoon hours is due mainly to the stronger winds observed at this time of the day.We must mention that PM 10 concentrations show a qualitatively similar behavior, being on average 30% higher than PM 2.5 .

THE FORECASTING MODEL
Among statistical forecasting models of pollution, artificial neural networks have given positive results in many cities around the world (Perez et al., 2000;Ordieres et al., 2005;Perez et al., 2006Perez et al., , 2008;;Voukantsis et al., 2011;Zhou et al., 2014;Perez et al., 2016;Shahraiyni and Sodoudi, 2016).This work uses a feed forward multilayer neural network (MLP) model for the forecasting of hourly concentrations at different times at Cerrillos and El Bosque stations.Given the positive results obtained by Perez and Gramsch (2016) for the forecasting of hourly PM 2.5 concentrations at a nearby zone in the city of Santiago, initially we used similar input variables (Model 1), and later, we introduced a few modifications in order to improve the performance (Model 2).
Input The ventilation factor is a meteorological parameter which is estimated daily on the basis of regional atmospheric conditions which incorporates estimations of boundary layer height, patterns of winds and humidity (Ruttland and Garreaud, 1995).The reason to use PM 2.5 and PM 10 concentrations at 6 PM and 7 PM as input is that around these times, when usually afternoon winds weaken, a sudden and significant increase of particulate matter pollutant is observed during the onset of a night episode, so this information is a good predictor for these events (Perez and Gramsch, 2016).Wind speed and relative humidity are hourly values.The neural network has a hidden layer with 8 neurons and a single output, the forecasted hourly PM 2.5 concentration at Cerrillos (or El Bosque) at a given time after 7 PM of the present day.
In order to increase the accuracy of Model 1 we explored several options and we found that the best implementation consisted of a feed forward neural network with 13 input variables, 8 neurons in a first hidden layer, 4 neurons in a second hidden layer and one output, the forecasted hourly PM 2.5 concentration a certain number of steps ahead at Cerrillos (or El Bosque).Amount of hidden layers and neurons in each layer were chosen by minimizing forecasting error after exploring different combinations up to three layers with the restriction that number of weights does not exceed 20% of the training cases.The forecasted values are intended to reproduce the observed values at the respective stations.Fig. 3  This set of input variables was obtained with a stripping method (Bhat and McAvoy, 1992), starting from a relatively large set including Model 1 variables and some new quantities like dominant wind direction during the previous night.Variables that do not improve the quality of the forecasting are discarded.

Fig. 3.
Neural network with two hidden layers used in Model 2. In this case the network has 13 inputs, 8 units in the first hidden layer, 4 units in the second hidden layer and one neuron in the output layer.
The reason for not including measured concentrations of particulate matter later than 7 PM of the present day as input is because our goal is to generate a forecasting report at 8 PM, in such a manner that authorities have enough time to announce eventual restrictions for the following day.In this case we have found that in order to detect the presence of an episode it is enough to consider the 7 PM value (and not the 6 PM value as in Model 1).In this new set, the ventilation factor is left out.The reason for this finding may be that some more specific variables work more efficiently than the global ventilation factor (thermal amplitudes, wind speed, wind direction and relative humidity).Thermal amplitudes are good predictors of episodes of pollution.Most of the night episodes are classified as type A (Ruttland and Garreaud, 1995), which is described by clear days, high pressures, low minimum temperatures at sunrise (around 0°C) and high maximum temperatures in the afternoon (between 20°C and 25°C).These conditions are not favorable for vertical and horizontal dispersion of pollutants (Perez and Gramsch, 2016).Low wind speed and relatively high humidity are frequently present at times of high PM 2.5 concentrations (Chu et al., 2010;Voukantsis et al., 2011;Zhou et al., 2014).Given that chemical analysis of measured particulate matter at Cerrillos and El Bosque stations indicates a significant fraction of nitrates, it is very likely that there exists secondary PM 2.5 formation from available NO 2 .For this reason, in Model 2 we include afternoon NO 2 concentrations as input, which may be produced from NO morning vehicular emissions (Perez and Trier, 2001).In Fig. 4 we observe that in average, NO 2 concentrations are higher during light hours, especially for episode days.Wind direction is also a relevant input variable, because according to the displayed rose plots in Fig. 5, east direction dominates at night during episodes, allowing pollutant accumulation at western stations (the coastal range blocks farther transportation of lowaltitude atmospheric pollutants).An additional improvement with respect to the results obtained with Model 1 is possible by using forecasted values of hourly PM 2.5 and PM 10 as input.As an example, in order to forecast PM 2.5 at 9 PM, we use the forecasted value of PM 2.5 at 8 PM as input.Hourly PM 10 is forecasted by training a model that has the same inputs as the PM 2.5 Model 2.
In a multilayer perceptron (MLP) the algorithm has a structure of nodes arranged in layers.The first layer is the input layer, in which nodes are the predictor variables.Every input unit is connected to all the nodes in a next layer, called hidden layer.Nodes in this hidden layer are connected to a next hidden layer or the output layer.Every node in the hidden or output layer generates a numerical . Here w ij is the connection weight between node j and node i and the x j are the inputs to unit i.The signal generated by unit i is sent to every node in the following layer or is registered as an output if the output layer is reached.Connection weights are calculated using some optimization algorithm in order to reproduce a sample of cases of the problem to solve.For this task we have chosen the Backpropagation algorithm (Rumelhart et al., 1986).It has been shown that MLPs are universal function approximators (Hornik et al., 1989).The purpose of this study is to develop hourly PM 2.5 forecasting models for Cerrillos and El Bosque stations from 1 to 21 hours in advance during the cold season, using year 2016 as a sample test.Parameters of the models were adjusted during a training stage based on 2014 and 2015 data.Architecture was kept fixed for every time delay, but different weights were generated depending on steps ahead.Four statistical indicators were compared in order to validate the models: Pearson correlation:   Normalized percent error: Percent error: Index of agreement: (5) Perez and Menares, Aerosol and Air Quality Research, 18: 2666-2679, 20182673 Here, triangular bracket means average over the sample, y ta is actual value and y tp is the forecasted value.Pearson correlation is an estimation of the comparison between the combined dispersion and the single dispersion of observed and forecasted values.Since with this parameter only dispersion is quantified, additional criteria are necessary for model validation.Normalized percent error (Eq.( 3)) is a convenient indicator that avoids the inflation of errors for small values in the case of plain percent error (Eq.( 4)).Index of agreement is a non-dimensional and bounded measure with value 1 indicating optimum fitting (Willmott, 1981).The denominator in this expression is sometimes called the potential error.The numerator is the mean square error.This implies that IA is more sensitive to peak values rather than low-range values.
After establishing that with the initial set of input variables and finding that neural network Model 1 is more accurate than a Linear Model l, we have analyzed the possibility of improving the results by including new input variables: NO 2 concentrations, wind direction and forecasted values of PM 2.5 and PM 10 .The new neural network is called Model 2, and we compare with Neural Model 1 and with a new linear model (Linear Model 2) that has the same input variables.Figs. 6 and 7 show the evaluation of Pearson correlation and NPE for Cerrillos station.We observe a significant improvement of accuracy of forecasting with Neural Model 2 and we have verified that every new input variable contributes to this improvement.Figs. 8 and 9 show the results of the different forecasting models developed for El Bosque station.More accuracy of Neural Model 2 is more evident for 17-21 hours in advance forecasting.In Figs. 10 and 11 we display the calculation of the index of agreement for the different implemented models.The superiority of Neural Model 2 over Neural Model 1 and Linear Model 2 is confirmed by these plots.For the case of Cerrillos, the IA of Model 2 is over 0.85 from 1 to 11 hours in advance, which can be seen as an indication of a good performance during episodes, which occur mostly at night.For station El Bosque, it is remarkable that IA of model 2 stays over 0.8 most of the time.This is consistent with the result that the percent error is much below 30% up to 18 hours in advance.
Usually the month of June is the time of the year when the highest concentrations of PM 2.5 are observed in the city.As a sample of quality of forecasting with Neural Model 2 for days during this month in Cerrillos, we can see in Fig. 12 the comparison between predicted and observed hourly values at 11 PM (4 hours in advance forecasting).This is around the time of the day when highest concentrations are registered.Accuracy for up to 21 hours in advance is still reasonable.Qualitatively similar behavior is observed in El Bosque station.
We must keep in mind that in most countries, air quality is defined in terms of a 24-hour average of concentrations of the pollutant of interest.In particular, in Chile, air quality for a given day is established by the maximum of       the 24-h moving average of PM 2.5 concentrations.When more than one monitoring station is present in a city, air quality is assumed from the readings of the station with highest values.Although the purpose of this study was the forecasting of hourly values in order to provide useful information that may allow more efficient restrictions to emissions when necessary, we can provide an estimation of air quality by averaging the forecasted values.Fig. 13 shows the comparison between measured 24-h average maxima and calculated values from forecasted concentrations for Cerrillos and El Bosque stations.We observe that these maxima are very well reproduced.The exceedances from the limit of 80 µg m -3 are also well captured in both stations.

DISCUSSION
Our study shows that hourly PM 2.5 concentrations at a given zone in Santiago, Chile, can be forecasted with an acceptable accuracy (in terms of a potential operational model) several steps in advance by using a neural network model.Previous PM 2.5 and PM 10 concentrations and a set of meteorological variables are used as input.Wind direction, not always considered in this type of study, has helped to improve the quality of our forecasting.Average concentrations of NO 2 , taking into account the possibility of secondary particle formation, are also included as predictors.Our predictions are more accurate than those of Perez and Gramsch (2016), which may be attributed to the inclusion of additional input variables and more efficient training.The accuracy of the forecasting may still be improved by taking into account input variables that have not been considered until now.These could be data from other related pollutants (CO, SO 2 ) or additional meteorological information.Given that the hourly PM 2.5 reaches its maximum around 11 PM during high concentration episodes and that our results have a reasonable accuracy for this time period, we are able to provide a tool that may be useful for air quality management in the city.Restrictions about fuels for heating, for example, may be applied to the coming night.The described model possesses an accuracy similar to that of deterministic models (Borrego et al., 2011;Saide et al., 2016) but has the advantage of requiring significantly fewer computer resources.After adaptations for convenience, our findings may be extrapolated to address the situation of cities with a similar size and/or comparable topographical and meteorological conditions (e.g., Livingstone, 2009;Chen, 2014).Including past concentrations of the pollutants, temperatures, wind speed and relative humidity as inputs seems to be necessary for forecasting in most cases elsewhere.The inclusion of wind direction and NO 2 concentrations may depend on the local conditions of the city or region of interest.

Fig. 1 .
Fig. 1.Map of Santiago and location of PM 2.5 monitoring stations (circles).Cerrillos and El Bosque are located on the south-west zone of the city and are among the stations that register the highest concentrations.

Fig. 2 .
Fig. 2. Average hourly PM 2.5 concentrations in Cerrillos and El Bosque stations over the period April-August 2014, 2015 and 2016 for different hours of the day.
variables used for Model 1 are: • Hourly PM 2.5 concentration at 6 PM in Cerrillos station • Hourly PM 2.5 concentration at 7 PM in Cerrillos station • Hourly PM 2.5 concentration at 6 PM in El Bosque station • Hourly PM 2.5 concentration at 7 PM in El Bosque station • Hourly PM 10 concentration at 6 PM in Cerrillos station • Hourly PM 10 concentration at 7 PM in Cerrillos station • Hourly PM 10 concentration at 6 PM in El Bosque station • Hourly PM 10 concentration at 7 PM in El Bosque station • Wind speed at 7 PM • Relative humidity at 7 PM • Thermal amplitude of present day • Forecasted thermal amplitude for the following day • Forecasted ventilation factor in the area during the following day shows the general scheme for a multilayer neural network.Input variables for Model 2 are: • Hourly PM 2.5 concentration at 7 PM in Cerrillos station • Hourly PM 10 concentration at 7 PM in Cerrillos station • Hourly PM 2.5 concentration at 7 PM in El Bosque station • Hourly PM 10 concentration at 7 PM in El Bosque station • Forecasted wind speed at the previous hour to the intended forecasting • Forecasted relative humidity at the previous hour to the intended forecasting • Thermal amplitude of present day • Forecasted thermal amplitude for the following day.• Average afternoon NO 2 concentration of present day • Dominant wind direction during previous night • Dominant wind direction expected for following night • Forecasted PM 2.5 concentrations at Cerrillos (or El Bosque) one hour previous to the hour of the intended forecasting • Forecasted PM 10 concentration at Cerrillos (or El Bosque) one hour previous to the hour of the intended forecasting

Fig. 4 .
Fig. 4. Average hourly NO 2 concentrations in Cerrillos station over the period April-August 2014, 2015 and 2016 for different hours of the day.Two curves, one for days in which the maximum of 24-h moving average of PM 2.5 concentrations exceeds 80 µg m -3 .For the other curve, the maximum is less than 80 µg m -3 .

Fig. 5 .
Fig. 5. Wind patterns in Cerrillos station.Angles are measured clockwise with 0° at North direction.Episode days are those in which the maximum of 24-h moving average of PM 2.5 concentrations exceeds 80 µg m -3 .

Fig. 6 .
Fig. 6.Pearson Correlation for observed and forecasted hourly values of 2016 PM 2.5 concentrations in Cerrillos station.Comparison between Neural Model 1, Neural Model 2 and Linear Model 2.

Fig. 7 .
Fig. 7. Comparison of forecasting errors obtained with Neural Model 1, Neural Model 2 and Linear Model 2. Data is 2016 hourly values of PM 2.5 concentrations in Cerrillos station.

Fig. 8 .
Fig. 8. Pearson Correlation for observed and forecasted hourly values of 2016 PM 2.5 concentrations in El Bosque station.Comparison between Neural Model 1, Neural Model 2 and Linear Model 2.

Fig. 9 .
Fig. 9. Comparison of forecasting errors obtained with Neural Model 1, Neural Model 2 and Linear Model 2. Data is 2016 hourly values of PM 2.5 concentrations in El Bosque station.

Fig. 10 .
Fig. 10.Index of agreement for Neural Model 1, Neural Model 2 and Linear Model 2. Data is 2016 hourly values of PM 2.5 concentrations in Cerrillos station.

Fig. 11 .
Fig. 11.Index of agreement for Neural Model 1, Neural Model 2 and Linear Model 2. Data is 2016 hourly values of PM 2.5 concentrations in El Bosque station.

Fig. 12 .
Fig. 12. Observed and forecasted hourly PM 2.5 values at 11 PM (4 hours in advance from the time of data collection) during June 2016 in Cerrillos station.2

Fig. 13 .
Fig. 13.Daily maxima of 24-h moving average of PM 2.5 concentrations between April 1 and August 31, 2016, in Cerrillos and El Bosque stations.Forecasted values calculated from forecasted hourly concentrations.

Table 1 .
Average PM 2.5 concentrations and days with exceedances of the 24-h average limit of 80 µg m -3 for the period between April and August.Years 2014, 2015 and 2016 in Cerrillos and El Bosque stations.