Neuro-Fuzzy Approach to Forecast NO 2 Pollutants Addressed to Air Quality Dispersion Model over Delhi , India

Air pollution forecasting is the most important environmental issue in urban areas as it is useful to assess the effects of air pollutants on human health. It has been observed that the air pollution has been increased above the standard level in the urbanized area of Delhi and will be a major problem in the next few years. Therefore, the main objective of the present study is to develop the model that can forecast daily concentrations of air pollutions in one-day advance. In the present study, the artificial intelligence based Neuro-Fuzzy (NF) model has been proposed for air quality forecasting and the concentration of nitrogen dioxide (NO2) pollutant has been chosen for analysis. The available meteorological variables viz. temperature, pressure, relative humidity, wind speed and direction, visibility and the estimated concentrations through AERMOD. The application of introducing AERMOD aims to improve the forecasting ability of model on the basis the emissions from anthropogenic sources. The training and validation have been made with the eight and two year’s available seasonal daily data respectively. The evaluation of the model has been made by comparing its results with observed values as well as other statistical models like MLR and ANN, which reveals that the NF model is performing well and can be used for operational use.


INTRODUCTION
Air pollution is now recognized as one of the major environmental problems faced by most of the countries across the world.In addition to natural sources like windblown dust, smoke from bush fires and volcanic eruptions, many anthropogenic sources such as factories, power plants, and automobiles are responsible for the pollution.The emissions from major anthropogenic sources enhanced rapidly in the last few decades due to urbanization, increases in per capita consumptions of energy, and industrialization.
The problems of air pollution in developing countries deserve greater attention because the air in developing countries is more polluted and there is less research for reducing emissions as compared to developed countries.Hence, screening, assessment, and forecasting of ambient air pollutant in urban corridors have become an essential requirement as a part of an efficient local/episodic urban air quality management plan (Gokhale and Khare, 2005).
Forecasting of concentration of air pollutants in urban areas is a topic of great interest in air quality research due to awareness of its association with health effects.Longterm air pollution control is needed to prevent the situation from becoming worse in the long run.On the other hand, short term forecasting is required to take preventive and evasive action during an episode of airborne pollution.Delhi, the capital city of India, is counted in one of the most polluted cities of the world.The air quality of Delhi includes major air pollutants CO, NO 2 , PM 10 , PM 2.5 , SO 2 and O 3 .However, the present study has been carried out on NO 2 due to its rising concentrations and availability of data for the study period in Delhi.The problem has become so important, especially for a city like Delhi, that there is a need for timely information about changes in its level.Since, the forecasting is not easy in large urban areas because the air pollutants emitted from numerous concentrated sources, as well as area sources are dispersed over the entire geographical area.Any given location within the urban areas receives pollutants from the different sources in varying amounts, depending upon prevailing winds and other meteorological variables (Goyal et al., 2014).The air pollutants particles generally pass through the nose, throat and enter the lungs.Once inhaled, these particles can affect the heart, lungs and causes serious health effects.Since, particulate matter is a complex mixture of solid and liquid particles that vary in size and composition, and remain suspended in the air.It is made up of a number of components, including acids (such as nitrates and sulfates), organic chemicals, metals, and soil or dust particles.In urban areas, it is mainly a product of combustion from vehicle sources such as cars, buses, ships, trucks, and from stationary sources such as power plants, factories, etc.Over the past decade, many health effect studies have shown an association between exposure to air pollutants and increases in daily mortality with symptoms of certain illnesses (Schwartz, 1994).
Modeling of the real world process such as forecasting of air quality with traditional time series analysis has proven to be difficult due to their chaotic and nonlinear phenomenon.There are few limitations, which are associated with the regression-based models, which has been used in air quality forecasting.The major conceptual limitation of all regression models is that one can only ascertain relationships, but never be sure about underlying causal mechanism.Such models show optimal results, when relationships between the independent variables (meteorology and estimated concentrations through emissions) and the dependent variable (pollutant concentration) are almost linear.Regression models tend to predict the mean better than the extreme values (i.e., the highest pollutant concentrations) about the data.It is also likely to under predict the high concentrations and over predict the low concentrations.To overcome few of these limitations the ANN model, which is based on no assumption prior to forecasting, has been adopted and described in many studies (Kumar and Goyal, 2013;Mishra and Goyal, 2015a).ANN is being universal function approximations, and being inherently nonlinear, are good at detecting nonlinearities and results in better forecasts than regression based models.Still, there are some limitations in it due to which the forecasting accuracy is not as desired and they are open to criticism (Kumar and Goyal, 2013).The design of ANN architecture is difficult to determine exactly as many alternatives are involved in finding the optimum number of hidden layers and neurons.
Despite of many methods usually used for forecasting of air pollutants, none of them is commonly accepted and does not give the satisfactory results.Since, most of the results come from physical methods, which may not be reliable because of problems with obtaining credible data of pollutants, especially those coming from the communal sources and traffic, it is hard to use them in operational modelling.For this reason, in practice artificial intelligence based NF model, the combination of neural network and fuzzy logic methods, are used in the present study as used for forecasting of haze episodes in Delhi (Mishra et al., 2015).The neural network approaches is due to the significant properties of handling non-linear data and selflearning capabilities (Hornik, 1991), which can forecast the concentrations, but it cannot be known the degree that an input influences the output (Pao, 1989).While, fuzzy logic is an effective rule-based modeling in soft computing that not only tolerates imprecise information and also makes a framework of approximate reasoning.The only disadvantage is the lack of self-learning capability.But, the combination of neural network and fuzzy logic can overcome the above disadvantages (Mishra et al., 2015).Bates and Granger (1969) are the first to introduce combining forecasts as an alternative of using one single forecast.The literature indicates that work on time series forecasting demonstrated that performance increases through combining forecasts (Makridakis et al., 1982).It is found that selecting among combinations, on the average, leads to significantly better performance than that of a selected individual forecast.The idea of combining forecasts is to use each model's unique features to capture different patterns or features in the dataset.Hence, by combining forecasts from different models, forecasting accuracy can often be improved over the individual forecast (Makridakis et al., 1982;Goyal et al., 2011).Thus, the both, the learning capabilities of a neural network and reasoning capabilities of fuzzy logic are combined in order to give enhanced prediction capabilities, as compared to using a single methodology alone.Therefore, the artificial intelligence based NF model has been used in the present study for forecasting of air pollutants along with MLR and ANN in Delhi, India.

Study Area
Delhi is situated in the northern part of India (Latitude 28°35'N, Longitude 71°12'E).The river Yamuna forms the eastern boundary of the city.It is situated between the Great Indian Desert (Thar Desert) of Rajasthan state to the west, the central hot plane to the south and the cooler hilly region to the north and east.Delhi has four different seasons.The winter season is dominated by cold, dry air and ground-based inversion with low wind conditions lasting from December to February, which occur very frequently and increase the concentration of pollutants (Anfossi et al., 1990).The summer (March, April, May) is governed by high temperature and high winds, while the monsoon season (June, July, August) is dominated by rain.The post-monsoon (September, October, November) has moderate temperature and wind conditions.December, January and February determines the winter months.
Fig. 1 shows the air quality and meteorological monitoring stations in Delhi study area.The air pollution monitoring stations include ITO (Income Tax Office), DCE (Delhi College of Engineering), Sirifort, Nizamuddin and Safdarjung Airport as a meteorological monitoring station.The study area of Delhi includes coal based as well as gas based thermal power plants and many industries.

Data Preparation
The chosen air quality monitoring station, ITO, is about 5 km approximately from the Safdarjung Airport (Fig. 1).Since the forecasting techniques are based on an analysis of current pollution level with current and predicted weather conditions over a specific region.The novelty of the present study is to develop of better the air quality-forecasting model for the inclusion of estimated emissions from different types of sources, e.g., vehicles, power plants, industries and household.The estimated emissions are included in the form of the predicted concentrations through AERMOD.This is the first study, which including emissions for forecasting  (Chang and Hanna, 2004).

Air Quality Concentration: AERMOD
AERMOD (AMS and EPA Regulatory MODEL), a steady-state plume model, is recommended by USEPA for estimating the impact of new or existing sources of pollution on ambient air quality level at source-receptor distances of less than 50 km.This model has been used to estimate the concentrations of NO 2 pollutants from the estimated emission rate.The more information about the model can be obtained from model formulation, USEPA, 2004.
AERMOD requires hourly surface and upper air meteorological observations for simulating the pollutant dispersion (USEPA 2004).The major purpose of AERMET is to calculate boundary layer parameters for use by AERMOD (Goyal et al., 2013).The meteorological preprocessor of AERMOD, known as AERMET calculates boundary layer parameters, viz.frictional velocity, Monin-Obukhov length, convective velocity scale, temperature scale, mixing height, surface heat flux by using local surface characteristics in the form of surface roughness and Bowen ratio in combination with standard meteorological observations (wind speed, wind direction, temperature and cloud cover).These parameters are then passed through an interface present in AERMOD to calculate vertical profiles of wind speed, lateral and vertical turbulent fluctuations, and the potential temperature gradient.The same methodology has been adopted from our previous study Goyal et al. (2013).

Neuro-Fuzzy Modelling: Combination of ANN and Fuzzy Logic
Neural network and fuzzy logics are a natural complementary tool in NF model.The neural networks are low-level computational structures that perform well when dealing with raw data while fuzzy logic deals with reasoning on a higher level, using linguistic information acquired from domain experts.Integrated Neuro-Fuzzy systems can combine the parallel computation and learning abilities of neurons as the human-like knowledge representation and explanation abilities of fuzzy logic.Thus, the Neuro-Fuzzy system is more powerful as the neural networks become more transparent and fuzzy logic becomes capable of learning (Jang et al., 1997).A fuzzy system is prepared through ifthen rules on the basis of membership functions, which is defined for input and output variables of the system.This fuzzy system is trained on neural network on the basis of the input data.The structure of a Neuro-Fuzzy system is similar to a multi-layer neural network.In general, a Neuro-Fuzzy system has: (i) input and output layers, (ii) three hidden layers that represent membership functions and fuzzy rules.The selection of membership function (type and number) depends on characteristics of input and output variables that can be decided by experts on the basis of experiment, observation and experience.Fig. 2 shows the architecture of Neuro-Fuzzy structure.The same methodology as in Mishra et al. (2015), has been adapted in the present study.Further, the same methodology as Mishra and Goyal, (2015a), has been adopted for the other statistical models like MLR and ANN analysis in each season.

RESULTS AND DISCUSSION
Artificial intelligence based NF model has been developed to forecast one-day in advance 24-hourly average concentration of air pollutants over Delhi.The training and validation of ANN and NF models are performed through MATLAB 8.1 (licensed IIT Delhi) and MLR is performed through Windows software SPSS (version 17).The input layer consists eight parameters containing meteorological variables (temperature, pressure, relative humidity, wind speed, wind direction index, visibility) and previous day's estimated concentrations from emissions i.e. the output of AERMOD.In NF model, once the input data have been loaded, Sugeno FIS has been generated showing input as well as output variables.The FIS has been trained by hybrid algorithm, i.e. back propagation method and fuzzy logic.The input with "4" categories of Gaussian membership functions have been used for the model development, which are categorized as "good", "moderate", "bad", and "hazardous".The daily forecasted values NO 2 has been evaluated in the following four seasons by comparing them with observed air quality values and also with other statistical models like MLR and ANN:

Winter
The daily data for winter months, i.e., December, January and February for the years 2001-2008 (total day 722) has been chosen for the training of models and validation has been made for the winter months of the year 2009-2010 (total days 149).Fig. 3 The values of the statistical measures have been calculated to access the model's ability in the training phase and are given in Table 1.For NF model, the correlation coefficients (R) between observed and predicted values have been found as 0.88, which is close to its ideal value.The value of index of agreement (IOA) is found as 0.95, which is very close to its ideal values 1.0.The values of root mean square error (RMSE), normalized mean square error (NMSE), fractional bias (FB) and a factor of two (FAC2) in training phase are found as respectively against corresponding ideal values.
The scatter plot for NF model between observed and forecasted values of NO 2 concentrations for the year 2009-2010 has been shown in Fig. 3 1).

Summer
The daily data for summer months, i.The values of the statistical measures have been calculated to access the model's forecasting ability in the training phase and are given in Table 2.For NF Model, R has been found as 0.97, which is close to its ideal value.The value of IOA is found as 0.98, which is very close to its ideal values 1.0.Again, the values of RMSE, NMSE, FB and FAC2 is training phase are found as 14.01, 0.023, 0.02 and 0.99 respectively against corresponding ideal values.
The maximum points of scatter plot between the observed and forecasted values of NO 2 concentrations for NF model over the year 2009-2010 are lying within the factor of two (Fig. 4(b)).The observed maximum value is found as 614 µg m -3 (1 May 2010) and minimum as 67 µg m -3 (3 April 2009).The model performance in validation phase is assessed by computing the values of statistical measure as given in Table 2 and observed in good agreement.Thus, the NF model is performing satisfactorily and better than other statistical models like MLR and ANN (Table 2).

CONCLUSIONS
The present study investigates the meteorological variables and emissions in the development of one-day advance air quality forecasting model.The NF model has been conducted to develop a model for NO 2 in an urban area with high traffic and industrial influences.Such NF models posse's self-learning, self-organizing and self-tuning capabilities and so improve the forecast quality.The developed model shown precise and very effective predictions compared with the observed values.The scatter plots between observed and model's predicted values show that the models are achieving the trend of the time series of NO 2 observed values, but the extreme values are underestimated in all four seasons.A statistical error analysis reflects that Neuro-Fuzzy model is performing better than that of MLR and ANN in Delhi.Clearly, the present study has indicated that Neuro-Fuzzy model provided a well suited method and gave promising results for modeling of highly non-linear air pollution problem at urban area like Delhi.However, the employed input meteorological variables are generally available for routine weather forecasting models.Thus, the developed model can be used for operational forecasts of air pollutants over urban polluted regions of the world.

Fig. 1 .
Fig. 1.Study area of Delhi with air pollution (circles) and meteorological monitoring (square) stations.
(a) represents the scatter plot between the observed and model's forecasted NO 2 values for the training periods for NF model.The values of observed NO 2 concentrations have maximum as 539 µg m -3 (14 December 2008) and the minimum was 20 µg m -3 (24 January 2001), whereas the mean for the winter months of the year 2001-2008 is observed as 95 µg m -3 , which is exceeding three times of the ambient air quality standards.The NO 2 concentrations forecasted by the model show the maximum as 539 µg m -3 (14 December 2008) and minimum as 15 µg m -3 (20 December 2008), whereas the mean is found also as 95 µg m -3 .However, the most of the forecasted values are matched with the observed values.
e., March, April and May for the years 2001-2008 (total days 736) has been chosen for the training and data for the year 2009-2010 (total data 184) has been chosen for the validation of models.The NF model-training phase is shown in Fig. 4(a) as scatter plots between the observed and model's forecasted NO 2 values.It implies that the training data set within the factor of two and the trends have been well captured by the model.The values of observed NO 2 concentrations have maximum as 428 µg m -3 (1 May 2007) and the minimum was 31 µg m -3 (29 May 2005), whereas the mean for the summer months of the year 2001-2008 is observed as 103 µg m -3 , which is exceeding approximately thrice with NAAQS.The NO 2 concentrations forecasted by the model show the maximum as 428 µg m -3 (1 May 2007) and minimum as 44 µg m -3 (7 April 2002), whereas the mean is found as 103 µg m -3 .However, the most of the forecasted values are matching with the observed values.

Fig. 4 .
Fig. 4. Scatter plot between observed and NF model's predicted values of NO 2 in summer (a) training period, (b) validation period.

Table 1 .
Statistical measure of MLR, ANN and NF model for NO 2 in winter season.

Table 2 .
Statistical measure of MLR, ANN and NF model for NO 2 in summer season.