Evaluation of Model Performance for Forecasting Fine Particle Concentrations in Korea

The performance of a modeling system consisting of WRF model v3.4.1 and CMAQ model v4.7.1 for forecasting fine particle concentrations were evaluated using measurement data at the surface. Twenty-four hour averages of PM2.5 and its major components at Bulgwang (located in the northwest of Seoul) during the period February 2012 through January 2013 were compared with predicted concentrations as well as hourly averages of inorganic ions measured at Yongin (located to the southeast of Seoul) in spring 2012. The mean fractional bias (MFB) of –0.37 for PM2.5 at Bulgwang fell just outside the goal of –0.3, the level of accuracy that the best model can be achieved. Negative values of MFB, especially in winter, along with the correlation coefficient of 0.61 between measured and predicted concentrations showed that the model performance at Bulgwang was closer to that for Europe than that for North America. However, underestimation of SO4 and overestimation of NO3 were similarly observed at Bulgwang as in the United States. Although diurnal variations in the measured values showed distinctive features at Yongin according to the classified patterns, most variations in the predicted values typically showed a peak early in the morning followed by an increase at night.


INTRODUCTION
At the beginning of 2013, high concentrations of PM 2.5 (particulate matter having an aerodynamic diameter ≤ 2.5 µm) with 24-h averages that exceeded 100 µg m -3 attracted much attention in Korea along with 1-h average PM 2.5 concentrations approaching 1000 µg m -3 in Beijing (Shimadera et al., 2014;Wang et al., 2014;Zhang et al., 2014b).Although interest is not as sharp as that in 2013, public attention in Korea to PM 10 and PM 2.5 is still great because of their health risks.In fact, health risks of coarse particles with aerodynamic diameters > 2.5 µm are unclear due to a significant portion of fugitive dust of crustal origin (United States Environmental Protection Agency, 2006).By contrast, while PM 2.5 levels in the 1990s decreased by more than 10 µg m -3 compared with those in the 1980s in many parts of the United States, similar positive associations between gains in life expectancy and reductions in PM 2.5 levels were observed for both time periods (Pope et al., 2009).
To meet the public demand requiring immediate information on the particulate matter, PM 10 forecasting started on a trial basis for the greater Seoul area in August 2013, and has been expanded nationwide since February 2014.In November 2014, the forecasting regions were refined by dividing the country from six to ten regions, and the forecasting frequency was increased from two to four times per day.In addition, the number of pollution levels was reduced from five to four consisting of good, moderate, bad, and serious by deleting 'slightly bad' between moderate and bad (KME and NIER, 2014).PM 2.5 and ozone were added to the target pollutants for forecasting in January and April in 2015, respectively.
Despite considerable efforts to improve the accuracy of the forecasting, reliable forecasting is a complicated task due to: (1) model uncertainties that are considered more pronounced in East Asia (Park and Kim, 2014); and (2) requisite technical skills in interpreting model results with reasonable uncertainties (Murphy, 1993;Pliske et al., 2004).The Korean government has sponsored the development of emission inventories for both Korea and Northeast Asia (NIER, 2013;Jang et al., 2014) and attempted to improve and/or optimize the modeling system for forecasting (KME, 2014;NIER, 2015).Recently, the National Institute of Environmental Research (NIER) coordinated a plan to develop a numerical air quality model tailored to the Korean environment (Cho et al., 2016).
In this study, the performance of the model for PM 2.5 was examined by comparing model results with measurement data from a particle-into-liquid sampler (PILS) at Yongin and filters at Bulgwang (see Fig. 1 for the locations).The purpose of this study was to evaluate the performance of the air quality forecasting model in Korea, which could provide a basis for searching remedies to reduce the differences between model results and measurement data.Although the target for forecasting is 24-h average PM 2.5 , the model performance was also investigated in terms of major components constituting PM 2.5 , and monthly and diurnal variations to find weaknesses in predicting PM 2.5 .

Modeling
PM 2.5 simulations were conducted using a threedimensional air quality forecasting system for a year from February 2012 to January 2013.It consisted of Weather Research and Forecast (WRF) model v3.4.1 (Skamarock and Klemp, 2008) and the Community Multiscale Air Quality (CMAQ) modeling system v4.7.1 (Byun and Schere, 2006).WRF model simulations were initialized with Global Forecasting System (GFS) data sets.The WRF model results were prepared for daily emission processing and air quality simulations using the Meteorology-Chemistry Interface Processor.The CMAQ configurations are listed in Table 1.
As for anthropogenic emissions, the Intercontinental Chemical Transport Experiment-Phase B (INTEX-B) inventory for the year 2006 (Zhang et al., 2009;Li et al., 2014) was used for Northeast Asia, while the Clean Air Policy Support System (CAPSS) inventory for the year 2007 was used for Korea (Kim et al., 2008;Lee et al., 2011).Raw emission data were processed to generate gridded, hourly emission fields using the Sparse Matrix Operator Kernel Emissions Processor (SMOKE) v2.1 (http://www.smoke-model.org).Biogenic emissions were obtained using the Model of Emissions of Gases and Aerosols from Nature (MEGAN) version 2.04 (Guenther et al., 2006).Fig. 1 shows the modeling domain consisting of three grids with horizontal resolutions of 27, 9, and 3 km.There were 15 layers vertically on a sigma coordinate up to 50 kPa with the lowest layer thickness of about 32 m.Default profiles provided with CMAQ were used as the boundary conditions for the outermost grid, and the boundary conditions for the inner grids were updated by the model outputs from the outer grids.

Measurements
Yongin site (127.27°E,37.34°N, 167 m above sea level) was located on the rooftop of a five-story building located on a hill, about 35 km southeast of downtown Seoul.Concentrations of inorganic ions in PM 2.5 were measured using PILS (ADI 2081, Applikon Analytical), coupled with on-line ion chromatography (IC) (Advanced modules, Metrohm) every ~25 min.Because sampling times varied from day to day, the data were resampled at an interval of 30 min using a cubic spline method, and one-hour averages were calculated.Diurnal data and twenty-four hour averages were obtained when at least 75% of the possible data were available in a day.The ion balance of the data was checked using the relative ion difference, which is the Global Atmospheric Watch (GAW) criteria suggested by Allan (2004).The total number of days with valid diurnal data for inorganic ions was 94 (86%) out of a total of 109 days during the measurement period between the middle of February and early June in 2012.
Bulgwang site (126.93°E,37.61°N, 67 m above sea level) is an intensive measurement station operated by the Korean Ministry of Environment, located in the northwest of Seoul.PM 2.5 samples were collected on a Teflon filter (Zefluor, Pall) using a well impactor ninety-six (WINS) and a sequential sampler (PMS-103, APM) at a flow rate of 16.7 L min -1 for 24 hours.Concentrations of PM 2.5 and inorganic ions were determined using an automated filter weighing system  (MTL) equipped with a microbalance (UMX2, Mettler Toledo) and IC (ICS 2000, Dionex), respectively.PM 2.5 samples were also collected on a quartz filter (Tissuquartz 2500QAT-UP, Pall) to determine concentrations of organic and elemental carbons using an OCEC analyser (Sunset).Detailed information on sampling and analysis for Bulgwang can be found in Jeon et al. (2015).Concentrations of the components analyzed in this study as well as PM 2.5 were available on 236 days (65%) out of 365 days between February 2012 and January 2013.

Model Performance Metrics
The model performance was evaluated using mean fractional bias (MFB), correlation coefficient (R), and the slope and relative intercept of the best-fit line for the plot of predicted vs. measured values.MFB is defined by where p i and m i denote predicted and measured values, respectively, and N denotes the number of data (Boylan and Russell, 2006).MFB was selected among various metrics measuring biases and errors because (1) this metric is symmetric, which gives equal weight to overpredictions and underpredictions, (2) the performance goals and criteria depending on the concentration are available, and (3) positive and negative values of the differences between measured and predicted values are provided in comparison with mean fractional error which provides the absolute values.The performance goals are the level of accuracy that the best model can be achieved while the performance criteria are the level of accuracy that is acceptable for standard model applications.Boylan and Russell (2006) proposed the performance goals and criteria as follows: where C ̅ is (p ̅ + m ̅ )/2 in µg m -3 , and p ̅ and m ̅ are the means of predicted and measured values, respectively.The value of R, the slope and intercept of the best-fit line were selected because these metrics provide overall information on the relationship between predicted and measured values.The relative intercept which is the intercept divided by the mean of the predicted values was used rather than the absolute value of intercept, considering that the latter depends on the level of the predicted values.

RESULTS AND DISCUSSION
Table 2 shows the model performance for meteorological variables at Suwon and Seoul weather stations, which are two weather stations operated by the Korea Meteorological Administration and located in the finest grid (see Fig. 1).Predicted temperatures were in good agreement with the measured values.The model slightly underpredicted relative humidity at both Suwon and Seoul, but overpredicted wind speed, which is distinct particularly at the Seoul weather station.Although the discrepancies in wind speed between measured and predicted values varied, overprediction of wind speed was common in regional-scale modeling for Korea and worldwide (Koo et al., 2012;Solazzo et al., 2012 b Temperature and relative humidity were measured at 1.5 m above the ground, while anemometer height was 18.7 m at Suwon and 10 m at Seoul (KMA, 2012).Zhang et al., 2014a;Kim et al., 2017).Predicted higher winds could give rise to an underprediction of PM 2.5 (to be discussed later) due to enhanced dispersion, and emphasize the importance of transport over local contributions.

Ion Sum and PM 2.5
Diurnal variations are one of the most important information that can be obtained from highly time-resolved measurements.However, these variations have rarely been studied, particularly for particulate inorganic ions, because their concentrations have traditionally been measured using filter sampling, which typically provides 24-h averages (McMurry et al., 2004;Solomon et al., 2008).Lee et al. (2016) distinguished the five diurnal patterns from the PILS measurement data at Yongin using a hierarchical clustering method: L, low concentration pattern; Mam and Mpm, medium concentration patterns with primary peaks in the morning and afternoon, respectively; H+ and H-, high concentration patterns with increasing and decreasing trends, respectively, during the day (see Fig. 5 for the variations).
Table 3(a) shows the model performance for the sum of the concentrations of five ions (Cl -, NO 3 -, SO 4 2-, Na + , and NH 4 + ) (ion sum) according to the diurnal pattern.Although eight ions were analyzed using IC, the concentrations of five ions were summed because CMAQ does not provide the concentrations for three ions (K + , Mg 2+ , and Ca 2+ ).The mean predicted value over the entire period is 75% of the mean measured value.MFB of -0.25 falls within the goals of ≤ 0.3 and ≥ -0.3 (Fig. 2), and R of 0.64 can be considered as a moderate value.However, the slope is 0.45, well below 1.0, along with a high value of the relative intercept at 0.39.The ion sum increases from L to H+ and H-.The difference between measured and predicted values also increases from L to H+ and H-, and so does |MFB|.As a  result, MFB at H+, whose mean of the measured and predicted concentrations is highest, falls just outside the criteria in Fig. 2. The metrics in Table 3(a) demonstrate that the model performance generally degrades with increasing the concentration level.Table 3(b) shows the model performance for 24-h average PM 2.5 at Bulgwang.The mean predicted value over the entire period is 68% of the mean measured value, which is lower than 75% in Table 3(a).MFB, R, and the slope are also lower along with higher relative intercept, indicating that the overall performance in Table 3(b) is lower than that in Table 3(a), although the differences are not significant.The largest |MFB| and the lowest slope in winter reveal that the model performance is low at high concentration levels (Fig. 2).all the metrics in Table 3(b) indicate that the performance is highest in spring despite a high concentration level approaching that for winter.
The model performance for 24-h average PM 2.5 at Bulgwang shown in Table 3(b) is similar to that for Europe rather than North America.In North America PM 2.5 was overestimated in winter and underestimated in summer while in Europe it was underestimated throughout the year, especially in winter (Foley et al., 2010;Appel et al., 2012).In Table 3(b), PM 2.5 over the entire period is underestimated and pronounced in winter.The value of R of 0.61 over the entire period falls within the middle range of 0.55-0.75for most modeling studies in Europe but within the lower range of 0.6-0.8 for most modeling studies in North America (Solazzo et al., 2012).

Chemical Components
Table 4(a) shows the model performance for each component at Yongin where we measured only the concentrations of inorganic ions.As in Fig. 2, MFBs in Table 4(a) are compared with the performance goals and criteria in Fig. 3. MFBs for NO 3 -and NH 4 + fall within the goal, that for Na + on the line, and those for SO 4 2-and Cl - outside the criteria.The variation in R is similar; the values of R for NO 3 -, NH 4 + , and Na + are higher than those for SO 4 2and Cl -.In the case of the slope and relative intercept, higher slopes for NO 3 -and NH 4 + indicate a better performance while relative intercepts for Na + and Cl -are lower despite lower slopes.
Table 4(b) shows the performance for the components at Bulgwang.Compared with Table 4(a) for Yongin, |MFB| for each ion is lowered, except for NO 3 -.In fact, MFBs for all ions increase, but |MFB| for NO 3 -becomes larger because the MFB for NO 3 -is positive, while those for other ions are negative.With three more components (OC, EC, and other [OTH]), Fig. 3 shows that MFBs for NH 4 + , Na + and Cl -fall within the goal, and the remaining five components within the criteria (including OC on the line).Unlike MFBs, which show a better performance than those at Yongin (except for NO 3 -), the performance indicated by the other three metrics is lower, particularly for Na + and Cl -at lower concentrations.Park and Kim (2014) mentioned that underestimation of SO 4 2-and overestimation of NO 3 -were long-standing issues in PM modeling in East Asia.However, similar phenomena were also observed in the United States, although differences between predicted and measured values were smaller (Foley et al., 2010;Zhang et al., 2014a).In contrast to underestimation of OTH at Bulgwang in Table 4(b) and Fig. 3, overestimation of OTH was the most significant problem in determining the performance for PM 2.5 in the United States.Since OTH is the reminder excluding other components, various factors such as those related to fugitive dust and error in the PM measurements including aerosol water content and organic aerosol mass could contribute artificial inflation of OTH (Appel et al., 2008;Zhang et al., 2014a).However, these factors are difficult to be quantified, and thus overestimation of OTH was not alleviated in the recent CMAQ update (Foley et al., 2010).

Temporal Variations
Fig. 4 shows the variations in monthly averages at   Bulgwang from filter measurement and model prediction.The sum of the components is PM 2.5 .Since PM 2.5 concentration is underpredicted as shown in Table 3(b), predicted monthly averages are lower than measured ones, especially in January 2013.Variation in measured PM 2.5 is larger than that in predicted PM 2.5 but mostly due to a larger value on January.The coefficient of variation (ρ), defined by the standard deviation divided by the mean, for the measured PM 2.5 is 0.34 over the entire period and decreases to 0.29 excluding the value in January.On the other hand, ρ for the predicted PM 2.5 is 0.33 over the entire period and remains a similar value of 0.34 excluding the value in January.The values of ρ for NO 3 -, NH 4 + , and OTH are higher for the measured values (0.92, 0.46, and 0.49, respectively) compared with those for the predicted values (0.49, 0.34, and 0.42).This tendency is maintained after excluding the values in January.Considering a particularly high ρ for the measured NO 3 -, overprediction of NO 3 -in Table 4(b) and Fig. 3 was caused by a sizable amount of the predicted values between June and October when the measured values were small.Fig. 5 shows the comparison of diurnal variations in the ion sum between measured and predicted values.Concentrations measured at Yongin using PILS-IC were used because hourly averages for diurnal variations were available while only 24-h averages were available at Bulgwang from filter measurements.Fig. 5 demonstrates that the predicted values are higher early in the morning (except for H+) and increase again at night (except for H-) although the measured values have a distinctive variation for each pattern.This can be attributed to the fact that most variations in predicted values are characterized by a peak in the morning rush hour and a drop in the afternoon except for H+.The differences between measured and predicted values are generally larger in the afternoon particularly for the overall pattern based on mean values over the entire measurement period and Mpm.

SUMMARY AND CONCLUSIONS
The performance of a model for predicting fine particle concentrations was evaluated using PILS-sampled inorganic ion concentration data at Yongin in spring 2012 and filtersampled PM 2.5 and major component concentration data at Bulgwang during the period February 2012 through January 2013.WRF model v3.4.1 and CMAQ modeling system v4.7.1 were used for meteorological and air quality modeling, respectively.
The model performance for the ion sum at Yongin was generally degraded with increasing the concentration level.However, MFB of the mean predicted value at Yongin over the entire measurement period was -0.25, which fell within the goals, the level of accuracy that the best model can be achieved.The overall performance for PM 2.5 at Bulgwang was slightly lower than that for the ion sum at Yongin.Seasonal variation in MFB for PM 2.5 at Bulgwang was closer to that for Europe than that for North America.The value of R at 0.61 for a year fell within the middle range of the values obtained for Europe, but within the lower range of those for North America.
In the case of the components at Bulgwang, MFBs for NH 4 + , Na + and Cl -fell within the goal, and the other five components (NO 3 -, SO 4 2-, OC, EC, and OTH) within the criteria.A comparatively small underestimation of OTH was noteworthy, because overestimation of OTH was the most significant problem in the United States.By contrast, underestimation of SO 4 2-and overestimation of NO 3 -were similar both at Bulgwang and in the United States.
Comparison of monthly averages between measured and predicted values at Bulgwang showed that overestimation of NO 3 -was caused by a sizable amount of the predicted values when the measured values were small.The predicted values generally showed typical variations of peaks early in the morning followed by increases at night.These observations contrasted with the measured values at Yongin, where distinctive variations were observed for each pattern.

Fig. 1 .
Fig. 1.Modeling domain consisting of three grids with horizontal resolutions of 27, 9, and 3 km.Measurement sites at Yongin and Bulgwang are shown on the finest grid along with Seoul and Suwon weather stations.

Fig. 2 .
Fig. 2. Mean fractional biases (MFBs) for 24-h averages of ion sum at Yongin and PM 2.5 at Bulgwang.Solid lines denote upper and lower limits of the performance criteria, and dotted lines denote those of the performance goals.MFBs at Yongin are shown by diurnal pattern while those at Bulgwang are shown by season.MFB for overall mean at each site is also shown.

Fig. 3 .
Fig. 3. Mean fractional biases for major components at Yongin and Bulgwang.The lines have the same interpretaion as in Fig. 2.

Fig. 4 .
Fig. 4. Comparisons of monthly averages between measured and predicted values at Bulgwang during the period February 2012 through January 2013.The sum of the components is the PM 2.5 .

Fig. 5 .
Fig. 5. Comparison of diurnal variations in the ion sum between measured and predicted values at Yongin.The number in the parentheses indicates the number of occurrences.

Table 1 .
CMAQ modeling system configurations a .

Table 2 .
; Model performance for 1-hour averages of meteorological variables at Suwon and Seoul weather stations a,b .MFB, mean fractional bias; R, correlation coefficient; slope and intercept are those of the best-fit line for the plot of predicted versus measured values; relative intercept is the intercept divided by the mean of predicted values. a

Table 3 .
Model performance for 24-h averages of the ion sum a at Yongin and PM 2.5 at Bulgwang.

Table 4 .
Model performance for chemical components at Yongin and Bulgwang.