parameterization schemes during an extremely high PM2.5 pollution episode in Beijing.

The authors try to refine the ground meteorological stations surrounding the BeijingTianjin-Hebei region to achieve an improved forecast for particulate matter. This topic is interesting and has practical implications since right now more and more stations are constructed but few studies have studied how they can help improve numerical forecasts in reality. The refining approach introduced in this paper by considering the sensitive areas is reasonable and logical. Overall, the manuscript is well written and clearly structured. However, there are still several issues that should be addressed before acceptance.

the minimum value of nonlinear function (cost function) with an initial constraint condition, and the gradient of cost function with respect to the initial perturbation represents the descending direction of searching for the minimum of the cost function. Therefore, in this study, we have to rewrite the cost function Eq. (3)  can be obtained. Then, with the adjoint model of the WRF, the gradient of the cost function with respect to the initial perturbation, ( ( ) ) is calculated. Theoretically, the gradient presents the fastest descending direction of the cost function. However, in realistic numerical experiments, the gradient presents the fast-descending direction but not necessarily the fastest, so we need more iterations. After iteratively forward and backward integrations of the WRF model governed by the SPG2 algorithm, the initial perturbation is optimized and updated until the convergence condition is satisfied, where the convergence condition is ‖ ( ( ) − ( ( ) )) − ( ) ‖ 2 ≤ 1 and 1 is an extremely small positive number, ( ( ) ) projects the initial perturbation to the constraint condition. Finally, the CNOP ( ) can be obtained. The flow chart of the CNOP calculation is shown in Figure 2. For further details of the SPG2 algorithm, the readers can be referred to Birgin et al. (2001). To make it clearer, we add a flow chart of the CNOP calculation in the revised manuscript.

In section 5. The authors select two forecasts which possess large forecast errors in the control run as examples to show that the cost-effective stations provide observations of equivalent efficiency of the whole constructed stations. However, to better demonstrate the effectiveness of the cost-effective observations, I recommend the authors to have a look at the improvements when the costeffective observations are removed from the whole station observations. If the remained observations (after the removal of the cost-effective observations)
contribute to a slight improvement of the PM2.5 forecasts but with a larger number, then it will be more convincing that the cost-effective observations are necessary for the PM2.5 forecasts in BTH. Response: We thank the reviewer's suggestions. The CNOP-type error represents the initial error that results in the largest forecast error in the verification area at the verification time. The CNOP-type error considers the interaction among the errors on spatial grid points and in this situation, the errors on the grid points with large amplitude of the CNOP-type error contribute much more to the final prediction error. When we sort the spatial grid points with a decreasing order according to the amplitude of the error and choose the first 3% grid points as the essential grid points, the interactions between these grid points are remained, so that it is assumed that assimilating the observations on these grid points may contribute more to the improvements of forecast skills. Based on a series of OSSEs, it is verified that assimilating the essential or costeffective observations can indeed improve greatly the PM2.5 forecasts. Specifically, when the 279 cost-effective station observations are assimilated for the AF in section 5, they achieve an overall 41.11% the improvement of PM2.5 forecasting skills, which explains 99% the improvement when assimilating constructed station observations; furthermore, when the cost-effective station observations are removed from all the constructed station observations, the number of the rest station observations is 77 smaller than that of the cost-effective station observations and the assimilation of these observations explains much less, which is 70% the improvement obtained by assimilating all constructed station observations. To be specially emphasized, for the DF presented in section 5, when the simulated observations from the 241 cost-effective station observations are assimilated, it results in an improvement of 47.55% of PM2.5 forecasting skills, even 1.7% higher than the improvement of assimilating all constructed station observations; however, when the cost-effective station observations are removed, assimilating the rest 240 station observations would only result in an improvement of 22.60% PM2.5 forecasting skill. Obviously, although the number of rest station observations is almost the same with the cost-effective station observations, the improvement of PM2.5 forecasting skills is less than half of the improvements obtained by assimilating the cost-effective station observations. Totally, assimilating the cost-effective station observation will lead to much higher PM2.5 forecasting skills than assimilating the rest observations, which emphasizes the important role of the cost-effective station observations in improving the PM2.5 forecast skills.
In the current version of manuscript, we have considered the interactions among the errors on spatial grid points so that we choose the grid points with large amplitude of the CNOP-type errors, which will have large impact on final prediction result, to construct the cost-effective observation network. So the cost-effective observation network is constructed under the consideration of the interactions among the errors on spatial grid points. Moreover, it is verified numerically that assimilating these costeffective station observations can achieve the PM2.5 forecasting skills comparable to that obtained by assimilating all ground station observations. We think it is enough to demonstrate the effectiveness of cost-effective station observations. However, if we assimilate the rest station observations, since the interactions among errors on spatial grid points are considered, the impact of the cost-effective station observations could also be reflected in the improvement of forecasting skill obtained by the assimilation of rest station observations. That's why assimilating the rest station observations could explain roughly 70% the improvement obtained by assimilating all constructed station observations in the AF shown in section 5. In this situation, it is hard to distinguish the role of cost-effective station observation. Therefore, even though we have implemented the experiments suggested by the reviewer, to avoid confusing the readers, we think it is better to keep the original structure and do not include the additional experimental results in the revised manuscript.

Response:
We thank the reviewer's suggestions. For all the AFs and DFs in the study, we have compared their meteorological conditions before and after the assimilations of the cost-effective station observations and all the constructed station observations, respectively. We find that for the AFs, assimilating the cost-effective station observations will adjust the atmospheric stability; and for the DFs, assimilating the costeffective observations will correct both the dynamical and thermodynamical meteorological conditions, as we discussed on Lines 576-585 in the manuscript. Specially, we select two forecasts as examples to show the details. The other forecasts show similarities with the two example forecasts that assimilating the cost-effective station observations and all the constructed station observations correct the meteorological conditions in a similar way, which causes a comparative skill of PM2.5 forecasts. To make the interpretations clear and not superfluous, we think the interpretations in the present manuscript are acceptable; if more examples are included, it is much difficult to make the content logical.
Minor comments:

Line 104. For the application of CNOP in field campaigns, Feng et al., (2022) demonstrated its validity on identifying sensitive areas for typhoon forecasting.
Feng, J., Qin, X., Wu, C., and coauthors. Improving typhoon predictions by assimilating the retrieval of atmospheric temperature profiles from the FengYun-4A's Geostationary Interferometric Infrared Sounder (GIIRS). Atmospheric Research,280(15), 106391. Response: We thank the reviewer to providing the reference. We have read the paper and cited it in the manuscript. Please see Lines 105-106.

Response:
We have modified the "target observation" to "targeted observation". Please see Line 305. We also checked its usages throughout the paper.

Line 290. When the "cost-effective" first appeared in the manuscript, I did not
quite understand what it means. More explanations should be added here. Response: Sorry for confusing the reviewer. The "cost-effective" means assimilating the observations obtained from fewer meteorological stations can lead to higher PM2.5 forecasting skills. This kind of station network can be taken as cost-effective stations because it provides sensitive observations to the PM2.5 forecasts in the economic fashion. The explanations have been added in the revised manuscript. Please see Lines 308-311.

Response:
The CNOP-type initial errors which include wind, temperature and water vapor mixing ratio components at the ground level are calculated for each of the 48 PM2.5 forecasts in the "truth run" with the application of WRF and its adjoint model by using the SPG2 solver (see section 2). We have rephrased the sentence in the revised manuscript. Please see Lines 342-344.

Response:
We thank the reviewer's suggestion. As we showed on Lines 295-315 in the manuscript, to identify the sensitive area of the ground meteorological field in each forecast, we adopt the idea of Lorenz (1965) and take the better simulation initialized by ERA5 as "truth run" and the simulation initialized by GFS forecast data as "control run", where these two simulations have the same emission inventory and use the same model; so the difference between them reflect the sensitivities of forecast uncertainties of PM2.5 concentrations on the accuracy of initial meteorological field. When we compute the CNOP-type initial perturbation superimposed on the better simulation initialized by ERA5, it can be regarded as an approximation to the most sensitive initial error and the sensitive area identified by such CNOP-type error can be regarded as an approximation to the real sensitive area. If the approximate sensitive area is valid, assimilating the additional observations in the sensitive area of control forecast will make the updated forecasts approach to the truth run.
Although the present study is associated with hindcasts of PM2.5, it is still difficult to obtain the meteorological observations from the Monitor Center; therefore, we can only assimilate the simulated observations (i.e. the ERA5 data) to the control run to show the effectiveness of the cost-effective observation network. The effectiveness is verified by examining whether a forecast (i.e. the simulation initialized by GFS) after assimilating the observations from the cost-effective station network will be much closer to the good simulation (i.e. the simulation initialized by ERA5). If the costeffective station network is useful along this thinking, it can be inferred that assimilating real observations from the cost-effective stations to the meteorological initial field in the control forecast would improve the meteorological field forecasting and then the PM2.5 forecasting greatly against the observations. The relevant discussions have been added in the revised manuscript. Please see Lines 732-741.

The boundary layer height is also an important meteorological variable for PM2.5 forecasts. Why do not the authors consider the perturbation of this variable in the study? Response:
The CNOP in the present study only considers the sensitivity from initial uncertainties. We agree with the reviewer that the boundary layer height is an important meteorological variable for PM2.5 forecasts. Since the boundary layer simulation is more influenced by the parameterization in the WRF model (Chen et al., 2017;Mohan and Gupta, 2018), to study the role of boundary layer uncertainties in yielding the PM2.5 forecast uncertainties, an extension of the CNOP method, CNOP-parametric perturbation (CNOP-P; Mu et al., 2010) or nonlinear forcing singular vector (NFSV, Duan and Zhou, 2013), can be used to identify the sensitivities of boundary layer uncertainties. The related discussions have been added in the revised manuscript. Please see Lines 755-764. Mu, M., Duan, W. S., Wang, Q., and Zhang, R., 2010. An extension of conditional nonlinear optimal perturbation approach and its applications, Nonlin. Processes Geophys., 17(2), 211-220.

Response to Reviewer #2:
The manuscript entitled "An approach to refining the ground meteorological observation stations for improving PM2.5 forecasts in Beijing-Tianjin-Hebei region" introduced an approach to refine the ground stations by identifying the sensitive areas for targeted observations. The study is highly related to the studies of predictability, target observation and data assimilation. And it provides a scientific guidance on optimizing the ground stations. I believe the approach is not only useful for air quality forecasts, but can also be used to the forecasts of extreme weather events. Nevertheless, there is a gap between publication and the manuscript in current version. I hope the following comments will help authors improve the manuscript.

Response:
We thank your appreciations.

Specific comments:
1. Line 42. There are a great many publications addressing the meteorological conditions on PM2.5 variations, but the authors only cite one, which is not enough. More references are needed here. Response: We thank the reviewer's suggestions. We have added the references (Lou et al., 2019;Chen et al., 2020) in the revised manuscript. Please see Lines 43.

Line 68. "assimilating more observations may not necessarily lead to much higher forecast benefits." References are needed here.
Response: We have added the references here (Li et al., 2010;Liu et al., 2021). Please see Line 69.
3. Line 75. How are the worse forecast skills possible when the sensitivities are low? Please provide a detailed explanation here. Response: We have added a detailed explanation in the revised manuscript. Please see Lines 77-78. Theoretically, if the observations in the area where the forecast is not sensitive to the initial errors are assimilated, the forecast skills might be slightly improved or neutral. However, in realistic forecasts, the imperfect assimilation procedure or the unresolved scales and processes in the model may induce additional errors and lead to the worse forecasts when the observations in the area where the forecast is not sensitive to the initial errors are assimilated (Janjic et al., 2018). For example, in Yu et al. (2012), removing the initial error in the area that is not the most sensitive area could worsen the prediction results of ENSO. That emphasized the importance of identifying the most sensitive area and suggests that additional observations should be assimilated more carefully in this sense.

Line 195-202. The descriptions are insufficient and confuse me. Please add more details and make it clear.
Response: Sorry for confusing the reviewer. We have rewritten the paragraph and make it clearer. Please see Lines 205-219.
"The spectral projected gradient 2 (SPG2) method is used to solve the optimization problem in Eq. (3). It is noted that the SPG2 algorithm is generally designed to solve the minimum value of nonlinear function (cost function) with an initial constraint condition, and the gradient of cost function with respect to the initial perturbation represents the descending direction of searching for the minimum of the cost function. Therefore, in this study, we have to rewrite the cost function Eq. (3)  can be obtained. Then, with the adjoint model of the WRF, the gradient of the cost function with respect to the initial perturbation, ( ( ) ) is calculated. Theoretically, the gradient presents the fastest descending direction of the cost function. However, in realistic numerical experiments, the gradient presents the fast-descending direction but not necessarily the fastest, so we need more iterations. After iteratively forward and backward integrations of the WRF model governed by the SPG2 algorithm, the initial perturbation is optimized and updated until the convergence condition is satisfied, where the convergence condition is ‖ ( ( ) − ( ( ) )) − ( ) ‖ 2 ≤ 1 and 1 is an extremely small positive number, ( ( ) ) projects the initial perturbation to the constraint condition. Finally, the CNOP ( ) can be obtained. The flow chart of the CNOP calculation is shown in Figure 2. For further details of the SPG2 algorithm, the readers can be referred to Birgin et al. (2001)."

Line 313. Please clarify that the real "meteorological" observations are not in public archive, because in section 3.1, the authors have compared the simulations with the observed PM2.5 concentrations.
Response: We have clarified that the real meteorological observations are not available in public archive in the revised manuscript. Please see Line 333. The sentence have been corrected into "Since the real meteorological observations are not in public archive, the "additional observations" are correspondingly taken from the initial field of the truth run (i.e. the ERA5 data) and called as "simulated observations" according to the OSSEs.".

Line 322. Is the CNOP-type initial error that what has been described in section 2.3? It is suggested to add a detailed description on what variables the CNOPtype errors have contained here.
Response: Yes, the CNOP-type initial error is what has been described in section 2.3. We have added a detailed description of CNOP-type error here. The sentence has been corrected into "the CNOP-type initial errors which includes wind, temperature and water vapor mixing ratio components at the ground level are calculated for each of the 48 PM2.5 forecasts with the application of WRF and its adjoint model by using the SPG2 solver (see section 2). Please see Lines 342-344.

Line 349. "the area with larger values of TME can be regarded as the sensitive areas". It is ambiguous. Is there a threshold for the definition of sensitive area or just determined subjectively?
Response: The TME is applied to measure the comprehensive sensitivity of PM2.5 forecast uncertainties on initial meteorological perturbations. When we identify the essential observational network, we take the 3% as the threshold to determine the sensitive area. Then a total of 424 sensitive grid points is obtained. We select the 3% as the threshold here because the number 424 of sensitive grid points is close to the number of 481 of the meteorological stations within and surrounding the BTH region. Please see Lines 411-414.

Line 453. "the essential stations can indeed provide additional observations that help increase the skill of the PM2.5 forecasts, in comparison to other constructed stations but not in the sensitive grids". The authors did not do any comparison experiments to show the improvements are higher than assimilating the station observations which are not in the sensitive grids. How can they get such conclusions?
Response: We have corrected the sentence into "It is clear that the essential stations can indeed provide additional observations that help increase the skill of the PM2.5 forecast in the BTH much significantly". Please see Line 473. Figure 7 (a1, a2), assimilating the observations will lead to worse forecasts since the AEv and AEM are negative. It is hard to understand. Why will assimilating the observations will lead to worse forecasts? Response: The negative PM2.5 forecast skills occurred at the AF initialized at 20:00 on Nov 18 th 2016. For the AF initialized at 20:00 at Nov 18 th 2016, the PM2.5 concentration in the truth run increases from 139.5 μg m −3 to 151.5 μg m −3 averaged over the BTH region; while the control run forecasts the PM2.5 concentration of 159.6 μg m −3 averaged over the BTH region, 20.1 μg m −3 higher than the PM2.5 concentration in the truth run. When all the constructed station observations are assimilated, the PM2.5 concentration averaged over the BTH region is 153.11 μg m −3 at the forecast time, much closer to the truth run. However, the improvements in the BTH region are uneven (Figure 2), and the number of grids showing negative improvements overweigh those showing positive improvements, which results in a negative AEv and AEM. Figure 2 The improvements of PM2.5 forecast skills when all the constructed station observations are assimilated.

Line 703. "It is clear that assimilating the fewer observations can lead to higher PM2.5 forecast skills". It is inaccurate. It is suggested to rephrase it more carefully.
Response: We have corrected the sentence into "It is clear that assimilating the fewer sensitive observations can lead to higher PM2.5 forecast skill". Please see Line 723.

It is suggested to mention the limitation of the study in section6 that the results are based on OSSEs. If the real observations are available, how the refined station observations help improve the air quality forecasts deserve deeper studies.
Response: We thank the reviewer's suggestion. As we showed on Lines 295-315 in the manuscript, to identify the sensitive area of the ground meteorological field in each forecast, we adopt the idea of Lorenz (1965) and take the better simulation initialized by ERA5 as "truth run" and the simulation initialized by GFS forecast data as "control run", where these two simulations have the same emission inventory and use the same model; so the difference between them reflect the sensitivities of forecast uncertainties of PM2.5 concentrations on the accuracy of initial meteorological field. When we compute the CNOP-type initial perturbation superimposed on the better simulation initialized by ERA5, it can be regarded as an approximation to the most sensitive initial error and the sensitive area identified by such CNOP-type error can be regarded as an approximation to the real sensitive area. If the approximate sensitive area is valid, assimilating the additional observations in the sensitive area of control forecast will make the updated forecasts approach to the truth run.
Although the present study is associated with hindcasts of PM2.5, it is still difficult to obtain the meteorological observations from the Monitor Center; therefore, we can only assimilate the simulated observations (i.e. the ERA5 data) to the control run to show the effectiveness of the cost-effective observation network. In our study, the effectiveness is verified by examining whether a forecast (i.e. the simulation initialized by GFS) after assimilating the observations from the cost-effective station network will be much closer to the good simulation (i.e. the simulation initialized by ERA5). If the cost-effective station network is useful along this thinking, it can be inferred that assimilating real observations from the cost-effective stations to the meteorological initial field in the control forecast would improve the meteorological field forecasting and then the PM2.5 forecasting greatly against the observations. The relevant discussions have been added in the revised manuscript. Please see Lines 732-741.
CNOP method, CNOP-parametric perturbation (CNOP-P; Mu et al., 2010) or nonlinear forcing singular vector (NFSV, Duan and Zhou, 2013), can be used to identify the sensitivities of boundary layer uncertainties. Future studies addressing the boundary layer uncertainties using the extensions of CNOP method and the design on its relevant observation network can be expected. The related discussions have been added in the revised manuscript. Please see Lines 755-764.

Response:
We thank the reviewer's suggestions. The CNOP-type error represents the initial error that results in the largest forecast error in the verification area at the verification time. The CNOP-type error considers the interaction among the errors on spatial grid points and in this situation, the errors on the grid points with large amplitude of the CNOP-type error contribute much more to the final prediction error. When we sort the spatial grid points with a decreasing order according to the amplitude of the error and choose the first 3% grid points as the essential grid points, the interactions between these grid points are remained, so that it is assumed that assimilating the observations on these grid points may contribute more to the improvements of forecast skills. Based on a series of OSSEs, it is verified that assimilating the essential or costeffective observations can indeed improve greatly the PM2.5 forecasts. Specifically, when the 279 cost-effective station observations are assimilated for the AF in section 5, they achieve an overall 41.11% the improvement of PM2.5 forecasting skills, which explains 99% the improvement when assimilating constructed station observations; furthermore, when the cost-effective station observations are removed from all the constructed station observations, the number of the rest station observations is 77 smaller than that of the cost-effective station observations and the assimilation of these observations explains much less, which is 70% the improvement obtained by assimilating all constructed station observations. To be specially emphasized, for the DF presented in section 5, when the simulated observations from the 241 cost-effective station observations are assimilated, it results in an improvement of 47.55% of PM2.5 forecasting skills, even 1.7% higher than the improvement of assimilating all constructed station observations; however, when the cost-effective station observations are removed, assimilating the rest 240 station observations would only result in an improvement of 22.60% PM2.5 forecasting skill. Obviously, although the number of rest station observations is almost the same with the cost-effective station observations, the improvement of PM2.5 forecasting skills is less than half of the improvements obtained by assimilating the cost-effective station observations.
Totally, assimilating the cost-effective station observation will lead to much higher PM2.5 forecasting skills than assimilating the rest observations, which emphasizes the important role of the cost-effective station observations in improving the PM2.5 forecast skills.
In the current version of manuscript, we have considered the interactions among the errors on spatial grid points so that we choose the grid points with large amplitude of the CNOP-type errors, which will have large impact on final prediction result, to construct the cost-effective observation network. So the cost-effective observation network is constructed under the consideration of the interactions among the errors on spatial grid points. Moreover, it is verified numerically that assimilating these costeffective station observations can achieve the PM2.5 forecasting skills comparable to that obtained by assimilating all ground station observations. We think it is enough to demonstrate the effectiveness of cost-effective station observations. However, if we assimilate the rest station observations, since the interactions among errors on spatial grid points are considered, the impact of the cost-effective station observations could also be reflected in the improvement of forecasting skill obtained by the assimilation of rest station observations. That's why assimilating the rest station observations could explain roughly 70% the improvement obtained by assimilating all constructed station observations in the AF shown in section 5. In this situation, it is hard to distinguish the role of cost-effective station observation. Therefore, even though we have implemented the experiments suggested by the reviewer, to avoid confusing the readers, we think it is better to keep the original structure and do not include the additional experimental results in the revised manuscript.

A series of OSSEs is designed to verify the effectiveness of refined stations, due
to the unavailable of real meteorological observations. It is suggested to add more discussions on how future work could use real observations. Response: We thank the reviewer's suggestion. As we showed on Lines 295-315 in the manuscript, to identify the sensitive area of the ground meteorological field in each forecast, we adopt the idea of Lorenz (1965) and take the better simulation initialized by ERA5 as "truth run" and the simulation initialized by GFS forecast data as "control run", where these two simulations have the same emission inventory and use the same model; so the difference between them reflect the sensitivities of forecast uncertainties of PM2.5 concentrations on the accuracy of initial meteorological field. When we compute the CNOP-type initial perturbation superimposed on the better simulation initialized by ERA5, it can be regarded as an approximation to the most sensitive initial error and the sensitive area identified by such CNOP-type error can be regarded as an approximation to the real sensitive area. If the approximate sensitive area is valid, assimilating the additional observations in the sensitive area of control forecast will make the updated forecasts approach to the truth run.
Although the present study is associated with hindcasts of PM2.5, it is still difficult to obtain the meteorological observations from the Monitor Center; therefore, we can only assimilate the simulated observations (i.e. the ERA5 data) to the control run to show the effectiveness of the cost-effective observation network. In our study, the effectiveness is verified by examining whether a forecast (i.e. the simulation initialized by GFS) after assimilating the observations from the cost-effective station network will be much closer to the good simulation (i.e. the simulation initialized by ERA5). If the cost-effective station network is useful along this thinking, it can be inferred that assimilating real observations from the cost-effective stations to the meteorological