Effect of Uncertainty on Source Contributions from the Positive Matrix Factorization Model for a Source Apportionment Study

Uncertainty estimation plays an important role in source apportionment models such as the positive matrix factorization (PMF) model. In this study, synthetic datasets were generated and analyzed using PMF with specified uncertainties at different levels to investigate the impact of uncertainty inputs on the results of PMF model, as well as the benefits and risks of emphasizing on certain species. The results showed that: (1) uncertainties for the PMF model should be estimated based on characteristics of the dataset being analyzed; (2) emphasizing on correct tracers will improved model performance; and (3) emphasizing on unsuitable tracers may lead to disruptive consequences that might not be captured by the Q metric. Tests were also performed on collected ambient PM2.5 samples and similar conclusions were drawn: emphasizing on correct tracers was shown to improve the separation of important source categories from mixed sources. When emphasizing on incorrect tracers, a counterfeit factor of Fe industrial source was extracted, which are inconsistent with field observations. Results from this study provide insights on how uncertainties should be estimated for the PMF model.


INTRODUCTION
The Positive Matrix Factorization (PMF) model is a receptor-based model recommended by the United States Environmental Protection Agency (US EPA) for the source apportionment of Particulate Matter (PM) in urban areas (Hopke et al., 2003;Zheng et al., 2007;Watson et al., 2008).PMF has been frequently applied throughout the world to quantify the contributions of individual sources to PM (Lee et al., 1999;Tian et al., 2013a;Huang et al., 2014;Alam et al., 2015), and its results have significant implications in policies related PM control and scientific investigations on PM and their impacts on public health (Zheng et al., 2005;Chen et al., 2012;Shen et al., 2012;Holzkamper et al., 2015;Lin et al., 2015;Panicker et al., 2015).
Compared with other receptor models, an important strength of PMF is the capability of accounting for uncertainties of individual input variables (i.e., concentration observations for input chemical species) which might affect the solutions.In the discipline of environmental science, uncertainty attempts to represent physical reality, and various types of information may be communicated to the model by specifying uncertainties accordingly (Paatero and Tapper, 1994;Cheng and Sandu, 2009).Hence, users of the PMF model are expected to provide carefully specified uncertainties to allow for proper estimation of the confidence levels associated with corresponding input variables.
Uncertainty is a key parameter for PMF, and practice has shown that uncertainty inputs impact substantial on PMF results (Lee et al., 1999;Lowenthal et al., 2010).Examples (which were from our following work) of such impacts are provided in Table S1 in the Supplementary Material, which shows that results vary significantly with the same ambient concentration dataset but different uncertainty inputs.Therefore, a proper understanding of how uncertainties affect model results is critical, not only for the appropriate use of PMF model, but also for subsequent policies implementations (Holzkamper et al., 2015;Panicker et al., 2015).
Despite the importance of uncertainties, studies that systematically explored how uncertainties impact PMF results are still limited.For certain datasets, analytical laboratories or reporting agencies provided estimations of uncertainties for all measurement value; but such data are lacking in most studies.Instead, uncertainties must be specified subjectively (Reff et al., 2007;US EPA, 2008, 2014), or, in certain situations, estimated using empirical equations provided for the user (which is described in detail in the Methods and Materials section).In either case, the choices of uncertainty values are likely to differ among investigators, and no consensus exists as to how uncertainties should be appropriately determined.Additionally, as reported by Paatero and Tapper (1994), certain species can be more important than others in the model; thus, the uncertainties of such important species might play a more important role with regard to the performance of the PMF.It could be assumed that the emphasis for uncertainties of important species may influence the outcome, positively or negatively.So we want to try and test the effect as a trial.
In this study, we attempted to explore this issue by investigating how different choices of uncertainties impact the performance of the PMF model.The following items were investigated: the impacts of diverse uncertainties on the results of datasets that had various errors (the property of the concentration matrix); and the merit and risk of emphasis for the uncertainties of important species.Based on results from the above mentioned scenarios, suggestions on how to appropriately determine uncertainties for the PMF model were proposed.These suggestions were then applied to ambient PM 2.5 dataset collected in a megacity in China to evaluate their practicalities.
Overall, the goals of this study are: (1) to provide examples that describe how to determine the empirical equations for uncertainty estimation using synthetic tests; (2) to explore the influence of uncertainties of important species on the results of PMF model; and (3) to analyze the ambient PM 2.5 dataset following the proposed suggestions and evaluate the practicability and efficacy of those suggestions.This work applied synthetic tests to quantitatively estimate the influence of input uncertainties.Although the input for modeling is a very complex issue, this study may provide helpful information to guide the estimation of uncertainties when applying PMF, which significantly increases the accuracy of source apportionment.The accuracy of models is the basis to enhance their ability for PM control and epidemiological analyses, particularly those evaluating acute health effects (Maier et al., 2013).

Principle of PMF and Uncertainty
Positive Matrix Factorization (PMF), developed by Paatero and Tapper (1994), is a powerful method for source apportionment (further details of PMF are provided in Supplementary Material, Paatero andTapper (1994), andPaatero (2007)).The model used in this work is PMF2.A brief description of the features of this model is provided in Supplementary Materials, and in related literatures (Brown et al., 2015;Hopke, 2016).
The principle of PMF is to minimize an object function Q, which is defined as (Krecl et al., 2015): where e ij are residuals; and σ ij , the focus of this work, is "uncertainty" in the j th species of the i th sample.The uncertainty term σ ij is used to adjust the weights of observations that include errors.Generally, users specify uncertainty values to indicate the expected agreement between the model and the data array.Uncertainty σ ij is computed based on three parameters (C1, C2, and C3) using the "Errormodel" (EM) code, as shown in Table S3.Typically, the value of C2 is set to zero (Paatero, 2004), and the equations for uncertainty can be reduced to σ ij = C1 + C3| x ij |.In this study x ij is the observed chemical species concentrations.The related uncertainty parameters are defined as follows: C1 is associated with laboratory uncertainty; and C3 is the model uncertainty coefficient that describes the expected residuals that are not caused by laboratory errors, such as the variation of source profiles with time (US EPA, 2009).In most EM (Table S3), C3 is multiplied with species concentration x ij because it is assumed to vary with the magnitude of the observed value.Further details on EM and related parameters are provided in the Supplementary Material.

Synthetic Datasets Development
To evaluate the impacts of uncertainties σ on the results of the PMF model, synthetic testing were performed (Varella et al., 2010;Li et al., 2015).First, synthetic datasets were developed.Errors that reflect the characteristics of corresponding datasets were also simulated and built into the developed datasets.Then, the synthetic datasets were analyzed using PMF in combination with different uncertainty levels.
The method for developing synthetic datasets is similar with a previous study (Zeng and Hopke, 1992).Two types of errors were considered: errors associated with the magnitude of the measured concentrations, denoted by e x ; and errors due to the fluctuations of source profiles, which are denoted by e f .The Monte Carlo method was used to simulate the datasets (Lee et al., 2007) The magnitude of the errors was controlled by parameters H f and H x : H f was set to 0.01 for primary sources (Zeng and Hopke (1992)), and to 0 for secondary sources (Lee et al., 2007).The values for H x were set to 0.025, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.5, 0.7, and 0.9, to simulate datasets with various error levels.Further details on H f and H x can be found in Supplementary Material and related studies (Javitz et al., 1988;Zeng and Hopke, 1992;Habre et al., 2011).Three ambient particle sources, light-duty gasoline vehicles (LDGV), soil dust (SDUST) and ammonium nitrate (AMNITR), were selected for this study.Their source profiles are described by Marmur et al. (2005) and Shi et al. (2011), and are also summarized in Table 1.The contributions of each source are simulated from normally distributed random numbers, and the mean contributions and variance matrix used to produce them are summarized in Table 1 (Habre et al., 2011).Each of the synthetic dataset contains 300 "daily samples", 22 species, and 3 sources.

Data Analysis
In environmental studies, C1 is associated with the detection limits of instrument used and is typically considered to be constant for individual species (Paatero, 2007).As recommended by (Paatero, 2007), C1 was set to twice of the precision of concentration measurement reported for x ij (Table S4).Note that laboratory uncertainty could also be estimated from similar studies if unavailable.The values of C3, on the other hand, are arbitrarily assigned by users.In this study, ten different values of C3 were used (i.e., C3 = 0.025, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.5, 0.7, and 0.9) to investigate its impact on uncertainties σ (Table S5).
In addition, as mentioned previously, because certain variables can be considered to be more important than others, down-weighting of some species were used in several previous studies.In this study, instead of down-weighting, we up-weighted some variables by reducing their uncertainties, to explore the impact of such subjective weighting methods (2) SCE 2: Ideally, users can determine all of the source categories and their tracers, then uncertainties σ for the corresponding tracers may be decreased to subjectively emphasize the tracers.To investigate the impact of such decisions, all of the tracers (i.e., EC and OC for LDGV, Si for SDUST, and NO 3 -for AMNITR) were upweighted in SCE 2.
(3) SCE 3: In most environmental studies, users may not be able to identify all of the source categories, nor a subset of the sources and their tracers.In this scenario, it is assumed that only AMNITR can be determined, and the value of C3 for NO 3 -was reduced.(4) SCE 4: Under certain circumstances, users could identify the wrong source category.For example, an Fe-industry source do not contribute to ambient PM, but the user may incorrectly assume that this source category exists and decreased the uncertainties of Fe (i.e., an unsuitable species).Understanding the effect of this subjective decision and incorrect emphasis is important.Here, the above described hypothetical scenario was assumed, with the value of C3 for Fe (i.e., the unsuitable species) decreased.In all scenarios, H x was set to 0.025-0.9,and 10 datasets were produced for each H x (100 dataset for each scenario).To emphasize suitable source tracers or unsuitable species, the values of C3 were decreased in SCE2-4: C3 = 0.0125/0.025,0.025/0.05,0.05/0.1,0.075/0.15,0.1/0.2,0.125/0.25,0.15/0.3,0.25/0.5,0.35/0.7,and 0.45/0.9for the tracers and other species, respectively (Table S5).The C1 values were held the same as in SCE 1 (Table S4).Each of these datasets were analyzed using PMF with ten different levels of σ, resulting a total of 4000 analysis.Other parameters of PMF are shown in Supplementary Material.To evaluate the results of PMFs, two indicators were used, including the percentage of the divided results (DP), which is defined as the percentage of the number of divided tests to that of the total number of tests for each case; and the average absolute error (AAE) (Javitz et al., 1988), whose detailed definitions are also shown in the Supplementary Material.

Sampling and Chemical Analysis of Ambient PM 2.5 Dataset
To investigate the influence of uncertainties on source apportionment of ambient PM data, an actual ambient PM 2.5 dataset was analyzed using PMF with the diverse uncertainty scenarios and levels as mentioned above.The PM 2.5 samples were collected in 2012 in the megacity of Chengdu in China using filter-based samplers.Two parallel medium-volume air samplers were used, with one being polypropylene membrane filters (90 mm diameter, Beijing Synthetic Fiber Research Institute, China), and the other being quartz-fiber filters (90 mm diameter, type 2500QAT-UP, Pall Life Sciences, USA).The samplers were pre-calibrated, and the pump volume was set at 100 L min -1 and pumps were operated continuously for 24 h.
The concentrations of 17 elements (i.e., Na, Mg, Al, Si, K, Ca, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, and Pb), total carbon (TC) and 2 water soluble ions (i.e., NO 3 -and SO 4 2-) were measured.Elements in ambient samples collected on the polypropylene membrane filters were analyzed via inductively coupled plasma (IRIS Intrepid II, Thermo Electron).Water soluble ions were analyzed via ion chromatography (DX-120, DIONEX).A clip from each quartz-fiber filter was taken for TC measurement, which was carried out by DRI/OGC carbon analyzers.Detailed information on the sampling area, sampling method, chemical analysis and quality assurance/quality control were reported in previous studies and Supplementary Material (Shi et al., 2009;Tian et al., 2013b).

Impacts of Uncertainties on Datasets with Diverse Errors
For SCE 1, the DP values, which were used to investigate the stability of PMF, were calculated and summarized in Table S6.Inputs with large errors can lead to nonrepresentative results.With increasing errors, the stability and performance of PMF would decrease.As shown in Table S6, when H x exceeds 0.2, the values of DP for most cases were below 50%.Hence, further AAEs were restricted to H x = 0.025-0.2.The average AAEs (i.e., of three sources) of ten tests and their average values for each case are shown in Fig. 1.With increasing Hx, AAE exhibited larger means and ranges.These results can also indicate that the PMF 0.9 0.7 0.5 0.3 0.25 0.2 0.15 0.1 0.05 functioned stably and effectively with low errors inputted.PMF extracts source profiles and quantifies contributions based on temporal variations of chemical species because species from the same source have the same temporal variation patterns (Pant and Harrison, 2012).Solution of PMF might be instable when errors are high, and to a certain degree, un-separated sources or results with unclear physical significance might appear.With lower input uncertainties, PMF solutions are stable, and are consistent across multiple starting points.
Additionally, the performance of PMF model varies with changes of C3 at the same error level (Fig. 1).This finding suggests that the estimated uncertainties for PMF would influence its performance significantly, and the effects are associated with errors in the input datasets.The averaged AAEs ranged from 2.07% (C3 = 0.1) to 29.3% (C3 = 0.9) when H x = 0.025; and from 50.2% (C3 = 0.3) to 61.7% (C3 = 0.025) when H x = 0.2.When H x = 0.025 and 0.05, significant increases of AAEs were observed when C3 are high (exceeds 0.3).When H x = 0.1, the AAEs changes less substantially with C3.When H x = 0.15, higher AAEs appeared in both lower and higher ends of C3 values.The effects of uncertainties and errors indicate that, when errors in the datasets are small, or when specified uncertainties are too high, the results of PMF would deviate from true values.Conversely, when errors are below certain limits, results from PMF might still be satisfactory results despite relatively higher estimations of uncertainties.Thus, the uncertainties input into the PMF should be estimated based on the characteristics of the analyzed datasets.
For synthetic datasets, true contributions (i.e., the mean contributions) are available, and the AAEs between the true and estimated contributions can be calculated and used to evaluate the accuracy of PMF.It is worth mentioning that in environmental studies, the true contributions are likely unknown, and Q is typically used as an indicator.
Such application of Q value as the diagnostic metric is limited, but still widely used (Brown et al., 2015).Note that in Eq. (1), Q is calculated based on residuals and uncertainties.The Q of each PMF case was calculated to investigate its relationship with uncertainties and the performance of PMF model.The theoretical Q for a model run is equal to nm-p (n + m) (Koçak et al., 2011), which could be compared with the values of Q obtained via PMF to indicate the goodness of model fit.The average Q values for each H x and C3 combination in SCE 1, and the theoretical Q (equals to 6204), are shown in Fig. S1 in the Supplementary Material.In Fig. S1, Q decreases as uncertainties increases for each error level, which is expected given the definition of Q in Eq. ( 1).The cases with the best Q (i.e., Q is similar to the theoretical Q) are circled in Fig. 1.The uncertainty levels corresponding to the best Q increases as H x increases, which indicates the collaborative effect of uncertainties and errors.These results suggest that using Q to guide the selection of PMF solution is effective.For a certain dataset, the difference between calculated Q and Q values obtained from alternative PMF models are often used as criterion for rejecting models with "too high" Q values, such as determining rotational ambiguity, factor numbers, etc. (Paatero et al., 2014).

Influence of Emphasis by Up-Weighting Uncertainties
In this section, the benefits and risks of subjective emphasis on the important variables in PMF model were discussed (SCE 2-4). (

1) Influence of Subjective Emphasis on all Correct Tracers
For SCE 2, the results of DP are summarized in Table S6.In SCE 2, PMF results with a decreased σ for all of the correct tracers are able to tolerate higher errors.Hence, PMF with subjective emphasis on correct tracers is able to separate factors effectively, even though errors of the datasets are relatively high.However when errors are too high, the results might be not reliable.The high DP alone may not demonstrate consistency of the results completely; thus, AAE for H x = 0.025-0.3were also calculated (Fig. S2, with best Q), and the average Q is shown in Fig. S3.Similar conclusions can also be drawn for SCE 1.The degree of errors that PMF is able to tolerate is higher in SCE 2 than in SCE 1.
To further compare SCE 1 (all of the species had the same C3) and SCE 2 (C3 of all tracers were reduced), AAEs of SCE 1 and SCE 2 for the same H x and C3 combinations were compared.The average AAEs are shown in Fig. 2 (H x = 0.025 and 0.2) and Fig. S4 (H x = 0.05, 0.1 and 0.15), along with the standard deviations for each H x and C3.When errors in the datasets are low (e.g., H x = 0.025 in Fig. 2), no significant differences are shown.Conversely, when errors are high (e.g., H x = 0.2 in Fig. 2), better and more stable results were obtained from PMF cases with decreased uncertainties for all correct tracers.Therefore, if all of the sources and their tracers were to be determined accurately, reducing uncertainties σ for these tracers could help PMF to separate the factors effectively, and model performs more stably when errors in the datasets are relatively high.Emphasis on correct tracers will increase the weight of correct sources so that their temporal variations are more prominent, and the source categories are easier to be separated.Thus, the available factors can be used to explain the behavior of suppressed, irrelevant variables (Paatero and Tappe, 1994). (

2) Influence of Subjective Emphasis on One Tracer
In SCE 3, the situation in which only part of the sources and their tracers can be determined correctly were investigated.Extreme cases were excluded from further analysis.The comparisons between SCE 1 and SCE 3 for H x = 0.025-0.2and C3 = 0.05-0.25 are shown in Fig. S5, and the corresponding Q values are summarized in Fig. S6.No significant differences can be found among the AAEs and Q values in SCE 3 (C3 of NO 3 -were reduced) or SCE 1 (C3 of all of the species were held the same).The PMF generally performs better when C3 of NO 3 -was reduced, especially for AMMITR source, although similar conclusion cannot be drawn from all cases.However, based on results from these synthetic tests, it is evidential that the upweighting of the correct tracer(s) could improve PMF performance to some degree. (

3) Influence of Subjective Emphasis on Unsuitable Species
In SCE 4, a Fe-industry source was assumed to exist incorrectly, and C3 values of Fe were reduced.Comparing Fig. 2. Comparison of results from SCE 1 (C3 for all species were held the same) and SCE 2 (C3 of all correct tracers were decreased).Average AAEs and standard deviation for the same errors (H x = 0.025 and 0.2) and C3 combinations are compared.TOT refers to total PM concentration.
the results with "true" conditions, the values of DP in SCE 4 are rather low even when H x = 0.15, indicating that PMF was impacted by incorrect user inputs, leading to nonrepresentative results.As an example, differences between the results of AAEs for H x = 0.025-0.15and C3 = 0.05-0.25 for SCE 4 and SCE 1 are shown in Fig. S7.The estimated contributions of the three sources deviates from "true" values even when H x was low, indicating that the subjective emphasis of unsuitable species might lead to disruptive consequences for PMF.
To further demonstrate the disruptive consequences of subjective emphasis on unsuitable species, Fig. 3 provides comparison of source profiles derived from two cases for the same dataset.As shown in Fig. 3(a), three factors can be identified: AMNITR with NO 3 -as tracer; LDGV with OC and EC as tracers; and SDUST with Si as tracer.Such finding is consistent with the "true" condition.However, for SCE 4 (Fig. 3(b)), AMNITR and LDGV were not separated, and a unique factor with Fe as tracer was extracted.Such results are not representative of the "true" condition due to the assumption on Fe-industry sources.Therefore, if users were to accidentally or subjectively specify small uncertainties for one specie, the noise introduced might be substantial, and a counterfeit factor may be extracted from PMF (Paatero and Tappe, 1994).
Additionally comparisons between Q values in SCE 1 and SCE 4 were performed (Fig. S8 and Fig. 3).Q values from the two scenarios are similar, which might imply that using Q to describe the disruptive influence of subjective emphasis on unsuitable species may be inadequate.Such finding indicates that, if a source were incorrectly identified, and tracers species were to be mistakenly emphasized, the source might be extracted from PMF as the user expected, however Q may not reflect actual quality of the result.
In summary, following suggestions can be made from results of synthetic tests on the impact of uncertainties for PMF: (1) Uncertainties of species should be estimated based on the characteristics of the datasets to be analyzed.
(2) Appropriate up-weighting of the correct tracers are helpful for obtaining satisfactory outcomes.( 3 species are emphasized, PMF results might be disrupted, and counterfeit factors might be extracted.Unfortunately, such errors could not be captured accurately by Q.

Application of the Ambient PM 2.5 Dataset from Chengdu
(1) Levels and chemical compositions of PM 2.5 in in Chengdu The average concentration of PM 2.5 in Chengdu during the sampling period was 131.98 ± 49.75 µg m -3 , and their chemical compositions are shown in Fig. S9.SO 4 2-accounts for the highest fractions (21.69%), followed by TC (18.34%) and NO 3 -(12.82%).Crustal elements are also found to be important for PM 2.5 in Chengdu.Potential sources of PM 2.5 are energy sectors (Fig. S10) including coal, nature gas, electric power and oil refineries. (

2) Influence of Uncertainties on Source Contributions to PM 2.5 in Chengdu
To evaluate the practicability of above mentioned suggestions, the ambient PM 2.5 dataset from Chengdu was analyzed with PMF at diverse uncertainty levels.A 133 × 20 matrix was used as the input concentration matrix for PMF, and uncertainty inputs were constructed based on the suggestions aforementioned.
Three tests were performed on the PM 2.5 dataset.In Test A, uncertainties were constructed based on the typical empirical method: the C3 values of all species were held the same.Different numbers of factors were conducted, and five factors were found to yield the best performance.Similar to the results of the synthetic tests, decreases in Q were observed as the number of uncertainties increases.When C3 = 0.5, the values of Q were found to be the closest to the theoretical value (1895), and the PMF performed better.
A summary of the performance of PMF (C3 = 0.5) is shown in Table 2(A), and the extracted factors are shown in Table S8(A).Factor 1 showed a strong association with Ca and was identified as cement dust.For Factor 2, the concentrations of crustal elements (i.e., Al, Si, and Fe) and TC are high.Al, Si, and Fe suggest the impact of soil dust, while TC typically indicates vehicular exhaust contributions.The relatively high concentrations of Al, Si and TC may also suggest a coal-combustion source category.Therefore, it is likely that Factor 2 in Test A was a mixed sources of soil dust, coal combustion and vehicular exhaust.NO 3 -and SO 4 2-showed high weights in Factors 3 and 4, which were determined to be secondary nitrate and secondary sulfate, respectively.Factor 5 was linked with Ze, which is a marker of tire wear.Overall, for traditional empirical method, three important sources (soil dust, coal combustion and vehicular exhaust) were reflected by one factor.However, various other sources contribute to more than 80% of the total PM 2.5 contributions, suggests less favorable model results.
For Test B, uncertainties were constructed in a way that the C3 of certain suitable source markers were decreased.Based on field survey and related studies (Tian et al., 2013b), the primary source categories in Chengdu are likely to be soil dust, vehicular exhaust, coal combustion, secondary sulfate, secondary nitrate and cement dust.Thus, the values of C3 were decreased for the above source markers including Al, Si, Ca, TC, NO  S8(B).Factor 1 was associated with Al, Si, Fe and TC; thus, it was identified as soil dust and coal combustion, which were not extracted due to their collinearity.Factor 2 was a mixed source of cement dust and tire wear, which were indicated by the markers Ca and Ze.Factor 3 and 4 were secondary nitrate and secondary sulfate, respectively.Factor 5 was vehicular exhaust, which exhibited a high concentration of TC.In this test, vehicular exhaust was treated separately from the mixed source, while cement dust and tire wear were considered to be one factor.Compared with less important sources, users are likely to distinguish two important sources.The correlation coefficient between the measured and estimated contributions in Test B (R = 0.95) is also higher than Test A (R = 0.9).These results verified that subjective emphasis on suitable tracers could improve the performance of PMF model.
Nonetheless, decreasing C3 for source markers may also pose risks.If the C3 of an unsuitable species were decreased, incorrect results might be obtained.In Test C, C3 of Fe was decreased subjectively.As the result, a unique factor with Fe as the tracer was extracted, and showed the highest contribution to PM 2.5 (45.02%), close to perfect values of Q (1861.96)and high R (0.93) values (Table 2(C) and Table S8(C)).However, results from field survey and related studies indicated that source contribution from the Fe-industrial sources to PM 2.5 was in fact low.Hence the results in Table 2(C) are not considered representative.Such finding is consistent with previous results: if uncertainties of unsuitable species were decreased, a counterfeit factor may be extracted, and Q may not reflect the disruptive consequences.Therefore, the decrease in C3 should only be considered for suitable species.Better understanding of the monitoring areas are needed and comprehensive field investigations may be necessary for such judgments.

CONCLUSIONS
In this study, the effects of uncertainty estimation on the performance of PMF model were systematically investigated using synthetic and ambient PM 2.5 datasets.The results suggest that both errors in the datasets and uncertainties inputs to the PMF model can affect PMF performance.The uncertainties should be estimated based on the characteristics of the datasets being analyzed, and the uncertainties would increase as the error in the datasets increases.In addition, the benefits and risks of subjective emphasis on the species were also investigated.Reduce uncertainties of appropriate tracers will likely to improve PMF performance.However, if the uncertainties of unsuitable species were decreased, the results may be adversely affected: counterfeit factors may be extracted, and the Q metric might not reflect the disruptive consequences.Results from this study may help investigators to better estimate uncertainties for PMF models, thus improving model performance.Furthermore, a focus on robust analytical methods is very important to ensure that the input data are representative and robust.

Fig. 1 .
Fig.1.AAEs of ten tests and their average values corresponding to diverse uncertainty levels (C3) for various errors (H x = 0.025-0.2) in SCE 1. Cases with the best Q (Q were the most approximate to theoretical Q) were marked for each H x .

Fig. 3 .
Fig.3.Source profiles and Q values of one example dataset (H x = 0.15) estimated by PMF using different uncertainties in SCE 1 (C3 = 0.15 for all species) and in SCE 4 (Fe was incorrectly emphasized with C3 = 0.075, C3 = 0.15 for others).

Table 1 .
Three ambient particle sources and their source files used to develop synthetic datasets.SCE 1: the simplest empirical method, with C3 values set to the same for all species.

Table 2 .
Tracers, source categories and Q values estimated by PMF in three Tests for the ambient PM 2.5 dataset collected from Chengdu, China.Options of the three Tests and correlation coefficients between measured and estimated total PM concentrations (R) are also listed. of the values reported for x ij .A summary of modeled results is shown in Table 2(B); the extracted source profiles and source contributions are shown in Table