Accuracy of Advanced and Traditional Three-Way Factor Analysis Models for Determining Source Contributions to Particulate Matter

Although three-way factor analysis models can take more information into account, their degree of accuracy must be investigated further. We simulated numerous synthetic datasets to evaluate traditional (PMF3) and advanced three-way models (AAB or ABB). On one hand, scenarios whereby sources share the same profiles but different emission patterns were constructed and introduced into PMF3 and advanced AAB models that can estimate the same source profile matrixes B but different emission pattern matrixes A. On the other hand, datasets with the same emission patterns but different profiles were set to simulate the variability of source profiles and were used to evaluate an advanced ABB model that can estimate the same emission pattern matrix A but different source profile matrixes B. The AAEs of PMF3 under two different conditions ranged from 2.95% to 90.22% and from 2.98% to 90.11%, respectively, while the results of the advanced three-way models were 2.88%–27.51% and 2.89%–29.89%, respectively. We observed that the PMF3 performed well when all sources showed strong emission patterns and source profiles were similar. The application of advanced AAB model was stable under various emission patterns; and ABB model was stable under different variability of source profiles. At the same time, several ambient datasets were estimated by the advanced three-way models. Our findings suggest that the performance of advanced three-way models make full use of spatial or size distribution information to enhance capacities to identify source categories.


INTRODUCTION
Atmospheric particulate matter (PM) is considered to be a major pollutant worldwide (Shen et al., 2014;Srithawirat and Brimblecombe, 2015;Meena et al., 2016).Numerous studies have described adverse effects of PM on public health, visibility, climate change and environmental quality (Zheng et al., 2007).To improve urban quality levels, environmental management departments typically pay more attention to PM emitted sources, which have become a topic of scientific debate worldwide (Huang et al., 2014;Kchih et al., 2015;Malaguti et al., 2015).However, individual contributions of each source category to PM cannot be measured directly (Shen et al., 2014).Therefore, various source apportionment methods have been developed and applied (Zheng et al., 2011;Shen et al., 2014;Taiwo, 2016;et al., 2003;Abdollahi et al., 2010).PMF3, one of the most commonly used three-way receptor models and a standard bilinear PMF was used to resolve the the source categories and contributions to PM (Mantas et al., 2014), typically employs PARAFAC (Parallel Factor Analysis) strategies (Paatero, 1997).
As discussed in the related researches studies and in our previous works (Tian et al., 2014), the PARAFAC approach could only extract a single source profiles matrix (matrix B) and the same source emission pattern matrix (matrix A) for different sites or for different sizes of PM, which might be inconsistent with the actual situation conditions as discussed above.While multi-size or multi-site three-way datasets with temporal variability, chemical species and size distributions or spatial variability levels as dimensions may provide more information for source apportionment (Pu et al., 2015;Parthasarathy, et al., 2016).Diverse threeway datasets generally present different characteristics.For multi-size three-way datasets, source compositions are dependent on particle sizes (Peré-Trepat et al., 2007), meaning that source profiles may vary slightly for different sizes, though the time series of daily contributions for one source category should be highly consistent (Shi et al., 2015).For multi-site three-way datasets, when sites fall within a certain geographical range (e.g., a city), the fundamental assumption is that source profiles for receptors are identical; however, the time series of daily contributions for one source category at different sites can be altered.When characteristics of various three-way datasets can be used fully, the accuracy of source apportionment results may be improved.
To take better advantage of three-way dataset information and to reflect real conditions, an advanced three-way factor analysis method based on ME2 (multilinear engine 2) was proposed and applied (Shi et al., 2015;Peré-Trepat et al., 2007).This advanced model can take size-composition variations into account, and namely it can obtain one emission pattern matrix (matrix A) but different source profile matrixes (matrix B) for multi-size three-way datasets.Thus, it is referred to as the ABB three-way factor analysis model.Another similar model can generate the same source profile matrix but different emission pattern matrices for multi-site three-way datasets, and it is thus referred to as the AAB threeway factor analysis model.The ABB model is suitable for multi-size datasets and the AAB model is appropriate for multi-site datasets to address real conditions and to take full advantage of three-way dataset information.
The accuracy of three-way factor analysis models is central to source apportionment and to further health-related studies.However, advanced three-way factor analysis models have not yet been systematically explored, and assessments of their accuracy are scarce.In fact, it may be nearly impossible to fully determine the "true" contributions of sources in real air ambient conditions.It is thus difficult to use real ambient datasets to explore the performance of models.Therefore, artificial dataset and simulation studies have typically been conducted by researchers (Shi et al., 2011;Tian et al., 2014).
In this work, extensive synthetic datasets for several scenarios involving two aspects (varying emission pattern and source profile variability levels) are developed and applied using advanced AAB and ABB three-way factor analysis models and the traditional PMF3 to evaluate the accuracy of three-way factory analysis models.The main purpose of this paper is to study the application potential of advanced three way models.We, in turn, propose protocols of model selection and application for use in environmental management departments.

Advanced Three-Way Factor Analysis Models
The traditional PMF3 is based on the PARAFAC model, which can be expressed in the form of a matrix as follows (Hopke et al., 1998): where X is the three-way matrix of ambient concentrations (µg m -3 ) for different types of PM (e.g., sizes or sites); A relates to the source contribution matrix, and namely the emission pattern matrix; B relates to the source profile matrix; and C is the fractional factor for different PM types.
According to the equation, for PM of different types, the same source profile matrix (B) and the same emission pattern matrix (A) are extracted through traditional PARAFAC.
The main principle of advanced AAB three-way factor analysis models can be described as follows (Peré-Trepat et al., 2007): where a ipk is the estimated contribution of the pth source to the ith sample for the kth PM; b jp is the jth species mass fraction in the pth source profile.The AAB model can estimate the same source profile matrixes B (the matrix of b jp ) but different emission pattern matrixes A (the matrix of a ipk ) for PMs of different types.Similarly, the ABB model can be described as follows: where x ijk is the concentration value of the jth species of the ith sample for the kth PM; a ip is the estimated contribution of the pth source to the ith sample; b jpk is the jth species mass fraction of the pth source profile for the kth PM; e ijk is the residual.From the ABB model, the same emission pattern matrix A (the matrix of a ip ) but different source profile matrixes B (the matrix of b jpk ) can be obtained for PMs of different types.
The advanced three-way factor analysis model was derived from multilinear engine 2 (ME2), which is a more flexible computer program.The ME2 was first developed by Paatero (1999).Its purpose is similar to that of the general factorization method in that it has typically been solved via positive matrix factorization (PMF).Detailed descriptions of the ME2 are included in related literature (Paratero, 1999;Ramadan et al., 2003;Amato and Hopke, 2012) and in our previous works (Tian et al., 2014;Shi et al., 2015).The advanced AAB and ABB models evolved from a logical two-way model, but they are organized as three-way arrays and input data should be three-way characterized by composition, time and PM type (Peré-Trepat et al., 2007).

Synthetic Dataset Development and Data Analysis
To evaluate the accuracy of the advanced three-way factor analysis models, numerous synthetic datasets for different scenarios were developed.For the purposes of the present work, two types of synthetic datasets were developed: (1) datasets with different site-to-site correlations of emission patterns used to evaluate the performance of the AAB model and (2) datasets with differing degrees of source profile variability used to evaluate the performance of the ABB model.
The method used to construct the synthetic datasets is based on the published literatures (Habre et al., 2011;Tian et al., 2014;Shi et al., 2015).Two PM types denoted as site I and site II (or size I and size II) were examined.For each PM type, the corresponding receptor concentration matrix X m×n was developed according to: where G m×p is the source contribution matrix; m is the number of daily samples; p is the number of sources; F p×n is the source profile matrix; n is the number of species; and E m×n is the random perturbation method used to simulate analytical and measurement errors.Due to concerns related to feasibility and in accordance with previous related studies (Javitz et al., 1988;Habre et al., 2011), the collinearity of source profiles and contributions was not taken into account.For F p×n , the three source categories used in the present work are vehicular exhaust (denoted as VE), secondary sulfate (SS) and crustal dust (CD).Actual source profiles of vehicular exhaust and crustal dust measured in our previous work (Shi et al., 2015) were applied.The profiles of secondary sulfate were largely expressed as pure (NH 4 ) 2 SO 4 .The source profiles used to simulate three-way datasets (referred to as actual source profiles) are shown in Table S1 Thus, p = 3 and n = 24.For the construction of G m×p , the Monte Carlo method was employed to randomly determine the contributions of each source category from a normal distribution with the set mean and standard deviations (Habre et al., 2011).Three hundred daily samples were simulated (m = 300).The coefficient of variation (CV, which was defined as the standard deviation of the species concentration divided by its mean) was used to perturb receptor concentrations.Following from Javitz et al. (1988), a CV of 10% was used in all of the concentration matrices, and random uncertainties were generated from a log-normal distribution.The ambient concentration matrixes of two types of PMs were combined to develop each three-way input as a 300 × 24 × 2 array denoting temporal, compositional and type variability.
To systematically evaluate the performance of the AAB model and to examine the impacts of emission patterns on the advanced AAB and traditional PMF3 models, datasets with different site-to-site correlations of emission patterns were constructed based on four scenarios as follows (Table S2).SCE 1: for each source category, site-to-site correlations of daily contributions between two types were set to zero to simulate emission patterns of different PM types that are completely inconsistent; SCE 2: site-to-site correlations of daily contributions from secondary sulfate were set to 0.6 and the others were set to 0 to simulate one source exhibited moderate correlation between two types; SCE 3: site-to-site correlations for three source categories were set to 0.6 to simulate three sources exhibiting moderate site-to-site correlations between two types; SCE 4: site-to-site correlations for three source categories were set to 0.9 to simulate three sources exhibiting strong site-to-site correlations.
The site-to-site correlation of daily contributions between two sites (R) was denoted as: ( , ) where CORREL denotes Pearson's correlation coefficients; g ij Site I is the simulated contribution of the j th source to the i th sample for type Site I; and g ij Site II is the simulated contribution of the j th source to the i th sample for type Site II.
To systematically evaluate the performance of the ABB model and the influence of source profile variability on advanced and traditional three-way factor analysis models, datasets with varying degrees of source profile variability were developed.CVs = 10%, 30% and 50% were used to simulate three degrees of source profile variability.Due to the presence of uniform emission patterns, site-to-site correlations for three source categories were set to 0.9.
To guarantee stability, representativeness and feasibility, ten synthetic datasets were generated for each scenario, creating a total of 70 synthetic datasets.
The 40 synthetic datasets with different site-to-site correlations of emission patterns were introduced into the advanced AAB and traditional PMF3 models.The 30 synthetic datasets presenting different degrees of source profile variability were computed from the ABB and PMF3 models.
Indicators used to estimate the performance of the threeway models included Pearson correlation coefficients (COR) and the average absolute error (AAE) between actual (simulated) and modeled daily contributions.The two indicators were defined as follows (Javitz et al., 1988): where E ij is the estimated contribution from the jth source category to the ith sample for each PM type; T ij is the simulated contribution (namely the true contribution) corresponding to E ij ; and n is the number of samples and is equal to 300 in this work.Some ambient datasets were estimated from the advanced 3-way model in our prior works to study multi-dimensional source apportionment patterns.

Evaluation of the AAB Three-Way Model for Different Site-to-Site Emission Pattern Correlations
To investigate the performance of the advanced AAB model for different site-to-site correlations of source contributions, 40 synthetic datasets based on 4 scenarios (with different site-to-site correlations of source contributions) were introduced into the advanced AAB and traditional PMF3 models, and the results were then compared.
The mean estimated contributions of the three source categories and the TC (total concentration of PM: the sum of contributions from three source categories) estimated from the advanced AAB and traditional PMF3 models are shown in Fig. 1.As discussed in our prior work (Shi et al., 2015), PMF3 results are rather divergent and unstable with high standard deviations.Very PMF3 poor performance was observed, and especially for SCE 1 and 2. However, the performance of the advanced AAB model for different type-to-type correlations of source contributions was determined to be satisfactory.For SCE 1, the estimated contributions of VE, SS, CD and TC by AAB model were, respectively, 40.65 ± 4.10 µg m -3 , 35.79 ± 2.81 µg m -3 , 43.12 ± 4.40 µg m -3 , 119.55 ± 1.00 µg m -3 for Site I, respectively, 27.82 ± 3.47 µg m -3 , 35.35 ± 3.25 µg m -3 , 26.93 ± 3.06 µg m -3 , 90.11 ± 0.60 µg m -3 for Site II; while the results of PMF3 were 36.13 ± 10.21 µg m -3 , 62.58 ± 9.40 µg m -3 , 17.68 ± 15.66 µg m -3 , 116.39 ± 5.21.00 µg m -3 for Site I, 16.16 ± 24.84 µg m -3 , 50.73 ± 29.65 µg m -3 , 21.04 ± 14.09 µg m -3 , 87.93 ± 4.15 µg m -3 for Site II respectively.And the same conditions appeared in the other scenarios.We found the mean estimated contributions of the advanced AAB model to be very consistent with the real data, and the standard deviations were very small, denoting stable and accurate advanced AAB model results for different site-to-site correlations of source contributions.
To further evaluate whether the advanced AAB model and traditional PMF3 can function well for different siteto-site correlations of daily source contributions and to investigate the influence of site-to-site correlations on them, CORs and AAEs between estimated and predicted contributions were calculated.Fig. 2 presents the CORs of each source category and TC values for four scenarios for sites SI and Fi.S1 presents the conditions for site SII.It is evident that for all source categories and TC values, CORs were clearly unstable.For most fittings, and especially those in SCE1 and 2, estimated contributions of PMF3 diverged from true values while in SCE 3, the PMF3 performed better, and the results of PMF3 in SCE4 are the best.Thus, we found that the solutions of the traditional PMF3 are suitable only for high site-to-site correlations of source contributions.For the advanced AAB model, stable results with high CORs were found for all source categories at both sites: CORs ranged from 0.68-0.92for vehicular exhaust, from 0.70-0.95for secondary sulfate, and from 0.68-0.94for crustal dust in SCE 1; from 0.79-0.89for vehicular exhaust, from 0.77-0.94for secondary sulfate, and from 0.77-0.92for crustal dust in SCE 2; from 0.83-0.92for vehicular exhaust, from 0.75-0.95for secondary sulfate, and from 0.81-0.94for crustal dust in SCE 3; and from 0.82-0.92for vehicular exhaust, from 0.82-0.94for secondary sulfate, and from 0.79-0.94for crustal dust in SCE 4. The CORs of TC estimated from the advanced AAB model exceeded 0.94 in all cases, which means that the results of the model were stable even though the siteto-site correlations of fluctuated obviously.Furthermore, the AAEs shown in Figs. 3 and S2 are in good agreement with the CORs.In SCE 1 and 2, the AAEs of the PMF3 presented larger ranges with high mean values, suggesting that PMF3 performs poorly when siteto-site correlations of source contributions are low.While site-to-site correlations of all source contributions were found to be higher (In SCE 3 and 4), the AAEs of PMF3 were relatively lower.All of these results show that the site-to-site correlations of source contributions should strongly influence the performance of PMF3 and could perform better when all sources show strong site-to-site associations.However, for the advanced AAB model, AAEs were stable.In SCE 1, the AAEs of three source categories ranged from 12.30%-39.49%with the average value of 21.89% for SI and from 12.12%-27.55%with an average value of 19.70% for SII.In SCE 2, the AAEs of three source categories ranged from 11.59%-34.65%with an average value of 19.36% for SI and from 12.28%-27.69%with an average value of 17.99% for SII.AAEs ranged from 10.48%-35.88% with an average of 19.95% for SI and from 10.29%-29.72% with an average of 18.81% for SII in SCE 3. In SCE 4, AAEs ranged from 9.97%-36.93%with an average value of 20.81% for SI and from 10.11%-40.65% with an average of 19.82% for SII.The AAEs of TC fell below 5% for all source categories at both sites.
These results can be explained by the different principles underlying PMF3 and the advanced AAB model.According to Eq. ( 2), the AAB model takes differences in source contributions fully into account in the surveyed region, while PMF3 extracts only one Matrix A (related to source contributions) for all of the sites, i.e., for each source, site-to-site correlations of contributions should be units.Therefore, PMF3 can perform well only when all sources show strong site-to-site correlations among all investigated sites.In contrast, site-to-site correlations of source contributions do not influence the results of the advanced AAB model.Thus, both models can be used when all sources show high site-to-site correlations, as is the case of the period of regional pollution.When differences in source contributions increase, the performance of PMF3 worsens while the results of the advanced AAB model remain stable.When the datasets share the same source profile but different their emission pattern, AAB three-way model is appreciate to use.
To evaluate the performance of the advanced AAB threeway factor analysis model based on the ambient dataset, the results of the AAB model gleaned from our our other work were examined.Ambient PM 2.5 samples were collected from two sites in Tianjin, a megacity in China.The ambient datasets for the two sites (three-way datasets) were introduced into the AAB three-way model.The results of the AAB three-way model are summarized in Figs.S5 and S6.According to the source profiles estimated by the AAB three-way model, six factors (sea salt, secondary sulfate and secondary organic carbon, vehicular exhaust, crustal dust and cement dust, coal combustion and biomass burning and secondary nitrate) were identified.At the same time, the coastal and inland site datasets were apportioned by PMF (PMF2).Similar factors and source profiles were obtained in coastal site, but the sea salt factor can't be extracted in inland site Rather, the results of the advanced AAB three-way model are consistent with those of the PMF2, which is widely used.However, the AAB three-way model can make full use of spatial information to facilitate source category identification.

ABB Model Evaluation for Different Source Profile Variability Levels
As studied in our prior work, source profile variability should also affect three-way source apportionment.Thus, the advanced ABB model was employed to solve this problem.In this section, 30 synthetic datasets for 3 scenarios (with varying degrees of source profile variability: CV = 10%, 30% and 50%) were analyzed using the advanced ABB and traditional PMF3 models.The corresponding results were analyzed.
Due to differing assumptions and fundamental principles, the PMF3 and advanced ABB model performed differently.
For the advanced ABB model, source profiles are individually established for each size; while PMF3 shares the same source profiles for all sizes.Therefore, the results can be explained.PMF3 performed well only when the source profiles of all sizes were strongly correlated.As receptor concentration  disturbance increased, source profile similarities weakened, and PMF3 deteriorated considerably.However, the results of the advanced ABB model remained robust even though the correlation source profile varied.The ABB three-way model require the similar emission pattern but different source profile.Furthermore, the ABB three-way model was applied to an ambient dataset reported in our previous works (Tian et al., 2016).PM 10 and PM 2.5 were synchronously sampled for Chengdu, a megacity in southern China.The ambient dataset for a single size was computed by PMF2 and the three-way dataset of two sizes was estimated by the ABB three-way model.The percentage source contributions evaluated by the two models are summarized in Fig. S7.For both PMF2 and ABB threes-way models, cement dust became enriched in the coarse area and vehicular exhaust, secondary nitrate and SOC became enriched in the fine area.It is interesting to note several differences between the size distributions determined by the two models.The contributions of secondary sulfate fitted by PMF2 were similar in fine and coarse PM while the secondary sulfate of the ABB three-way model clearly became enriched in PM to a lesser degree than it did in PM 2.5 .According to related studies (Arimoto et al., 1996;Srimuruganandam and Shiva Nagendra, 2012), it can be concluded that the size distribution of percentage source contributions (which is defined as the percentage of a certain source contribution accounting for corresponding PM concentration) estimated by the advanced ABB three-way model may be more reasonable than that of the PMF2.This may be explained by the fact that the three-way model can consider internal data on chemical species, temporal visibility levels and PM sizes while two-way models compute datasets of certain sizes independently, resulting in missing information between sizes.

CONCLUSION
To study the accuracy of advanced and traditional threeway factor analysis models, simulated three-way blocks were constructed and modeled by PMF3 and advanced AAB/ABB models.Based on the results and discussion listed above, it can be concluded that emission patterns and the variability of source profiles with different particle sizes should influence the performance of three-way models along with the accuracy of estimated results.For PMF3, the model performed well only when all sources showed strong siteto-site correlations among all investigated sites and when the source profiles of all sites showed strong correlations.In contrast, the advanced three-way factor analysis models were not influenced by emission pattern or source profile variability.These findings may serve as valuable information for environmental scientists as they carry out multi-site and multi-size source apportionment tasks.Ambient datasets were also used to evaluate the performance of the advanced three-way factor analysis model.From the results of our other work on Chengdu and Tianjin, we conclude that the results of the advanced three-way model are not only consistent with those of the PMF2 used internationally but also make full use of spatial information to facilitate source category identification.
The main focus of our analysis was to systematically investigate the accuracy of the models examined.Our findings are dependent on features of the simulated data analyzed.Though the ambient dataset was estimated, it may not reflect its performance perfectly when used to analyze real particle concentration data.More studies should be conducted on the advanced three-way model when run based on real particle concentration data.Furthermore, additional applications of the advanced three-way model for other characteristic three-way datasets should be explored.

Fig. 1 .Fig. 2 .
Fig. 1.The source contributions estimated by PMF3 and advanced AAB model in four scenarios with different site-to-site correlations of emission patterns.

Fig. 3 .
Fig. 3.The AEE of simulated and predicted factor contributions by advanced AAB model and PMF3 respectively for Site I.

Fig. 4 .
Fig. 4. The source contributions estimated by PMF3 and advanced ABB model in three CVs and two sizes of PM.

Fig. 5 .
Fig. 5.The CORs of simulated and predicted factor contributions by advanced ABB model and PMF3 respectively in Size I.

Fig. 6 .
Fig. 6.The AEE of simulated and predicted factor contributions by advanced ABB model and PMF3 respectively in Size I.