Langley Calibration of Sunphotometer using Perez ’ s Clearness Index at Tropical Climate

In the tropics, Langley calibration is often complicated by abundant cloud cover. The lack of an objective and robust cloud screening algorithm in Langley calibration is often problematic, especially for tropical climate sites where short, thin cirrus clouds are regular and abundant. Errors in this case could be misleading and undetectable unless one scrutinizes the performance of the best fitted line on the Langley regression individually. In this work, we introduce a new method to improve the sun photometer calibration past the Langley uncertainty over a tropical climate. A total of 20 Langley plots were collected using a portable spectrometer over a mid-altitude (1,574 m a.s.l.) tropical site at Kinabalu Park, Sabah. Data collected were daily added to Langley plots, and the characteristics of each Langley plot were carefully examined. Our results show that a gradual evolution pattern of the calculated Perez index in a time-series was observable for a good Langley plot, but days with poor Langley data basically demonstrated the opposite behavior. Taking advantage of this fact, the possibly contaminated data points were filtered by calculating the Perez derivative of each distinct air mass until a negative value was obtained. Any points that exhibited a negative derivative were considered bad data and discarded from the Langley regression. The implementation was completely automated and objective, rendering qualitative observation no longer necessary. The improved Langley plot exhibits significant improvement in addressing higher values for correlation, R, and lower values for aerosol optical depth, τa. The proposed method is sensitive enough to identify the occurrence of very short and thin cirrus clouds and is particularly useful for sun-photometer calibration over a tropical climate.


INTRODUCTION
A sun photometer is an electronic device used to measure direct sun irradiance within a narrow spectral band.It is used to derive the atmospheric transmission along the optical path length.Obtaining the atmospheric transmission profile is useful in retrieving the aerosol optical depth (AOD), which is an important radiative forcing parameter of the climate system (Guleria and Kuniyal, 2015).Besides, a sun photometer is also useful for the measurement of optically thin cloud optical depth (Guerrero-Rascado et al., 2013), the Angstrom exponent (Kaskaoutis and Kambezidis, 2008) and precipitable water vapor (Li et al., 2016).On a global scale, CIMEL sun photometers and PREDE sky radiometers are extensively employed to study the heterogeneity in columnar aerosol characteristics (Devara et al., 2013).Despite their many applications, one of the main challenges for most sun photometer measuring networks is the calibration of the instrument itself (Holben et al., 1998).Calibration is not only imperative for pre-and post-measurements but also important for measurements on a regular basis due to possible calibration constant shifts over time (Reynold et al., 2001).This shift is detectable as a permanent change in the calibration constant by 2-6% in 1.3 years for 440 to 1640 nm and 6-7% at 340 and 380 nm channels, mainly caused by the degradation of the filters (Li et al., 2009).The most economical and simplest way to calibrate the sun photometer is using the Langley method.It is based on extrapolating the diurnal sun photometer's signal to zero air mass within a suitable range of air masses.This value predicts the extraterrestrial constant for calibrating the instrument's readings into physical units or retrieving the aerosol optical depth directly after subtracting contributions from other important optical depths.However, this method requires perfectly clean and clear sky conditions for an accurate extrapolation to zero air mass.Ideally, it is performed at high altitudes (> 3,000 meters above sea level) to guarantee such conditions.Here, we introduce a new method to improve sun photometer calibration beyond the Langley uncertainty in tropical climates.We emphasize the tropical climate in our work because Langley calibration in the tropics is often complicated by abundant cloud cover.In this work, a total of 20 Langley plots were collected using a portable spectrometer over a mid-altitude (1,574 m a.s.l.) tropical site at Kinabalu Park, Sabah.Data collected were plotted in the Langley plot daily, and the characteristics of each Langley plot were carefully examined.We were able to identify some consistent patterns exhibited by a good Langley plot.These patterns are useful in characterizing the behavior of a good Langley plot and further improve the Langley calibration.The details of these patterns are discussed in this paper.

THEORY
A ground-based sun photometer pointed at the sun with a narrow field of view and a band pass filter measures a signal V of direct solar irradiance.This signal V can be related to the signal at the top of the atmosphere V 0 by where r is the normalized sun to Earth distance and m is the optical air mass.The air mass is approximately 1/cos (θ), where θ is the solar zenith angle (SZA).The total extinction τ is the sum of the contributions of aerosol optical depth τ a , molecular Rayleigh optical depth τ ray , and ozone optical depth τ O .The logarithm of the signal V has a linear relationship with the air mass.This relationship can be represented by a best fitted line with slope τ and ordinate intercept ln (V 0 /r 2 ).The calibration constant V o can be determined by extrapolating to zero air mass, which is the basis of most Langley calibration methods.Knowing V o , the AOD can be calculated by rearranging Eq. ( 1): Errors in the calibration are typically the largest sources of uncertainty in AOD retrieval.For example, the observation of a fictitious diurnal AOD cycle is a clear artifact due to an incorrect value of the calibration constant (Cachorro et al., 2008).This error was found closely related to the derived Angstrom Exponent α, which can be used to offset the calibration error.Another similar work by Kreuter et al. (2013) proposed a method to improve the Langley calibration by reducing the diurnal variation of the Angstrom Exponent.However, like all variations of the Langley method, it implicitly depends on the natural variation in AOD.At low-altitude sites, adding solar aureole measurements to the Langley analysis can realize the sun calibration (Nieke et al., 1999).Besides, imposing strict data screening to select the appropriate dataset for a Langley plot is also useful for near-sea-level calibration (Chang et al. 2014).
To produce a good Langley plot for a tropical climate, the most important condition is that the measurements must contain no cloud-cover data.Cloud cover is a mass of clouds covering all or most of the sky.Subjective removal of these points by qualitative observation is unscientific.In the tropics, despite abundant cloud cover in the rainy season, significant cloud loading is also observable throughout the year.This is because the general pattern of the tropical climate is warm temperatures and high relative humidity.Depending on the type of tropical climate, most areas generally experience large quantities of precipitation all year round.Therefore, the performance of the Langley calibration in a tropical climate is heavily governed by cloud loading.Cloud loading can be characterized by calculating the sky's clearness index.One of the most common models used for this purpose is the Perez model of sky classification.The Perez model defines the discrete sky clearness based on eight categories bounded by lower limit 1.0 for completely overcast and upper limit 6.2 for completely clear.The index is calculated using the relationship between diffuse and global components of solar irradiance by (Perez et al., 1990) 3 3 1.041 1 1.041 where I ed is the diffuse component irradiance, I dir is the direct component irradiance, λ i,j is the spectral range and ϕ H is the solar zenith angle in radian.

METHOD
A total of 20 Langley plots were collected at Kinabalu Park, Sabah, from 26 th till 30 th August, 2015, using a portable radiometer, the ASEQ LR-1 spectrometer.The study site is located in an open area at Kinabalu Park (6.0°N, 116°E, 1,574 m a.s.l.).Kinabalu Park is one of the national parks in Malaysia, located on the west coast of Sabah, Malaysia, within the district of Ranau.The major economic activity in this district is small agriculture and retail business.Therefore, aerosol loading is expected to be low for less pollution emission.Measurements were made on a visually clear morning starting at sunrise, between 0600 and 0900 local time, at periodic intervals of 3 minutes.Measurements for afternoon data are not possible due to the abundant cloud cover always prevailing during the sunset hours, especially for a tropical climate where thick fog and rainfall is regularly expected over the study area.
Table 1 shows the specifications of the spectrometer.The instrument has a 3648-element CCD-array silicon photodiode detector from Toshiba that enables an optical resolution as precise as 1 nm (FWHM).Each measurement series consists of global and diffuse irradiance components.The direct irradiance component was determined by subtracting the diffuse irradiance scans from the corresponding global irradiance scans as where λ is the wavelength of a particular spectral light and t represents time of measurement.The LR-1 spectrometer is not equipped with a shadowing band; the diffuse irradiance component is measured using a manual shading disk diffuser after each global irradiance measurement.The diffuse component irradiance is measured for each scan of the global component irradiance using a shading disc to overfill the image of the solar disc on a parallel axis from direct viewed by the sensor (Fig. 1).This shading disc has a diameter of 0.09 m (D = 0.09 m) and is held 1.0 m from the sensor.The dimension of the shading disc is determined by following the condition that the shading angle θ s of the shading disc to the sensor should be same as the viewing angle θ v of the sensor.Here, the viewing angle is defined as the maximum angle at which the senor can detect light radiation with acceptable performance.Given that the viewing angle of the cosine = corrected sensor is 5.0°, the ratio of the shading disc radius R to the distance of the shading disc L to the sensor should meet the following equation: where D represents the diameter of the shading disc.In this way, the shade of the shading disc over the shaded spectrometer covers at least the whole part of the sensor head of the spectrometer, but the margin area is kept at a minimum, as shown in Fig. 1.

RESULT AND DISCUSSION
A total of n = 219 raw data were collected after 5 days of measurement from 26 th till 30 th August 2015.Table 2 shows the important daily information of the campaign.Sky conditions were evaluated by qualitative observation, which may be useful for generic reference and discussion.Fig. 2 shows the cumulative fraction of direct normal irradiance (DNI) measured during the measurement period for each day.Our observations are consistent with the measured DNI pixel where the clear sky conditions on Day 2 and Day 3 generally measured higher DNI pixels compared to other days.As shown in Fig. 2, 75% of the data is greater than 600 pixels on Day 1 and greater than 500 pixels on Day 3.More specifically, no DNI lower than 2000 pixels was measured on Day 2, and only a little fraction less than 0.25 was measured on Day 3. Day 5 was denoted as partly cloudy because short intervals of thin cirrus clouds were observable during the first few intervals of measurement.This observation is also consistent with the measurement of low DNI pixels in the time interval from 0 to 10 min on that day, which was followed by relatively high DNI values for the rest of the intervals.Day 4 measured relatively low DNI pixels where only a fraction of 0.75 data was measured greater than 4000 pixels.Day 1 measured the lowest DNI pixels among all days, when the highest pixel was below    Fig. 3 presents the boxplot and histogram of the Perez index calculated using the ratio of diffuse to global components of solar irradiance measured during the campaign.By definition, an index greater than 4.50 is considered as clear sky, a value between 1.23 and 4.50 is partly cloudy and a value less than 1.23 is cloudy or overcast.On Day 1, the highest and lowest index was measured at 1.70 and 1.03 with an average of 1.38 ± 0.20, indicating contamination by cloud cover is the most significant factor.Another cloud-contaminated day was Day 4, which measured an average Perez index of 2.09 ± 0.66.Both Day 1 and Day 4, which would be considered as cloudy days, have quite a low standard deviation of less than 1.00 compared to other days.On the other hand, Day 3 measures the highest Perez index at 5.62 and also the largest standard deviation with an average of 3.32 ± 1.31.The second highest standard deviation was measured on Day 5 at 3.06 ± 1.13, which was partly due to the short intervals of thin cirrus clouds occurring during the first few intervals of measurement (see Fig. 1).On average, Day 2 measured the highest Perez index at 3.72 ± 1.00, indicating the best available dataset for a good Langley plot.

Langley Calibration
Fig. 4 shows the normal Langley plot extrapolated to zero air mass at 500 nm for each day.The aerosol optical depth was estimated from the slope of the regression line after subtracting the contribution from the Rayleigh and ozone optical depth.There are many methods of approximating the Rayleigh optical depth.The one used in the current study is based on the approach of calculating the Rayleigh Optical Depth (ROD) values dependent on wavelength, pressure and height (Frouin et al., 2001;Knobelspiesse et al., 2004) where k Ray(λ) is the Rayleigh scattering coefficient, p is the site's atmospheric pressure, p o is the mean atmospheric pressure at sea-level and H is the altitude from sea-level in meters.Similarly, the ozone optical depth (OOD) was calculated using satellite observations of ozone in Dobson units (DU) which is computed by Knobelspiesse et al. (2004): , ( ) 1000 where Z is the ozone concentration in DU (1 DU = 2.69 × 10 16 molecules cm -2 ) and k oz(λ) is the ozone absorption cross section.Using the inverse technique, AOD is hence retrievable from τ λ after eliminating the effects of other relevant atmospheric constituents, which in this case are the Rayleigh and ozone contribution.The regression line on Day 2 denotes the best correlation at R = 0.88 and also exhibits the lowest aerosol optical depth at τ A = 0.25 compared to other days.This observation agrees with the interpretation from Perez index statistics that predicts Day 2 exhibited the best datasets for the Langley plot.The second highest correlation, R = 0.67, and lowest aerosol optical depth, τ A = 0.45, were measured on Day 3, whereas Day 1 and Day 4, predicted as cloudy days by the Perez index, have relatively poor correlation and high aerosol optical depth of not higher than 0.60.Day 5, on the other hand, shows the greatest aerosol optical depth at τ A = 2.40, resulting to the most discrepant extrapolated value at AM0 of V o = 13.75.
In general, days perceived as cloudy (Days 1, 4 and 5) by the Perez index basically predicted higher extraterrestrial values relative to V o obtained from clear days (Days 2 and 3).We believe that this higher shift was due to the effect of cloud contamination that occurred during the measurement period.Such conditions are likely to reduce the measured DNI pixels in each distinct airmass.As a result, the overall effect could possibly shift the regression line lower and eventually lead to an overestimation when extrapolating to zero airmass.In Fig. 5, to visualize this effect, one can observe the Langley regression on Day 5, which significantly overestimated the extrapolated value due to the occurrence of short intervals of thin cirrus clouds during the early air mass (see Fig. 5(a)).After the selective removal of these cloudy data, the regression line has significantly improved the prediction of the extrapolated value to 10.09 (see Fig. 5(b)).Here, we present these results to highlight two concerns.The first concern is fictitious extrapolation is likely to happen when all data, regardless of clear or cloudy data, are included in a Langley plot.This error could be misleading and undetectable unless the performance of the best fitted line on the Langley plot is scrutinized individually (see Figs. regression is not robust; it is not surprising that a better correlation coefficient can lead to a large extrapolated value easily.Hence, this further highlights the crucial requirement. of identifying possible cloudy data and filtering them from the Langley plot for more reliable and accurate extrapolated values

Characterization and Improvement of Langley Plot
Fig. 6 shows the daily diurnal evolution of the Perez index calculated within the measurement campaign.The Perez index can be interpreted as an indicator of sky clearness, where a higher index represents clearer sky conditions.Therefore, plotting the index as a function of air mass renders a picture of the stability of the atmospheric conditions during the Langley measurement.From the figure, a gradual evolution pattern is observed for Day 2 and Day 3.Both patterns exhibit a similar evolution where the increment of change was consistent with that of time (see Figs. 5(b) and 5(c)).Another important characteristic observed in the two patterns is the amount of increment is quite steep.However, the evolution pattern for other days, particularly Day 1, is punctuated and unstable.We characterize this pattern as punctuated evolution because the increment of change is not consistent with time, and most of the time, there is virtually no change at all (see Fig. 6(a)).The observation here is consistent with the linearity of the Langley plot in Fig. 5. Taking this into consideration, the characterization of the Perez index pattern in time-series actually offers a way to improve the Langley plot.The improvement is completely automated and objective because qualitative observation of distinct airmass during the Langley measurement is no longer necessary.
In other words, the characterization of the Perez index pattern provides an objective way to identify and filter potential contaminated data from the Langley plot.The identification is based on the reasoning that a perfect Langley plot should exhibit an ideally gradual evolution pattern that has no negative derivatives at any time within the measurement period.That would mean any instances of data that have negative derivatives are likely to be contaminated by cloud cover, aerosol loading or unstable atmospheric turbidity.Using this rule, we identified several instances of potentially contaminated data on each Langley plot.To visualize this filtration procedure, Day 5 is selected as an example.Fig. 7(a) shows the original Langley plot and Fig. 7(b) shows the improved Langley plot at 500 nm for Day 5. Data P1 is the initial point, so its derivative Perez index is unable to be determined.Data P2 is the second data point, and its derivative Perez index (-0.14,see Fig. 6(b)) was calculated with respect to its preceding point, which is  P1.For the next data points (P3 to P6), the respective derivative values are still calculated with respect to P1.This sequence is continued until a positive derivative is obtained; in this case the sequence is stopped at P7.Thereafter, calculation of derivative value for P8 follows the normal sequence with respect to the preceding point.A similar practice is followed again when a negative derivative is obtained.For example, the next negative derivative value lies on P12; hence, the derivative value for Data P13 was calculated with respect to P11 instead of the preceding point.Finally, all data with negative derivative values were identified and filtered.
The working principle of the improved Langley plot is highly dependent on the initial point used.The calibration success hinges on correctly locating the initial point when the optical depth of atmosphere is constant.Considering that linear regression itself is not robust, the new improved Langley plot may be quite different for different initial points, especially when it is ambiguous or missing due to some unavoidable reasons.To tackle this issue, a sensitivity test was performed on Day 5 for seven cases using different initial points, as shown in Table 3. Case 0 assumes no missing data from P1 to P7, while Case 1 assumes P1 is missing, Case 2 assumes P1 and P2 is missing and so on till Case 7.
There are 38 datums originally.In Case 1, when P1 is assumed to be unavailable, the total datums should be 37.Thus, m was calculated by 37-n (n = 25), which gives m = 12.In this way, when calculating f, ΔV o and Δτ a , Data P1 is not included in the Langley plot.In Case 2, when P1 and P2 are assumed to be unavailable, the total datums should be 36.Thus, m was calculated by 36-n (n = 24), which gives m = 12.The same goes for Case 3 through Case 7. The number of datums, n, for each case depends on the resulting dataset after implementing the proposed algorithm for filtering data points that exhibited negative derivatives.The sensitivity test shows that when a correct initial point is located, the resulting improved Langley plot produces quite consistent AM0 extrapolated values with low aerosol optical depth, τ a , and high R 2 .For example, Case 0 on Day 5 can be considered as a viable Langley plot, considering its low τ a and high R after the treatment using the proposed algorithm.In other words, Case 0 positively located the correct initial point to effect a robust Langley regression for reliable AM0 extrapolated values.The same goes for Case 3, 4, 6 and 7, where all resulting Langley plots denote low τ a and high R.However, out of the seven cases, three cases (Case 1, 2 and 5) resulted into fictitious Langley plots with remarkably low R 2 and high τ a .These Langley plots are obviously erroneous and failed to locate the correct initial point for the treatment to recover a usable Langley plot.When using the new method, f becomes smaller in all cases but ΔV o and Δτ a remained unchanged as the calculation of both parameters is referenced to the original Langley plot.
The value Δτ a merely indicates the degree of the aftermath effects of the treatment in recovering a useful Langley plot.On Day 5, the original Langley plot was partly contaminated by cloud loadings, especially in the range with early air masses.Therefore, large ΔV o and Δτ a are expected in such cases for reasonable f.Therefore, Cases 0, 3, 4, 6 and 7 on Day 5 can be considered as viable Langley plots after the treatment using the proposed algorithm, considering their low Δτ a and high R 2 .The viability of the improved Langley plot is subject to two main characteristics: (1) low aerosol optical depth, τ a , and (2) high correlation strength, R 2 , of the resulting Langley plot.In addition, a fictitious Langley plot is possible when the resulting Langley plot after treatment shows good correlation R 2 but high τ a .

Performance Analysis
Table 4 depicts the important information in each Langley plot before and after the filtration.On average, all improved Langley plots showed better correlation of the best fitted line, higher than R 2 > 0.88, compared to the normal Langley plot.The highest correlation after the filtration was observed on Day 4, with 0.98, followed by Day 5 (0.95), Day 1 (0.91) and Day 2 (0.93), whereas the lowest was on Day 3 (0.88).As has been previously discussed, high correlation alone is not robust enough to define a good Langley plot.Therefore, we have no intention of justifying the improved Langley plot with highest R 2 as the best Langley plot.In fact, the Langley plot on Day 2 remains the best regression line amongst all other days after the improvement.It is justified by examining the fraction of filtered data, f.A dataset that most likely represents the ideal atmospheric conditions for the Langley plot contains practically less contaminated data.A dataset that has the minimal fraction of filtered data most likely fulfils this criterion.Here, we present the results in Table 4, where Day 2 has the least fraction of 0.22, followed by Day 3 (0.26) and Day 5 (0.39).This fraction number also reflects the feasibility of the improved Langley plot.When the fraction number is small, the corresponding filtered dataset is more useful for yielding a reliable Langley plot than that with a greater fraction number.When the fraction number gets larger, it simply implies that the dataset originally contained enormous contaminated data, which could hinder the reliability of the Langley plot even for the improved version.This implication is exemplified on Day 1 and Day 4, when the improved version resulted in half of the data being filtered, and the absolute difference between the normal and the improved extrapolated value, ΔV o , is remarkably high (> 1.66).
Another important parameter that reflects the feasibility of the Langley plot is the aerosol optical depth.This parameter is obtained directly from the slope of the Langley regression line.In general, low aerosol loading conditions are likely to produce better Langley plots due to more stable atmospheric conditions within the Langley measurements.As shown in Table 4, after the improved version, lower aerosol loading was observed for Day 3 and Day 5 with Δτ a at -0.08 and -2.13, respectively.On the other side, in spite of showing a reduction in the AOD value, an increase in AOD after the filtration was observed on Day 1 (0.58), Day 2 (0.08) and Day 4 (0.49).When the metric used to define the reliability of Langley plot was Δτ a , the effect of the value, whether it was increasing (positive) or decreasing (negative), had no great impact on the calibration constant.We believe that all individual Langley plots produced constant V o within a band between ± 1% for a low fraction of filtered data (f < 0.2-0.3).Instead, it is the magnitude of the value that has the greater impact.The proposed cloud screening task with relevance to the improved Langley plot objectively filters data that exhibits negative derivatives from preceding data points within the measurement period.In this way, the resultant plot should have low τ a in terms of magnitude.In any cases where high τ a is obtained, the implication is highly likely to reproduce fictitious reproduction.In our results, Day 1 and 4 are perfect examples to demonstrate this effect, where after the filtration, the day obtained considerably high τ a .This mechanism is partly due to the over-filtration from the cloud screening task imposed by the algorithm.This happens when extremely low DNI values were measured during the early air mass interval and corresponded with extremely high DNI values towards the end of the interval.Under such conditions, a serious gap between the high and low air mass intervals is likely to occur.The gap expansively opens windows for enormous uncertainty when extrapolating to zero air mass and hence incurred large errors that are too random to control.Therefore, we conclude that this fictitious Langley plot is unlikely to be used for calibration and should be avoided.
Under clear-day conditions, a Langley plot gives a stable V o,λ when the data are extrapolated to the top of the atmosphere.The calibration factor k is obtained by dividing the extrapolated values V o,λ with the extraterrestrial constant of the nominal wavelength using ASTM G173-03 Reference Spectra (ASTM, 2012).The calibration constant for each observation day obtained from the normal and the improved Langley method was computed and tabulated in Table 5.The unusually high ψ of 5.63 obtained on Day 5 is explained by the fact that the original dataset of the day was severely contaminated by heavy cloud loading, especially during the early airmass interval (see Figs. 3(e) and 5).The effect of such conditions leads to erroneous overestimation in the AM0 extrapolated value on a normal Langley plot.In the cases of low AOD on Day 2 (0.32) and Day 3 (0.57), the difference in ψ is low, with the forcing less than 0.10.In the cases of high AOD on Day 1 (1.17) and Day 4 (0.82), the difference in ψ is quite large, with the forcing greater than 0.81 and 0.59, respectively.This finding is in agreement with the observation reported by Ningombam et al. (2014) and Verman et al. (2010).Their study revealed that the amount of variation in the extrapolated values V o and AOD is positively correlated with AOD, i.e., an increasing trend of AOD tends to promote higher variation in V o .Hence, the difference between calibration constants obtained from normal and improved Langley plots may be quite small in Table 5. Calibration constant, k obtained from normal and improved method of Langley plot using ASTM G173-03 Reference Spectra.a case with low AOD and a high-altitude clean atmosphere, but it may not be so small in a case with high AOD.Such studies also show that the turbid measurement greatly affects the stability and variability of the calibration constant estimated on a regular basis (Ningombam et al., 2015).It indicates that the variation in the calibration constant before and after the filtration shows little to no significant difference for low AOD and a pristine atmosphere but a significant difference for high AOD and unstable atmospheric conditions.

No
Table 6 shows the Langley regression line obtained from both the normal and improved Langley plots for other wavelengths, at 470 nm, 670 nm and 870 nm.All regression lines are best fitted in linear form, and the same filtration algorithm was used to obtain the improved Langley plot for all wavelengths.Fig. 8 shows the relationship of ΔV o and Δτ a plotted against the wavelength for each observation day.A consistent pattern was observed in that clear days (Day 2 and Day 3) tend to have low ΔV o and Δτ a for all wavelengths, but cloudy days tend to have high ΔV o and Δτ a .In this context, higher variability in both these values indicates a large discrepancy between the normal and improved Langley datasets.Under ideal Langley conditions, perfectly constant atmospheric conditions with no cloud loading are expected for low ΔV o and Δτ a , regardless of varying wavelengths.That means a Langley dataset that exhibits low variability in ΔV o and Δτ a has the highest likelihood of producing more reliable V o , with little effect from cloud cover.Fig. 9 presents this behaviour by plotting the Δτ a against ΔV o .High correlation (R 2 > 0.90) is observed for all wavelengths.When the cloud loading is low, the variability in Δτ a has little or insignificant impact on the variation in the calibration constant V o and vice versa.Besides, our results also show that a weak correlation is seen in the gradual increase in R 2 between ΔV o and Δτ a with an increasing wavelength (see Fig. 9), suggesting that a longer wavelength tends to suffer from higher variability in Langley plots, particularly during high AOD conditions (Ningombam et al., 2015).
To further examine the reliability of the new calibration, the method was applied to another 5 days, from 13 th January till 2 nd March 2016.The measurement site was located at UMS, Sepanggar (6.03°N, 116.12°E, 18 m a.s.l.), which is also in tropics.The same measurement protocol was used to ensure the results obtained are consistent.Fig. 10 shows the characterization of the calculated Perez index for Day 6 till Day 10.It shows that the Perez index follows a gradual evolution pattern for decreasing air mass, indicating the effect of cloud loadings is little or insignificant on all days.The magnitude of the index was in the range from 1.30 to 3.30, which represents partly cloudy sky conditions.There was also no severely punctuated pattern observed between small air mass intervals.By definition, the weather of any day is considered good for a Langley plot.Table 7 summarizes the final product of the improved Langley plot for sun photometer data.Note that after filtration, the improved Langley plot showed a better R 2 and lower τ a for all wavelengths.The method objectively removes cloudy data, assuming that the ever rising calculated Perez index Table 6.Daily AM0 Langley regression at 470 nm, 670 nm and 870 nm before and after filtration.The regression line is best fitted in linear form, y = a + bx. is expected for clean and clear-sky conditions.The resulting regression showed a better correlation and reduced slope of the incline, but the magnitude of the change is small, considering the original dataset was considered good enough for a Langley plot.This finding is consistent with our previous results in that the variation in the calibration constant before and after the filtration shows little or insignificant difference for low AOD and a pristine atmosphere.However, for cases where poor R 2 and high τ a remained even after the filtration, the resultant improved Langley plot was unrealistic and obviously erroneous.These cases could be due to many reasons, such as an incorrect initial point, a severely contaminated original dataset or the improper use of an air mass range.
Considering that linear regression itself is not robust, the new improved Langley plot may be quite different for different initial points used, especially when they are ambiguous or missing due to some unavoidable reason.

Fig. 1 .
Fig. 1.Shading disc of 0.09 m diameter is held 1 m parallel from the sensor to ensure the shading angle θ s to the sensor is the same as the viewing angle θ v of the senso.

Fig. 2 .
Fig. 2. Daily cumulative fraction of direct normal irradiance (DNI) measured during the measurement period over the study area.

Fig. 3 .
Fig. 3. Boxplot (upper) and histogram (lower) of Perez index calculated within the measurement period from Day 1 to Day 5 over the study area.

,
Fig. 5. Effect of (a) before and (b) after removal cloudy data on Langley extrapolation on Day 5.

Fig. 6 .
Fig. 6.Daily diurnal evolution of Perez index calculated using diffuse and global component solar irradiance within the measurement campaign.

Fig. 7 .
Fig. 7. Langley plot at 500 nm on Day 5 (a) before filtration, (b) after filtration.Values presented on lower figure are derivatives of Perez index at distinct airmass.Data with negative derivatives are to be filtered.

Table 4 .
Daily AM0 Langley regression at 500 nm before and after filtration.The regression line is best fitted in linear form, y = a + bx. of data; m: number of filtered data; f: fraction of filtered data; ΔV o : absolute difference in V 0 ; Δτ a : absolute difference in τ a .

Fig. 8 .
Fig. 8. Absolute difference in extrapolated value, ΔV o (a) and AOD, Δτ a (b) plotted in clustered column of wavelength in nm.

Table 2 .
Important details of measurement campaign.

Table 3 .
Sensitivity test performed on Day 5 using seven cases of different initial points.The symbol "x" represents absent data.

Table 7 .
Reliability test of the new calibration method.Daily AM0 Langley regression at 470 nm, 500 nm, 670 nm and 870 nm before and after filtration.The regression line is best fitted in linear form, y = a + bx.