An anomaly in pH data in South Africa ’ s national water quality monitoring database – implications for future use

The South African national water quality database (Water Management System) houses data records from several environmental monitoring programmes, including the National Chemical Monitoring Programme (NCMP). The NCMP comprises an extensive surface water quality monitoring programme, managed by the Department of Water and Sanitation (DWS). The purpose of this technical note is to alert users to a systematic anomaly recently observed in the pH dataset of the NCMP, reflected in an abrupt increase between preand post-1990 data records. Although the cause of the anomaly in pH could not be confirmed with high confidence, an inappropriate acid rinse procedure in pre-1990 analytical methods was identified as the most likely cause, based on available evidence. This was supported by the variation in relative sensitivity when comparing the effect on waters with different buffering capacities, i.e., water with low buffering capacity (represented by total alkalinity < 10 mg/L, as CaCO3) showing the largest anomaly, compared with waters of higher buffering capacity (represented by total alkalinity > 30 mg/L, as CaCO3) showing the smallest anomaly. Historical pH data records in the NCMP (i.e. pre1990), therefore should be used with caution, especially in more weakly buffered systems. The possibility of reconstructing data using a correction factor derived from detailed statistical analyses of the post-1990 pH characteristics at selected sites is a possible solution that could be investigated in future. A key lesson learnt is the need to be diligent in capturing detailed meta-data on sampling procedures and analytical methods in datasets spanning several generations. Availability of such information is critical in order to provide users with a means of evaluating the suitability and comparability of data records in long-term datasets. The DWS includes such meta-data in the current version of the database, dating from about 1995 onwards.


INTRODUCTION
The chemistry of freshwater ecosystems is influenced by various factors, including geology, climatic conditions, soils, geomorphology and biological activity (Day et al., 1998;Bluth and Kump, 1994;Huizenga, 2011).pH is used to describe the hydrogen activity in a solution, by defining the acidity or basicity of water, largely determined by factors such as carbonate (CO 3 2- ) and bicarbonate (HCO 3 -) ions released during chemical weathering (Kumbar, 2003;Huizenga, 2011).Underlying geology comprises various rock formations with different chemical composition, which contribute different quantities and proportions of ions to surface water and groundwater, influencing pH amongst other chemical variables (Davies and Day, 1998).However, biological processes and anthropogenic sources, such as effluents from industries, mining and agriculture, also can influence pH in surface and ground waters (Weber and Strumm, 1963;Davies and Day, 1998).
Total alkalinity defines the ability of natural waters to neutralise acid within a system when assaulted by the same amount of acid, also referred to as its buffering capacity (Weber and Strumm, 1963;Davies and Day, 1998).Alkalinity of natural waters is mainly determined by the soil and bedrock through which it passes (e.g.Kney and Brandes, 2007), where the main sources are rocks and soils which contain carbonate, bicarbonate, and hydroxide compounds.Liu et al. (2000) found alkalinity values in the Chesapeake Bay drainage basin to be primarily related to the presence or absence of carbonate bedrock.Kney and Brandes (2007) found watersheds with alkalinity less than 30 mg/L (as CaCO 3 ) to be underlain by siliciclastic and crystalline bedrock, while watersheds underlain with carbonate bedrock had higher alkalinity (> 30 mg/L as CaCO 3 ).Jarvis et al. (2006) found very low alkalinity (< 10mg/L as CaCO 3 ) to be associated with high coloured, Yorkshire moorland waters, draining base-poor sandy soils (Dimbleby, 1952).These moorland waters are comparable with waters draining fynbos-vegetated soils of the Table Mountain Group in South Africa (Midgley and Schafer, 1992;Allanson et al., 1990;Day et al., 1998;Lahav et al., 2001).Streams affected by acid mine drainage in the Mpumalanga area may have a pH as low as 2.3, but they are neutralised by the calcium in dolomite formations (Harrison, 1958).
The South African national water quality database stores data records from several environmental monitoring programmes, including the National Chemical Monitoring Programme (NCMP), under the management of the national Department of Water and Sanitation (DWS).The NCMP comprises extensive surface water monitoring programmes that have collected data from lakes, dams and rivers across South Africa since the 1960s (Huizenga 2011;Huizenga et al., 2013).Chemical variables monitored include major cations, anions, compounds and indicators, such as: Ca 2+ (calcium), Mg 2+ (magnesium), K + (potassium), Na + (sodium) and NH 4 + -N
Scientists at the Council for Scientific and Industrial Research (CSIR) recently detected an abrupt increase in pH records in the NCMP for rivers of the southern Cape around 1990.A screening of the complete NCMP pH database then indicated a systematic anomaly in all pH records reflected as an abrupt increase from pre-to post-1990.The purpose of this technical note, therefore, is to alert users of the NCMP database of the anomaly in pH data, to pose a potential cause, to test the sensitivity of the anomaly among waters with varying buffering capacity (i.e. using total alkalinity as indicator) and, finally, to highlight implications for future use.

CHANGES IN PH ANALYTICAL METHODS
The abrupt anomaly between pre-1990 and post-1990 data records across the entire pH dataset in the NCMP pointed to a systematic analytical error.Investigations into historical documentation on analytical methods, as well as anecdotal sources, revealed that the laboratories of the DWS changed their analytical methods for pH around 1990.
The pre-1990 bubble-segmented continuous flow method used a different (incorrect) rinse procedure to clean the pH probe between sample analyses (e.g.Verhoef and Engelbrecht, 1977) compared with the present method (RQIS, 2017).The pre-1990 rinse procedure comprised an HCl solution at pH 2.5 (Fig. 1).Residual acid from the rinse phase might have caused the error in the pH readings, giving a lower sample pH reading than what it would have been under ambient conditions.Although it could not be confirmed with certainty that this analytical error was the only cause of the systematic anomaly in pH results, it is considered the most likely cause based on the available evidence.The present method uses analytical quality purified water for the rinse phase (purified through a Millipore Milli-RX Water Purification System), rinsing twice between measurements to prevent contamination (RQIS, 2017).

Relative sensitivity of waters with different buffering capacity
Because the primary cause of the anomaly was attributed to an erroneous (acidic) rinse procedure in the pre-1990 period, it can be expected that water of lower buffering capacity (i.e.low total alkalinity) would show a greater effect, compared with water with higher buffering capacity (i.e. higher total alkalinity).This is because low buffering capacity waters would have been less efficient in neutralising the residual acid introduced through inappropriate rinsing of the pH probe, compared with water with higher buffering capacity.As a result, the pre-1990 method would have produced a proportionally lower (false) pH reading in lower buffered water compared with higher buffered waters, thus the larger anomaly between pre-and post-1990 data.
To test this hypothesis, pH records from stations in the NCMP were selected that had at least one record per year for the period 1980 to 2016 (Table 1).These were then grouped into three total alkalinity ranges (based on median alkalinity across the study period), as derived from the literature (Dimbleby, 1952;Kney and Brandes, 2007), namely, (i) total alkalinity < 10 mg/L (e.g.representative of waters draining base-poor sandy soils), (ii) 10-30 mg/L (e.g.representative of waters draining siliciclastic and crystalline bedrock), and (iii) greater than 30 mg/L (e.g.representative of water draining carbonate bedrock) (Fig. 2).Statistical analyses, conducted on the pre-and post-1990 datasets in each of the alkalinity ranges are presented in Table 1.
Stations with an alkalinity range of < 10 mg/L (i.e. a lower buffering capacity) (Fig. 2A) showed the greatest median difference of 1.42 (Table 1), followed by stations in an alkalinity range of 10-30 mg/L (Fig. 2B) with a median difference of 0.79 (Table 1).The lowest median difference can be seen in stations that fall within an alkalinity range of > 30 mg/L (i.e. a high buffering capacity) (Fig. 2C) of 0.44 (Table 1).

IMPLICATIONS FOR FUTURE USE
The inappropriate acid rinse procedure in the pre-1990 method is considered the most likely cause based on available evidence, which has since been corrected.This likely cause is supported by the relative sensitivity when comparing the effect on water with different buffering capacity, i.e., water with low buffering capacity (represented by total alkalinity < 10 mg/L, as CaCO 3 ) showing the largest shift in pH between pre-and post-1990 data, and water of higher buffering capacity (represented by > 30 mg/L, as CaCO 3 ) showing the least shift.
Scientists therefore should be cautious when using the historical pH data records in the NCMP (i.e., pre-1990), especially in more weakly buffered systems.The possibility of reconstructing data using a correction factor derived from detailed statistical analyses of the post-1990 pH characteristics at selected sites is posed as a potential solution that could be investigated in future.

Figure 1
Schematic illustrating highlighting the wash procedure in the pH analytical method applied pre-1990 (the acid rinse channel is circled).

762
A key lesson learnt is the need to be diligent in capturing detailed meta-data on sampling procedures and analytical methods in datasets spanning several generations.Availability of such information is critical in order to provide scientists and managers with a means of evaluating the suitability and comparability of data records in long-term datasets.The DWS includes such meta-data in the current version of the NCMP database, dating from about 1995.