Toxicity testing: ecological relevance and relative efficacy and costs of toxicity tests in the South African context

The Direct Estimation of Ecological Effect Potential (DEEEP) is a suite of toxicological methods that was compiled to facilitate management of effluent discharges. DEEEP used a range of tests to assess different endpoints and test taxa from differing trophic levels. It was used at pilot scale but never adopted in South Africa formally. The use of toxicological testing in managing effluent discharge has been somewhat ad-hoc since. This study examined a range of tests for undertaking toxicological assessments of effluent from the perspectives of ecological realism, test tractability, and cost of testing. The assays assessed include some from DEEEP, some using South African test taxa, and some using commercial toxicity test kits. Results indicate that, in terms of returned endpoints, no clear difference between tests using immobilized and cultured or wildcollected test taxa was present. Culture maintenance was found to be a significant contributor to test costs where cultured test taxa were used (although culture costs are implicit in test kit costs too). Costing analysis looked at scenarios where equipment could be shared and reused, and how these contribute to laboratory costs. The research leads on to suggestions for testing implementation in laboratories while maximizing ecological realism and minimizing costs.


INTRODUCTION
The South African National Water Act (NWA) (No. 36 of 1998) provides for water resource protection through implementation of the Resource Directed Measures (RDM) and Source Directed Controls (SDCs). The RDMs provide quantitative and qualitative resource quality objectives (RQOs) for the quality of the water resource, while SDCs regulate the impact from abstraction of water, set discharge license conditions, and use financial and other measures to regulate water use. These measures aim to ensure adequate water quantity and quality for aquatic ecosystems in order to ensure a state of ecosystem health that will ensure sustainable use of the resource (CSIR, 2010;DWA, 2013).
The use of water use licenses (WULs) to manage abstraction-related impacts is well known in South Africa. Their use in managing the impacts of effluent release to surface water is less well known but significant, and, together with other regulatory tools such as discharge standards, contributes to management of the quality of the resource. An obvious inclusion for water use licenses for effluent discharge is physicochemical parameters of concern that are known from the waste stream or are of importance for the receiving environment. However, it is apparent that assessing particular water quality parameters, such as pH, major salts, metal ions etc., cannot anticipate all potentially negative impacts, and an approach that assesses effluent toxicity is also required (DWAF, 2003).
The Direct Estimation of Ecological Effect Potential (DEEEP) is a suite of toxicological methods compiled by the then Department of Water Affairs and Forestry (DWAF) in South Africa to facilitate management of effluent discharge to receiving water bodies (DWAF, 2003;Slabbert, 2004). DEEEP was intended to assess the toxicological potential of whole effluent using tests that assess oxygen demand, lethal (acute) and sublethal (chronic) toxicity, bioaccumulation, mutagenicity and persistence potential of effluents (Slabbert, 2004). The suite of tests uses taxa from a range of trophic levels. DEEEP showed potential as a way to operationalise toxicity management of water resources, but the approach was only applied at a pilot scale and was never formally adopted (Chapman et al., 2011a).
Despite the lack of formalised implementation of DEEEP, research into approaches to support toxicity testing of effluents and consequent management of water quality has continued. Slabbert and Murray (2011), referring to limited understanding of toxicology on behalf of the regulator hindering the application of toxicological testing, produced a tool to facilitate appropriate toxicological test identification. This tool offered a range of tests that went far beyond those recommended under DEEEP. Chapman et el. (2011a;2011b) reported on local capacity to undertake toxicological testing and quality management and assurance in testing. Wepener and Chapman (2012) proposed approaches for the use of ecotoxicological testing in managing water quality in South Africa. Pearson et al. (2015) reviewed international practices and strategies regarding toxicological testing of effluent, and produced a tool intended to support the application of toxicity testing in water use licensing (largely reflecting on the DEEEP approach). Despite ongoing research and development of methods and tools, use of toxicological testing in water management remains ad-hoc. This is not https://doi.org/10.17159/wsa/2020.v46.i2.8241 surprising as many of the steps required to establish a regulatory framework to undertake routine toxicological monitoring in South Africa have not occurred (Chapman et al., 2011a).
Toxicological testing is most commonly undertaken using a standardized test(s) to assess the impact of the toxicant, effluent or water sample in question on a single, often standardized test organism (though tests using indigenous, multiple taxa and mesocosms are known and arguably provide more ecologically relevant results) (Chapman, 2002;Preston, 2002;Wepener and Chapman, 2012). Traditionally, test taxa are provided by cultures maintained to ensure a supply of relevant test taxa. More recently, commercial test kits have been produced which simplify testing by removing the requirement for cultures by supplying immobilized test taxa as part of the kit. The results from the same tests using cultured taxa and taxa from kits have been compared and, while there are exceptions, the consensus seems to be that no significant differences exist in the endpoints from these methods (Blaise, 2000;Janssen et al., 2000;Mitchell et al., 2002;Daniel et al., 2004;Persoone et al., 2009;Persoone and Wadhia, 2009). Toxicity test kits have been adopted as a tool in aquatic toxicology because they are often fast, require little effluent, and are cost effective (Blaise, 1998;Daniel et al., 2004). While such kits may provide an alternative to traditional culture-based testing, not all taxa can be immobilized and some tests, for example those using vertebrates as test taxa, will remain dependent on cultures for test taxa. In most cases, indigenous taxa will also not be available for use in commercial kit-based tests.
The research reported on here aims to compare the use of toxicological testing using cultured test organisms with tests using commercial toxicity test kits. In addition, the use of tests using standardized and indigenous test taxa were also compared. Comparisons were undertaken on the basis of test tractability, given the test regime, on a range of effluents, endpoints (LC 50 or EC 50 ) produced, and cost of testing.

METHODS
Toxicity tests were undertaken with four whole effluents collected late in 2007 from anonymous Eastern Cape sources consisting of a tannery, a dairy farm, a wastewater treatment works and a textile factory. 125 L of effluent was collected from each source, transported directly to the laboratory, and immediately frozen at −18°C to provide standardized effluent for testing. All frozen effluents were defrosted overnight and warmed to room temperature before testing. The toxicity tests listed in Table 1 were used to assess effluent toxicity. Toxicity tests comprised of tests mandated under DEEEP (Slabbert, 2004), commonly used standard tests and tests using wild-caught or cultured South African taxa.
All tests used the same initial dilution series, viz., 100%, 50%, 25%, 12.5% and 6.25%. Where this series could not produce valid endpoints, tests were repeated on appropriately modified dilution series provided sufficient effluent was available. Dechlorinated tap water was used as the diluent. LC 50 or EC 50 endpoints were used for comparisons between tests as these have been found to be relatively accurate and stable owing to the steepness of the dose-response curve at this point (Mitchell, 2002). Endpoints were calculated using EPAs Probit analysis v1.4, as well as Trimmed Spearman-Karber tests (Hamilton et al., 1977) where data were more appropriate.
The costs of undertaking these tests were determined to assess relative test cost efficiency (including the cost of maintaining cultures of test taxa where required). All costs were classified as either equipment (capital and other equipment re-used over time), consumable (consumables used during testing or culture maintenance), or labour (estimates of time taken in testing or culture maintenance at two relevant pay grades). All costing followed laboratory practice at the Unilever Centre for Environmental Water Quality. Costs were based on 2008 supplier or labour costs adjusted following inflation to 2017 costs. All costs are VAT-exclusive. A complete breakdown of all test costs, culture costs, testing rates, etc., is presented in Griffin et al. (2011).
The costing generated above was used to assess various approaches to cost sharing. Assessment of test costs was undertaken by determining the basic cost of each test (all equipment, consumable and labour costs required to undertake a test and maintain such cultures as may have been requiredthis is effectively the cost associated with equipping a laboratory to undertake a particular test), as well as savings that are accrued owing to reuse of equipment given a particular testing capacity, and sharing of equipment between tests.

Physicochemical data
The results of physicochemical tests on the whole effluents used in toxicity testing are presented in Table 2, and indicate that the effluents varied considerably with respect to all physicochemical parameters assessed. Several of the effluents used exceeded discharge limits for effluents, often with respect to several parameters (DWAF, 2004). In particular, limits for pH, electrical conductivity (salinity), nitrate, orthophosphate, and chemical oxygen demand (COD) were more often exceeded than not. Colour interference was noted in two effluents, which interfered with tests that rely on spectrophotometry, fluorescence or enumeration of smaller taxa to produce results, and constrained the range of dilutions that could be assessed.  (Slabbert, 2004) Daphnia pulex 21 day reproduction test (Slabbert, 2004) Algal (Pseudokirchneriella subcapitata) 72 hr growth inhibition test (Slabbert, 2004) Tests using cultured or wild-collected native taxa Mayfly 10 day lethality test (DWAF, 2000) Algal (

Toxicity
The results, as EC 50 or LC 50 expressed as percentage effluent, of the various toxicity tests undertaken are presented in Table 3. A number of tests did not return valid endpoints for various reasons, including excessive mortality/inhibition at lowest effluent concentrations, insufficient mortality/inhibition at highest effluent concentrations, growth stimulation, and/or colour interference. Where sufficient effluent was available, tests were repeated with more appropriate dilution ranges.
Of the effluents tested, tannery effluent was the most toxic. Despite repetition of many tests with more dilute effluent, this effluent caused excessive mortality/inhibition even at effluent concentrations of 0.01-0.4%. Tannery effluent was also the least tractable as the effluent had a deep colour cast that necessitated dilution of the effluent in order that test organisms could be counted with minimal interference.
Textile effluent returned more valid endpoints than any other effluent. Textile effluent was less toxic than tannery effluent, and did not cause growth stimulation in any toxicity test. It also had a colour cast, though not as pronounced as tannery effluent.
Effluents from a wastewater treatment works and a dairy farm returned relatively few endpoints (2 and 3 respectively) from the range of tests undertaken. In many of the tests, the lack of an endpoint was often a result of limited mortality at high effluent concentrations or growth stimulation. This outcome seems to suggest that these effluents had limited toxicity compared with tannery or textile effluent; nevertheless, where endpoints are comparable between tests, this tentative conclusion is not always supported.
All algal tests were found to show growth stimulation in WWTW and dairy farm effluent. This is likely a function of growth in elevated nutrients in the effluent together with a relative Table 2. Results of physicochemical testing of whole effluents used in toxicity testing. * indicates that results were below General and Special Discharge Limits for wastewater entering a water resource, and ** indicates that samples exceeded the Special Limits only (DWAF, 2004). BD indicates that results were below the test detection limit. absence of other toxicants. For comparison, tannery effluent contained more nutrients than any other effluent assessed, but a combination of other factors led to this being the most toxic of effluents tested.
No single test returned an EC 50 or LC 50 endpoint for all effluents. The C. nilotica juvenile lethality test was the most tractable test assessed here in that it returned endpoints from three of the four effluents. It was also the most sensitive of the tests that returned endpoints in WWTW and dairy effluents. The V. fischeri bioluminescence test returned the next most valid endpoints, though it was not as sensitive. Other tests were less tractable with respect to the effluents tested. Examples include tests using algae, which were not commonly able to return endpoints (often due to growth stimulation in several effluents), and the C. nilotica 10-day lethality test, which returned no valid endpoints and could not be repeated as each test required 120 L of effluent. The D. pulex reproduction test was never undertaken owing to difficulty in production of sufficient viable neonates for testing of any effluent.
No clear differences were found between results from the same tests undertaken using live cultured test taxa and test kits. Selection of culture or kit-based tests should therefore be based on other operational criteria. An example of such a criterion is cost, which is further considered below.

Cost of testing
The cost of introducing any of the toxicological tests assessed here to a laboratory are presented in Table 4. Costs are broken up into start-up equipment costs, costs per test and, where applicable, culture maintenence. All costs presented assume no sharing of equipment between tests, in order that a clear picture of test costs is gained in the absence of any externalities. Potential cost savings owing to equipment reuse and/or sharing are considered below.
The data from Table 4 indicate that costs per test varied considerably with respect to costs per actual test conducted and costs of start-up equipment. The greatest variation in costs related to the start-up equipment. In many of these cases, the costs of the start-up equipment were significantly impacted by the costs of a single specialized item. As an example, the tests using cultured taxa required a source of ultrapure water, and the costs of the purifier increased the costs of all tests. Start-up equipment costs were considerably more variable between tests using cultured or wild-collected data than those between tests using commercial kits.
Another major conclusion is that the costs of culturing of test taxa, where cultures are used, makes up a large part of the cost per test. Culture costs varied with test taxon, with costs of algal culture maintenance being the lowest, while the costs of maintaining cultures of C. nilotica were the highest on a pertest basis. Depending on the taxon used, culture costs could be the greater part of the cost per test and, where not, they were always significant. As tests using commercial kits are cultureindependent, culture costs do not affect the costs per test of tests that used these kits (although such costs will form part of the overall test kit cost). Tests using mayflies had no culture costs as the test taxa were collected from the wild. The culture costs presented here are based on the practices of the laboratory where the research was undertaken, and were produced assuming that toxicity testing used all cultured taxa (i.e. the laboratory was running at full capacity). If the testing rate was lower than this, then the cost-per-test of culturing test taxa would be greater. Naturally, as the culture costs presented here are based on the practices of one laboratory, these cannot be reliably taken to represent all laboratories. Nevertheless, they can be used as an indicator of the relative cost of maintaining cultures for testing.
The costs of introducing a test, as presented in Table 4, do not assess how re-use or sharing of capital equipment might contribute to a decreased overall test cost. The decrease in test cost with repeated testing (without sharing of equipment between tests) is presented in Fig. 1. This plot shows how the up-front capital costs contribute to the overall cost per test and how this varies with repetition from 1 to 10 000 times. For most tests, the marginal contribution of capital equipment to overall test costs had reduced to a minimum after approximately 1 000 repetitions, when test cost was largely determined by the cost of testing and, where applicable, culture costs.
After 1 000 or more tests, the most cost-effective test was the D. pulex 48 hr lethality test undertaken using the Daphtoxkit F pulex commercial kit. This was followed by the V. fischeri 30 min bioluminescence test using the BioTox kit, and then the D. magna 48hr lethality test using the Daphtoxkit F magna kit. At the other end of the scale, the most expensive of the tests assessed was the D. pulex 21 day reproduction test, followed by the mayfly 10 day lethality test and the C. nilotica 10 day lethality test. These extreme results highlight the extra cost in undertaking chronic tests that take longer to run. They also show the potential cost efficiency of use of commercial toxicity test kits.
An interesting comparison can be drawn between the P. subcapitata 72 hr growth inhibition test using cultured algae and the commercial kit. When the number of repetitions is low, the test using the kit is by far the most cost effective. This is because of the very high equipment costs of the test using cultured algae as compared with the kit. However, the per-test costs of undertaking the test using cultured algae are low (the lowest of all tests assessed here), while the per-test costs of the test using a kit are higher as a result of the costs of the kit. As a result, once the test has been repeated enough times that the equipment costs are minimalized as a portion of the per-test cost, the cost of a test using cultured algae is less than the commercial kit test costs. Cost parity between the two tests is reached after 406 test repetitions. If an organisation anticipates undertaking fewer than 406 algal tests, then kits are more cost-effective. Above this point, tests using cultured algae are more cost-effective.
In summary, for all tests assessed, undertaking relatively few tests results in equipment costs making up a significant part of the per-test cost of testing, leading to an inflation of testing costs. Greater efficiencies are achieved when 100 or more tests are undertaken, and once 1 000 tests are completed, the capital costs are minimised with respect to ongoing costs of testing. At this point, the costs of testing are largely determined by the ongoing per-test costs as laid out in Table 4.
All cost analyses presented so far assume that no equipment is shared between tests. This gives an idea of the cost of introducing a particular test to a laboratory, but is essentially unrealistic as equipment would generally be used for multiple tests or functions. In order to assess the costs of adding a test to an equipped laboratory, we analysed the costs of adding a test to a laboratory already equipped for other tests. The clearest example of sharing-induced savings related to tests where effectively the same methods and equipment are used, but with a new test taxon. Examples from the tests assessed here include the P. subcapitata 72 hr growth inhibition test and S. bicaudatus and Chlorella sp. 96 hr growth inhibition tests, which shared all equipment with the result that any of these tests could be added to a laboratory with no extra equipment costs provided that one of the other tests was already in use. Another example of tests that shared all equipment were the D. pulex and D. magna 48 hr lethality tests (Daphtoxkit F pulex and Daphtoxkit F magna, respectively).
Complete sharing of equipment costs did not always require a complete methodological overlap with existing tests. The D. pulex 48 hr lethality test could be added to a laboratory already undertaking the other tests using cultured and wild-collected taxa reported on here at no extra equipment cost. Likewise, the T. thermophila 24 hr growth inhibition test (Protoxkit F) could be added with no extra equipment cost to the test suite of a laboratory undertaking the other commercial kit-based tests assessed here.
Of the tests assessed here, none beyond those mentioned above could be added with no extra equipment costs to an already practising laboratory. However, equipment cost savings encountered while assessing scenarios where a new test is added to the test suite of an existing laboratory were considerable. When determining additional equipment costs attached to adding a test to a laboratory undertaking either tests using cultured/wild-collected taxa, or commercial kits, savings ranged from 0.2-56.8% of basic equipment costs for that test (excluding those tests covered above where no extra equipment cost was incurred). Overall, savings owing to shared equipment reduced the equipment requirement costs of new tests to 18.3% of total equipment requirements in tests using cultured/wildcollected taxa, and to 12.6% of total equipment costs in tests using commercial kits.

DISCUSSION
The ecological realism of undertaking toxicological testing using standard test organisms that might not be present in affected water bodies has been questioned (Chapman, 2002). For this reason, this research included several South African indigenous taxa for comparison with standard test taxa. However, of the standard test taxa that were assessed, several have been recorded from South Africa. This group includes D. pulex (Jarvis et al., 1987), D. magna (Coetzer, 1987), and B. calyciflorus (Jarvis et al., 1987Brain et al., 1995). No published South African records of P. subcapitata were found, although it has been reported from Zimbabwe (Dzinomwa and Ndagurwa, 2017). No records of T. thermophila outside of laboratories were found. As a result, the majority of taxa used in bioassays have ecological realism in South Africa. A simple comparison of results from indigenous and standard taxa was not possible as, for various reasons, many tests did not return valid endpoints.
There were many reasons that tests did not return valid EC 50 endpoints, and several were more common than others. The first relates to the fact that if the effluent is not toxic enough to induce mortality or impact more than 50% of the populations tested when at full concentration, then no EC 50 can be derived. This outcome was common when testing dairy and WWTW effluent. Another relates to the fact that stimulation of some taxa occurred in effluents from the WWTW and dairy, most likely owing to nutrient loading in these effluents combined with limited toxicity. A third reason was that tannery effluent in particular was so toxic that tests needed repeating using a modified dilution series, and sufficient remaining effluent for this was not always available. In some cases, the diluted effluent was still too toxic after repeating the test with a new dilution series.
One of the tests using an indigenous test taxon that proved most tractable in this survey was the C. nilotica juvenile lethality test. The test returned endpoints for three of four test effluents, and the missing result was because the effluent was more toxic than anticipated, and insufficient effluent was available to repeat the test. The test has been used at UCEWQ for more than 15 years (e.g Muller et al., 2004;Mensah et al., 2011;Vellemu et al., 2018) and has proved a valuable adjunct to other toxicological bioassays. C. nilotica is naturally found in South Africa and throughout much of Africa (GBIF, 2019), which confers ecological realism on this assay. The drawback to the use of this test is the cost of maintaining cultures of C. nilotica for testing, although the costs of undertaking the test and the start-up equipment are competitive. However, this needs to be viewed in the light of the requirement of some taxa such as fish, which are widely used in South African aquatic ecotoxicology (e.g. Ansara-Ross et al., 2009;Wepener and Chapman, 2012;Vellemu et al., 2018), to be sourced from culture.
The use of test kits for toxicity testing has been widely adopted as a means of quickly and conveniently introducing a range of tests to a laboratory with no concomitant cultures. Even where capacity exists for culture maintenance, use of the kits means that costs associated with culture maintenance are avoided (although they inherently form part of the cost of the kit). The research reported on here demonstrates that culture maintenance costs are significant. This accords with results from other authors (Persoone and Van de Vel, 1988, Wadhia andClive Thompson, 2007). Avoiding culture costs by using kits with immobilized test taxa for toxicological testing would therefore be an advantage to any organisation undertaking toxicological testing. However, these tests rely on being able to practically and cost-effectively immobilize living organisms in some way that allows them to be easily brought back into a metabolically active state. The lack of viable and cost-effective methods to immobilize all or most live organisms limits the inclusion of all potential test taxa in kits. This includes common taxa used in aquatic toxicological testing such as the fish Danio rerio or Poecilia reticulata. Consequently, many sources of taxa for testing will remain cultures or collection from the wild.
The majority of tests assessed here were relatively quick (96 hours or less), and had mortality or growth inhibition as an endpoint. However, several ran for longer periods with endpoints that included mortality and reproductive success. Of these, none used test kits. Two of the tests, the D. pulex reproduction test and the Mayfly 10 day lethality test, were considerably more costly to undertake than other tests assessed here. This cost is due to the large amount of hands-on time required of laboratory staff. The third longer-term test was the C. nilotica 10 day lethality test. The labour costs of this test were lower leading to an overall lower test cost. Owing to the increased exposure and assessment time required for these tests, labour costs are always going to be significant. However, these tests are valuable as they assess the effect of longer-term exposure of taxa, and the impact of exposure on more than one endpoint. The D. pulex 21 day reproduction test was included in DEEEP for this reason (Slabbert, 2004).
Results from tests comparing results from traditional and kit-based D. magna lethality test and P. subcapitata growth inhibition test showed no differences between standard methods and kits (Daniel et al. 2004). Results presented here illustrate the cost-effectiveness of these kits, and indicate that, overall, the kits are a valuable addition to the suite of currently available methods.
The variation in endpoints between tests with differing test taxa and endpoints was notable, and supports the use of a battery of tests in assessing effluent toxicity. This approach was included in DEEEP (Slabbert, 2004), and has been maintained in aquatic toxicity assessments in South Africa (e.g. Pearson et al., 2015;Singh et al., 2017;Vellemu et al., 2018). The use of a test battery that includes a range of taxa, and ideally different endpoints, provides a more comprehensive indication of the toxic potential of an effluent to aquatic ecosystems. However, greater ecological realism in testing could be achieved by approaches such as in-situ testing, multispecies testing, and trait assessment, and by moving away from reliance on single-species laboratory tests (Calow andForbes, 2003, Schmitt-Jansen et al., 2008;Segner, 2011;Clements et al., 2011).

CONCLUSIONS
The study described here assessed the suitability of commercial toxicity test kits for application in South Africa from the viewpoints of endpoint suitability, cost, and ecological relevance. No difference in endpoints from tests using kits and standard tests using cultured test taxa could be found. However, a failure to return endpoints for all tests meant that not all endpoints could be determined. Commercial toxicity test kits were found to be cost effective in the light of significant costs of maintaining cultures of test organisms. Finally, despite commercial test kits using common test taxa, many of these have been recorded from the wild in South Africa, which adds to the ecological relevance of test kits. By all measures assessed therefore, the commercial toxicity test kits that were assessed are appropriate for use in South Africa.