# Selecting Distributions

## Model selection

It can be difficult to select a ‘best fitting’ distribution when modeling species sensitivity data with low sample sizes. In these situations, model averaging can be used to fit multiple distributions and calculate a weighted average HCx and confidence limits (Schwarz and Tillmanns 2009). Distributions selected to use in model averaging must be bounded by zero given that effect concentrations cannot be negative. Furthermore, the selected distributions should provide a variety of shapes to capture the diversity of shapes in empirical species sensitivity distributions.

By default, the ssdtools uses Akaike Information Criterion corrected for small sample size (AICc) as a measure of relative quality of fit for different distributions and as the basis for calculating the model-averaged HCx.
However, if two or more similar models fit the data well then the support for this type of shape will be over-inflated (Burnham and Anderson 2002).

## Default Distributions

To avoid such model redundancy (Burnham and Anderson 2002), ssdtools and the accompanying Shiny app (Dalgarno 2018) fit three different shape distributions by default: the log-normal, log-logistic and gamma.

The three distributions are plotted below with a mean of 2 and standard deviation of 2 on the (natural) log concentration scale or around 7.4 on the concentration scale.

library(ssdtools)
library(ggplot2)

set.seed(7)
ssd_plot_cdf(ssd_match_moments(meanlog = 2, sdlog = 2))

### Log-normal distribution

The log-normal distribution was selected as the starting distribution given the data are for effect concentrations.

The log-normal distribution does have a couple limitations to consider when fitting species sensitivity data. First, on the logarithmic scale, the normal distribution is symmetrical. Second, the log-normal distribution decays too quickly in the tails giving narrow tails which may not adequately fit the data. Additional distributions were selected to compensate for these deficiencies.

### Log-logistic distribution

The log-logistic distribution is often used as a candidate SSD primarily because of its analytic tractability (Aldenderg and Slob 1993). We included it because it has wider tails than the log-normal and because it is a specific case of the more general Burr family of distributions (Burr 1942, Shao 2000).

### Gamma distribution

The gamma distribution is a two-parameter distribution commonly used to model failure times or time to events. For use in modeling species sensitivity data, the gamma distribution has two key features that provide additional flexibility relative to the log-normal distribution: 1) it is non-symmetrical on the logarithmic scale; and 2) it has wider tails. The Weibull distribution was considered as an alternative but the Gamma distribution is generally more flexible.

## All Distributions

For completeness the hazard concentrations for the other main distributions provided by ssdtools are plotted below with a mean of 2 and standard deviation of 2 on the (natural) log concentration scale or around 7.4 on the concentration scale.

set.seed(7)
ssd_plot_cdf(ssd_match_moments(dists = c(
"gamma",
"gompertz", "lgumbel", "llogis",
"lnorm", "weibull"
), meanlog = 2, sdlog = 2)) +
theme(legend.position = "bottom")

## References

Aldenberg, T., and Slob, W. 1993. Confidence Limits for Hazardous Concentrations Based on Logistically Distributed NOEC Toxicity Data. Ecotoxicology and Environmental Safety 25(1): 48–63. https://doi.org/10.1006/eesa.1993.1006.

Burnham, K.P., and Anderson, D.R. 2002. Model Selection and Multimodel Inference. Springer New York, New York, NY. https://doi.org/10.1007/b97636.

Burr, I. W. 1942. Cumulative frequency functions. Annals of Mathematical Statistics. 13:2, 215–232.

Dalgarno, D. 2018. ssdtools: A shiny web app to analyse species sensitivity distributions. Prepared by Poisson Consulting for the Ministry of the Environment, British Columbia. https://bcgov-env.shinyapps.io/ssdtools

Schwarz, C.J. and A.R. Tillmanns. 2019. Improving statistical methods to derive species sensitivity distributions. Water Science Series, WSS2019-07, Province of British Columbia, Victoria.

Shao, Q. 2000. Estimation for hazardous concentrations based on NOEC toxicity data: an alternative approach. : 13.