Along with methods for improving the estimation of generalized linear models (see `iteration`

vignette), **brglm2** provides *pre-fit* and *post-fit* methods for the detection of separation and of infinite maximum likelihood estimates in binomial response generalized linear models.

The key methods are `detect_separation`

and `check_infinite_estimates`

and this vignettes describes their use.

**Note:**

‘detect_separation’ and `check_infinite_estimates`

will be removed from ‘brglm2’ at version 0.8. New versions are now maintained in the ‘detectseparation’ R package at <URL: https://cran.r-project.org/package=detectseparation>. In order to use the version in ‘detect_separation’ load first ‘brglm2’ and then ‘detectseparation’, i.e. ‘library(brglm2); library(detectseparation)’.

Heinze and Schemper (2002) used a logistic regression model to analyze data from a study on endometrial cancer. Agresti (2015, Section 5.7) provide details on the data set. Below, we fit a probit regression model with the same linear predictor as the logistic regression model in Heinze and Schemper (2002).

```
library("brglm2")
data("endometrial", package = "brglm2")
modML <- glm(HG ~ NV + PI + EH, family = binomial("probit"), data = endometrial)
theta_mle <- coef(modML)
summary(modML)
#>
#> Call:
#> glm(formula = HG ~ NV + PI + EH, family = binomial("probit"),
#> data = endometrial)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.47007 -0.67917 -0.32978 0.00008 2.74898
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 2.18093 0.85732 2.544 0.010963 *
#> NV 5.80468 402.23641 0.014 0.988486
#> PI -0.01886 0.02360 -0.799 0.424066
#> EH -1.52576 0.43308 -3.523 0.000427 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 104.90 on 78 degrees of freedom
#> Residual deviance: 56.47 on 75 degrees of freedom
#> AIC: 64.47
#>
#> Number of Fisher Scoring iterations: 17
```

As is the case for the logistic regression in Heinze and Schemper (2002), the maximum likelihood (ML) estimate of the parameter for `NV`

is actually infinite. The reported, apparently finite value is merely due to false convergence of the iterative estimation procedure. The same is true for the estimated standard error, and, hence the value `r round(coef(summary(modML))["NV", "z value"], 3)`

for the \(z\)-statistic cannot be trusted for inference on the size of the effect for `NV`

.

‘Lesaffre and Albert (1989, Section 4) describe a procedure that can hint on the occurrence of infinite estimates. In particular, the model is successively refitted, by increasing the maximum number of allowed IWLS iterations at east step. At east step the estimated asymptotic standard errors are divided to the corresponding ones from the first fit. If the sequence of ratios diverges, then the maximum likelihood estimate of the corresponding parameter is minus or plus infinity. The following code chunk applies this process to `modML`

.

```
check_infinite_estimates(modML)
#> Warning: 'check_infinite_estimates' will be removed from 'brglm2' at version
#> 0.8. A new version of 'check_infinite_estimates' is now maintained in the
#> 'detectseparation' package.
#> (Intercept) NV PI EH
#> [1,] 1.000000 1.000000e+00 1.000000 1.000000
#> [2,] 1.320822 1.710954e+00 1.352585 1.523591
#> [3,] 1.410578 5.468665e+00 1.483265 1.617198
#> [4,] 1.413505 3.225927e+01 1.490753 1.618462
#> [5,] 1.413559 3.311464e+02 1.490923 1.618484
#> [6,] 1.413560 5.786011e+03 1.490924 1.618484
#> [7,] 1.413560 1.704229e+05 1.490924 1.618484
#> [8,] 1.413560 3.232598e+06 1.490924 1.618484
#> [9,] 1.413560 5.402789e+06 1.490924 1.618484
#> [10,] 1.413560 9.295902e+06 1.490924 1.618484
#> [11,] 1.413560 2.290248e+07 1.490924 1.618484
#> [12,] 1.413560 3.953686e+07 1.490924 1.618484
#> [13,] 1.413560 3.953686e+07 1.490924 1.618484
#> [14,] 1.413560 3.953686e+07 1.490924 1.618484
#> [15,] 1.413560 3.953686e+07 1.490924 1.618484
#> [16,] 1.413560 3.953686e+07 1.490924 1.618484
#> [17,] 1.413560 3.953686e+07 1.490924 1.618484
#> [18,] 1.413560 3.953686e+07 1.490924 1.618484
#> [19,] 1.413560 3.953686e+07 1.490924 1.618484
#> [20,] 1.413560 3.953686e+07 1.490924 1.618484
```

Clearly, the ratios of estimated standard errors diverge for `NV`

.

`detect_separation`

tests for the occurrence of complete or quasi-complete separation in datasets for binomial response generalized linear models, and finds which of the parameters will have infinite maximum likelihood estimates. `detect_separation`

relies on the linear programming methods developed in Konis (2007).

`detect_separation`

is *pre-fit* method, in the sense that it does not need to estimate the model to detect separation and/or identify infinite estimates. For example

```
endometrial_sep <- glm(HG ~ NV + PI + EH, data = endometrial,
family = binomial("logit"),
method = "detect_separation")
#> Warning: 'detect_separation' will be removed from 'brglm2' at version 0.8. A
#> new version of 'detect_separation' is now maintained in the 'detectseparation'
#> package.
endometrial_sep
#> Separation: TRUE
#> Existence of maximum likelihood estimates
#> (Intercept) NV PI EH
#> 0 Inf 0 0
#> 0: finite value, Inf: infinity, -Inf: -infinity
```

The `detect_separation`

method reports that there is separation in the data, that the estimates for `(Intercept)`

, `PI`

and `EH`

are finite (coded 0), and that the estimate for `NV`

is plus infinity. So, the actual maximum likelihood estimates are

```
coef(modML) + endometrial_sep$betas
#> (Intercept) NV PI EH
#> 2.18092821 Inf -0.01886444 -1.52576146
```

and the estimated standard errors are

If you found this vignette or **brglm2**, in general, useful, please consider citing **brglm2** and the associated paper. You can find information on how to do this by typing `citation("brglm2")`

.

Agresti, A. 2015. *Foundations of Linear and Generalized Linear Models*. Wiley Series in Probability and Statistics. Wiley.

Heinze, G., and M. Schemper. 2002. “A Solution to the Problem of Separation in Logistic Regression.” *Statistics in Medicine* 21: 2409–19.

Konis, Kjell. 2007. “Linear Programming Algorithms for Detecting Separated Data in Binary Logistic Regression Models.” DPhil, University of Oxford. https://ora.ox.ac.uk/objects/uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a.

Lesaffre, E., and A. Albert. 1989. “Partial Separation in Logistic Discrimination.” *Journal of the Royal Statistical Society. Series B (Methodological)* 51 (1): 109–16. http://www.jstor.org/stable/2345845.