gfboost_vignette

library(gfboost)
#> This is mboost 2.9-3. See 'package?mboost' and 'news(package  = "mboost")'
#> for a complete list of changes.

The gfboost package extends the mboost package (Hothorn et al. (2017), Hofner et al. (2014), Hothorn et al. (2010), Bühlmann and Hothorn (2007), Hofner, Boccuto, and Göker (2015), Hothorn and Bühlmann (2006)) as it provides an implementation of a so-called ‘’gradient-free Gradient Boosting’’ algorithm that allows for the application of Boosting to non-differentiable and non-convex loss functions as well as a modified, loss-based Stability Selection variant (Werner (2019b), Werner and Ruckdeschel (2019)). The motivation behind this type of Boosting algorithm is the application of Boosting to ranking problems which suffer from complicated, non-continuous loss functions.

Preliminaries

Assumptions

We assume that our training set is given by a data matrix $$\mathcal{D}^{train} \in \mathbb{R}^{n \times (p+1)}$$ where $$n$$ is the number of observations and $$p$$ is the number of predictors, i.e., our data matrix can be written as $$\mathcal{D}^{train}=(X^{train},Y^{train})$$ for the regressor matrix $$X^{train} \in \mathbb{R}^{n \times p}$$ and the response vector $$Y^{train} \in \mathbb{R}^n$$.

The relation between the response and the regressors can be described by some map $$f: \mathbb{R}^p \mapsto \mathbb{R}$$ in the sense that $$y=f(x)+\epsilon$$ for some error term $$\epsilon$$ with mean zero and variance $$\sigma^2 \in ]0,\infty[$$.

The goal is to approximate $$f$$ by a sparse and stable model where ‘’stable’’ refers to the property that the set of selected predictors does only change insignificantly on similar data.

Boosting families

Our package uses the family objects from the package mboost to define new loss functions. We refer to mboost (Hothorn et al. (2017)) for more details. For our applications, we allow for non-differentiable loss functions like ranking losses, i.e., losses that do not provide a gradient, leaving the ngradient argument of the family objects empty. We provide some specific losses for ranking problems. As a simple example, we demonstrate the family Rank():

y<-c(-3, 10.3,-8, 12, 14,-0.5, 29,-1.1,-5.7, 119)
yhat<-c(0.02, 0.6, 0.1, 0.47, 0.82, 0.04, 0.77, 0.09, 0.01, 0.79)
Rank()@risk(y,yhat)
#> [1] 0.1777778

The Rank family implements the hard ranking loss (Clémençon, Lugosi, and Vayatis (2008)) which compares the ordering of the two inserted vectors. More precisely, it sums up all misrankings, i.e., all pairs $$(i,j)$$ where $$Y_i$$ is higher/lower than $$Y_j$$ but where $$\hat Y_i$$ is lower/higher than $$\hat Y_j$$. At the end, this sum is standardized such that a hard ranking loss of 0 indicates that there are no misrankings while a hard ranking loss of, say, 0.5, indicates that in the half of all such possible pairs a misranking occurs. We provide further families for the so-called localized and weak ranking loss (Clémençon and Vayatis (2007)), including a normalized variant for the latter.

For more details about ranking problems, see Clémençon, Lugosi, and Vayatis (2008) or Werner (2019a) as well as references therein.

In this section, we describe our gradient-free Gradient Boosting algorithm SingBoost which is based on $$L_2-$$Boosting and allows for loss functions that do not provide a gradient. We further show how to get coefficient paths for the variables and describe an aggregation procedure for SingBoost.

SingBoost

The main function of this package is the singboost function which is based on glmboost (Hothorn et al. (2017)). In this version of the package, we restricted ourselves to linear regression base learners.

The usage of singboost is very similar to that of glmboost. The main difference is that SingBoost includes so-called ‘’singular iterations’’. In contrast to the gradient descent step executed in standard Boosting algorithms where the baselearner that fits the current negative gradient vector best, the singular iterations correspond to a secant step, i.e., the least-squares baselearner that fits the current residuals, evaluated in the (possibly complicated) loss function, best is taken. Therefore, singboost allows for loss functions that do not provide a (unique) gradient, so even a family object that has an empty ngradient argument can be inserted as the family argument of the singboost function.

The motivation behind these singular iterations comes from a measure-theoretical perspective in the sense that the selection frequencies can be regarded as a ‘’column measure’’ (cf. Werner and Ruckdeschel (2019)). Clearly, different loss functions lead to different column measures and potentially to different sets of selected variables, i.e., the columns that are relevant for our target loss function, maybe a ranking loss, and which are not selected by some $$L-$$Boosting for another loss function $$L$$, may be identified with a ‘’singular part’’ of column measures (cf. Werner and Ruckdeschel (2019)). We always use $$L_2-$$Boosting (Bühlmann and Yu (2003), Bühlmann (2006)) as reference Boosting model. Nevertheless, we assume that $$L_2-$$Boosting already selects relevant variables which would make a Boosting algorithm only containing singular iterations and therefore being computationally quite expensive unnecessary. Therefore, SingBoost alternates between singular iterations and standard $$L_2-$$Boosting iterations where the frequency of the singular iterations can be set by the parameter M, i.e., every $$M-$$th iteration is a singular iteration. For the sake of a wide applicability, we do not exclude standard (i.e., differentiable) loss functions. Note that our implementation contains an additional LS argument indicating whether singular iterations are gradient- or secant-based, requiring to be set to TRUE if a non-differentiable loss function is used.

The further input arguments are the number of iterations $$m_{iter}$$ (miter), the learning rate $$\kappa$$ (kap) and the data set $$\mathcal{D}$$ (D). It is important to note that the inserted data set must not contain an intercept column since this column will be added automatically internally in singboost. If one does not want an intercept coefficient, one has to center the data beforehand. Furthermore, the last column must contain the response variable, so singboost always treats the last column as the response column and all other columns as regressor columns. Since a formula argument cannot be inserted, one has to generate the model matrix first if basic functions, interaction terms or categorical variables are included. For the particular case of localized ranking losses, the argument best controls how large the proportion of the best instances is.

Consider the following simple example:

glmres<-glmboost(Sepal.Length~.,iris)
glmres
#>
#>   Generalized Linear Models Fitted via Gradient Boosting
#>
#> Call:
#> glmboost.formula(formula = Sepal.Length ~ ., data = iris)
#>
#>
#>   Squared Error (Regression)
#>
#> Loss function: (y - f)^2
#>
#>
#> Number of boosting iterations: mstop = 100
#> Step size:  0.1
#> Offset:  5.843333
#>
#> Coefficients:
#>      (Intercept)      Sepal.Width     Petal.Length Speciesvirginica
#>      -3.36560001       0.53639026       0.46090783      -0.01924627
#> attr(,"offset")
#> [1] 5.843333
attributes(varimp(glmres))$self #> [1] 0.00 0.38 0.57 0.00 0.00 0.05 attributes(varimp(glmres))$var
#> [1] (Intercept)       Sepal.Width       Petal.Length      Petal.Width
#> [5] Speciesversicolor Speciesvirginica
#> 6 Levels: (Intercept) < Petal.Width < ... < Petal.Length
firis<-as.formula(Sepal.Length~.)
Xiris<-model.matrix(firis,iris)
Diris<-data.frame(Xiris[,-1],iris$Sepal.Length) colnames(Diris)[6]<-"Y" coef(glmboost(Xiris,iris$Sepal.Length))
#>      (Intercept)      Sepal.Width     Petal.Length Speciesvirginica
#>      -3.36560001       0.53639026       0.46090783      -0.01924627
#> attr(,"offset")
#> [1] 5.843333
singboost(Diris)
#> $Selected variables #> [1] "Petal.Length>=Sepal.Width>=Speciesvirginica" #> #>$Coefficients
#> [1]  2.47773332  0.53639026  0.46090783  0.00000000  0.00000000 -0.01924627
#>
#> $Freqs #> [1] 0.00 0.38 0.57 0.00 0.00 0.05 #> #>$VarCoef
#>        Intercept      Sepal.Width     Petal.Length Speciesvirginica
#>       2.47773332       0.53639026       0.46090783      -0.01924627
singboost(Diris,LS=TRUE)
#> $Selected variables #> [1] "Petal.Length>=Sepal.Width>=Speciesvirginica" #> #>$Coefficients
#> [1]  2.47773332  0.53639026  0.46090783  0.00000000  0.00000000 -0.01924627
#>
#> $Freqs #> [1] 0.00 0.38 0.57 0.00 0.00 0.05 #> #>$VarCoef
#>        Intercept      Sepal.Width     Petal.Length Speciesvirginica
#>       2.47773332       0.53639026       0.46090783      -0.01924627

The application to the iris data set should demonstrate that $$L_2-$$Boosting is a special instance of our SingBoost algorithm (Werner and Ruckdeschel (2019)). More precisely, using the squared loss function, our SingBoost algorithm results in exactly the same model and coefficients, provided that the hyperparameters are identical. We did not specify them explicitly in the example, but the number of iterations in singboost is $$m_{iter}=100$$ per default and the learning rate is $$\kappa=0.1$$, mimicking the default setting for these parameters for $$L_2-$$Boosting in mboost (Hothorn et al. (2017)). Note that we did not explicitly insert a Boosting family. Per default, singboost uses Gaussian() which corresponds to standard $$L_2-$$Boosting. The LS argument indicates that we indeed perform so-called singular iterations. Clearly, for the squared loss function, both results are the same since $$L_2-$$Boosting with least-squares baselearners implicitly also takes the best baselearner in the sense that it fits the current residuals best, evaluated in the squared loss (and which is implemented however much more sophisticatedly in mboost by using correlation updates), see e.g. Zhao and Yu (2004), Zhao and Yu (2007). For clarity, we manually renamed the response variable Sepal.Length as Y in advance in the previous example (which is not necessary). singboost reports the selected variables, the coefficients and the selection frequencies, i.e., the relative number of iterations where a particular predictor has been selected. Note that singboost does not report the offset and the intercept separately.

singboost of course also allows for categorical variables, interaction terms etc.. See the following example:

glmres2<-glmboost(Sepal.Length~Petal.Length+Sepal.Width:Species,iris)
finter<-as.formula(Sepal.Length~Petal.Length+Sepal.Width:Species-1)
Xinter<-model.matrix(finter,iris)
Dinter<-data.frame(Xinter,iris$Sepal.Length) singboost(Dinter) #>$Selected variables
#> [1] "Petal.Length>=Sepal.Width.Speciessetosa>=Sepal.Width.Speciesvirginica"
#>
#> $Coefficients #> [1] 4.00831289 0.45662497 0.08896750 0.00000000 0.01751541 #> #>$Freqs
#> [1] 0.00 0.60 0.36 0.00 0.04
#>
#> $VarCoef #> Intercept Petal.Length #> 4.00831289 0.45662497 #> Sepal.Width.Speciessetosa Sepal.Width.Speciesvirginica #> 0.08896750 0.01751541 coef(glmres2) #> (Intercept) Petal.Length #> -1.83502045 0.45662497 #> Sepal.Width:Speciessetosa Sepal.Width:Speciesvirginica #> 0.08896750 0.01751541 #> attr(,"offset") #> [1] 5.843333 The example above shows how to deal with interaction terms. In fact, instead of inserting a formula argument as input for singboost, build the model matrix (without the intercept column) and enter this matrix as input argument. The same procedure is valid for categorical variables or basis functions. Note that we did not yet implement a group structure, i.e., we always treat all columns separately and do not perform block-wise updates as suggested in Gertheiss et al. (2011). We now demonstrate how to apply SingBoost for quantile regression and hard ranking regression: glmquant<-glmboost(Sepal.Length~.,iris,family=QuantReg(tau=0.75)) coef(glmquant) #> (Intercept) Sepal.Width Petal.Length #> -3.1274211 0.5191849 0.4756341 #> attr(,"offset") #> 50% #> 5.8 attributes(varimp(glmquant))$self
#> [1] 0.24 0.27 0.49 0.00 0.00 0.00
singboost(Diris,singfamily=QuantReg(tau=0.75),LS=TRUE)
#> $Selected variables #> [1] "Petal.Length>=Sepal.Width>=Speciesvirginica" #> #>$Coefficients
#> [1]  2.47836568  0.53612658  0.46095525  0.00000000  0.00000000 -0.01925953
#>
#> $Freqs #> [1] 0.00 0.38 0.57 0.00 0.00 0.05 #> #>$VarCoef
#>        Intercept      Sepal.Width     Petal.Length Speciesvirginica
#>       2.47836568       0.53612658       0.46095525      -0.01925953
singboost(Diris,singfamily=QuantReg(tau=0.75),LS=TRUE,M=2)
#> $Selected variables #> [1] "Petal.Length>=Sepal.Width>=Speciesvirginica" #> #>$Coefficients
#> [1]  2.44900980  0.54702733  0.45925162  0.00000000  0.00000000 -0.01196688
#>
#> $Freqs #> [1] 0.00 0.42 0.55 0.00 0.00 0.03 #> #>$VarCoef
#>        Intercept      Sepal.Width     Petal.Length Speciesvirginica
#>       2.44900980       0.54702733       0.45925162      -0.01196688
singboost(Diris,singfamily=Rank(),LS=TRUE)
#> $Selected variables #> [1] "Petal.Length>=Sepal.Width>=Speciesvirginica>=Speciesversicolor" #> #>$Coefficients
#> [1]  2.485989684  0.534163798  0.460436981  0.000000000  0.000344599
#> [6] -0.018630537
#>
#> $Freqs #> [1] 0.00 0.38 0.56 0.00 0.01 0.05 #> #>$VarCoef
#>         Intercept       Sepal.Width      Petal.Length Speciesversicolor
#>       2.485989684       0.534163798       0.460436981       0.000344599
#>  Speciesvirginica
#>      -0.018630537

The last simulation has the argument M=2 which indicates that we alternate between singular and $$L_2-$$Boosting iterations where the second simulation uses the default M=10.

More details concerning the implementation of the SingBoost algorithm can be found in Werner (2019b).

Coefficient paths

If coefficient paths have to be drawn, use the path.singboost function followed by the singboost.plot function instead of the singboost function since only path.singboost also saves the intercept and coefficients paths. In the singboost.plot function, $$M$$ and $$m_{iter}$$ have to be inserted again:

singpath<-path.singboost(Diris)
singboost.plot(singpath,10,100)

Column Measure Boosting (CMB)

CMB intends to aggregate SingBoost models that have been fitted on subsamples of the training data. The main motivation behind this idea is that SingBoost is expected to detect relevant variables for the respective (complicated) loss function that $$L_2-$$Boosting would not select. From a measure-theoretical perspective, these columns may be identified with a ‘’singular part’’ (cf. Werner and Ruckdeschel (2019)). In order to propose the opportunity to study this singular part for specific loss functions, we provide the CMB procedure that aggregates SingBoost models in order to get a more ‘’stable’’ singular part. Note that the CMB procedure does not replace a Stability Selection. Furthermore, if one is only interested in getting a suitable stable model, we recommend to only use a Stability Selection without the CMB procedure by setting Bsing=1 and nsing=ncmb when using the CMB3S function (see below) for the sake of computational costs.

The main idea is to draw $$B^{sing}$$ (Bsing) subsamples with $$n_{sing}$$ (nsing) rows from the training set and to compute the singboost model on each. Then, the models are aggregated where different aggregation procedures can be used, i.e., either a simple average (weights1) or a weighted average based on the reciprocal of the out-of-sample loss (weights2) computed on the complement of the current subsample.

Let us however provide an example for CMB:

set.seed(19931023)
cmb1<-CMB(Diris,nsing=100,Bsing=50,alpha=0.8,singfam=Rank(),
evalfam=Rank(),sing=TRUE,M=10,m_iter=100,
kap=0.1,LS=TRUE,wagg='weights1',robagg=FALSE,lower=0)
cmb1
#> $Column measure #> [1] 0.000 1.000 1.000 0.325 0.500 0.775 #> #>$Selected variables
#> [1] "Sepal.Width>=Petal.Length>=Speciesvirginica>=Speciesversicolor>=Petal.Width"
#>
#> $Variable names #> [1] "Intercept" "Sepal.Width" "Petal.Length" #> [4] "Petal.Width" "Speciesversicolor" "Speciesvirginica" #> [7] "Y" #> #>$Row measure
#>   [1] 0.725 0.650 0.600 0.650 0.750 0.650 0.775 0.700 0.600 0.500 0.725 0.725
#>  [13] 0.675 0.650 0.675 0.700 0.725 0.600 0.575 0.600 0.625 0.750 0.650 0.675
#>  [25] 0.650 0.650 0.750 0.625 0.625 0.650 0.675 0.700 0.700 0.725 0.600 0.600
#>  [37] 0.550 0.675 0.625 0.725 0.700 0.675 0.625 0.600 0.650 0.650 0.650 0.675
#>  [49] 0.825 0.525 0.525 0.750 0.625 0.600 0.625 0.750 0.600 0.700 0.700 0.625
#>  [61] 0.575 0.675 0.625 0.475 0.775 0.600 0.725 0.700 0.675 0.550 0.575 0.700
#>  [73] 0.600 0.700 0.775 0.625 0.700 0.700 0.525 0.700 0.800 0.700 0.650 0.625
#>  [85] 0.650 0.825 0.675 0.725 0.675 0.675 0.800 0.775 0.675 0.650 0.700 0.725
#>  [97] 0.400 0.700 0.775 0.775 0.750 0.625 0.700 0.650 0.650 0.675 0.725 0.725
#> [109] 0.700 0.700 0.625 0.650 0.675 0.600 0.650 0.650 0.525 0.650 0.650 0.650
#> [121] 0.625 0.700 0.600 0.625 0.575 0.675 0.750 0.700 0.700 0.575 0.700 0.650
#> [133] 0.725 0.725 0.700 0.725 0.775 0.700 0.725 0.650 0.725 0.700 0.750 0.525
#> [145] 0.650 0.625 0.725 0.650 0.600 0.725

In this example, we apply SingBoost with the hard ranking loss (inserted through singfam=Rank()) to $$B^{sing}=50$$ subsamples with $$n_{sing}=100$$ rows from the iris data set each. After computing the SingBoost models, their out-of-sample performance is evaluated based on the complement of the subsample used for training (and thus always contains 50 observations) using again the hard ranking loss (inserted through evalfam=Rank()). The parameter alpha=0.8 indicates that we only use the best 80% (i.e., the best 40) of the SingBoost models for aggregation and wagg=‘weights1’ leads to a simple average of the selection frequencies. Finally, the aggregated selection frequencies (which form an empirical aggregated ‘’column measure’‘) are reported as well as the selected variables. The last part of the output, the empirical aggregated’‘row measure’‘, reports the’‘selection frequencies’’ for the rows. This is not trivial since these frequencies are not just the relative numbers of occurences of the respective rows in the training sets due to the partitioning but also contain information about the ‘’quality’’ of the instances since the relative numbers of occurences in the training sets corresponding to the best models are computed.

A loss-based Stability Selection

$$L_2-$$Boosting is known to suffer from overfitting. This drawback is not limited to $$L_2-$$Boosting, but also affects the Lasso, other Boosting models and therefore also SingBoost. Usually, one applies the Stability Selection from Meinshausen and Bühlmann (2010) (for Lasso and its variants) resp. from Hofner, Boccuto, and Göker (2015) (for Boosting models) by computing the models on subsamples of the training set and by selecting a set of stable variables according to some threshold. Motivated by the fact that our SingBoost essentially combines $$L_2-$$Boosting and a secant step according to some other target loss function, we propose a Stability Selection that is based on the performance of the models on some validation set, evaluated in this target loss function.

The main issue concerning the standard Stability Selection is that two of three parameters (number of variables per model, PFER and threshold) have to be inserted in the function stabsel available in the package stabs (Hofner and Hothorn (2017)). However, since the recommendations for these parameters do not necessarily need to hold for SingBoost models, our idea is to define a grid of thresholds or a grid of cardinalities (number of variables in the final (stable) model). Then, we compute SingBoost models on $$B$$ subsamples with $$n_{cmb}$$ (ncmb) rows from the training set and average the selection frequencies. Now, having a grid of thresholds (indicated by setting gridtype=‘pigrid’), we compute the stable model for each threshold by taking the variables whose aggregated selection frequencies are at least as high as the threshold and evaluate their performances on a validation set according to our target loss function. To this end, we either compute SingBoost or least squares coefficients on the reduced data set which only contains the regressor columns corresponding to the stable variables. The stable model with the best validation performance (and with it, implicitly the optimal threshold) is finally reported. Note that the main issue that we learned when working with thresholds is that once the signal to noise ratio is rather low, it frequently happens that no variable passes the threshold which leads to an empty model, see Werner (2019b). Since we never think of an empty model as a reasonable model, we propose to alternatively define a grid of numbers of variables in the final model such that for each number, say $$q$$ (not to confuse with the $$q$$ in Hofner’s Stability Selection where this variable indicates the number of variables selected in each Boosting model), the $$q$$ variables with the highest aggregated selection frequencies enter the stable model (using this type of grid is indicated by setting gridtype=‘qgrid’). The selection of the best model and therefore, the optimal number of variables, is again based on the validation performance.

Note that although our Stability Selection requires the arguments nsing and Bsing, one can easily use it without the CMB aggregation procedure by specifying $$n_{sing}=n_{cmb}$$ and $$B^{sing}=1$$. This is important if one wants to compare our Stability Selection directly with the Stability Selection of Hofner, Boccuto, and Göker (2015).

On the other hand, the aggregation paradigms that we already mentioned in the CMB subsection can be directly applied when aggregating the SingBoost models for the Stability Selection. Per default, our CMB3S function averages the selection frequencies across the models. If a weighted aggregation or the aggregation of only the best models is desired, set $$n_{train}=n_{cmb}$$, $$B=1$$, $$B^{sing}>1$$ and $$n_{sing}<n_{cmb}$$. Effectively, $$B^{sing}$$ subsamples with $$n_{sing}$$ observations each are drawn from the training set, but the aggregation schemes that we mentioned above are available through the arguments alpha and wagg.

Let us now consider the following illustrative example:

set.seed(19931023)
ind<-sample(1:150,120,replace=FALSE)
Dtrain<-Diris[ind,]
Dvalid<-Diris[-ind,]
set.seed(19931023)
cmb3s1<-CMB3S(Dtrain,nsing=80,Dvalid=Dvalid,ncmb=80,Bsing=1,B=100,alpha=0.5,singfam=Gaussian(),
evalfam=Gaussian(),sing=FALSE,M=10,m_iter=100,kap=0.1,LS=FALSE,wagg='weights1',gridtype='pigrid',
grid=seq(0.8,0.9,1),robagg=FALSE,lower=0,singcoef=TRUE,Mfinal=10)
cmb3s1$Fin #> Intercept Sepal.Width Petal.Length #> 2.2572372 0.5979883 0.4636778 cmb3s1$Stab
#> [1] 0.00 1.00 1.00 0.22 0.41 0.52

We ran the CMB3S function where CMB-3S is the acronym for ‘’Column Measure Boosting with SingBoost and Stability Selection’’. We divided the data set into a training set and a validation set. From the training set containing 120 instances, we draw $$B=100$$ times a subsample containing $$n_{cmb}=80$$ instances. On each subsample, we compute the $$L_2-$$Boosting model (since singfam=Gaussian()). Having aggregated the SingBoost selection frequencies and the overall CMB selection frequencies, the 30 instances that were not contained in the initial training set form the validation set for the Stability Selection. We used a grid of thresholds since gridtype=‘pigrid’ and used the thresholds 0.8, 0.9 and 1 (inserted as the grid argument), so for each threshold, the stable model is computed as well as its performance on the validation set. To this end, the required coefficients are computed by SingBoost (since singcoef=TRUE) where each 10-th iteration is a singular iteration (due to Mfinal=10) according to the squared loss (since singfam=Gaussian()). The validation loss is also the squared loss since evalfam=Gaussian(). The CMB3S function outputs the aggregated selection frequencies for each variable as well as the finally selected stable variables. Note that only Sepal.Width and Petal.Length are selected since only these two variables pass the optimal threshold.

Alternatively, one can run the same simulation with a grid of cardinalities by just changing the input arguments gridtype and grid:

set.seed(19931023)
ind<-sample(1:150,120,replace=FALSE)
Dtrain<-Diris[ind,]
Dvalid<-Diris[-ind,]
set.seed(19931023)
cmb3s2<-CMB3S(Dtrain,nsing=80,Dvalid=Dvalid,ncmb=80,Bsing=1,B=100,alpha=0.5,singfam=Gaussian(),
evalfam=Gaussian(),sing=FALSE,M=10,m_iter=100,kap=0.1,LS=FALSE,wagg='weights1',gridtype='qgrid',
grid=seq(1,2,3),robagg=FALSE,lower=0,singcoef=TRUE,Mfinal=10)
cmb3s2$Fin #> Intercept Sepal.Width Petal.Length #> 2.2572372 0.5979883 0.4636778 cmb3s2$Stab
#> [1] 0.00 1.00 1.00 0.22 0.41 0.52

One can see that the optimal number of final variables turned out to be 2, so the stable predictor set is the same as in the previous example.

Optionally, we also implemented a function CV.CMB3S which takes into account that the initial data set is randomly divided into a training and a validation set. Therefore, the CV.CMB-3S procedure cross-validates the Stability Selection itself by using multiple partitions of the initial data set. The main reason to use this function is to compare the performance of our stable models against some competitor models, so the initial data set is not only partitioned into training and validation sets but also a test set is drawn such that theese three sets form a partition of the initial data set. The cross-validated Stability Selection reports the cross-validated loss, evaluated in the target loss function (inserted through targetfam). We strongly recommend to parallelize the cross-validated Stability Selection due to huge comptuational costs.

References

Bühlmann, Peter. 2006. “Boosting for High-Dimensional Linear Models.” The Annals of Statistics, 559–83.

Bühlmann, Peter, and Torsten Hothorn. 2007. “Boosting Algorithms: Regularization, Prediction and Model Fitting.” Statistical Science, 477–505.

Bühlmann, Peter, and Bin Yu. 2003. “Boosting with the $$L_2$$ Loss: Regression and Classification.” Journal of the American Statistical Association 98 (462): 324–39.

Clémençon, S., G. Lugosi, and N. Vayatis. 2008. “Ranking and Empirical Minimization of U-Statistics.” The Annals of Statistics, 844–74.

Clémençon, Stéphan, and Nicolas Vayatis. 2007. “Ranking the Best Instances.” Journal of Machine Learning Research 8 (Dec): 2671–99.

Gertheiss, Jan, Sara Hogger, Cornelia Oberhauser, and Gerhard Tutz. 2011. “Selection of Ordinally Scaled Independent Variables with Applications to International Classification of Functioning Core Sets.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 60 (3): 377–95.

Hofner, Benjamin, Luigi Boccuto, and Markus Göker. 2015. “Controlling False Discoveries in High-Dimensional Situations: Boosting with Stability Selection.” BMC Bioinformatics 16 (1): 144.

Hofner, Benjamin, and Torsten Hothorn. 2017. stabs: Stability Selection with Error Control. https://CRAN.R-project.org/package=stabs.

Hofner, Benjamin, Andreas Mayr, Nikolay Robinzonov, and Matthias Schmid. 2014. “Model-Based Boosting in R: A Hands-on Tutorial Using the R Package Mboost.” Computational Statistics 29 (1-2): 3–35.

Hothorn, Torsten, and Peter Bühlmann. 2006. “Model-Based Boosting in High Dimensions.” Bioinformatics 22 (22): 2828–9.

Hothorn, Torsten, Peter Bühlmann, Thomas Kneib, Matthias Schmid, and Benjamin Hofner. 2010. “Model-Based Boosting 2.0.” Journal of Machine Learning Research 11 (Aug): 2109–13.

———. 2017. mboost: Model-Based Boosting. https://CRAN.R-project.org/package=mboost.

Meinshausen, Nicolai, and Peter Bühlmann. 2010. “Stability Selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 (4): 417–73.

Werner, Tino. 2019a. “A Review on Ranking Problems in Statistical Learning.”

———. 2019b. “Gradient-Free Gradient Boosting.” PhD thesis, Carl von Ossietzky Universität Oldenburg.

Werner, Tino, and Peter Ruckdeschel. 2019. “The Column Measure and Gradient-Free Gradient Boosting.” Submitted. Available on ArXiv, ArXiv: 1909.10960.

Zhao, Peng, and Bin Yu. 2004. “Boosted Lasso.” California University Berkeley Departement of Statistics.

———. 2007. “Stagewise Lasso.” Journal of Machine Learning Research 8 (Dec): 2701–26.