R/sl3: modern Super Learning with pipelines
A modern implementation of the Super Learner algorithm for ensemble learning and model stacking
Authors: Jeremy Coyle, Nima Hejazi, Ivana Malenica, Oleg Sofrygin
What’s sl3?
sl3 is a modern implementation of the Super Learner algorithm of van
der Laan, Polley, and Hubbard (2007). The Super Learner algorithm
performs ensemble learning in one of two fashions:
- The “discrete” Super Learner can be used to select the best
prediction algorithm among a supplied library of learning algorithms
(“learners” in the
sl3nomenclature) – that is, that algorithm which minimizes the cross-validated risk with respect to some appropriate loss function. - The “ensemble” Super Learner can be used to assign weights to specified learning algorithms (in a user-supplied library) in order to create a combination of these learners that minimizes the cross-validated risk with respect to an appropriate loss function. This notion of weighted combinations has also been called stacked regression (Breiman 1996).
Installation
Install the most recent stable release from GitHub via
devtools:
devtools::install_github("jeremyrcoyle/sl3")Issues
If you encounter any bugs or have any specific feature requests, please file an issue.
Examples
sl3 makes the process of applying screening algorithms, learning
algorithms, combining both types of algorithms into a stacked regression
model, and cross-validating this whole process essentially trivial. The
best way to understand this is to see the sl3 package in action:
set.seed(49753)
suppressMessages(library(data.table))
library(dplyr)
library(SuperLearner)
#> Loading required package: nnls
#> Super Learner
#> Version: 2.0-23-9000
#> Package created on 2017-11-07
library(origami)
#> origami: Generalized Cross-Validation Framework
#> Version: 0.8.2
library(sl3)
# load example data set
data(cpp)
cpp <- cpp %>%
dplyr::filter(!is.na(haz)) %>%
mutate_all(funs(replace(., is.na(.), 0)))
# use covariates of intest and the outcome to build a task object
covars <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs",
"sexn")
task <- sl3_Task$new(cpp, covariates = covars, outcome = "haz")
# set up screeners and learners via built-in functions and pipelines
slscreener <- Lrnr_pkg_SuperLearner_screener$new("screen.glmnet")
glm_learner <- Lrnr_glm$new()
screen_and_glm <- Pipeline$new(slscreener, glm_learner)
SL.glmnet_learner <- Lrnr_pkg_SuperLearner$new(SL_wrapper = "SL.glmnet")
# stack learners into a model (including screeners and pipelines)
learner_stack <- Stack$new(SL.glmnet_learner, glm_learner, screen_and_glm)
stack_fit <- learner_stack$train(task)
#> Loading required package: glmnet
#> Loading required package: Matrix
#>
#> Attaching package: 'Matrix'
#> The following object is masked from 'package:tidyr':
#>
#> expand
#> Loading required package: foreach
#>
#> Attaching package: 'foreach'
#> The following objects are masked from 'package:purrr':
#>
#> accumulate, when
#> Loaded glmnet 2.0-13
preds <- stack_fit$predict()
head(preds)
#> Lrnr_pkg_SuperLearner_SL.glmnet Lrnr_glm
#> 1: 0.35345519 0.36298498
#> 2: 0.35345519 0.36298498
#> 3: 0.24554305 0.25993072
#> 4: 0.24554305 0.25993072
#> 5: 0.24554305 0.25993072
#> 6: 0.02953193 0.05680264
#> Lrnr_pkg_SuperLearner_screener_screen.glmnet___Lrnr_glm
#> 1: 0.36228209
#> 2: 0.36228209
#> 3: 0.25870995
#> 4: 0.25870995
#> 5: 0.25870995
#> 6: 0.05600958Contributions
It is our hope that sl3 will grow to be widely used for creating
stacked regression models and the cross-validation of pipelines that
make up such models, as well as the variety of other applications in
which the Super Learner algorithm plays a role. To that end,
contributions are very welcome, though we ask that interested
contributors consult our contribution guidelines
prior to submitting a pull request.
After using the sl3 R package, please cite the following:
@misc{coyle2017sl3,
author = {Coyle, Jeremy R and Hejazi, Nima S and Malenica, Ivana and
Sofrygin, Oleg},
title = {{sl3}: Modern Pipelines for Machine Learning and {Super
Learning}},
year = {2017},
howpublished = {\url{https://github.com/jeremyrcoyle/sl3}},
url = {http://dx.doi.org/DOI_HERE},
doi = {DOI_HERE}
}
License
© 2017 Jeremy R. Coyle, Nima S. Hejazi, Ivana Malenica, Oleg Sofrygin
The contents of this repository are distributed under the GPL-3 license.
See file LICENSE for details.
References
Breiman, Leo. 1996. “Stacked Regressions.” Machine Learning 24 (1). Springer:49–64.
van der Laan, Mark J., Eric C. Polley, and Alan E. Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1).

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
