Title: | Bindings for Additive TidyModels |
---|---|
Description: | Fit Generalized Additive Models (GAM) using 'mgcv' with 'parsnip'/'tidymodels' via 'additive' <doi:10.5281/zenodo.4784245>. 'tidymodels' is a collection of packages for machine learning; see Kuhn and Wickham (2020) <https://www.tidymodels.org>). The technical details of 'mgcv' are described in Wood (2017) <doi:10.1201/9781315370279>. |
Authors: | Hamada S. Badr [aut, cre] |
Maintainer: | Hamada S. Badr <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.1 |
Built: | 2024-11-06 05:44:03 UTC |
Source: | https://github.com/hsbadr/additive |
additive()
is a way to generate a specification of a model
before fitting and allows the model to be created using
mgcv package in R.
additive( mode = "regression", engine = "mgcv", fitfunc = NULL, formula.override = NULL, family = NULL, method = NULL, optimizer = NULL, control = NULL, scale = NULL, gamma = NULL, knots = NULL, sp = NULL, min.sp = NULL, paraPen = NULL, chunk.size = NULL, rho = NULL, AR.start = NULL, H = NULL, G = NULL, offset = NULL, subset = NULL, start = NULL, etastart = NULL, mustart = NULL, drop.intercept = NULL, drop.unused.levels = NULL, cluster = NULL, nthreads = NULL, gc.level = NULL, use.chol = NULL, samfrac = NULL, coef = NULL, discrete = NULL, select = NULL, fit = NULL ) ## S3 method for class 'additive' update( object, parameters = NULL, fitfunc = NULL, formula.override = NULL, family = NULL, method = NULL, optimizer = NULL, control = NULL, scale = NULL, gamma = NULL, knots = NULL, sp = NULL, min.sp = NULL, paraPen = NULL, chunk.size = NULL, rho = NULL, AR.start = NULL, H = NULL, G = NULL, offset = NULL, subset = NULL, start = NULL, etastart = NULL, mustart = NULL, drop.intercept = NULL, drop.unused.levels = NULL, cluster = NULL, nthreads = NULL, gc.level = NULL, use.chol = NULL, samfrac = NULL, coef = NULL, discrete = NULL, select = NULL, fit = NULL, fresh = FALSE, ... ) additive_fit(formula, data, ...)
additive( mode = "regression", engine = "mgcv", fitfunc = NULL, formula.override = NULL, family = NULL, method = NULL, optimizer = NULL, control = NULL, scale = NULL, gamma = NULL, knots = NULL, sp = NULL, min.sp = NULL, paraPen = NULL, chunk.size = NULL, rho = NULL, AR.start = NULL, H = NULL, G = NULL, offset = NULL, subset = NULL, start = NULL, etastart = NULL, mustart = NULL, drop.intercept = NULL, drop.unused.levels = NULL, cluster = NULL, nthreads = NULL, gc.level = NULL, use.chol = NULL, samfrac = NULL, coef = NULL, discrete = NULL, select = NULL, fit = NULL ) ## S3 method for class 'additive' update( object, parameters = NULL, fitfunc = NULL, formula.override = NULL, family = NULL, method = NULL, optimizer = NULL, control = NULL, scale = NULL, gamma = NULL, knots = NULL, sp = NULL, min.sp = NULL, paraPen = NULL, chunk.size = NULL, rho = NULL, AR.start = NULL, H = NULL, G = NULL, offset = NULL, subset = NULL, start = NULL, etastart = NULL, mustart = NULL, drop.intercept = NULL, drop.unused.levels = NULL, cluster = NULL, nthreads = NULL, gc.level = NULL, use.chol = NULL, samfrac = NULL, coef = NULL, discrete = NULL, select = NULL, fit = NULL, fresh = FALSE, ... ) additive_fit(formula, data, ...)
mode |
A single character string for the prediction outcome mode. Possible values for this model are "unknown", "regression", or "classification". |
engine |
A single character string specifying what computational
engine to use for fitting. Possible engines are listed below.
The default for this model is |
fitfunc |
A named character vector that describes how to call
a function for fitting a generalized additive model. This defaults
to |
formula.override |
Overrides the formula; for details see
|
family |
This is a family object specifying the distribution and link to use in
fitting etc (see |
method |
The smoothing parameter estimation method. |
optimizer |
An array specifying the numerical optimization method to use to optimize the smoothing
parameter estimation criterion (given by |
control |
A list of fit control parameters to replace defaults returned by
|
scale |
If this is positive then it is taken as the known scale parameter. Negative signals that the scale parameter is unknown. 0 signals that the scale parameter is 1 for Poisson and binomial and unknown otherwise. Note that (RE)ML methods can only work with scale parameter 1 for the Poisson and binomial cases. |
gamma |
Increase this beyond 1 to produce smoother models. |
knots |
this is an optional list containing user specified knot values to be used for basis construction.
For most bases the user simply supplies the knots to be used, which must match up with the |
sp |
A vector of smoothing parameters can be provided here.
Smoothing parameters must be supplied in the order that the smooth terms appear in the model
formula. Negative elements indicate that the parameter should be estimated, and hence a mixture
of fixed and estimated parameters is possible. If smooths share smoothing parameters then |
min.sp |
Lower bounds can be supplied for the smoothing parameters. Note
that if this option is used then the smoothing parameters |
paraPen |
optional list specifying any penalties to be applied to parametric model terms.
|
chunk.size |
The model matrix is created in chunks of this size, rather than ever being formed whole.
Reset to |
rho |
An AR1 error model can be used for the residuals (based on dataframe order), of Gaussian-identity
link models. This is the AR1 correlation parameter. Standardized residuals (approximately
uncorrelated under correct model) returned in
|
AR.start |
logical variable of same length as data, |
H |
A user supplied fixed quadratic penalty on the parameters of the GAM can be supplied, with this as its coefficient matrix. A common use of this term is to add a ridge penalty to the parameters of the GAM in circumstances in which the model is close to un-identifiable on the scale of the linear predictor, but perfectly well defined on the response scale. |
G |
Usually |
offset |
Can be used to supply a model offset for use in fitting. Note
that this offset will always be completely ignored when predicting, unlike an offset
included in |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
start |
Initial values for the model coefficients. |
etastart |
Initial values for the linear predictor. |
mustart |
Initial values for the expected response. |
drop.intercept |
Set to |
drop.unused.levels |
by default unused levels are dropped from factors before fitting. For some smooths involving factor variables you might want to turn this off. Only do so if you know what you are doing. |
cluster |
|
nthreads |
Number of threads to use for non-cluster computation (e.g. combining results from cluster nodes).
If |
gc.level |
to keep the memory footprint down, it can help to call the garbage collector often, but this takes a substatial amount of time. Setting this to zero means that garbage collection only happens when R decides it should. Setting to 2 gives frequent garbage collection. 1 is in between. Not as much of a problem as it used to be, but can really matter for very large datasets. |
use.chol |
By default |
samfrac |
For very large sample size Generalized additive models the number of iterations needed for the model fit can
be reduced by first fitting a model to a random sample of the data, and using the results to supply starting values. This initial fit is run with sloppy convergence tolerances, so is typically very low cost. |
coef |
initial values for model coefficients |
discrete |
experimental option for setting up models for use with discrete methods employed in |
select |
If this is |
fit |
If this argument is |
object |
A Generalized Additive Model (GAM) specification. |
parameters |
A 1-row tibble or named list with main
parameters to update. If the individual arguments are used,
these will supersede the values in |
fresh |
A logical for whether the arguments should be modified in-place of or replaced wholesale. |
... |
Other arguments passed to internal functions. |
formula |
A GAM formula, or a list of formulae (see |
data |
A data frame or list containing the model response variable and
covariates required by the formula. By default the variables are taken
from |
The arguments are converted to their specific names at the
time that the model is fit. Other options and argument can be
set using set_engine()
. If left to their defaults
here (NULL
), the values are taken from the underlying model
functions. If parameters need to be modified, update()
can be
used in lieu of recreating the object from scratch.
The data given to the function are not saved and are only used
to determine the mode of the model. For additive()
, the
possible modes are "regression" and "classification".
The model can be created by the fit()
function using the
following engines:
mgcv: "mgcv"
An updated model specification.
Engines may have pre-set default arguments when executing the model fit call. For this type of model, the template of the fit calls are:
additive() |> set_engine("mgcv") |> translate()
## Generalized Additive Model (GAM) Specification (regression) ## ## Computational engine: mgcv ## ## Model fit template: ## additive::additive_fit(formula = missing_arg(), data = missing_arg(), ## weights = missing_arg())
mgcv-package
,
gam
,
bam
,
gamObject
,
gam.models
,
smooth.terms
,
predict.gam
,
plot.gam
,
summary.gam
,
gam.side
,
gam.selection
,
gam.control
,
gam.check
,
vis.gam
,
family.mgcv
,
formula.gam
,
family
,
formula
,
update.formula
.
additive() show_model_info("additive") additive(mode = "classification") additive(mode = "regression") set.seed(2020) dat <- gamSim(1, n = 400, dist = "normal", scale = 2) additive_mod <- additive() |> set_engine("mgcv") |> fit( y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat ) summary(additive_mod$fit) model <- additive(select = FALSE) model update(model, select = TRUE) update(model, select = TRUE, fresh = TRUE)
additive() show_model_info("additive") additive(mode = "classification") additive(mode = "regression") set.seed(2020) dat <- gamSim(1, n = 400, dist = "normal", scale = 2) additive_mod <- additive() |> set_engine("mgcv") |> fit( y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat ) summary(additive_mod$fit) model <- additive(select = FALSE) model update(model, select = TRUE) update(model, select = TRUE, fresh = TRUE)