--- title: "Get started with `additive`" author: "Hamada S. Badr" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: keep_md: true vignette: > %\VignetteIndexEntry{Getting Started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r results='hide', message=FALSE, warning=FALSE} library(additive) ``` ```{r results='hide', message=FALSE, warning=FALSE} library(recipes) library(workflows) ``` Let's simulate a data using `mgcv` package, which is automatically loaded by `additive`. ```{r} set.seed(2020) dat <- gamSim(1, n = 400, dist = "normal", scale = 2) ``` In a first step, we use the `recipes` package to prepare (a recipe for) the data. ```{r} test_recipe <- dat |> recipe() |> update_role(y, new_role = "outcome") |> update_role(x0, x1, x2, x3, new_role = "predictor") |> step_normalize(all_numeric_predictors()) ``` ```{r} print(test_recipe) ``` Above, we not only define the roles of the relevant variables but also normalized all numeric predictors to facilitate model fitting later on. In the next step, we use `additive` to set up a basic model structure. ```{r} test_model <- additive( family = gaussian(), method = "REML" ) |> set_engine("mgcv") |> set_mode("regression") ``` ```{r} print(test_model) ``` The `additive` function is the main function of the package to initialize a Generalized Additive Model (GAM). We can set up a lot of the information directly within the function or update the information later on, via the `update` method. For example, if we didn't specify the family initially or set it to something else that we now wanted to change, we could use the `update` method as follows ```{r} test_model <- test_model |> update(family = gaussian()) ``` Next, we define a workflow via the `workflows` package, by combining the above defined data processing recipe and the model plus the actual model formula to be passed to the `mgcv` engine. ```{r} test_workflow <- workflow() |> add_recipe(test_recipe) |> add_model( spec = test_model, formula = y ~ s(x0) + s(x1) + s(x2) + s(x3) ) ``` ```{r} print(test_workflow) ``` We are now ready to fit the model by calling the `fit` method with the data set we want to train the model on. ```{r results='hide', echo = FALSE} run_on_linux <- grepl("linux", R.Version()$os, ignore.case = TRUE) ``` ```{r results='hide', eval = run_on_linux} test_workflow_fit <- test_workflow |> fit(data = dat) ``` ```{r eval = run_on_linux} print(test_workflow_fit) ``` To extract the parsnip model fit from the workflow ```{r eval = run_on_linux} test_fit <- test_workflow_fit |> extract_fit_parsnip() ``` The `gamObject` object can be extracted as follows ```{r eval = run_on_linux} gam_fit <- test_workflow_fit |> extract_fit_engine() ``` ```{r eval = run_on_linux} class(gam_fit) ``` We can use the trained workflow, which includes the fitted model, to conveniently `predict` using new data without having to worry about all the data reprocessing, which is automatically applied using the workflow preprocessor (recipe). ```{r} newdata <- dat[1:5, ] ``` ```{r eval = run_on_linux} test_workflow_fit |> predict( new_data = newdata, type = "conf_int", level = 0.95 ) ``` To add the standard errors on the scale of the linear predictors ```{r eval = run_on_linux} test_workflow_fit |> predict( new_data = newdata, type = "conf_int", level = 0.95, std_error = TRUE ) ```