---
title: "Get started with `additive`"
author: "Hamada S. Badr"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    keep_md: true
vignette: >
  %\VignetteIndexEntry{Getting Started}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r results='hide', message=FALSE, warning=FALSE}
library(additive)
```

```{r results='hide', message=FALSE, warning=FALSE}
library(recipes)
library(workflows)
```

Let's simulate a data using `mgcv` package, which is automatically loaded by `additive`.

```{r}
set.seed(2020)
dat <- gamSim(1, n = 400, dist = "normal", scale = 2)
```

In a first step, we use the `recipes` package to prepare (a recipe for) the data.

```{r}
test_recipe <- dat |>
  recipe() |>
  update_role(y, new_role = "outcome") |>
  update_role(x0, x1, x2, x3, new_role = "predictor") |>
  step_normalize(all_numeric_predictors())
```

```{r}
print(test_recipe)
```

Above, we not only define the roles of the relevant variables but also
normalized all numeric predictors to facilitate model fitting later on.
In the next step, we use `additive` to set up a basic model structure.

```{r}
test_model <- additive(
    family = gaussian(),
    method = "REML"
  ) |>
  set_engine("mgcv") |>
  set_mode("regression")
```

```{r}
print(test_model)
```

The `additive` function is the main function of the package to initialize a
Generalized Additive Model (GAM). We can set up a lot of the information directly
within the function or update the information later on, via the `update` method.
For example, if we didn't specify the family initially or set it to something else
that we now wanted to change, we could use the `update` method as follows


```{r}
test_model <- test_model |>
  update(family = gaussian())
```

Next, we define a workflow via the `workflows` package, by combining the above
defined data processing recipe and the model plus the actual model formula to be
passed to the `mgcv` engine.

```{r}
test_workflow <- workflow() |>
  add_recipe(test_recipe) |>
  add_model(
    spec = test_model,
    formula = y ~ s(x0) + s(x1) + s(x2) + s(x3)
  )
```

```{r}
print(test_workflow)
```

We are now ready to fit the model by calling the `fit` method
with the data set we want to train the model on.

```{r results='hide', echo = FALSE}
run_on_linux <- grepl("linux", R.Version()$os, ignore.case = TRUE)
```

```{r results='hide', eval = run_on_linux}
test_workflow_fit <- test_workflow |>
  fit(data = dat)
```

```{r eval = run_on_linux}
print(test_workflow_fit)
```

To extract the parsnip model fit from the workflow

```{r eval = run_on_linux}
test_fit <- test_workflow_fit |>
  extract_fit_parsnip()
```

The `gamObject` object can be extracted as follows

```{r eval = run_on_linux}
gam_fit <- test_workflow_fit |>
  extract_fit_engine()
```

```{r eval = run_on_linux}
class(gam_fit)
```

We can use the trained workflow, which includes the fitted model, to
conveniently `predict` using new data without having to worry about all
the data reprocessing, which is automatically applied using the workflow
preprocessor (recipe).

```{r}
newdata <- dat[1:5, ]
```

```{r eval = run_on_linux}
test_workflow_fit |>
  predict(
    new_data = newdata,
    type = "conf_int",
    level = 0.95
  )
```

To add the standard errors on the scale of the linear predictors

```{r eval = run_on_linux}
test_workflow_fit |>
  predict(
    new_data = newdata,
    type = "conf_int",
    level = 0.95,
    std_error = TRUE
  )
```