Understanding the BAMLSS model frame

Similar to the well-known model.frame() function that is used, e.g., by the linear model fitting function lm(), or for generalized linear models glm(), the bamlss.frame() function extracts a “model frame” for fitting distributional regression models. Internally, the function parses model formulae, one for each parameter of the distribution, using the Formula package infrastructures (Zeileis and Croissant 2010) in combination with model.matrix() processing for linear effects and smooth.construct() processing of the mgcv package to setup design and penalty matrices for unspecified smooth function estimation (???), see also, e.g., the documentation of function s() and te().

The most important arguments are

bamlss.frame(formula, data = NULL, family = "gaussian",
  weights = NULL, subset = NULL, offset = NULL,
  na.action = na.omit, contrasts = NULL, ...)

The argument formula can be a classical model formulae, e.g., as used by the lm() function, or an extended bamlss formula including smooth term specifications like s() or te(), that is internally parsed by function bamlss.formula(). Note that the bamlss package uses special family objects, that can be passed either as a character without the "_bamlss" extension of the bamlss family name (see the manual ?bamlss.family for a list of available families and the corresponding vignette BAMLSS Families), or the family function itself. In addition, all families of the gamlss (???) and gamlss.dist (Stasinopoulos and Rigby 2019) package are supported.

The returned object, a named list of class "bamlss.frame", can be employed with all model fitting engines. The most important elements used for estimation are:

  • x: A named list, the elements correspond to the parameters that are specified within the family object. For each distribution parameter, the list contains all design and penalty matrices needed for modeling (see the upcoming example).
  • y: The response data.
  • family: The processed .

To better understand the structure of the "bamlss.frame" object a print method is provided. For illustration, we simulate data

set.seed(111)
d <- GAMart()

and set up a "bamlss.frame" object for a Gaussian distributional regression model including smooth terms. First, a model formula is needed

f <- list(
  num ~ x1 + s(x2) + s(x3) + te(lon,lat),
  sigma ~ x1 + s(x2) + s(x3) + te(lon,lat)
)

Afterwards the model frame can be computed with

bf <- bamlss.frame(f, data = d, family = "gaussian")

To keep the overview, there is also an implemented print method for "bamlss.frame" objects.

print(bf)
## 'bamlss.frame' structure: 
##   ..$ call 
##   ..$ model.frame 
##   ..$ formula 
##   ..$ family 
##   ..$ terms 
##   ..$ x 
##   .. ..$ mu 
##   .. .. ..$ formula 
##   .. .. ..$ fake.formula 
##   .. .. ..$ terms 
##   .. .. ..$ model.matrix 
##   .. .. ..$ smooth.construct 
##   .. ..$ sigma 
##   .. .. ..$ formula 
##   .. .. ..$ fake.formula 
##   .. .. ..$ terms 
##   .. .. ..$ model.matrix 
##   .. .. ..$ smooth.construct 
##   ..$ y 
##   .. ..$ num 
##   ..$ delete

For writing a new estimation engine, the user can directly work with the model.matrix elements, for linear effects, and the smooth.construct list, for smooth effects respectively. The smooth.construct is a named list which is compiled using the smoothCon() function of the mgcv package using the generic smooth.construct() method for setting up smooth terms.

print(names(bf$x$mu$smooth.construct))
## [1] "s(x2)"       "s(x3)"       "te(lon,lat)"

In this example, the list contains three smooth term objects for parameter mu and sigma.

See also the vignette Estimation Engines presenting more details on how to work with the bamlss.frame().

References

Stasinopoulos, Mikis, and Robert Rigby. 2019. Gamlss.dist: Distributions for Generalized Additive Models for Location Scale and Shape. https://CRAN.R-project.org/package=gamlss.dist.

Umlauf, Nikolaus, Nadja Klein, Achim Zeileis, and Thorsten Simon. 2024. bamlss: Bayesian Additive Models for Location Scale and Shape (and Beyond). https://CRAN.R-project.org/package=bamlss.

Zeileis, Achim, and Yves Croissant. 2010. “Extended Model Formulas in R: Multiple Parts and Multiple Responses.” Journal of Statistical Software 34 (1): 1–13. https://doi.org/10.18637/jss.v034.i01.