bamlss.Rd
This is the main model fitting function of the package. Function bamlss()
is a wrapper function that parses the data
and the model formula
, or
extended bamlss.formula
, as well as the bamlss.family
into a bamlss.frame
. The bamlss.frame
then holds all model
matrices and information that is needed for setting up estimation engines.
The model matrices are based on mgcv
infrastructures, i.e.,
smooth terms are constructed using smooth.construct
and
smoothCon
. Therefore, all mgcv
model term constructors like
s
, te
, t2
and ti
can be used. Identifiability conditions are imposed using function gam.side
.
After the bamlss.frame
is set up function bamlss()
applies optimizer
and/or sampling functions. These functions can also be provided by the user. See the details
below on how to create new engines to be used with function bamlss()
.
Finally, the estimated parameters and/or samples are used to create model output results like summary statistics or effect plots. The computation of results may also be controlled by the user.
bamlss(formula, family = "gaussian", data = NULL,
start = NULL, knots = NULL, weights = NULL,
subset = NULL, offset = NULL, na.action = na.omit,
contrasts = NULL, reference = NULL, transform = NULL,
optimizer = NULL, sampler = NULL, samplestats = NULL,
results = NULL, cores = NULL, sleep = NULL,
combine = TRUE, model = TRUE, x = TRUE,
light = FALSE, ...)
A formula or extended formula, i.e., the formula
can be a
list
of formulas where each list entry specifies the details of one parameter
of the modeled response distribution, see bamlss.formula
. For incorporating
smooth terms, all model term constructors implemented in mgcv
such as
s
, te
and ti
can be used, amongst others.
A bamlss.family
object, specifying the details of the modeled
distribution such as the parameter names, the density function, link functions, etc.
Can be a character without the "_bamlss"
extension of the
bamlss.family
name.
A data.frame
or list
containing the model
response variable(s) and covariates specified in the formula
.
By default the variables are taken from environment(formula)
:
typically the environment from which bamlss
is called.
A named numeric vector containing starting values to be send to the optimizer
and/or sampler
function. For a possible naming convention for the parameters see
function parameters
, but this is not restrictive and engine specific.
An optional list containing user specified knots, see the documentation of
function gam
.
Prior weights on the data.
An optional vector specifying a subset of observations to be used in the fitting process.
Can be used to supply model offsets for use in fitting.
A function which indicates what should happen when the data
contain NA
's. The default is set by the na.action
setting of options
, and is na.omit
if set to NULL
.
An optional list. See the contrasts.arg
of
model.matrix.default
.
A character
specifying a reference category, e.g., when
fitting a multinomial model.
A transformer function that is applied on the bamlss.frame
.
See, e.g., function randomize
and bamlss.engine.setup
.
An optimizer function that returns, e.g., posterior mode estimates
of the parameters as a named numeric vector. The default optimizer function is
opt_bfit
. If set to FALSE
, no optimizer function will be used.
A sampler function that returns a matrix of samples, the columns represent the
parameters, the rows the iterations. The returned matrix must be coerced to an object of
class "mcmc"
, see as.mcmc
. The default sampler function is
sam_GMCMC
. If set to FALSE
, no sampler function will be used.
A function computing statistics from samples, per default function
samplestats
is used. If set to FALSE
, no samplestats
function
will be used. Note that this option is crucial for very large datasets, as computing
statistics from samples this way may be very time consuming!
A function computing results from the parameters and/or samples, e.g., for
creating effect plots, see function link{results.bamlss.default}
. If set FALSE
no results
function will be used.
An integer specifying the number of cores that should be used for the sampler
function. This is based on function mclapply
of the parallel
package.
Time the system should sleep before the next core is started.
If samples are computed on multiple cores, should the samples be combined into
one mcmc
matrix?
If set to FALSE
the model frame used for modeling is not part of the
return value.
If set to FALSE
the model matrices are not part of the return value.
Should the returned object be lighter, i.e., if light = TRUE
the returned
object will not contain the model.frame and design and penalty matrices are deleted.
Arguments passed to the transformer
, optimizer
, sampler
,
results
and samplestats
function.
The main idea of this function is to provide infrastructures that make it relatively easy to create estimation engines for new problems, or write interfaces to existing software packages.
The steps that are performed within the function are:
First, the function parses the data
, the formula
or the extended
bamlss.formula
as well as the bamlss.family
into a model frame
like object, the bamlss.frame
. This object holds all necessary model matrices
and information that is needed for subsequent model fitting engines. Per default,
all package mgcv
smooth term constructor functions like
s
, te
, t2
and
ti
can be used (see also function smooth.construct
),
however, even special user defined constructors can be included, see the manual of
bamlss.frame
.
In a second step, the bamlss.frame
can be transformed, e.g., if a mixed
model representation of smooth terms is needed, see function randomize
.
Then an optimizer function is started, e.g., a function that finds posterior mode estimates of the parameters. A convention for model fitting engines is that such functions should have the following arguments:
optimizer(x, y, family, start, weights, offset, ...)
Internally, function bamlss()
will send the x
object that holds all
model matrices, the response y
object, the family
object, start
ing
values for the parameters, possible weights
and offset
s of the created
bamlss.frame
to the
optimizer function (see the manual of bamlss.frame
for more details on the
x
, y
and other objects). The job of the optimizer is to return a named numeric
vector of optimum parameters. The names of the parameters should be such that they can be
uniquely mapped to the corresponding model matrices in x
. See function
parameters
for more details on parameter names. The default optimizer function
is opt_bfit
. The optimizer can return more information than only the optimum
parameters. It is possible to return a list, the convention here is that an element named
"parameters"
then holds the named vector of estimated parameters. Possible other return
values could be fitted values, the Hessian matrix, information criteria or information
about convergence of the algorithm, etc. Note that the parameters are simply added to the
bamlss.frame
in an (list) entry named parameters
.
After the optimization step, a sampler
function is started. The arguments of such
sampler functions are the same as for the optimizer
functions
sampler(x, y, family, start, weights, offset, ...)
Sampler functions must return a matrix of samples, each row represents one iteration and the matrix
can be coerced to mcmc
objects. The function may return a list of samples,
e.g., if multiple chains are returned each list entry then holds one sample matrix of
one chain. The column names of the sample matrix should be the same as the names of estimated
parameters. For a possible naming convention see function parameters
, which
ensures unique mapping of samples with the model matrices in the x
object of the
bamlss.frame
. The samples are added to the bamlss.frame
in an (list) entry named samples
.
Next, the samplestats
function is applied. This function can compute any quantity
from the samples and the x
object, the arguments of such functions are
samplestats(samples, x, y, family, ...)
where argument samples
are the samples returned from the sampler
function,
and x
, y
and family
are the same objects as passed to the optimizer
and or sampler functions. For example, the default function in bamlss()
for this task
is also called samplestats
and returns the mean of the log-likelihood and the
log-posterior computed of all samples, as well as the DIC.
The last step is to compute more complex information about the model using the
results
function. The arguments of such results
functions are
results(bamlss.frame, ...)
here, the full bamlss.frame
including possible parameters
and
samples
is passed to the function within bamlss()
. The default function
for this task is results.bamlss.default
which returns an object of class
"bamlss.results"
for which generic plotting functions are and a summary
function is provided. Hence, the user can control the output of the model, the plotting
and summary statistics, too.
Note that function transform()
, optimizer()
, sampler()
, samplestats()
and results()
can be provided from the bamlss.family
object, e.g.,
if a bamlss.family
object has an element named "optimizer"
, which
represents a valid optimizer function such as opt_bfit
, exactly this optimizer
function will be used as a default when using the family.
An object of class "bamlss"
. The object is in principle only a slight extension
of a bamlss.frame
, i.e., if an optimizer
is applied it will hold the
estimated parameters in an additional element named "parameters"
. If a sampler function
is applied it will additionally hold the samples in an element named "samples"
.
The same mechanism is used for results
function.
If the optimizer
function computes additional output next to the parameters, this will
be saved in an element named "model.stats"
. If a samplestats
function is applied,
the output will also be saved in the "model.stats"
element.
Additionally, all functions that are called are saved as attribute "functions"
in the
returned object.
Umlauf N, Klein N, Zeileis A (2019). BAMLSS: Bayesian Additive Models for Location, Scale and Shape (and Beyond). Journal of Computational and Graphical Statistics, 27(3), 612--627. doi:10.1080/10618600.2017.1407325
Umlauf N, Klein N, Simon T, Zeileis A (2021). bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond). Journal of Statistical Software, 100(4), 1--53. doi:10.18637/jss.v100.i04
if (FALSE) ## Simulated data example.
d <- GAMart()
f <- num ~ s(x1) + s(x2) + s(x3) + te(lon, lat)
b <- bamlss(f, data = d)
#> Error in eval(expr, envir, enclos): object 'd' not found
summary(b)
#> Error in eval(expr, envir, enclos): object 'b' not found
plot(b)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'b' not found
plot(b, which = 3:4)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'b' not found
plot(b, which = "samples")
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'b' not found
## Use of optimizer and sampler functions:
## * first run optimizer,
b1 <- bamlss(f, data = d, optimizer = opt_bfit, sampler = FALSE)
#> Error in eval(expr, envir, enclos): object 'd' not found
print(b1)
#> Error in eval(expr, envir, enclos): object 'b1' not found
summary(b1)
#> Error in eval(expr, envir, enclos): object 'b1' not found
## * afterwards, start sampler with staring values,
b2 <- bamlss(f, data = d, start = coef(b1), optimizer = FALSE, sampler = sam_GMCMC)
#> Error in eval(expr, envir, enclos): object 'd' not found
print(b2)
#> Error in eval(expr, envir, enclos): object 'b2' not found
summary(b2)
#> Error in eval(expr, envir, enclos): object 'b2' not found
## Continue sampling.
b3 <- continue(b2, n.iter = 12000, burnin = 0, thin = 10)
#> Error in eval(expr, envir, enclos): object 'b2' not found
plot(b3, which = "samples")
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'b3' not found
plot(b3, which = "max-acf")
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'b3' not found
plot(b3, which = "max-acf", burnin = 500, thin = 4)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'plot': object 'b3' not found