MPTmultiverse
provides a single function,
fit_mpt()
, that performs a multiverse analysis for
multinomial processing tree models. The idea of a multiverse analysis
(Steegen, Tuerlinckx, Gelman, & Vanpaemel, 2016) is to perform and
report all possible combinations of reasonable modeling choices. This
contrasts with a more common analysis approach in which only one
specific analysis (i.e., one path through a garden of forking paths) is
reported.
fit_mpt
performs a multiverse analysis combining two
different factors.
For the frequentist approaches, no pooling (with and without
parametric or nonparametric bootstrap) and complete pooling are
implemented using MPTinR
(Singmann & Kellen, 2013). For the Bayesian approaches, no pooling,
complete pooling, and three different variants of partial pooling are
implemented using TreeBUGS
(Heck, Arnold, & Arnold, 2017).
First of all, make sure that you have installed the latest
version of R
, of all necessary R
packages,
and of JAGS
. To install JAGS
, go to
mcmc-jags.sourceforge.net
and follow installation instructions. After that, install or update the
required R
packages:
Here we use parts of the data from Experiment 1 reported in Bayen and
Kuhlmann (2011) investigating source-memory of participants under two
different cognitive load conditions. The 24 participants in the
load
condition generated a random number between one and
nine every time they heard a tone, which happened every 2 s during the
entire study phase, and said it out loud. The 24 participants in the
no_load
condition performed the study phase without a
secondary task. Participants and both conditions performed the test
phase in the same manner.
We use the same model as Bayen and Kuhlman (2011), model variant 4 of the 2-high threshold source-memory model (2HTSM) introduced by Bayen, Murnane, and Erdfelder (1996). Model variant 4 of the 2HTSM separates observed source-memory data into 4 parameters: Parameter D (item recognition); parameter d (source memory); parameter b (probability of guessing that an item is old); and parameter g (probability of guessing that an item comes from source A versus source B).
Both data and model come with package MPTmultiverse
so
their location can be obtained using function system.file
.
In other applications, the file paths need to be provided by the user to
match the location of data and model on their file system. It often
makes sense to set the working directory to the directory in which the
data and model file resides, either via setwd()
or via the
menu.
# load packages
library("MPTmultiverse")
# If you're running the analysis from an .rmd file, you only need to ensure that
# the .rmd, .eqn, and .csv files are all in the same directory.
# ------------------------------------------------------------------------------
# MPT model definition & data
EQN_FILE <- system.file("extdata", "2HTSM_Submodel4.eqn", package = "MPTmultiverse")
DATA_FILE <- system.file("extdata", "Kuhlmann_dl7.csv", package = "MPTmultiverse")
# if .csv format uses semicolons ";" (German format):
data <- read.csv2(DATA_FILE, fileEncoding = "UTF-8-BOM") ## use correct encoding if necessary
# if .csv format uses commata "," (international format):
# data <- read.csv(DATA_FILE, fileEncoding = "UTF-8-BOM")
# We first take a look at the data using head()
head(data)
#> Subject ExpCond Yee Yeu Yen Yue Yuu Yun Yne Ynu Ynn
#> 1 110 1 6 12 5 2 18 5 0 2 14
#> 2 138 1 11 8 6 3 16 4 2 2 12
#> 3 141 1 9 10 7 1 16 5 2 1 13
#> 4 149 1 10 9 4 0 19 6 0 0 16
#> 5 102 1 12 4 8 5 11 8 0 0 16
#> 6 105 1 14 2 5 1 19 7 2 0 14
## We then plot the response frequencies using plotFreq from the TreeBUGS package
TreeBUGS::plotFreq(data, boxplot = FALSE, eqn = EQN_FILE)
The look at the data.frame
tells us which columns
contain the subject identifier and the variable encoding group
membership. We need to record these variables for the use in
fit_mpt
.
The plot of the individual response frequencies shows quite some individual variability, but nothing concerning.
Next, we prepare the data for the fitting.
COL_ID <- "Subject" # name of the variable encoding subject ID
COL_CONDITION <- "ExpCond" # name of the variable encoding group membership
# Experimental conditions should be labeled in a meaningful way. To accomplish
# this, you may want to use the `factor` function:
unique(data[, COL_CONDITION])
#> [1] 1 2
data[[COL_CONDITION]] <- factor(
data[[COL_CONDITION]]
, levels = c(1:2)
, labels = c("no_load", "load")
)
### check input data frame
head(data)
#> Subject ExpCond Yee Yeu Yen Yue Yuu Yun Yne Ynu Ynn
#> 1 110 no_load 6 12 5 2 18 5 0 2 14
#> 2 138 no_load 11 8 6 3 16 4 2 2 12
#> 3 141 no_load 9 10 7 1 16 5 2 1 13
#> 4 149 no_load 10 9 4 0 19 6 0 0 16
#> 5 102 no_load 12 4 8 5 11 8 0 0 16
#> 6 105 no_load 14 2 5 1 19 7 2 0 14
Every time the package MPTmultiverse
is loaded, it
automatically sets some more or less useful defaults for model
estimation, usage of multiple processor cores, number of posterior
predictive samples, etc. By calling mpt_options()
without
any parameters, you can inspect these default values. If you want to
change them, call mpt_options
with the respective parameter
specified, i.e. mpt_options(n.iter = 1000)
. For testing
purposes, you can also specify mpt_options("test")
, which
is a shorthand for setting fast, but highly unreliable settings. You can
set options to defaults, again, by typing the shorthand
mpt_options("default")
.
The main computations are performed with a call to
fit_mpt
. In the default settings, the ten analysis options
offered by MPTmultiverse
are performed. Type
?fit_mpt
in the R console if you want to see what these
options are and find out more about the parameters of the function. The
help page also contains a comprehensive overview of the results object
returned by fit_mpt
.
Before fitting, we set a seed to make the analysis reproducible and set the options to the default settings.
set.seed(42)
mpt_options("default")
results <- fit_mpt(
model = EQN_FILE
, dataset = DATA_FILE
, data = data
, id = COL_ID
, condition = COL_CONDITION
, core = c("D", "d")
)
After fitting it is a good idea to save the results as a binary
R
data file for later. This is easiest done using
save()
. To save all information necessary to recreate the
analysis we only need to save the results tibble
as it
contains both data and model as attributes (see
str(results, 1)
).
We can automatically create a file name for the file holding the results based on the model and data file.
In the current example this would usually lead to quite a long
filename (e.g., see EQN_FILE
), so one can also use a custom
filename.
One can also directly save in a subfolder of the current working directory (if the subfolder exists).
After computations finished, which may take between a couple of hours
to days, check if model estimation worked by using the function
check_results
.
check_results(results)
#> ## MPTinR: no pooling
#> Based on asymptotic method, proportion of participants with non-identified parameters:
#> condition core proportion
#> 1 load FALSE 0
#> 2 load TRUE 0.0417
#> 3 no_load FALSE 0.0208
#> 4 no_load TRUE 0.0208
#>
#> Based on asymptotic CIs, table of non-identified parameters:
#> condition core parameter n
#> 1 load TRUE d 2
#> 2 no_load FALSE g 1
#> 3 no_load TRUE d 1
#>
#> Based on PB/MLE method, proportion of participants with non-identified parameters:
#> condition core proportion
#> 1 load FALSE 0
#> 2 load TRUE 0.354
#> 3 no_load FALSE 0.0625
#> 4 no_load TRUE 0
#>
#> Based on PB/MLE CIs, table of non-identified parameters:
#> condition core parameter n
#> 1 load TRUE d 17
#> 2 no_load FALSE g 3
#>
#> Based on NPB/MLE method, proportion of participants with non-identified parameters:
#> condition core proportion
#> 1 load FALSE 0
#> 2 load TRUE 0.354
#> 3 no_load FALSE 0.0625
#> 4 no_load TRUE 0
#>
#> Based on NPB/MLE CIs, table of non-identified parameters:
#> condition core parameter n
#> 1 load TRUE d 17
#> 2 no_load FALSE g 3
#>
#>
#> ## MPTinR: complete pooling
#> No convergence problems.
#>
#>
#> ## /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/2HTSM_Submodel4.eqn // /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/Kuhlmann_dl7.csv // no // TreeBUGS // simple:
#> All Rhat < 1.05 .
#> All effect sample sizes > 2000 .
#>
#>
#> ## /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/2HTSM_Submodel4.eqn // /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/Kuhlmann_dl7.csv // complete // TreeBUGS // simple:
#> All Rhat < 1.05 .
#> All effect sample sizes > 2000 .
#>
#>
#> ## /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/2HTSM_Submodel4.eqn // /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/Kuhlmann_dl7.csv // partial // TreeBUGS // trait:
#> All Rhat < 1.05 .
#> All effect sample sizes > 2000 .
#>
#>
#> ## /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/2HTSM_Submodel4.eqn // /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/Kuhlmann_dl7.csv // partial // TreeBUGS // trait_uncorrelated:
#> All Rhat < 1.05 .
#> All effect sample sizes > 2000 .
#>
#>
#> ## /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/2HTSM_Submodel4.eqn // /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/Kuhlmann_dl7.csv // partial // TreeBUGS // beta:
#> All Rhat < 1.05 .
#> 5 core parameters with effect sample size n.eff < 2000 :
#> alph[d], alph[D], bet[d], bet[D], mean[d]
#> 0 auxiliary parameters with effect sample size n.eff < 2000 :
#>
#>
#>
#> ## /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/2HTSM_Submodel4.eqn // /home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/MPTmultiverse/extdata/Kuhlmann_dl7.csv // partial // TreeBUGS // betacpp:
#> All Rhat < 1.05 .
#> 12 core parameters with effect sample size n.eff < 2000 :
#> sd[d], sd[D], alph[d], alph[D], bet[d], bet[D], mean[d], alph[d], alph[D], bet[d], bet[D], d[21]
#> 0 auxiliary parameters with effect sample size n.eff < 2000 :
#>
In this example, for the no-pooling asymptotic approaches the rate of
participants with non-identified parameters is very low. For the
bootstrap-based approaches the results pattern is different. Here we see
that the rate of participants with non-identified parameters in the
load
condition is considerably higher, around .17 versus
.03 in the no_load
condition. Particularly, the d parameter shows problematic
behavior.
For the Bayesian approaches, the betacpp
did not reach
an effective sample size ESS > 2, 000.
Increasing the number of iterations by typing
mpt_options(n.iter = 2e5)
, and re-running, should solve
this problem.
fit_mpt
returns a tibble
with an additional
class, multiverseMPT
, with one row per estimation method.
When using the default setting and if all methods succeed,
fit_mpt
uses ten estimation methods and thus this
tibble
contains ten rows. The first five columns contain
information about data and method and the remaining columns contain the
results. Most of the results columns are list
columns and
inspection of the content is performed most conveniently using packages
dplyr
and tidyr
. We therefore load these
packages before taking a glimpse at the columns.
library("dplyr")
library("tidyr")
glimpse(results)
#> Rows: 10
#> Columns: 18
#> $ model <chr> "/home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/…
#> $ dataset <chr> "/home/mariusbarth/R/x86_64-pc-linux-gnu-library/3.6/…
#> $ pooling <chr> "complete", "no", "no", "no", "no", "complete", "part…
#> $ package <chr> "MPTinR", "MPTinR", "MPTinR", "MPTinR", "TreeBUGS", "…
#> $ method <chr> "asymptotic", "asymptotic", "PB/MLE", "NPB/MLE", "sim…
#> $ est_group <list> [<tbl_df[8 x 9]>], [<tbl_df[8 x 9]>], [<tbl_df[8 x 9…
#> $ est_indiv <list> [<tbl_df[0 x 0]>], [<tbl_df[192 x 11]>], [<tbl_df[19…
#> $ est_rho <list> [<tbl_df[0 x 3]>], [<tbl_df[0 x 3]>], [<tbl_df[0 x 3…
#> $ test_between <list> [<tbl_df[4 x 11]>], [<tbl_df[4 x 11]>], [<tbl_df[4 x…
#> $ test_within <list> [<tbl_df[12 x 14]>], [<tbl_df[12 x 14]>], [<tbl_df[1…
#> $ gof <list> [<tbl_df[1 x 6]>], [<tbl_df[1 x 6]>], [<tbl_df[1 x 6…
#> $ gof_group <list> [<tbl_df[2 x 7]>], [<tbl_df[2 x 7]>], [<tbl_df[2 x 7…
#> $ gof_indiv <list> [<tbl_df[0 x 0]>], [<tbl_df[48 x 8]>], [<tbl_df[48 x…
#> $ fungibility <list> [<tbl_df[0 x 3]>], [<tbl_df[0 x 3]>], [<tbl_df[0 x 3…
#> $ test_homogeneity <list> [<tbl_df[2 x 4]>], [<tbl_df[2 x 4]>], [<tbl_df[2 x 4…
#> $ convergence <list> [<tbl_df[3 x 4]>], [<tbl_df[48 x 5]>], [<tbl_df[48 x…
#> $ estimation <list> [<tbl_df[4 x 2]>], [<tbl_df[1 x 2]>], [<tbl_df[1 x 2…
#> $ options <list> [<tbl_df[1 x 14]>], [<tbl_df[1 x 14]>], [<tbl_df[1 x…
Bayen and Kuhlman (2011) report a difference in the g parameter across conditions (they
actually report an interaction with a further within-subjects factor,
but this is not considered here). Thus, we can take a look at the
difference in g parameter
across conditions and methods in the following manner: We first select
the column containing the results of the between-condition tests,
unnest
this column, and then select only data containing
the relevant parameter.
results %>%
select(pooling:method, test_between) %>%
unnest(cols = test_between) %>%
filter(parameter == "g") %>%
print(width = 150)
#> # A tibble: 10 × 14
#> pooling package method parameter core condition1 condition2
#> <chr> <chr> <chr> <chr> <lgl> <chr> <chr>
#> 1 complete MPTinR asymptotic g FALSE no_load load
#> 2 no MPTinR asymptotic g FALSE no_load load
#> 3 no MPTinR PB/MLE g FALSE no_load load
#> 4 no MPTinR NPB/MLE g FALSE no_load load
#> 5 no TreeBUGS simple g FALSE no_load load
#> 6 complete TreeBUGS simple g FALSE no_load load
#> 7 partial TreeBUGS trait g FALSE no_load load
#> 8 partial TreeBUGS trait_uncorrelated g FALSE no_load load
#> 9 partial TreeBUGS beta g FALSE no_load load
#> 10 partial TreeBUGS betacpp g FALSE no_load load
#> est_diff se p ci_0.025 ci_0.1 ci_0.9 ci_0.975
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.186 0.0316 NA -0.248 -0.227 -0.146 -0.125
#> 2 -0.151 0.0781 0.0587 -0.304 -0.251 -0.0511 0.00189
#> 3 -0.133 0.0808 0.104 -0.292 -0.237 -0.0298 0.0250
#> 4 -0.133 0.0808 0.104 -0.292 -0.237 -0.0298 0.0250
#> 5 -0.145 0.0281 0 -0.200 -0.181 -0.109 -0.0888
#> 6 -0.187 0.0312 0 -0.247 -0.227 -0.147 -0.126
#> 7 -0.184 0.0978 0.0693 -0.370 -0.308 -0.0582 0.0157
#> 8 -0.199 0.0949 0.0462 -0.379 -0.318 -0.0759 -0.00323
#> 9 -0.153 0.0710 0.0360 -0.289 -0.243 -0.0611 -0.0110
#> 10 -0.152 0.0712 0.0385 -0.288 -0.242 -0.0609 -0.00990
Inspecting the differences across the analysis multiverse shows that
the estimated difference is negative in each case and the 95%
credibility/confidence intervals around the estimate do not include zero
for 6 out of the 10 methods. Only the CIs for the no-pooling frequentist
methods as well as the most sophisticated model, the latent trait model,
include 0. However, the 80% CIs do not include zero for all methods.
Taken together, this provides evidence that the g parameter is larger in the
load
compared to the no_load
condition.
In a similar manner, it is also possible to examine differences
between parameter estimates within each group: We first
select
the column containing within-condition tests,
unnest
this column, and then select
only data
containing the relevant parameters.
results %>%
select(pooling:method, test_within) %>%
unnest(cols = test_within) %>%
filter(condition == "no_load") %>%
filter(parameter1 == "d" & parameter2 == "D") %>%
print(width = 150)
#> # A tibble: 10 × 17
#> pooling package method condition parameter1 parameter2 core1
#> <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
#> 1 complete MPTinR asymptotic no_load d D TRUE
#> 2 no MPTinR asymptotic no_load d D TRUE
#> 3 no MPTinR PB/MLE no_load d D TRUE
#> 4 no MPTinR NPB/MLE no_load d D TRUE
#> 5 no TreeBUGS simple no_load d D TRUE
#> 6 complete TreeBUGS simple no_load d D TRUE
#> 7 partial TreeBUGS trait no_load d D TRUE
#> 8 partial TreeBUGS trait_uncorrelated no_load d D TRUE
#> 9 partial TreeBUGS beta no_load d D TRUE
#> 10 partial TreeBUGS betacpp no_load d D TRUE
#> core2 est se statistic df p ci_0.025 ci_0.1 ci_0.9 ci_0.975
#> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 TRUE -0.00493 0.0530 -0.0930 NA 0.926 -0.109 -0.0728 0.0630 0.0989
#> 2 TRUE -0.0126 0.0542 -0.232 22 0.819 -0.125 -0.0842 0.0591 0.0999
#> 3 TRUE -0.0216 0.0527 -0.410 23 0.686 -0.131 -0.0911 0.0479 0.0874
#> 4 TRUE -0.0216 0.0527 -0.410 23 0.686 -0.131 -0.0911 0.0479 0.0874
#> 5 TRUE -0.000551 0.0438 NA NA 0.982 -0.0857 -0.0568 0.0551 0.0853
#> 6 TRUE -0.00221 0.0526 NA NA 0.951 -0.104 -0.0684 0.0657 0.102
#> 7 TRUE -0.000351 0.0743 NA NA 0.973 -0.143 -0.0925 0.0945 0.152
#> 8 TRUE -0.00389 0.0714 NA NA 0.944 -0.144 -0.0924 0.0865 0.141
#> 9 TRUE -0.00505 0.0615 NA NA 0.930 -0.125 -0.0833 0.0741 0.117
#> 10 TRUE -0.00597 0.0604 NA NA 0.911 -0.123 -0.0827 0.0722 0.115
In this example, these comparisons are probably not meaningful, but for other designs this column may be used for within-subjects comparisons.
The analysis output results
is an object of class
multiverseMPT
, that has its own plot()
method.
The plot
method returns ggplot2
objects, which
allows further customization (such as using themes
). Type
?plot.multiverseMPT
to see the documentation of possible
arguments to this method.
To plot group-level parameter estimates use:
To plot between-subjects comparisons across all parameters:
To plot overall goodness-of-fit use:
To plot group-wise goodness-of-fit use: