Ladislas Nalborczyk
Univ. Grenoble Alpes, CNRS, LPNC (France) • Ghent University (Belgium)
How many participants/observations should I obtain for my next experiment, in order to show the effect I want to show ?
Usual algorithm (a priori NHST-PA): trying to guess the population effect size, plugging numbers (effect size and requested power) in the magic hat (the power function), and planning for a specific sample size.
But if I knew what (effect size) I am looking for, I would not bother making an experiment, right ?
Another approach consists in recruiting participants until we reach a predefined level of evidence. This is called sequential testing.
But this requires that we have a way to quantify evidence…
Example of the Sequential Bayes Factor procedure as presented in Schönbrodt, Wagenmakers, Zehetleitner, & Perugini (2017) and Schönbrodt & Wagenmakers (2017).
Quantifying the relative evidence for an hypothesis/model
↳ Bayes Factors
Making decisions while controlling error rates in the long-run
↳ NHST & p-values (à la Neyman-Pearson)
Comparing the (out-of-sample) predictive abilities of models
↳ Information Criteria (e.g., AIC, WAIC)
Hirotugu Akaike noticed that the negative log-likehood of a model + 2 times its number of parameters was approximately equal to the out-of-sample deviance of a model…
\[ \text{AIC} = \underbrace{-2\log(\mathcal{L}(\hat{\theta}|\text{data}))}_{deviance} + 2K \]
In-sample deviance: how bad is a model to explain the current dataset (the dataset that we used to fit the model)
Out-of-sample deviance: how bad is a model to explain a future dataset issued from the same data generation process (the same population)
This is crazy no ?
The individual AIC values are not interpretable in absolute terms as they contain arbitrary constants. We usually rescale them by substracting to the AIC of each model \( i \) the AIC of the model with the minimum one:
\[ \Delta_{AIC} = AIC_{i} - AIC_{min} \]
It is convenient to normalise the model likelihoods such that they sum to 1 and that we can treat them as probabilities.
\[ w_{i}=\dfrac{exp(-\Delta_{i}/2)}{\sum_{r=1}^{R}exp(-\Delta_{r}/2)} \]
The \( w_i \), called Akaike weights, are useful as the weight of evidence in favour of model \( g_i(\cdot |\theta) \), as being the actual Kullback-Leibler best model in the set (the K-L best model).
Evidence can be judged by taking the ratio of Akaike weights \( w_i/w_j \). Such ratios are called evidence ratios (ERs) and represent the evidence about fitted models as to which is better in a Kullback-Leibler information sense (i.e., which model has the lowest out-of-sample deviance).
BFs and AIC-based ERs have different asymptotic properties (because they answer different questions).
The (default t-test) Bayes Factor is said to be consistent for model identification (i.e., for identifying the true model). The AIC is not consistent for model identification, but it is an efficient estimator of the expected K-L information loss.
Simulation studies usually involve comparing two nested models, one of them being the true model (the model from which we generated the data).
mod1 <- lm(y ~ 1, data) # "true" model when effect size = 0
mod2 <- lm(y ~ 1 + x, data) # "true" model when effect size != 0
This situation is nicely addressed by using Bayes Factors (or equivalently, using the BIC), which aim at identifying the true model in the set of models.
However, the goal of identifying the true model assumes a true model actually being in the set of models… Alternatively, one could rather be interested in identifying the K-L best model (the closest model to the full reality).
As an example, we are going to simulate data from the following model, which assumes tapering effect sizes. Let \( x_{1}, \dots, x_{8} \) be independent \( \mathrm{Normal}(0, 1) \) random variables. Given the \( x_{i} \), let
\[ y = 100 + 10 x_{1} + 5 x_{2} + 1 x_{3} + 0.5 x_{4} + 0.1 x_{5} + 0.05 x_{6} + 0.025 x_{7} + 0.01 x_{8} + \epsilon, \epsilon \sim \mathrm{Normal}(0, 1) \]
Then, we can compare two nested models (neither of them being the true model).
mod1 <- lm(y ~ 1 + x1 + x2 + x3 + x4, data)
mod2 <- lm(y ~ 1 + x1 + x2 + x3 + x4 + x5, data)