class: center, middle, inverse, title-slide # An introduction to Bayesian multilevel models using R, brms, and Stan ### Ladislas Nalborczyk ### Univ. Grenoble Alpes, CNRS, LPNC ### 28.11.2019 --- # Overview 1. Theoretical background + What is Bayesian inference? + What is a multilevel model? + Introducing the brms package 2. Practical part / tutorial + Worked example #1: modelling proportion data with a Bayesian multilevel GLM (BMLGLM?) + Worked example #2: fitting a Bayesian three-level meta-analytic model --- # Who am I? Just awarded PhD in experimental psychology / cognitive science. My Bayesian adventure started in 2015, reading [Kruschke’s book](https://sites.google.com/site/doingbayesiandataanalysis/) "Doing Bayesian Data Analysis" and starting to use this approach in my research (e.g., see this recent [preprint](https://psyarxiv.com/8v5yd/)). Since then: I read a few more books (e.g., [Noël, 2010](https://laboutique.edpsciences.fr/produit/773/9782759817566/Psychologie%20statistique%20avec%20R); [Gelman et al. 2013](http://www.stat.columbia.edu/~gelman/book/); [McElreath, 2015](https://xcelab.net/rm/statistical-rethinking/)), I gave a doctoral course at UGA with [Thierry Phénix](https://scholar.google.fr/citations?user=2dMiOKEAAAAJ&hl=fr) (2017-2019), gave a workshop at UGent (2018) with [Paul Bürkner](https://paul-buerkner.github.io/about/) (`brms`'s father), with whom I wrote a tutorial paper about Bayesian multilevel models using `brms` ([Nalborczyk et al., 2019](https://psyarxiv.com/guhsa/)) and a commentary about confidence and credibility intervals ([Nalborczyk, Bürkner & Williams, 2019](https://www.collabra.org/articles/10.1525/collabra.197/)). I also gave a few presentations and wrote a few [blogposts](https://www.barelysignificant.com). --- class: middle, center # Theoretical background - Bayesian inference --- # Exercice - The marbles bag (from McElreath, 2015) Let's say we have a bag containing four marbles. These marbles come in two colours: blue and white. We know there are four marbles in the bag, but we don't know how many of them are blue... -- We know there are five possibilities (we call these possibilities our *hypotheses*): <br> <p align = "center"> ⚪ ⚪ ⚪ ⚪</p> <p align = "center"> 🔵 ⚪ ⚪ ⚪</p> <p align = "center">🔵 🔵 ⚪ ⚪</p> <p align = "center">🔵 🔵 🔵 ⚪</p> <p align = "center">🔵 🔵 🔵 🔵</p> --- # Exercice - The marbles bag (from McElreath, 2015) Our goal is to determine which hypothesis is the most plausible, given some **evidence** about the content of the bag. To obtain evidence, we can pull some marbles from the bag (with replacement). We do this three times and obtain the following sequence: <p align = "center">🔵 ⚪ 🔵</p> -- This sequence represents some evidence about the content of the bag (in other words, our data). From there, what **inference** can we reasonably make about the content of the bag? What can we say about the relative plausibility of each hypothesis? --- # Counting possibilities <p align = "center"> hypothesis: 🔵 ⚪ ⚪ ⚪                  data: 🔵 </p> <img src="RGUG2019_files/figure-html/unnamed-chunk-1-1.svg" width="75%" style="display: block; margin: auto;" /> --- # Counting possibilities <p align = "center"> hypothesis: 🔵 ⚪ ⚪ ⚪                  data: 🔵 ⚪</p> <img src="RGUG2019_files/figure-html/unnamed-chunk-2-1.svg" width="75%" style="display: block; margin: auto;" /> --- # Counting possibilities <p align = "center"> hypothesis: 🔵 ⚪ ⚪ ⚪                  data: 🔵 ⚪ 🔵</p> <img src="RGUG2019_files/figure-html/unnamed-chunk-3-1.svg" width="75%" style="display: block; margin: auto;" /> --- # Counting possibilities <p align = "center"> hypothesis: 🔵 ⚪ ⚪ ⚪                  data: 🔵 ⚪ 🔵 </p> <img src="RGUG2019_files/figure-html/unnamed-chunk-4-1.svg" width="75%" style="display: block; margin: auto;" /> --- # Counting possibilities Under this hypothesis, `\(3\)` paths (out of `\(4^{3}=64\)`) are consistent with the data. What about the other hypotheses? <p align = "center"> ⚪ ⚪ ⚪ 🔵      ⚪ 🔵 🔵 🔵      ⚪ ⚪ 🔵 🔵 </p> <img src="RGUG2019_files/figure-html/unnamed-chunk-5-1.svg" width="40%" style="display: block; margin: auto;" /> --- # Model comparison (comparing hypotheses) Given some data, the most *plausible* hypothesis is the one that **maximise** the possible ways of obtaining these data. <center> <style type="text/css"> .tg {border-collapse:collapse;border-spacing:0;} .tg .tg-gq9a{font-family:"Lucida Sans Unicode", "Lucida Grande", sans-serif !important;;text-align:center;font-size:20px;font-weight:normal;padding:5px 5px 5px 5px} </style> <table class="tg" width="400" height="400"> <tr> <th class="tg-gq9a">Hypothesis</th> <th class="tg-gq9a">Ways of producing the data</th> </tr> <tr> <td class="tg-gq9a"> <p> ⚪ ⚪ ⚪ ⚪ </p></td> <td class="tg-gq9a"> 0 x 4 x 0 = 0</td> </tr> <tr> <td class="tg-gq9a"><p> 🔵 ⚪ ⚪ ⚪ </p></td> <td class="tg-gq9a">1 x 3 x 1 = 3</td> </tr> <tr> <td class="tg-gq9a"><p> 🔵 🔵 ⚪ ⚪ </p></td> <td class="tg-gq9a">2 x 2 x 2 = 8</td> </tr> <tr> <td class="tg-gq9a"><p> 🔵 🔵 🔵 ⚪ </p></td> <td class="tg-gq9a">3 x 1 x 3 = 9</td> </tr> <tr> <td class="tg-gq9a"><p> 🔵 🔵 🔵 🔵 </p></td> <td class="tg-gq9a">4 x 0 x 4 = 0</td> </tr> </table> </center> --- # Accumulating evidence We previously considered that all hypotheses were a priori equiprobable (cf. the [principle of indifference](https://en.wikipedia.org/wiki/Principle_of_indifference)). -- However, we could have prior knowledge, either coming from our beliefs, from previous data or from knowledge about usual bags of this kind. -- Let's say we pull another marble out of the bag. How can we incorporate this new information into our previous analysis? --- # Accumulating evidence We can update our previous counts by multiplying it by the new information, following the famous Bayesian dictum: "*Yesterday's posterior is today's prior*" ([Lindley, 2000](http://www.phil.vt.edu/dmayo/personal_website/Lindley_Philosophy_of_Statistics.pdf)). <center> <style type="text/css"> .tg {border-collapse:collapse;border-spacing:0;} .tg .tg-gq9a{font-family:"Lucida Sans Unicode", "Lucida Grande", sans-serif !important;;text-align:center;font-size:20px;font-weight:normal;padding:5px 5px 5px 5px} </style> <table class="tg"> <tr> <th class="tg-gq9a">Hypothesis</th> <th class="tg-gq9a"><p>Ways of producing 🔵</p></th> <th class="tg-gq9a">Previous counts</th> <th class="tg-gq9a">New count</th> </tr> <tr> <td class="tg-gq9a"><p> ⚪ ⚪ ⚪ ⚪ </p></td> <td class="tg-gq9a">0</td> <td class="tg-gq9a">0</td> <td class="tg-gq9a">0 x 0 = 0</td> </tr> <tr> <td class="tg-gq9a"><p> 🔵 ⚪ ⚪ ⚪ </p></td> <td class="tg-gq9a">1</td> <td class="tg-gq9a">3</td> <td class="tg-gq9a">3 x 1 = 3</td> </tr> <tr> <td class="tg-gq9a"><p> 🔵 🔵 ⚪ ⚪ </p></td> <td class="tg-gq9a">2</td> <td class="tg-gq9a">8</td> <td class="tg-gq9a">8 x 2 = 16</td> </tr> <tr> <td class="tg-gq9a"><p> 🔵 🔵 🔵 ⚪ </p></td> <td class="tg-gq9a">3</td> <td class="tg-gq9a">9</td> <td class="tg-gq9a">9 x 3 = 27</td> </tr> <tr> <td class="tg-gq9a"><p> 🔵 🔵 🔵 🔵 </p></td> <td class="tg-gq9a">4</td> <td class="tg-gq9a">0</td> <td class="tg-gq9a">0 x 4 = 0</td> </tr> </table> </center> --- # Using prior information Someone from the marble factory tells us that blue marbles are rare. For every bag containing three blue marbles, they make two bags that only contain two blue marbles, and three bags that only contain one blue marble. He also tells us that every bag contains at least one marble of each colour. <center> <style type="text/css"> .tg {border-collapse:collapse;border-spacing:0;} .tg .tg-gq9a{font-family:"Lucida Sans Unicode", "Lucida Grande", sans-serif !important;;text-align:center;font-size:20px;font-weight:normal;padding:5px 5px 5px 5px} </style> <table class="tg"> <tr> <th class="tg-gq9a">Hypothesis</th> <th class="tg-gq9a"><p>Previous counts</p></th> <th class="tg-gq9a">Factory count</th> <th class="tg-gq9a">New count</th> </tr> <tr> <td class="tg-gq9a"><p> ⚪ ⚪ ⚪ ⚪ </p></td> <td class="tg-gq9a">0</td> <td class="tg-gq9a">0</td> <td class="tg-gq9a">0 x 0 = 0</td> </tr> <tr> <td class="tg-gq9a"><p> 🔵 ⚪ ⚪ ⚪ </p></td> <td class="tg-gq9a">3</td> <td class="tg-gq9a">3</td> <td class="tg-gq9a">3 x 3 = 9</td> </tr> <tr> <td class="tg-gq9a"><p> 🔵 🔵 ⚪ ⚪ </p></td> <td class="tg-gq9a">16</td> <td class="tg-gq9a">2</td> <td class="tg-gq9a">16 x 2 = 32</td> </tr> <tr> <td class="tg-gq9a"><p> 🔵 🔵 🔵 ⚪ </p></td> <td class="tg-gq9a">27</td> <td class="tg-gq9a">1</td> <td class="tg-gq9a">27 x 1 = 27</td> </tr> <tr> <td class="tg-gq9a"><p> 🔵 🔵 🔵 🔵 </p></td> <td class="tg-gq9a">0</td> <td class="tg-gq9a">0</td> <td class="tg-gq9a">0 x 0 = 0</td> </tr> </table> </center> --- # From counts to probability The plausibility of a hypothesis after seeing some data is proportional to the number of ways this hypothesis can "produce" the data, multiplied by its *a priori* plausibility. `$$p(hypothesis|data)\propto p(data|hypothesis) \times p(hypothesis)$$` -- From there, we construct probabilities by standardising these plausibilities so that the sum of all plausibilities is equal to 1. `$$p(hypothesis|data) = \frac{p(data|hypothesis) \times p(hypothesis)}{sum\ of\ products}$$` --- # From counts to probability Let's define `\(p\)` as the proportion of blue marbles. <center> <style type="text/css"> .tg {border-collapse:collapse;border-spacing:0;} .tg .tg-gq9a{font-family:"Lucida Sans Unicode", "Lucida Grande", sans-serif !important;;text-align:center;font-size:20px;font-weight:normal;padding:5px 5px 5px 5px} </style> <table class="tg"> <tr> <th class="tg-gq9a">Hypothesis</th> <th class="tg-gq9a">p</th> <th class="tg-gq9a">Ways of producing the data</th> <th class="tg-gq9a">Probability</th> </tr> <tr> <td class="tg-gq9a"> ⚪ ⚪ ⚪ ⚪ </td> <td class="tg-gq9a">0</td> <td class="tg-gq9a">0</td> <td class="tg-gq9a">0</td> </tr> <tr> <td class="tg-gq9a"> 🔵 ⚪ ⚪ ⚪ </td> <td class="tg-gq9a">0.25</td> <td class="tg-gq9a">3</td> <td class="tg-gq9a">0.15</td> </tr> <tr> <td class="tg-gq9a"> 🔵 🔵 ⚪ ⚪ </td> <td class="tg-gq9a">0.5</td> <td class="tg-gq9a">8</td> <td class="tg-gq9a">0.40</td> </tr> <tr> <td class="tg-gq9a"> 🔵 🔵 🔵 ⚪ </td> <td class="tg-gq9a">0.75</td> <td class="tg-gq9a">9</td> <td class="tg-gq9a">0.45</td> </tr> <tr> <td class="tg-gq9a"> 🔵 🔵 🔵 🔵 </td> <td class="tg-gq9a">1</td> <td class="tg-gq9a">0</td> <td class="tg-gq9a">0</td> </tr> </table> </center> ```r ways <- c(0, 3, 8, 9, 0) (probability <- ways / sum(ways) ) ``` ``` ## [1] 0.00 0.15 0.40 0.45 0.00 ``` --- # Notations, terminology - `\(\theta\)`: a parameter or vector of parameters (e.g., the proportion of blue marbles). - `\(\color{orangered}{p(x\vert \theta)}\)`<span style="color:orangered">: the conditional probability of the data `\(x\)` given `\(\theta\)`. Once we know `\(x\)`, can be seen as the likelihood function of `\(\theta\)`.</span> - `\(\color{steelblue}{p(\theta)}\)`<span style="color:steelblue">: the prior probability of `\(\theta\)`.</span> - `\(\color{purple}{p(\theta \vert x)}\)`<span style="color:purple">: the posterior probability of `\(\theta\)` (given `\(x\)`).</span> - `\(\color{green}{p(x)}\)`<span style="color:green">: the marginal probability of `\(x\)` (over `\(\theta\)`).</span> <br> `$$\color{purple}{p(\theta \vert x)} = \dfrac{\color{orangered}{p(x\vert \theta)} \color{steelblue}{p(\theta)}}{\color{green}{p(x)}} = \dfrac{\color{orangered}{p(x\vert \theta)} \color{steelblue}{p(\theta)}}{\color{green}{\sum\limits_{\theta}p(x|\theta)p(\theta)}} = \dfrac{\color{orangered}{p(x\vert \theta)} \color{steelblue}{p(\theta)}}{\color{green}{\int\limits_{\theta}p(x|\theta)p(\theta)\mathrm{d}x}} \propto \color{orangered}{p(x\vert \theta)} \color{steelblue}{p(\theta)}$$` --- # Probability axioms (Kolmogorov, 1933) A probability is **a numerical value** assigned to an event `\(A\)`, this event being a possibility of an ensemble `\(\Omega\)` (the ensemble of all possible events). Probabilities conform to the following axioms (the third axiom is only valid for mutually exclusive events): + **Non-negativity.** `\(P(A_{i}) \geq 0\)` + **Normalisation.** `\(P(\Omega) = 1\)` + **Additivity.** `\(P(A_{1} \lor A_{2}) = P(A_{1}) + P(A_{2})\)` -- This last axiom is also known as the **sum rule**, and can be generalised to non mutually exclusive events: `\(P(A_{1} \lor A_{2}) = P(A_{1}) + P(A_{2}) - P(A_{1} \land A_{2})\)`. --- # Deriving Bayes theorem From these axioms (and the definitions of joint, marginal and conditional probability) derives the **product rule**, which states that: `$$p(x,y) = p(x|y)p(y) = p(y|x)p(x)$$` -- Then, restating the above equality, `$$p(y|x)p(x) = p(x|y)p(y)$$` -- And dividing each side by `\(p(x)\)` gives Bayes theorem, which can be written as: `$$p(y|x) = \dfrac{p(x|y)p(y)}{p(x)}$$` -- Or, in plain English (replacing `\(x\)` by *data* and `\(y\)` by *hypothesis*): `$$p(hypothesis|data) = \frac{p(data|hypothesis) \times p(hypothesis)}{sum\ of\ products}$$` --- # Bayesian inference For each statistical problem, we follow three steps (from [Gelman et al., 2013](http://www.stat.columbia.edu/~gelman/book/)): - Building the model (the history of the data): likelihood + priors - Updating the model with the information contained in the data using Bayes theorem (aka computing or approximating the posterior probability) - Evaluating the model, its fit, its assumptions, summarising the results, readjusting the model -- > "*Bayesian inference is really just counting and comparing possibilities [...] in order to make good inference about what actually happened, it helps to consider everything that could have happened."* ([McElreath, 2015](http://xcelab.net/rm/statistical-rethinking/)) --- # The problem with posteriors Relatively simple problems like estimating the mean `\(\mu\)` and standard deviation `\(\sigma\)` of a Normal distribution with conjugate Gaussian priors have a closed form solution: `$$\color{purple}{p(\mu, \sigma | x)} = \frac{\prod_{i} \color{orangered}{\mathrm{Normal}(x_{i}|\mu, \sigma)}\color{steelblue}{\mathrm{Normal}(\mu|0,10)\mathrm{Uniform}(\sigma|0,50)}} {\color{green}{\int \int \prod_{i} \mathrm{Normal}(x_{i}|\mu,\sigma)\mathrm{Normal}(\mu|0,10)\mathrm{Uniform}(\sigma|0,50)d\mu d\sigma}}$$` `$$\color{purple}{p(\mu, \sigma | x)} \propto \prod_{i} \color{orangered}{\mathrm{Normal}(x_{i}|\mu, \sigma)}\color{steelblue}{\mathrm{Normal}(\mu|0,10)\mathrm{Uniform}(\sigma|0,50)}$$` -- However, for models containing more parameters or featuring non-conjugate priors, the integral (the *marginalising constant* in green) may be very difficult or impossible to compute. --- # Sampling the unknowns Instead of computing the marginalising constant, we will **approximate** the shape of the posterior distribution. Different methods exist... -- * Grid sampling: computing the exact value of the posterior probability in a finite number of points (can be computationaly very expensive) -- * Sampling from the distribution using Markov Chain Monte-Carlo algorithms (much more efficient than grid approximation)... --- class: middle, center <embed src="https://chi-feng.github.io/mcmc-demo/app.html?algorithm=RandomWalkMH&target=standard" width=1000px height=600px /> --- class: middle, center <embed src="https://chi-feng.github.io/mcmc-demo/app.html?algorithm=HamiltonianMC&target=standard" width=1000px height=600px /> --- class: middle, center # Theoretical background - Multilevel linear models --- # Linear models The sequel is an excerpt from this [blogpost](https://www.barelysignificant.com/post/glm/): "*Using R to make sense of the generalised linear model*". > "[Statistical] models are devices that connect theories to data. A model is an instanciation of a theory as a set of probabilistic statements" ([Rouder, Morey, & Wagenmakers, 2016](https://www.collabra.org/articles/10.1525/collabra.28/)). -- The usual linear model is of the following form. $$ `\begin{aligned} y_{i} &\sim \mathrm{Normal}(\mu_{i}, \sigma) \\ \mu_{i} &= \alpha + \beta \cdot x_{i} \\ \end{aligned}` $$ Where, in Bayesian terms, the first line of the model corresponds to the *likelihood* of the model, which is the assumption made about the data generation process. We make the assumption that the outcomes `\(y_{i}\)` are normally distributed around a mean `\(\mu_{i}\)` with some error `\(\sigma\)`. This is equivalent to say that the errors are normally distributed around `\(0\)`. --- # Linear models Below, we illustrate a simple Gaussian linear model using the `Howell1` dataset from the `rethinking` package ([McElreath, 2016](https://github.com/rmcelreath/rethinking)), which contains data about 544 individuals, including height (centimetres), weight (kilograms), age (years) and gender (0 indicating female and 1 indicating male). <img src="RGUG2019_files/figure-html/howell-1.svg" width="66%" style="display: block; margin: auto;" /> --- # Linear models A quick visual exploration of the dataset reveals a positive relationship between height and weight. The previously plotted regression line corresponds to the following model, where we assume a Normal likelihood: $$ `\begin{aligned} \text{height}_{i} &\sim \mathrm{Normal}(\mu_{i}, \sigma) \\ \mu_{i} &= \alpha + \beta \cdot \text{weight}_{i} \\ \end{aligned}` $$ -- This model can be fitted easily in `R` with the following syntax. ```r (mod1 <- lm(height ~ weight, data = d) ) ``` ``` ## ## Call: ## lm(formula = height ~ weight, data = d) ## ## Coefficients: ## (Intercept) weight ## 113.879 0.905 ``` --- # Generalised linear model Of course, the distributional assumption is not restricted to be Gaussian, and can be adapted to whatever distribution that makes sense in consideration of the data at hand. Generalising to other distributions, the generalised linear model can be rewritten as: $$ `\begin{aligned} y_{i} &\sim \mathrm{D}(f(\eta_{i}), \theta) \\ \eta &= \mathbf{X} \beta \\ \end{aligned}` $$ Where the response `\(y_{i}\)` is **predicted** through the linear combination `\(\eta\)` of predictors transformed by the inverse link function `\(f\)`, assuming a certain distribution `\(D\)` for `\(y\)` (also called the *family*), and family-specific parameters `\(\theta\)` ([Bürkner, 2017](https://www.jstatsoft.org/article/view/v080i01)). --- # Multilevel modelling Multilevel models are multi-level generalisations of linear models. $$ `\begin{align} \text{Level 1}: y_{i} &\sim \mathrm{Normal}(\mu_{i},\sigma) \\ \mu_{i} &= \alpha_{\text{group}[i]} + \beta \cdot x_{i} \\ \text{Level 2}: \alpha_{\text{group}} &\sim \mathrm{Normal}(\alpha,\sigma_{\text{group}}) \\ \end{align}` $$ The "prior" on the group intercepts `\(\alpha_{\text{group}}\)` is a function of two parameters `\(\alpha\)` and `\(\sigma_{\text{café}}\)`. These parameters are called **hyper-parameters**, as they are parameters for parameters. There are two levels in the model... --- class: middle, center <embed src="http://mfviz.com/hierarchical-models/" width=1000px height=600px /> --- # Shrinkage magic (Efron & Morris, 1977) <img src="stein1.png" width="66%" style="display: block; margin: auto;" /> The James-Stein estimator is defined as `\(z = \bar{y} + c(y - \bar{y})\)`, where `\(\bar{y}\)` is the grand average, `\(y\)` an individual estimation and `\(c\)` a constant, the "shrinking factor" ([Efron & Morris, 1977](http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf)). --- # Shrinkage magic (Efron & Morris, 1977) The James-Stein estimator is determined both by the variability in the measure (e.g., its standard deviation, influencing the shrinking factor `\(c\)`) and by its distance to the average estimation (i.e., `\(y - \bar{y}\)`). In other words, extreme observations are trusted less. Shrinkage therefore acts as **a safeguard against overfitting** in multilevel models. <img src="stein2.png" width="75%" style="display: block; margin: auto;" /> --- class: middle, center # Theoretical background - Introducing brms --- # Under the hood: Stan `Stan` is a probabilistic programming language written in `C++`. It implements several MCMC algorithms such as HMC, NUTS, L-BFGS... ```r data { int<lower=0> J; // number of schools real y[J]; // estimated treatment effects real<lower=0> sigma[J]; // s.e. of effect estimates } parameters { real mu; real<lower=0> tau; real eta[J]; } transformed parameters { real theta[J]; for (j in 1:J) theta[j] = mu + tau * eta[j]; } model { target += normal_lpdf(eta | 0, 1); target += normal_lpdf(y | theta, sigma); } ``` --- # Bayesian Regression Models using Stan The `brms` package ([Bürkner, 2017](https://www.jstatsoft.org/article/view/v080i01)) allows fitting complex Bayesian multilevel linear and non-linear models in `Stan` using an intuitive high-level syntax (similar to the syntax of `lme4`). For instance, the following model: $$ `\begin{align} y_{i} &\sim \mathrm{Normal}(\mu_{i}, \sigma) \\ \mu_{i} &= \alpha + \alpha_{subject[i]} + \alpha_{item[i]} + \beta \cdot x_{i} \\ \end{align}` $$ can be fitted using `brms` (as with `lme4`) as follows: ```r brm(y ~ x + (1|subject) + (1|item), data = d, family = gaussian() ) ``` --- # Example: Analysing the sleepstudy dataset The `sleepstudy` dataset data from the study described in [Belenky et al. (2003)](https://www.ncbi.nlm.nih.gov/pubmed/12603781). It describes the average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject. ```r library(lme4) data(sleepstudy) head(sleepstudy, 10) ``` ``` ## Reaction Days Subject ## 1 249.5600 0 308 ## 2 258.7047 1 308 ## 3 250.8006 2 308 ## 4 321.4398 3 308 ## 5 356.8519 4 308 ## 6 414.6901 5 308 ## 7 382.2038 6 308 ## 8 290.1486 7 308 ## 9 430.5853 8 308 ## 10 466.3535 9 308 ``` --- # Example: Analysing the sleepstudy dataset ```r sleepstudy %>% ggplot(aes(x = Days, y = Reaction) ) + geom_smooth(method = "lm", colour = "black") + geom_point() + facet_wrap(~Subject, nrow = 2) + theme_bw(base_size = 10) ``` <img src="RGUG2019_files/figure-html/plotsleep-1.svg" width="75%" style="display: block; margin: auto;" /> --- # Example: Analysing the sleepstudy dataset The `get_prior()` function returns a list of all priors to be specified given some `formula` and some data, as well as the default priors (if any) used by `brms`. ```r library(brms) get_prior(Reaction ~ Days + (1 + Days|Subject), sleepstudy) ``` ``` ## prior class coef group resp dpar nlpar bound ## 1 b ## 2 b Days ## 3 lkj(1) cor ## 4 cor Subject ## 5 student_t(3, 289, 59) Intercept ## 6 student_t(3, 0, 59) sd ## 7 sd Subject ## 8 sd Days Subject ## 9 sd Intercept Subject ## 10 student_t(3, 0, 59) sigma ``` --- # Example: Analysing the sleepstudy dataset Priors can be specified in several ways, explained in the documentation (accessible by typing `?prior`). ```r prior1 <- c( prior(normal(200, 10), class = Intercept), prior(normal(0, 10), class = b, coef = Days), prior(cauchy(0, 10), class = sigma), prior(lkj(2), class = cor) ) ``` ```r mod1 <- brm( Reaction ~ Days + (1 + Days | Subject), data = sleepstudy, family = gaussian(), prior = prior1, warmup = 2000, iter = 1e4, chains = 2 ) ``` --- # Example: Analysing the sleepstudy dataset ```r posterior_summary(mod1, pars = c("^b_", "^sd_", "sigma"), probs = c(0.025, 0.975) ) ``` ``` ## Estimate Est.Error Q2.5 Q97.5 ## b_Intercept 216.0746641 15.535217 183.859586 245.620803 ## b_Days 0.7233977 3.510254 -6.507006 7.198048 ## sd_Subject__Intercept 45.6950094 15.437435 22.305404 81.634903 ## sd_Subject__Days 12.2214325 3.974647 6.270876 21.588890 ## sigma 25.6498007 1.514462 22.878514 28.826137 ``` <br> -- <img src="easy.gif" width="33%" style="display: block; margin: auto;" /> --- # Some elements of syntax The `brms` package features the same syntax as `R` base functions such as `lm()` or functions from the `lme4` package. ```r Reaction ~ Days + (1 + Days | Subject) ``` -- The left-hand side of the formula defines the *outcome* (i.e., what is predicted). The `brms` package also allows fitting multivariate (i.e., with several outcomes) models by combining these outcomes with `mvbind()`: ```r mvbind(Reaction, Memory) ~ Days + (1 + Days | Subject) ``` -- The right-hand side of the formula defines the *predictors* (i.e., what is used to predict the outcome.s). Note that an intercept is implicitly assumed by default in `R`, so that the two notations below correspond to the same model. ```r mvbind(Reaction, Memory) ~ Days + (1 + Days | Subject) mvbind(Reaction, Memory) ~ 1 + Days + (1 + Days | Subject) ``` --- # Some elements of syntax To fit a model without an intercept (because we can), we need to explicitly specify it as follows. ```r mvbind(Reaction, Memory) ~ 0 + Days + (1 + Days | Subject) ``` -- The first part of the right-hand side of the formula corresponds to *constant effects* (aka *fixed effects*), whereas the second part (inside parentheses) corresponds to *varying effects* (aka *random effects*). ```r mvbind(Reaction, Memory) ~ Days + (1 | Subject) mvbind(Reaction, Memory) ~ Days + (Days | Subject) ``` The first above model contains an intercept varying by `Subject`. The second models contains in addition a slope varying by `Subject`. --- # Some elements of syntax When we include several varying effects (e.g., both a varying intercept and a varying slope), `brms` estimates the correlation between each pair of varying effects by default. If we are not interested in estimating this correlation, we need to remove this parameter from the model (that is equivalent to fixing its value to 0) by using `||`. ```r mvbind(Reaction, Memory) ~ Days + (1 + Days || Subject) ``` -- The previous models assumed a Gaussian response distribution. We can change this assumption by using the `family` argument. ```r brm(Reaction ~ 1 + Days + (1 + Days | Subject), family = lognormal() ) ``` --- # Helpful brms functions ```r # Specify the model using R formulas: brmsformula(formula, ...) # Generate the Stan code: make_stancode(formula, ...) stancode(fit) # Generate the data passed to Stan: make_standata(formula, ...) standata(fit) # Handle priors: get_prior(formula, ...) set_prior(prior, ...) # Generate expected values and predictions: fitted(fit, ...) predict(fit, ...) marginal_effects(fit, ...) # Model comparison: loo(fit1, fit2, ...) bayes_factor(fit1, fit2, ...) model_weights(fit1, fit2, ...) # Hypothesis testing: hypothesis(fit, hypothesis, ...) ``` --- class: middle, center # Worked example #1 - Experimental absenteeism --- # Worked example #1 - Experimental absenteeism Working with human subjects requires a minimum amount of reciprocal cooperation. But that requirement is not always fulfilled. A non-negligible proportion of students that register for a Psychology experiment in our lab do not show up for the experiment... We wanted to estimated the probability that a registered participant would come to the lab, depending on whether s•he received a reminder by email (this example is presented in more depth in two blogposts, available [here](http://www.barelysignificant.com/post/absenteeism/) and [there](http://www.barelysignificant.com/post/absenteeism2/)). ```r library(tidyverse) data <- read.csv("absenteeism.csv") data %>% sample_frac %>% head(10) ``` ``` ## reminder researcher presence absence total ## 1 1 3 87 9 96 ## 2 -1 5 34 49 83 ## 3 -1 10 23 58 81 ## 4 -1 8 38 54 92 ## 5 1 5 65 18 83 ## 6 -1 1 16 86 102 ## 7 -1 4 61 34 95 ## 8 1 8 81 11 92 ## 9 1 1 96 6 102 ## 10 -1 2 53 59 112 ``` --- # Worked example #1 - Experimental absenteeism In other words, we want to estimate the probability `\(p_{i}\)` of the participant being present. We assume `\(y_{i}\)` (i.e., the number of participants being present, out of `\(n_{i}\)` participants) to be binomial distributed with probability of being present `\(p_{i}\)`. `$$y_{i} \sim \mathrm{Binomial}(n_{i}, p_{i})$$` -- The probability `\(p_{i}\)` is predicted via a linear combination of some predictors: `$$logit(p_{i}) = log \Big(\frac{p_{i}}{1-p_{i}}\Big) = \alpha + \beta \times \text{reminder}_{i}$$` Where `\(logit(\cdot)\)` is a **link function**, relating the bounded scale of the parameter (i.e., `\(0 \leq \theta \leq 1\)`) to the unbounded scale of the predictors. --- # Worked example #1 - Experimental absenteeism We can extend this model on several levels, allowing the average probability of presence (i.e., the intercept) and the effect of the reminder (i.e., the slope) to vary by researcher. <img src="logit.png" width="50%" style="display: block; margin: auto;" /> <!-- $$ `\begin{aligned} y_{i} &\sim \mathrm{Binomial}(n_{i}, p_{i}) \\ logit(p_{i}) &= \alpha_{researcher_{[i]}} + \beta_{researcher_{[i]}} \times \text{reminder}_{i} \\ `\begin{bmatrix} \alpha_{\text{researcher}} \\ \beta_{\text{researcher}} \\ \end{bmatrix}` &\sim \mathrm{MVNormal}\bigg(\begin{bmatrix} \alpha \\ \beta \end{bmatrix}, \textbf{S}\bigg) \\ \textbf{S} &= `\begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta} \\ \end{pmatrix}` \textbf{R} \begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta} \\ \end{pmatrix} \\ \alpha &\sim \mathrm{Normal}(0, 10) \\ \beta &\sim \mathrm{Normal}(0, 10) \\ (\sigma_{\alpha}, \sigma_{\beta}) &\sim \mathrm{HalfCauchy}(0, 10) \\ \textbf{R} &\sim \mathrm{LKJcorr}(2) \\ \end{aligned}` $$ --> --- # Worked example #1 - Experimental absenteeism ```r prior2 <- c( prior(normal(0, 10), class = Intercept, coef = ""), prior(cauchy(0, 10), class = sd), prior(normal(0, 10), class = b), prior(lkj(2), class = cor) ) ``` ```r mod2 <- brm( presence | trials(total) ~ 1 + reminder + (1 + reminder|researcher), family = binomial(link = "logit"), prior = prior2, data = data, sample_prior = TRUE, warmup = 2000, iter = 1e4, cores = parallel::detectCores(), control = list(adapt_delta = 0.95) ) ``` --- # Worked example #1 - Experimental absenteeism ```r mod2 %>% plot( combo = c("hist", "trace"), widths = c(1, 1.5), theme = theme_bw(base_size = 16) ) ``` <img src="RGUG2019_files/figure-html/unnamed-chunk-28-1.svg" width="66%" style="display: block; margin: auto;" /> --- # Worked example #1 - Experimental absenteeism ```r posterior_summary(mod2, pars = c("^b_", "^sd_"), probs = c(0.025, 0.975) ) ``` ``` ## Estimate Est.Error Q2.5 Q97.5 ## b_Intercept 0.8403057 0.2960996 0.2602550 1.4320422 ## b_reminder 1.6052251 0.1762257 1.2564517 1.9665889 ## sd_researcher__Intercept 0.8702792 0.2855384 0.4812413 1.5746246 ## sd_researcher__reminder 0.4777322 0.1742964 0.2295891 0.8997469 ``` -- These estimations are in the log-odds scale... To interpret them more easily, we need to apply the inverse link function (i.e., the logit-inverse). For instance, the average probability of presence (i.e., whatever the researcher or the reminder status) is given by `\(p = \exp(\alpha) / (1 + \exp(\alpha) )\)`. -- ```r # retrieving the intercept a <- fixef(mod2)[1] # transforming it back to the probability scale (equivalent to plogis(a)) exp(a) / (1 + exp(a) ) ``` ``` ## [1] 0.6985296 ``` --- # Worked example #1 - Experimental absenteeism Then, we assessed the effect of the reminder email. Here again, it is difficult to interpret the raw estimates... but we know that `\(\text{exp}(\beta)\)` gives an [odds ratio](https://en.wikipedia.org/wiki/Odds_ratio). ```r fixef(mod2)[2, c(1, 3, 4)] %>% exp ``` ``` ## Estimate Q2.5 Q97.5 ## 4.978980 3.512934 7.146258 ``` We can interpret this ratio (OR = 4.99, 95% HDI [3.51, 7.19]) by saying that it is 5 times more probable that a participant will be present if the participant received a reminder email... --- # Plotting the predictions of the model An elegant way of representing the predictions of the model and its uncertainty is to directly plot some samples (i.e., some regression lines) from the posterior distribution. We call this kind of plot a *spaghetti plot*. <img src="RGUG2019_files/figure-html/unnamed-chunk-32-1.svg" width="66%" style="display: block; margin: auto;" /> --- # Plotting the predictions of the model ```r data %>% group_by(researcher, total) %>% data_grid(reminder = seq_range(reminder, n = 1e2) ) %>% add_fitted_samples(mod2, newdata = ., n = 100, scale = "linear") %>% mutate(estimate = plogis(estimate) ) %>% ggplot(aes(x = reminder, y = estimate, group = .iteration) ) + geom_hline(yintercept = 0.5, lty = 2) + geom_line(aes(y = estimate, group = .iteration), size = 0.5, alpha = 0.1) + facet_wrap(~researcher, nrow = 2) + theme_bw(base_size = 20) + labs(x = "Reminder", y = "Estimate") ``` <img src="RGUG2019_files/figure-html/unnamed-chunk-33-1.svg" width="75%" style="display: block; margin: auto;" /> --- # Hypothesis test - 1 The `brms` package offers several ways of testing hypotheses. The `hypothesis()` function computes an *evidence ratio* (equivalent to a Bayes factor for point hypotheses, such as `\(\theta = 0\)`). When testing point hypotheses, this evidence ratio is computed using the **Savage-Dickey** method. ```r (hyp1 <- hypothesis(mod2, "reminder = 0") ) ``` ``` ## Hypothesis Tests for class b: ## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob ## 1 (reminder) = 0 1.61 0.18 1.26 1.97 0 0 ## Star ## 1 * ## --- ## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses. ## '*': For one-sided hypotheses, the posterior probability exceeds 95%; ## for two-sided hypotheses, the value tested against lies outside the 95%-CI. ## Posterior probabilities of point hypotheses assume equal prior probabilities. ``` ```r 1 / hyp1$hypothesis$Evid.Ratio ``` ``` ## [1] -7.810388e+15 ``` --- # Hypothesis test - 1 ```r plot(hyp1, theme = theme_bw(base_size = 20) ) ``` <img src="RGUG2019_files/figure-html/unnamed-chunk-35-1.svg" width="50%" style="display: block; margin: auto;" /> --- # Comparing the posterior to the prior ```r data.frame(prior = hyp1$prior_samples$H1, posterior = hyp1$samples$H1) %>% gather(type, value) %>% ggplot(aes(x = value) ) + geom_histogram(bins = 50, alpha = 0.8) + geom_vline(xintercept = 0, lty = 2, size = 1) + facet_wrap(~type, scales = "free") + xlab(expression(beta[reminder]) ) + theme_bw(base_size = 20) ``` <img src="RGUG2019_files/figure-html/unnamed-chunk-36-1.svg" width="75%" style="display: block; margin: auto;" /> --- # Hypothesis test - 2 We can also follow the **model comparison** approach. Testing the hypothesis of `\(\theta = 0\)` is equivalent to comparing two models: a model including the effect of interest and a model without this effect. ```r prior3 <- c( prior(normal(0, 10), class = Intercept, coef = ""), prior(cauchy(0, 10), class = sd), prior(lkj(2), class = cor) ) mod2 <- brm(presence | trials(total) ~ 1 + reminder + (1 + reminder|researcher), family = binomial(link = "logit"), prior = prior2, data = data, # this line is important for bridgesampling save_all_pars = TRUE, warmup = 2000, iter = 1e4, cores = parallel::detectCores(), control = list(adapt_delta = 0.95) ) mod3 <- brm(presence | trials(total) ~ 1 + (1 + reminder|researcher), family = binomial(link = "logit"), prior = prior3, data = data, save_all_pars = TRUE, warmup = 2000, iter = 1e4, cores = parallel::detectCores(), control = list(adapt_delta = 0.95) ) ``` --- # Hypothesis test - 2 We can then compare the marginal likelihood of these two models (i.e., compute a Bayes factor). The `brms` package offers the `bayes_factor()` method, which relies on an approximation of the marginal likelihood through `bridgesampling` ([Gronau & Singmann, 2017](https://arxiv.org/abs/1710.08162)). ```r bayes_factor(mod2, mod3) ``` ``` ## Iteration: 1 ## Iteration: 2 ## Iteration: 3 ## Iteration: 4 ## Iteration: 5 ## Iteration: 6 ## Iteration: 1 ## Iteration: 2 ## Iteration: 3 ## Iteration: 4 ## Iteration: 5 ## Iteration: 6 ## Iteration: 7 ## Iteration: 8 ``` ``` ## Estimated Bayes factor in favor of bridge1 over bridge2: 3015.96437 ``` --- # Posterior predictive checking Another important tool to assess the predictive abilities of the model is known as *posterior preditive checking* (PPC). The idea is simple: we compare the observed data to data generated from the posterior distribution. Once we have a posterior distribution on `\(\theta\)`, we can generate data from the *posterior predictive distribution*: `$$p(\widetilde{y} | y) = \int p(\widetilde{y} | \theta) p(\theta | y) d \theta$$` If a model is a good model, it should be able to generate data that looks like the observed data (e.g., [Gabry, Simpson, Vehtari, Betancourt, & Gelman, 2017](https://arxiv.org/abs/1709.01449)). In some sense, posterior predictive checking can be seen as a generalisation of the NHST logic. --- # Posterior predictive checking ```r data %>% ggplot(aes(x = presence / total) ) + geom_density(fill = "grey20") + theme_bw(base_size = 20) ``` <img src="RGUG2019_files/figure-html/unnamed-chunk-39-1.svg" width="75%" style="display: block; margin: auto;" /> --- # Posterior predictive checking This procedure is implemented in `brms` via the `pp_check()` method, which allows various kind of checks. For instance, we compare below the a posteriori predictions of the model (n = 100) to the observed data. ```r pp_check(mod2, nsamples = 1e2) + theme_bw(base_size = 20) ``` <img src="RGUG2019_files/figure-html/unnamed-chunk-40-1.svg" width="75%" style="display: block; margin: auto;" /> --- # Posterior predictive checking ```r pp_check(mod2, nsamples = 1e3, type = "stat_2d") + theme_bw(base_size = 20) ``` <img src="RGUG2019_files/figure-html/unnamed-chunk-41-1.svg" width="66%" style="display: block; margin: auto;" /> --- # Adjusting Stan's behaviour When fitting more complex models, you will probably have to face warning messages such as "`There were x divergent transitions after warmup`". You can adjust `Stan`'s behaviour inside the `brm()` call via the `control` argument. ```r mod2 <- brm( presence | trials(total) ~ 1 + reminder + (1 + reminder|researcher), family = binomial(link = "logit"), prior = prior2, data = data, warmup = 2000, iter = 1e4, cores = parallel::detectCores(), # using all availables cores control = list(adapt_delta = 0.95) # adjusting the delta step size ) ``` For instance, we can increase the step of the algorithm via `adapt_delta` (fixed to 0.8 by default). This will probably slow down the sampling but will also improve the "quality" of the obtained samples. More generally, pay attention to warnings and messages from `brms` as they are usually quite informative and useful. --- class: middle, center # Worked example #2 - Meta-analysis --- # Worked example #2 - Meta-analysis A meta-analysis is an analysis of analyses. It is (or can be) a linear model similar to those discussed previously, excepts that observations are (usually) already summarised by an effect size. We can treat these effect sizes as observations with a known variation (i.e., estimated and reported in the original articles included in the meta-analysis). There are (at least) two types of meta-analytic models: - **Constant-effect (aka fixed-effect) meta-analysis**: we consider that the effect size estimated in each study refers to the same population effect size -- - **Varying-effect (aka random effect) meta-analysis**: we model the variability between studies and we exploit the similarities in extent groups (e.g., experiments from the same study or team) to improve the estimation of the overall effect (cf. the shrinkage magic). Note that a "fixed-effect" model" can be considered as a "random-effect" model with a fixed `\(\tau = 0\)`. --- # Worked example #2 - Meta-analysis This dataset contains the results of 32 experiments aiming to assess the role of biomechanical constraints in the visual estimation of distances ([Molto, Nalborczyk, Palluel-Germain & Morgado, 2019](https://psyarxiv.com/2zutq/)). ```r d <- read.csv("meta.csv") head(d, 15) ``` ``` ## study experiment yi vi ## 1 Costello (2015) 1 0.735209366 0.29846545 ## 2 Costello (2015) 2 0.925134052 0.04796738 ## 3 Durgin & Russell (2008) 1 -0.124624069 0.17928177 ## 4 Hutchison & Loomis (2006) 1 -0.128328394 0.10001576 ## 5 Hutchison & Loomis (2006) 2 0.350487089 0.05963093 ## 6 Kirsch & Kunde (2012) 1 0.609278390 0.31866670 ## 7 Kirsch & Kunde (2013a) 1 0.511022255 0.16231426 ## 8 Kirsch & Kunde (2013a) 2 0.407867633 0.04216585 ## 9 Kirsch & Kunde (2013b) 1 0.229159882 0.32082216 ## 10 Kirsch & Kunde (2013b) 2 -0.223592130 0.08876431 ## 11 Kirsch & Kunde (2013b) 3 0.145865089 0.10038189 ## 12 Lessard, Linkenauger & P. (2009) 1 0.172318408 0.09078810 ## 13 Morgado & al. (2013) 1 0.002778791 0.13837508 ## 14 Osiurak & Morgado (2012) 1 0.460865217 0.22233808 ## 15 Proffitt, S, B & Epstein (2003) 1 0.351381281 0.10386001 ``` --- # Worked example #2 - Meta-analysis We can write a first model as follows. $$ `\begin{aligned} y_{j} &\sim \mathrm{Normal}(\mu_{j}, \sigma_{j}) \\ \mu_{j} &= \alpha + \alpha_{study[j]} \\ \alpha_{study[j]} &\sim \mathrm{Normal}(0, \tau_{s}) \\ \alpha &\sim \mathrm{Normal}(0, 1) \\ \tau_{s} &\sim \mathrm{HalfCauchy}(0, 1) \\ \end{aligned}` $$ Where `\(\sigma_{j}^2\)` is the variance of the effect in the `\(j\text{th}\)` study and `\(\alpha\)` is the population effect size. The index `\(\alpha_{study[j]}\)` represents the average effect size in the `\(j\text{th}\)` study. In addition to the sample variance `\(\sigma_{j}^2\)` (which is known), we also estimate the variability of the effect between studies `\(\text{Var}(\alpha_{study}) = \tau_{s}^{2}\)` (level 2). --- # Worked example #2 - Meta-analysis The previous model can be extended to more than two levels to take into account the dependency structures existing in the data. Indeed, each individual study (i.e., each published paper) may contain several experiments. We could expect experiments from the same study to be more similar than experiments from different studies... <img src="meta_structure.png" width="75%" style="display: block; margin: auto;" /> --- # Worked example #2 - Meta-analysis We can write this model on three levels, as follows: $$ `\begin{aligned} y_{ij} &\sim \mathrm{Normal}(\mu_{ij}, \sigma_{ij}) \\ \mu_{ij} &= \alpha + \alpha_{study[j]} + \alpha_{experiment[ij]} \\ \alpha_{study[j]} &\sim \mathrm{Normal}(0, \tau_{s}) \\ \alpha_{experiment[ij]} &\sim \mathrm{Normal}(0, \tau_{e}) \\ \alpha &\sim \mathrm{Normal}(0, 1) \\ \tau_{e}, \tau_{s} &\sim \mathrm{HalfCauchy}(0, 1) \\ \end{aligned}` $$ In addition to the residual variability `\(\sigma_{ij}\)`, we also estimate two other sources of variation: the variation of effects between different experiments from the same study `\(\text{Var}(\alpha_{experiment}) = \tau_{e}^{2}\)` (level 2) and the variation of the effect between different studies `\(\text{Var}(\alpha_{study}) = \tau_{s}^{2}\)` (level 3). --- # Worked example #2 - Meta-analysis This model is easily fitted with `brms`. ```r prior4 <- c( prior(normal(0, 1), coef = intercept), prior(cauchy(0, 1), class = sd) ) mod4 <- brm( yi | se(sqrt(vi) ) ~ 0 + intercept + (1|study) + (1|experiment), data = d, prior = prior4, save_all_pars = TRUE, warmup = 2000, iter = 1e4, cores = parallel::detectCores(), control = list(adapt_delta = .99) ) ``` --- # Worked example #2 - Meta-analysis ```r posterior_summary(mod4, pars = c("^b_", "^sd_"), probs = c(0.025, 0.975) ) ``` ``` ## Estimate Est.Error Q2.5 Q97.5 ## b_intercept 0.4155609 0.22557013 -0.03298233 0.8714997 ## sd_experiment__Intercept 0.3551237 0.26368540 0.05669669 1.0527606 ## sd_study__Intercept 0.2206812 0.09877247 0.03555515 0.4310317 ``` --- # Worked example #2 - Meta-analysis ```r mod4 %>% plot(combo = c("hist", "trace"), theme = theme_bw(base_size = 18) ) ``` <img src="RGUG2019_files/figure-html/unnamed-chunk-47-1.svg" width="66%" style="display: block; margin: auto;" /> --- # Worked example #2 - Meta-analysis <img src="forest.png" width="50%" style="display: block; margin: auto;" /> --- # Further resources A list of blogposts about `brms`: https://paul-buerkner.github.io/blog/brms-blogposts/ The article introducing `brms` ([Bürkner, 2017](https://www.jstatsoft.org/article/view/v080i01)) as well as the "advanced" version ([Bürkner, 2018](https://journal.r-project.org/archive/2018/RJ-2018-017/index.html)). A tutorial on ordinal logistic regression models using `brms` ([Bürkner & Vuorre, 2018](https://journals.sagepub.com/doi/10.1177/2515245918823199)). Our tutorial paper on Bayesian multilevel models using `brms` ([Nalborczyk et al., 2019](https://pubs.asha.org/doi/10.1044/2018_JSLHR-S-18-0006), [preprint](https://psyarxiv.com/guhsa/)) and another one ([Vasishth et al., 2018](https://www.sciencedirect.com/science/article/pii/S0095447017302310)). The materials of our 2019 doctoral course "Introduction à la modélisation statistique bayésienne" (in French, [slides + code](https://github.com/lnalborczyk/IMSB2019)). --- # Take-home messages <!-- <link rel="stylesheet" href="http://maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css"> <link rel="stylesheet" href="https://cdn.rawgit.com/jpswalsh/academicons/master/css/academicons.min.css"> <link rel = "stylesheet" href = "css/font-awesome.css"/> <link rel = "stylesheet" href = "css/academicons.css"/> --> * **Bayesian inference**: A general statistical inference framework relying on the use of Bayes theorem (i.e., basic probability rules). Can be seen as an extension of logic, where conclusions are derived from premises (prior information) and evidence (information contained in the data). * **Multilevel models**: Hierarchical (or not) extensions of the linear model taking into account complex data structures to provide more precise estimations via partial pooling (cf. the concept of *shrinkage*). * **The brms package**: Bayesian equivalent of `lme4` (but way more flexible), providing an interface to the probabilistic language `Stan`. Allows fitting complex non-linear multilevel models. <br>
[<font>lnalborczyk</font>](https://twitter.com/lnalborczyk)
[<font>lnalborczyk</font>](https://github.com/lnalborczyk)
[<font>https://osf.io/ba8xt</font>](https://osf.io/profile/)
[<font>www.barelysignificant.com</font>](http://www.barelysignificant.com) <!-- <i class="fa fa-twitter"></i> [<font>lnalborczyk</font>](https://twitter.com/lnalborczyk) <i class="fa fa-github"></i> [<font>lnalborczyk</font>](https://github.com/lnalborczyk) <i class = "ai ai-osf"></i> [<font>https://osf.io/ba8xt</font>](https://osf.io/profile/) <i class = "fa fa-globe"></i> [<font>www.barelysignificant.com</font>](http://www.barelysignificant.com) -->