Chapter 1 Theoretical framework

s you read these words, you might notice the presence of an inner voice. This phenomenon, albeit occurring on a daily basis, usually remains unnoticed until we pay attention to it. However, if I ask you to focus on that little voice while reading these lines, you would perhaps be able to provide a relatively fine-grained description of this phenomenon. Whose voice is it? Is it yours? Is it gendered? It is often possible to examine these aspects as well as lower-level features such as the tone, pitch, tempo, or virtually any sensory aspect of this voice. The phenomenological observations we can make about our inner voice reveal that inner speech is (or can be) accompanied by sensory percepts (e.g., speech sounds, kinaesthetic feelings). It thus raises another set of fascinating questions about the origin and nature of inner speech percepts. Where do these percepts come from? Why do they look like the one we experience when we speak overtly?

This first set of questions refer to the nature of inner speech, that is, to what it is. In the present work, we are mostly concerned with these questions. Another set of issues revolve around the question of the functions of inner speech, that is, what it is for. The influential Vygotskian theory of inner speech development suggests that inner speech evolved from so-called egocentric speech (i.e., self-addressed overt speech or private speech) during childhood. As such, for the present purpose, in the same line as Fernyhough (2004) or Alderson-Day & Fernyhough (2015), we assume that the functions of inner speech are inherited from those of egocentric speech via a process of progressive internalisation. The specific features of this internalisation process are worthy of investigation on their own (and we briefly discuss them later on). However, we are mostly interested here in the what is (i.e., the nature) question. Thus, we will only sparsely address the question of the functions of inner speech.

That being said, a lot can be learned about inner speech by looking at situations in which these functions deviate from their original trajectory. These dysfunctions are instances of inner speech where its (functional or adaptive) functions, such as problem-solving, self-regulation or planning do not work as intended. These dysfunctional instances of inner speech may include auditory verbal hallucinations (AVHs; for a detailed investigation of the relation between inner speech and AVHs, see Rapin, 2011), where the sense of agency (i.e., feeling who the author of the internal speech is) is impaired, or repetitive negative thinking such as worry or rumination, where the ability to control (or to disengage from) negative thoughts is impaired. In the present work, we investigate (some of) the psychophysiological correlates of rumination, starting with the theoretical assumption that rumination can be considered as a form of inner speech. Therefore, we study rumination as we would study inner speech, with the potential of refining our understanding of both rumination and inner speech.

Rumination is implicated in the development and maintenance of several psychiatric disorders such as depression or anxiety. For instance, rumination has been shown to be associated with the development, severity and maintenance of depressive episodes (e.g., Treynor, Gonzalez, & Nolen-Hoeksema, 2003; Nolen-Hoeksema, 2000; Nolen-Hoeksema, Wisco, & Lyubomirsky, 2008). Given the central role of rumination in depression and the societal importance of depression (both in terms of lifetime prevalence and associated costs), rumination has been considered a key target in modern cognitive and behavioural therapies (e.g., Watkins, 2016). However, although rumination has mainly been studied in the framework of depression and anxiety, it has been suggested to be a key process in many other disorders (e.g., Baeyens, Kornacka, & Douilliez, 2012; Ehring & Watkins, 2008; Watkins, 2008). Thus, rumination can generally be understood as a transdiagnostic process (i.e., a process that is not specific to a single disorder).

In this first chapter, we briefly review the main theoretical frameworks in which rumination has been studied. We then review the historical and contemporary accounts of inner speech and suggest how rumination can be considered and studied as a form of inner speech. We then broaden the discussion by considering the analogy between inner speech and the more general phenomenon of motor imagery. Finally, we discuss how electromyography can be used (and has been used) to investigate covert actions (including inner speech), before moving to a brief introduction to the technical aspects of the present work (cf. Chapter 2).

1.1 Rumination: theories and measures

1.1.1 Theoretical perspectives on rumination

It is intuitively straightforward to understand how the mental rehearsing of negative content might impair cognitive functioning and worsen negative affects. Repetitively thinking about why you were unable to solve that sudoku during breakfast might lead to sustained negative affects throughout the day. However, research on rumination suggests that the process of thinking (i.e., how we think) about a certain content rather than the content of the thought (i.e., what we think about) is a more accurate predictor of the cognitive and affective consequences of repetitive negative thinking. Accordingly, rumination is described as a repetitive and passive thinking process that is focused on negative content. Whereas this definition is general enough to encompass several conceptualisations of rumination, it does not tell much about its functions or mechanisms. In this section, we review the most important theoretical models that have been proposed to explain the origin and the role of rumination in psychopathology. We do not aim to provide an exhaustive review of the existing theoretical perspectives on rumination. Instead, we refer the reader to more extensive work (e.g., reviews or books) when appropriate.

One of the most prolific model of rumination is the response styles theory (Nolen-Hoeksema, 1991). This theory was developed to explain the relation between rumination and depression, as well as to account for gender differences in the way individuals respond to negative affects. Indeed, it has been suggested that female individuals would be more likely to ruminate in response to negative affects whereas male individuals would be more likely to distract themselves. The tendency for female individuals to ruminate more than male individuals has been confirmed and quantified in a recent meta-analysis (Johnson & Whisman, 2013). According to the response styles theory, rumination consists of repetitive and passively thinking about the possible causes and consequences of negative affects. Thus, rumination is conceptualised as a mode of response to negative affects. Importantly, rumination is defined as an unconstructive thinking process, that is, a mode of thinking that does not lead to active problem-solving. Rather, rumination is thought to lead to a fixation on the problems and the feelings evoked by these problems.

The response styles theory suggests that rumination exacerbates and prolongs distress (including depression) through four main mechanisms (as reviewed in Nolen-Hoeksema et al., 2008). First, rumination has been suggested to “enhance” the effects of negative mood on cognition. This mechanism has been confirmed in experimental settings where rumination is induced and compared to distraction (e.g., following the rumination induction procedure developed in Nolen-Hoeksema & Morrow, 1993). In these experimental settings, rumination has been shown to be associated with a negativity bias (i.e., a tendency toward negative interpretations) and to increase the recall of negative autobiographical memories (e.g., Lyubomirsky & Nolen-Hoeksema, 1995; Lyubomirsky, Caldwell, & Nolen-Hoeksema, 1998; Watkins & Teasdale, 2001). Second, rumination has been suggested to interfere with problem-solving abilities. This has been observed in both dysphoric1 participants (e.g., Lyubomirsky & Nolen-Hoeksema, 1995) and clinically depressed participants (e.g., E. R. Watkins & Moulds, 2005). Third, rumination might also interfere with motivation and instrumental behaviour. More precisely, one study has shown that whereas dysphoric ruminators recognise that some activities might be beneficial for their mood, they are unwilling to engage in them (Lyubomirsky & Nolen-Hoeksema, 1993). Finally, rumination has been suggested to erode social support. For instance, Nolen-Hoeksema & Davis (1999) have shown that although chronic ruminators were more likely to reach out for social support, they reported less emotional support from others. According to the response styles theory, rumination is therefore maladaptive in that it worsens negative affects. In the first formulation of this theory, the adaptive alternative to rumination was thought to be distraction, during which the focus of attention is directed away from distress (e.g., by engaging in distractive activities such as sport or group activities). However, the adaptive status of distraction is still a matter of debate (for review, see Nolen-Hoeksema et al., 2008).

Trapnell & Campbell (1999) later attempted to distinguish different forms of rumination based on their outcome. They suggested to make a distinction between rumination and intellectual self-reflection. Whereas the later construct is supposed to reflect a more adaptive component of the self-reflective process, empirical data on that question is not conclusive (Nolen-Hoeksema et al., 2008). Treynor et al. (2003) have suggested, based on a reanalysis of the ruminative response scale (a rumination questionnaire discussed in the next section), that two components of rumination could be distinguished. More precisely, they obtained two factors coined as brooding and reflective pondering. Brooding refers to more negative aspects of self-reflection and a focus on abstract questions such as “Why do I always react the way I do?” and is positively correlated with depression. Pondering refers to a more general self-reflective process, which might be more related to problem-solving abilities. However, pondering has also been show to be positively correlated with depression concurrently (but to be negatively correlated to depression longitudinally, Treynor et al., 2003).

In another line, self-regulation theories (Carver & Scheier, 1998; Martin & Tesser, 1996) suggest that rumination is triggered by perceived discrepancies between one’s current state and a desired goal or state. For instance, if a researcher has the goal of publishing her research in a prestigious academic journal but has virulent discussions with reviewer #2, she is likely to focus on and to repetitively think about the discrepancy between her goal (publishing the paper) and her current state (having endless discussions with a critical peer). In that situation, the self-focused thinking might end either when the researcher acts in the direction of reducing the discrepancy between the situation and the goal (e.g., by complying with the reviewer’s requests) or by giving up on her initial goal. In any case, self-focused thinking would therefore be instrumental, in the sense that it would help to resolve the discrepancy. However, the researcher might also continue to focus on the discrepancies between her desired state and the current state in a passive way. In that situation, the discrepancy might persist and she might experience negative affects. Thus, self-regulation theories suggest that rumination can be either adaptive or maladaptive. In brief, rumination is adaptive when it leads to (efficient) problem-solving but is maladaptive when it does not lead to (efficient) problem-solving.

Another attempt to distinguish different types of rumination according to their outcome has been developed by Edward Watkins and colleagues, building upon Teasdale (1999)’s work on emotional processing modes. The theory of processing mode (Watkins, 2004, 2008) makes a distinction between two types of rumination. The first type of rumination involves abstract and evaluative thoughts about oneself (e.g., thinking about the causes, meanings and consequences of). The second type of rumination involves non-evaluative and concrete thoughts about present experiences (e.g., focusing on the experience of). A number of studies have confirmed that different forms of rumination might be distinguished according to their adaptive or maladaptive outcomes (for review, see Watkins, 2008). These results (amongst others) constitute the theoretical basis upon which rumination-focused therapies have been developed (e.g., Watkins, 2015, 2016).

So far, we have defined rumination as either a trait, a stable and habitual mode of response (response styles theory), or as momentary thoughts that are triggered by goal-state discrepancies (self-regulation theories). In other words, the former explains how rumination can be considered as a stable mode of response whereas the later explains how rumination might start. However, there has been a few attempts to integrate these two views in a common framework. One promising integrative approach has been proposed in the form of the habit-goal framework of depressive rumination (Watkins & Nolen-Hoeksema, 2014). This framework is built on the idea that rumination could be explicitly considered as a mental habit (Hertel, 2004). In classical conditioning and learning theories, a stimulus-response habit is formed when a response is repetitively associated with a stimulus (and when this association is reinforced). An important aspect of habits is their automaticity and the lack of awareness attached to them. Indeed, habitual responses are evoked “automatically” (i.e., without conscious effort) by contextual cues. Moreover, as habits are usually slow to learn, they are also slow to unlearn (i.e., they are relatively stable over time). The habit-goal framework considers rumination as a form of habitual response to goal-state discrepancies that occur frequently and repetitively in the same emotional context (i.e., depressed mood). Therefore, this framework permits to explain how rumination, while being originally triggered by state-goal discrepancies, might become independent of these goals through repetition. After learning, rumination might simply be “evoked” by contextual cues (e.g., negative mood). This would partially explain why rumination, as a habitual response, is particularly difficult to interrupt. This view of rumination also has implications for rumination-focused therapies (see discussion in Watkins & Nolen-Hoeksema, 2014).

Another line of research is interested in the cognitive correlates of the deficits and biases associated with rumination (e.g., Joormann & Gotlib, 2010; Koster, De Lissnyder, Derakshan, & De Raedt, 2011). One of the central feature of rumination is its perseverative nature (Mor & Daches, 2015). As suggested by Christoff, Irving, Fox, Spreng, & Andrews-Hanna (2016), rumination and other forms of thoughts can be considered in a common conceptual space (see Figure 1.1). This space is built upon two dimensions: the deliberate constraints dimension and the automatic constraints one. These dimensions represent two general mechanisms that allow constraining the contents of mental states and the transitions between them. The first constraint correspond to a deliberate process and is implemented through cognitive control (Miller, 2000).2 The second constraint is referring to more automatic processes like sensory afferences (e.g., visual or auditory saliency). In this framework, rumination is characterised by the highest level of automatic constraints and is spread all along the deliberate constraints dimension. In other words, rumination is characterised by a strong automaticity, which is is coherent with the mental habit view of rumination discussed in the previous section.

Conceptual space of different types of thought according to deliberate and automatic constraints (Figure from Christoff et al., 2016).

Figure 1.1: Conceptual space of different types of thought according to deliberate and automatic constraints (Figure from Christoff et al., 2016).

Accordingly, cognitive theories of rumination have tried to describe the cognitive mechanisms that are associated with rumination and its perseverative nature. These approaches try to answer questions such as: What are the cognitive underpinnings of the tendency to ruminate? What kind of cognitive biases does rumination cause? To answer these questions, the cognitive control processes that are the most often investigated in relation to depression (and rumination) are the ability of i) inhibiting irrelevant content or a prepotent answer, ii) shifting between tasks and iii) updating current working memory content (for reviews, see Mor & Daches, 2015; Grahek, Everaert, Krebs, & Koster, 2018; LeMoult & Gotlib, 2019). Linville (1996) first suggested that deficits in attention inhibition may underlie rumination. This proposition was later confirmed and refined by Joorman and colleagues (e.g., Joormann & Gotlib, 2010; Joormann & Vanderlind, 2014; Joormann, Yoon, & Zetsche, 2007), who have shown that rumination is associated with biases in multiple inhibitory processes. They have shown that rumination is associated with inhibition deficits with mood-congruent (i.e., negative) material. More precisely, they proposed that rumination is associated with a decreased ability to limit the access of irrelevant negative information (inhibition) and to discard negative irrelevant information (updating). Koster et al. (2011) proposed that rumination would be the result of a combination of impaired conflict signalling and impaired attentional control. A conflict usually emerges when self-evaluative negative thinking is cued by internal or external stressors and conflicts with an individual’s goals. According to this model, it is an impaired conflict signalling and an impaired ability to disengage attention from self-relevant negative information that explains prolonged ruminative thinking. This idea has been since corroborated by experimental works showing that difficulty disengaging attention was associated with rumination (e.g., Grafton, Southworth, Watkins, & MacLeod, 2016; Southworth, Grafton, MacLeod, & Watkins, 2017) and by a recent meta-analysis (Zetsche, Bürkner, & Schulze, 2018).

Another view on the relation between cognitive control and rumination has been developed by Whitmer & Gotlib (2013) and is known as the attentional scope model of rumination. In this framework, negative mood would “facilitate” rumination by narrowing the scope of attention. A narrowed scope of attention would limit the number of available thoughts and reduce the ability to inhibit irrelevant information or to switch to other information. In contrast, a broader attentional scope (e.g., caused by positive mood) would increase the array of available thoughts. Although some studies indeed found a narrower attentional breadth following a rumination induction (e.g., Grol, Hertel, Koster, & De Raedt, 2015), it is not clear whether attentional breadth is causally involved in ruminative thinking. For instance, Fang et al. (2018) failed to obtain transfer effects following a visual attentional breadth training.

Overall, a large number of studies has demonstrated that cognitive control abilities are impaired in individuals with a strong propensity to ruminate (trait rumination) or following a rumination induction (state rumination). For instance, Davis & Nolen-Hoeksema (2000) showed that ruminators (in comparison with non-ruminators) committed more errors in the Wisconsin card sorting task, highlighting a lack of cognitive flexibility in ruminators. Another study using a mixed antisaccade task showed an impaired inhibition but intact switching abilities in ruminators (De Lissnyder, Derakshan, De Raedt, & Koster, 2011). Using the Stroop task, Philippot & Brutoux (2008) observed that rumination was associated with impaired inhibition. Moreover, recent results suggest that training inhibition might reduce the negativity bias and state rumination (e.g., Daches & Mor, 2014; Daches, Mor, & Hertel, 2019). Experimental work also demonstrated that difficulties in shifting between different tasks was associated with higher levels of rumination (particularly brooding) in both depressed and nonclinical participants (for reviews, see Koster, Hoorelbeke, Onraedt, Owens, & Derakshan, 2017; LeMoult & Gotlib, 2019; Mor & Daches, 2015; Whitmer & Gotlib, 2013).

Studies using cognitive bias modification also permit to experimentally manipulate information-processing biases to assess their effect on mood and behaviour. For instance, Siegle, Ghinassi, & Thase (2007) showed that participants who received six sessions of cognitive control training (the attention control training and the paced auditory serial attention task, Wells, 2000; Gronwall, 1977) presented reduced levels of rumination after the training. Hoorelbeke & Koster (2017) confirmed this finding by showing that an internet-delivered training of ten sessions led to reduced levels of rumination and depressive symptoms after the training in remitted depressed patients (for a review of cognitive control interventions for depression, see Koster et al., 2017).

In brief and as summarised by van Vugt, van der Velde, & ESM-MERGE Investigators (2018), the cognitive approaches of rumination can be said to be divided into three (non-exclusive) classes. These approaches consider rumination i) as arising from a bias toward negatively valenced information (e.g., Whitmer & Gotlib, 2013), ii) as arising from difficulties in discarding or disengaging from negative and self-relevant information (e.g., Koster et al., 2011; Joormann & Vanderlind, 2014), or iii) as a “habit of thoughts” defined by specific pattern of memory associations (e.g., Cramer et al., 2016). Following the later conception, van Vugt et al. (2018) developed a computational model of rumination implementing the idea that rumination can be considered a maladaptive habit of thought. They showed how rumination can result from particular configurations of memory chunks and their associative structure. This model was able to predict the decline in cognitive task performance observed in depressed patients. Therefore, the computational approach in psychopathology and psychiatry might permit to implement the cognitive models described previously and to make testable predictions about cognitive tasks performance (see also Grahek, Shenhav, Musslick, Krebs, & Koster, 2019, for a mechanistic approach of motivation and cognitive control in depression).

1.1.2 Measures of rumination

In the following, we make a distinction between measures aiming to assess the stable tendency of individuals to engage in rumination (i.e., trait rumination) and measures aiming to assess the presence, quality or intensity of momentary rumination (i.e., state rumination). Likewise, we present and discuss several types of measures, from self-reported measures to physiological measures. For each type of measure, we first present and discuss measures of trait rumination before turning to measures of state rumination.

Rumination has traditionally been assessed through self-administered questionnaires. The most commonly used measure of trait rumination is the Ruminative Response Scale (RRS) of the Response Style Questionnaire (RSQ, Nolen-Hoeksema & Morrow, 1991). The RSQ is an operationalisation of rumination as it was conceptualised in the response styles theory (Nolen-Hoeksema, 1991). The RRS consists of 22 items describing responses to dysphoric mood that are self-focused, symptom-focused, and focused on the causes and consequences of one’s mood. A short version of the scale containing ten items has been shown to be highly related (r = .90) to the full version of the questionnaire (Nolen-Hoeksema & Jackson, 2001). However, it has been argued that the RRS might contain overlapping items between rumination and depression (Treynor et al., 2003). In response to these concerns, Treynor et al. (2003) removed the ambiguous items from the original RRS and conducted a novel factor analysis. This analysis revealed two distinct components: brooding and reflective pondering (as discussed in the previous section).

Based on Watkins (2008)’s distinction between constructive (concrete experiential thinking) and unconstructive (abstract analytical thinking) forms of rumination described previously, Barnard, Watkins, Mackintosh, & Nimmo-Smith (2007) developed the Cambridge Exeter Repetitive Thought Scale (CERTS) to assess different facets of rumination. This questionnaire contains 84 items arranged in three parts assessing i) the context of rumination, ii) the self-evaluation of the functionality of rumination and iii) ruminative processes. The short version of this questionnaire, the Mini-CERTS (Douilliez, Philippot, Heeren, Watkins, & Barnard, 2012), contains 16 items extracted from the third part of the CERTS. These items evaluate more specifically the two dimensions identified by Watkins (2008). Interestingly, the concrete dimension of the Mini-CERTS appears to be related to the brooding dimension of the RRS, whereas no relation was found between the concrete dimension of the Mini-CERTS and other subscales from the RRS (Douilliez et al., 2012).

Several questionnaires have also been developed to assess the tendency to ruminate (i.e., trait rumination) as a transdiagnostic process. This includes (amongst others) the rumination-reflection questionnaire (Trapnell & Campbell, 1999), the repetitive thinking questionnaire (McEvoy, Mahoney, & Moulds, 2010) or the perseverative thinking questionnaire (Ehring et al., 2011). Several other measures have also been developed to assess more specific forms of repetitive thoughts or processes related to ruminative thoughts such as meta-cognitions, thought control or stress or sadness-reactive rumination (for a review of existing measures of rumination, see Luminet, 2004).

Rumination can also be seen as a momentary response (state rumination). The effects of state rumination are usually assessed in laboratory settings where rumination is induced and compared to another (more adaptive) form of emotion regulation such as distraction or problem-solving (for review, see Lyubomirsky, Layous, Chancellor, & Nelson, 2015). Some measures have been developed to assess state rumination but usually in reaction to specific events (e.g., stress-reactive, offence-reactive or sadness-reactive rumination). Moreover, until recently, there was no comprehensive and validated measure of state rumination. Nevertheless, the increasing use of the experience sampling methodology (Csikszentmihalyi & Larson, 1987) to investigate rumination in a more naturalistic environment led to the development of short scales that could be used quickly and repetitively throughout the day. For instance, Moberly & Watkins (2008) operationnalised momentary ruminative thinking using two items. The first item asked participants to rate the extent to which they were focused on their symptoms, consistent with the conceptualisation of rumination of the response styles theory (Nolen-Hoeksema, 1991). The second item asked participants to rate the extent to which they were focused on their problems, consistent with self-regulation theories (Carver & Scheier, 1998; Martin & Tesser, 1996). Moberly & Watkins (2008) considered this two-item measure to reflect “ruminative self-focus”, independently of current (negative) affects. These two items are rated on a scale from 0 (not at all) to 7 (very much), from which a mean score is then computed.3

Very recently, Marchetti, Mor, Chiorri, & Koster (2018) developed the Brief State Rumination Inventory (BSRI) to provide a more comprehensive and validated measure of state rumination. They report two studies showing good reliability and validity of this scale in both its English and Dutch version. This questionnaire is composed of eight Visual Analogue Scales (VASs) ranging from “completely disagree” (numerically recoded as 0) to “totally agree” (numerically recoded as 100). These items are then summed to provide an indicator of momentary rumination. The BSRI is (to the best of our knowledge) the first validated full-length scale assessing momentary rumination.

Overall, the validity of self-report measures is based on the hypothesis that individuals have a reliable access to their internal states. However, we know self-reports increase reconstruction biases (e.g., Brewer, 1986; Conway, 1990). Moreover, we know that individuals usually have a low level of awareness of the cognitive processes that underlie their behaviours (Nisbett & Wilson, 1977). To overcome these difficulties, some authors have attempted to quantify state rumination and trait rumination more objectively, by recording physiological or neuroanatomical correlates of rumination (for review, see Siegle & Thayer, 2003). Peripheral physiological manifestations (e.g., pupil dilation, blood pressure, cardiac rhythm, cardiac variability) have been examined during induced rumination or in association with trait rumination. For instance, a consistent link between perseverative cognition and decreased heart rate variability (HRV) was found in a meta-analysis conducted by Ottaviani et al. (2016). They also observed a positive association between (both trait and state) perseverative cognition and increased heart rate, systolic blood pressure, diastolic blood pressure, and cortisol activity (see also Zoccola & Dickerson, 2012, for a review of the relation between rumination and cortisol).

With regards to state rumination, Vickers & Vogeltanz-Holm (2003) have observed an increased systolic blood pressure after rumination induction, suggesting the involvement of the autonomic nervous system in rumination. Moreover, galvanic skin response has been shown to be increased after a rumination induction in highly anxious women (Sigmon, Dorhofer, Rohan, & Boulard, 2000). According to Siegle & Thayer (2003), disrupted autonomic activity could provide a reliable physiological correlate of rumination. In this vein, Key, Campbell, Bacon, & Gerin (2008) have observed a diminution of the high-frequency component of heart rate variability (HF-HRV) after rumination induction in people with a low tendency to ruminate (see also Woody, McGeary, & Gibb, 2014). Moreover, Zoccola, Rabideau, Figueroa, & Woody (2014) showed that the physiological consequences of rumination might depend on the level of construal (i.e., abstract vs. concrete). More precisely, they showed that an induction of abstract rumination led to lower blood pressure in comparison an induction of concrete rumination. Woody, Smolak, Rabideau, Figueroa, & Zoccola (2015) further showed that the type of ruminative thought (imagery vs. verbal thought) was also associated with distinct physiological outcomes. They observed that verbal ruminative thoughts led to greater increases in heart rate than ruminative thoughts in a visual imagery modality. This effect was moderated by trait rumination and was only present in high ruminators.

In the present work, we used facial surface electromyography (in addition to self-reports) to investigate the muscular correlates of induced rumination. Before turning to a presentation of this experimental work however, we need to discuss why we think rumination can be considered a form of inner speech and how inner speech (and therefore, by inclusion, rumination) can be examined using surface electromyography.

1.1.3 On the verbal and sensory properties of rumination

One of the most salient features of rumination is that it is mostly expressed in a verbal modality (Ehring & Watkins, 2008; Goldwin & Behar, 2012; Goldwin, Behar, & Sibrava, 2013; McLaughlin, Borkovec, & Sibrava, 2007). In other words, while ruminating, we are mostly talking to ourselves silently. However, rumination can also be experienced as visual imagery (Goldwin & Behar, 2012; Newby & Moulds, 2012; Pearson, Brewin, Rhodes, & McCarron, 2008). By “visual imagery” we refer to a process during which perceptual information is retrieved from long-term memory, resulting in the experience of “seeing with the mind’s eye” (Ganis, Thompson, & Kosslyn, 2004). Some authors have suggested that because rumination is usually past-oriented, it should increase access to negative autobiographical memories (Lyubomirsky et al., 1998). Moreover, because autobiographical memories are often experienced as visual images, rumination should likewise include visual features (Pearson et al., 2008). Several studies have obtained results that are consistent with this claim. Among a sample of patients who were diagnosed as clinically depressed, a significant majority (94.7% and more than 70%) reported that rumination combined verbal and sensory elements, among which visual imagery (Newby & Moulds, 2012; Pearson et al., 2008, respectively). When unselected individuals were asked about the quality of their rumination directly while ruminating, 60.53% of them said they had been experiencing verbal thoughts and 35.92% mental visual images (McLaughlin et al., 2007). Another study comparing naturally occurring depressive and anxious thoughts in a non-clinical sample, found that depressive thoughts involved more images than anxious thoughts (Papageorgiou & Wells, 1999). In addition, a recent study demonstrated that a considerable number of people experience depressive cognition in a visual form (Lawrence, Haigh, Siegle, & Schwartz-Mette, 2018). Furthermore, this study showed that individuals with a visual depressive cognitive style reported a similar amount of rumination as individuals with a verbal style. Overall, the existing literature indicates that rumination can have visual features, despite being predominantly verbal.

These observations about the quality of ruminative thoughts are consistent with those concerning worry (e.g., Stöber, 1998; McLaughlin et al., 2007). Indeed the cognitive avoidance theory (Sibrava & Borkovec, 2006) suggest that worry, as a primarily linguistic repetitive thought, can be considered an avoidance response whose goal is to restrain aversive images, thus reducing somatic activation and emotion processing. Similarly, forming negative mental visual images has been shown to lead to a greater increase in anxiety in comparison to forming negative descriptive sentences (Holmes & Mathews, 2005). Taken together, these findings suggest that different modalities of rumination could have different effects on individuals. This idea is supported by studies showing the effectiveness of mental imagery in accessing and modifying emotion in therapy (for an overview, see Hackmann & Holmes, 2004). Overall, investigating the verbal and visual features of rumination could contribute to sharpen our understanding of the ruminative processes and lead to better-adapted therapeutic strategies.

Some of the few studies specifically manipulating verbal and visual rumination were carried out by Zoccola and colleagues (Woody et al., 2015; Zoccola et al., 2014). The verbal or visual form of rumination (or mentation type as these authors refer to it) was induced by playing audio tapes that directed participants’ thoughts. Prompts were similar in both conditions, differing only in the verbal/visual instruction (“Recall the speech task using words, phrases, and sentences.” vs. “Recall the speech task using pictures and images.”). Participants were subsequently asked to estimate the proportion of verbal thoughts and mental visual images. Importantly, it should be noted that in none of the studies in which thinking modality was manipulated, did the participants solely use one type of thought. Even though participants in the imagery group of Zoccola et al. (2014) reported higher levels of mental images in comparison to the participants in the verbal group, the later group also reported a certain level of mental imagery. This is in line with studies showing that rumination includes both verbal and visual components (e.g., Goldwin & Behar, 2012; McLaughlin et al., 2007), implying that it is not exclusively experienced in one modality. These results are substantiated by a recent study which has shown that participants generate visual images both in cases where they were told to visualise or to verbally think, while they have strong verbal representations only when asked to verbally think (Amit, Hoeflin, Hamzah, & Fedorenko, 2017). Amit et al. (2017) concluded that there is a difference in volitional control of verbal and visual thinking and that people have better control over inner speech than visual thought.

To sum up, although rumination might be expressed in different modalities, it is usually expressed in a verbal form. Therefore, we suggest that verbal rumination migh be considered as a form of inner speech. To understand what this assumption implies for the study of rumination, we now turn to a brief historical overview of inner speech research. This historical tour will allow us to introduce the experimental tools that have been used to investigate inner speech throughout history. We will then present the main theoretical perspectives on inner speech and discuss its analogies with the broader phenomenon of motor imagery.

1.2 What is that little voice inside my head?

To begin our investigation with a clear definition, when we use the term of “inner speech”, we refer broadly to the activity of silently talking to oneself. Whereas the exact nature of inner speech is still the matter of lively debates, Gregory (2017) lists some consensual properties of inner speech, namely, that i) inner speech takes place in the mind, ii) an instance of inner speech is a linguistic occurrence, iii) inner speech is episodic (i.e., it occurs at a given moment in time), iv) an episode of inner speech involves mental imagery (may it be auditory, visual, or kinaesthetic imagery), v) inner speech can be used in the service of working memory, vi) inner speech does not necessarily (and often does not) take the form of complete grammatical sentences (cf. our discussion of Vygotsky’s theory of inner speech development in section 1.2.1.2), vii) we do not have the same level of control upon our inner speech than upon our overt speech (whereas it is easy to stop producing external speech, it can be quite arduous to override inner speech).

Whereas many individuals produce inner speech on a daily basis to conduct inner monologues or dialogues, to prepare or to remember conversations, this activity remains nevertheless arduous to investigate in a controlled environment. Like most psychological phenomena, the study of inner speech started with introspective observations (Lœvenbruck et al., 2018; Morin, 2009; Perrone-Bertolotti, Rapin, Lachaux, Baciu, & Lœvenbruck, 2014). At the end of the XIXth century and throughout the XXth century, experimental psychologists gave a new look at inner speech through novel (neuro)physiological methods (we review these findings later on). As a result of being both a multi-facetted phenomenon (inner speech can be expressed in many forms or varieties) and being studied from different perspectives (from philosophy to linguistics and neurosciences), the activity of inner speech has been given many other names such as covert speech, subvocal speech, verbal thinking, implicit speech, internal monologue, internal dialogue, endophasy, speech imagery, auditory verbal imagery, silent talk or silent speech. This plethora of names might be explained by the variety of the activity in itself but also by the relatively vague definition that is usually attached to it.

Indeed, as noted by Vygotsky (1934/2012), the term of inner speech has been used to describe somehow different phenomena. More precisely, Vygotsky (1934/2012) suggested that this term has initially been employed to refer to “verbal memory”, citing for example the “silent recital of a poem known by heart” (p. 238). In that vein, Cardaillac (1830) earlier said: “la parole intérieure n’est que le souvenir de la sensation que produit la parole extérieure” (as quoted in Egger, 1881, p. 53).4 Accordingly, investigations of inner speech conducted throughout the XIXth century mostly revolved around the question of finding how words were reproduced in memory (either as auditory, visual or motor images). Under that view, inner speech is thought to correspond to an “image” of actual (overt) speech and this position may be said to correspond to the imagined speech view described in Gregory (2017).

According to the second perspective listed by Vygotsky (1934/2012), inner speech could be conceived as truncated overt speech, that is, “speech minus sounds” or “subvocal speech” (Watson, 1919). For instance, in line with his reflexologist theory of thought, Sechenov considered inner speech to be an inhibited (motor) reflex and wrote: “I never think directly in words, but always instead in muscular sensation which accompany my thought in the form of a conversation” (cited in Sokolov, 1972, p. 4). It should be noted however, as highlighted by Sokolov (1972), that the behaviourist approach and the reflex approach differ in that the former consider that inner speech “originate” from peripheral muscular activations, whereas the later consider inner speech to result from central (cerebral) processes. According to that latter perspective, the peripheral muscular activity recorded during imagined actions (or inner speech) would be a side-effect of these central processes.5 In that view, inner speech is considered as an actual speech (as overt speech is) and not to correspond to an “image” of overt speech. This position may be said to correspond to the actual speech view described in Gregory (2017).

According to Vygotsky (1934/2012), a third interpretation of inner speech would refer to everything that “precedes the motor act of speaking”. In other words, inner speech would include speech “motives” (or intentions) and the preverbal message that precedes speech production. We will come back to that position briefly when mentioning psycholinguistic models of speech production (e.g., Levelt, 1989) as well as the motor simulation model of motor imagery (e.g., Jeannerod, 2006). However, for the purpose of the current section, we are mostly concerned with the first and second position, namely, the view of inner speech as either imagined or actual speech.

In trying to separate these two views, Gregory (2017) first notes that, phenomenologically, producing inner speech feels like speaking (albeit covertly), and not like imagining speaking. Gregory then lists some further arguments in favour of the actual speech view: i) the embedding argument: we can imagine producing inner speech, but we cannot imagine imagining producing inner speech, therefore inner speech is actual speech (rather than imagined speech), ii) the paralleled case argument: inner speech stands in the same relation to speech in a pretend scenario as overt speech does, therefore, inner speech is also actual speech (for more details, see Gregory, 2017, p. 40), iii) the continuity argument: inner speech sits on a continuum with various kinds of external (and therefore actual) speech, iv) the precisification argument: the imagined speech view leaves too many details unspecified (e.g., who is speaking? In what context?), which is not the case of the actual speech view.

Although we will not directly assess the empirical arguments in favour of either the imagined speech or the actual speech view of inner speech, we wanted to give the reader a clear definition of what we mean by “inner speech” and to present the two main conceptions about the nature of inner speech. We think these two conceptions and the arguments that have been advanced in favour or against each view are important to keep in mind while reviewing the empirical evidence on the topic. In the next section, we will briefly review the historical development of ideas and methods used to describe inner speech, before turning to a description of the developmental mechanisms of inner speech and to contemporary neurocognitive models of inner speech production.

1.2.1 Historical overview of inner speech investigations

1.2.1.1 From introspection to experimental psychology

The question of the relation and intertwinement of thought and language is one of the most enduring philosophical question. Most notable reflections can be traced back to Plato’s Theaetetus, in which Plato defines thinking as “the conversation which the soul holds with itself in considering anything”. For Plato, the definition of thinking is taken to correspond to “word[s] spoken in silence”. Sokolov (1972) notes that ancient thinkers, by noticing a relation between thoughts and words, and between words and breathing, used to think that thoughts and words originated in the lungs. For instance, Socrates, in Plato’s Phaedrus, said that “his chest is full of thoughts” (as quoted in Sokolov, 1972, p. 14). In another context, by noting the progressive internalisation of external speech into inner speech during normal development, Egger (1881) wonders whether the phylogeny (the evolution of the species) followed this same course of development. In support of that idea, Egger (1881), citing Maspero (1875), reports the existence of an ancient Egyptian ideogram, representing a crouched man, with the right hand close to the mouth. Egger (1881) explains that this ideogram was meant to represent undistinctly the ideas of eating, drinking, screaming, talking, meditating, knowing or judging, suggesting that thought, during the Egyptian Ancient Empire, was considered to be associated with the mouth organ (p. 84).

Somehow consistently with that idea, Stricker (1880) reported (based on his own introspections) that he was not able to mentally produce speech sounds without contracting his articulators. He also reported not being able to produce two different speech sounds, or to produce speech sounds that were incongruent with movement of the articulators. To give a reproducible example of his intuition, Stricker (1880) suggested the following experiment: open your mouth and try to pronounce a word including labials or dentals, such as “bubble” or “toddle”. Ask yourself whether the image of the word (your inner speech) is clear or distinct? According to Stricker, most people would find it very difficult to imagine cleary these words with the mouth being open. Instead, the image of the work is rather imprecise and sounds like we were trying to produce (overtly) the word while keeping the mouth open. This sensation was already nicely described and analysed by Bain (1855):

“When we recall the impression of a word or sentence, if we do not speak it out, we feel the twitter of the organs just about to come to that point. The articulating parts –the larynx, the tongue, the lips,– are all sensibly excited; a suppressed articulation is in fact the material of our recollection, the intellectual manifestation, the idea of speech.”

James (1890) then notes that Stricker, “Like most psychologists, however, […] makes of his personal peculiarities a rule, and says that verbal thinking is normally and universally an exclusively motor representation.” Indeed, Paulhan (1886) replied to Stricker that he was able to produce overtly the phoneme /a/ while simultaneously being able to get and maintain the mental image of any other vowel. He also reported that he was able to imagine the sound of any vowel without motor actions or feelings (images). On a similar note, Egger (1881) believed inner speech to exist independently of motor phenomena and to be based predominantly on auditory representations. He noticed that although inner speech may be accompanied by vivid auditory imagery, inner speech is also very different from overt (external) speech, with inner speech being usually shorter and less grammatically structured than overt speech (we will come back to that observation later when discussing the development of inner speech).

In an attempt to reconcile the view of Stricker (1880) for whom inner speech was purely motor with the view of Paulhan (1886) and Egger (1881), Ballet (1886) suggested (as James, 1890), that these authors probably generalised to the population what they observed on themselves. Ballet then asserted that the predominance of motor over sensory representations (or the reverse) might be a question of individual differences. We might add that the relative predominance of motor or sensory representations during inner speech might also be due to individual differences in the phenomenological sensitivity to some specific representation (some might be very acute in discriminating similar auditory images while not being able to discriminate similar visual images) and to contextual differences.6 Nonetheless, for many authors, this debate highlighted the limitations of the introspective method (e.g., Reed, 1916). To be able to decide between different individual experiences and interpretations, some researchers therefore tried to find more objective methods to assess inner speech, or as put by Reed (1916), to go beyond introspection and to start looking for “the stamp of objective certainty”. With this ambitious goal in mind, Reed (1916) described the apparatus he used to examine tongue movements (see Figure 1.2). Reed then reported the results of an experiment aiming to examine the involvement of inner speech (and speech motor processes) in thinking.

Figure 1 \& 2 from Reed (1916) describing the apparatus used to record tongue movements during thinking and inner speech.

Figure 1.2: Figure 1 & 2 from Reed (1916) describing the apparatus used to record tongue movements during thinking and inner speech.

Reed (1916) observed that while reading, his participants were moving their tongue and lips (and were sometimes whispering). These observations, in addition to the behaviourist revolution in Psychology paved the way for new lines of research. The initial suggestion of Watson (1913) that “thought processes are really motor habits in the larynx” led to a fruitful line of research about the muscular bases and/or correlates of thought and inner speech. Sokolov (1972) gives an overview of the experiments carried out at the beginning of the XXth century in that perspective. For instance, Dodge (1896) anesthetised his lips and tongue and realised that it did not have any impact on his inner speech. Curtis (1900) and Courten (1902) recorded laryngeal movements using a pneumatic drum and a kymograph while their participants recited verses or were reading. They observed that laryngeal movements were not always present and depended on what was being read and/or produced, as well as on the “degree of understanding” of the participant (for further references, see Sokolov, 1972, pp. 43–45).

Using a galvanometer and electrodes inserted in the tip of the tongue, in the cheek, or under the lip, Jacobson (1931) recorded muscular action potential while participants were asked to produce verbal content covertly (e.g., counting or reciting a poem), but not during relaxation. Interestingly, Jacobson (1931) adds that “the series of vibrations during the mental activity occur in patterns evidently corresponding with those present during actual speech.” More precisely, the pattern of muscular of activity recorded during inner speech production was similar to the pattern of muscular activity recorded during overt speech production, but of lesser amplitude.

Throughout the present section, we briefly reviewed the history of ideas and methods used to describe inner speech in the second part of the XIXth century and at the beginning of the XXth century. In the next section, we make a brief pause in our historical tour to discuss the developmental trajectory of inner speech. How and when do we (humans) acquire the ability to talk to ourselves silently? Is it even acquired? To answer these questions, we will briefly review Vygotsky’s theory of inner speech development and some of its more recent refinements. Moreover, by examining how inner speech develops, we might gain new insights about the characteristics of inner speech in the adult mind.

1.2.1.2 Interlude: the development of inner speech

The developmental course of inner speech was possibly the most investigated issue related to inner speech in the first part of the XXth century. Among many, Watson, Piaget, Luria, Leontiev, and most famously Vygotsky confronted this question. Watson (1919) suggested that thought was rooted in (overt) speech, with maturation leading from speech to thought (where thought is to be understood as a synonym to inner speech, in Watson’s terminology). This hypothesis also applied to reading, with the novice reader reading overtly and progressively shifting to silent reading. For Vygotsky, the study of inner speech in the mature (i.e., adult) brain could only be understood from a developmental perspective. In the last chapter of his book Thought and Language, Vygotsky analyses the relationship between thought and word in the mature mind. The central idea of this chapter is stated as follows:

“The relationship between thought and language is not a thing, but a process, a continual movement back and forth from thought to word and from word to thought. Viewed in the light of a psychological analysis, this relation is a process that passes through a series of phases and stages, during which its essential features undergo changes that may be called development in the strict sense. Of course, this is a functional development, not development in the sense of aging; but the path traversed by thinking as a process from thought to word is development nonetheless.”

Fundamentally, Vygostky believed that language was a psychological tool and that its development during childhood interacts with the development of abstract thinking. Vygotsky observed, as Piaget before him, that children tend to speak (aloud) to themselves while playing. Piaget characterised this form of speech as “egocentric speech” because in this form of speech, according to Piaget, the child does not try to take the perspective of the listener. Piaget thought this form of speech to disappear at the age of seven or eight. In contrast, Vygotsky thought that the so-called egocentric speech (or private speech) continues but that it becomes more and more internalised, until reaching the status of “inner” speech. For Vygostky, this internalisation process starts with social speech, that is speech addressed to others. During development, this form of speech evolves to either communicative speech (speech addressed to others) and so-called egocentric speech (speech addressed overtly to oneself). This form of speech appears naturally in children in situations when they are being faced with a problem to solve, but also in adults faced with difficult problems. This egocentric speech would then became internalised, resulting in what we call inner speech. This led Vygostky to claim a functional equivalence between egocentric speech and internal speech, the later resulting from a progressive internalisation of the former.

However and importantly, this internalisation process does not only entail a movement from the outside to the inside but also entails a transformation of speech, or, as put by Vygotsky, an “internal reconstruction of an external operation”. Therefore, for Vygostky, it follows that the passage from inner speech to overt speech consists not in simply “vocalising” inner speech but in restructuring inner speech (e.g., retrieving a syntax proper to overt speech, retrieving the phonetic structure, etc). According to Vygotsky (1934/2012), inner speech is described by some essential properties such as: i) abbreviation: the phonetic aspect is “diminished”, reduced: “In inner speech we do not need to pronounce a word in its entirety. We understand, by virtue of our very intention, what word we wanted to say […] Strictly speaking, inner speech is almost wordless”, ii) predicativeness, “Psychologically, inner speech consists of predicates only”; “the subject of our inner reason is always present in our thought”; it is always implicitly understood, iii) it has a semantic structure of its own: predominance of sense over meaning,7 it is idiomatic, agglutination of semantic units (several words can be “merged” into a single word), and infusion of sense into a word (a word in inner speech becomes “loaded” with more associations than in conventional use).

Interestingly, Vygotsky rejected both the verbal memory view of inner speech (i.e., inner speech is simply the retrieval of acoustic, optic or motor images of words) and the behaviourist view of inner speech as merely a soundless form of external speech (à la Watson). For Vygotsy, the most determining factors of inner speech are its semantic (psychological) features, as expressed by his famous dictum: Thought is not expressed in words; it comes into existence through them.

More recently, Fernyhough (2004) proposed an extension of Vygotsky’s three-level model of inner speech development (i.e., external speech, egocentric speech, inner speech) to a four-level model, from external dialogue to private speech, expanded inner speech and condensed inner speech (see Figure 1.3). Fernyhough (2004) notes that this model describes stages in the development of inner speech (during childhood) but also movements “between the levels at any given point in time”. Indeed, it is possible to “move” between levels under certain conditions. For instance, in cognitively demanding conditions, we can observe transitions between levels, with condensed inner speech being transformed to expanded inner speech and even private speech through a process of “re-expansion”. This idea is supported by many studies showing an progressive externalisation of inner speech under cognitively demanding situations (e.g., Sokolov, 1972).

Stages of internalisation. Figure from Fernyhough (2004).

Figure 1.3: Stages of internalisation. Figure from Fernyhough (2004).

To sum up, it is suggested that inner speech (in the adult mind) is the result of a progressive internalisation process. This internalisation process covers different stages or expressions of speech from social speech, self-addressed speech (private speech or egocentric speech) to inner speech (first in a fully expanded form and then in a more condensed form). Being an internalised version of private speech, inner speech is hypothesised to be attached with the same functions as private speech. In other words, adults use inner speech with the same goals as they previously used (during childhood) overt private speech. Importantly, this internalisation process does not only entail an internalisation but also a transformation of the way speech is expressed: the characteristics of inner speech are distinguishable from the characteristics of overt (private) speech. Interestingly, these different levels (or stages) in the internalisation process, in addition to describing stages of development, also describe “movements” that can be performed between levels or stages. More precisely, the externalisation of inner speech would entail the inverse transformation that has been applied during the internalisation of private speech. In the next section, we come back to our historical perspective by reviewing inner speech research that has been carried out in the second part of the XXth century, before turning to an overview of the main theoretical perspectives about inner speech production.

1.2.1.3 Inner speech research from 1950 to present days

Following the pioneering work of Jacobson (1931), the second part of the XXth century witnessed an upsurge of electrophysiological methods (and especially of electromyography)8 to study the production of inner speech. Interestingly, the dominant interpretation of the muscular correlates of inner speech (as identified by Jacobson, 1931) at the beginning of the last century was that the peripheral muscular activity observed during imagined actions was the source of the mental content. However, as explained by Jeannerod (2006), this interpretation of mental processes as a consequence of peripheral feedback is now disproved, for instance by the simple fact that many people can experiment inner speech (or motor imagery) without any visible muscular activity. From there, one can ask whether the peripheral muscular activity observed during inner speech is necessary to the production of inner speech, or rather can be considered a consequence of inner speech production. As pinpointed by Cohen (1986), to prove that a pattern of motor activity is necessary for some mental activity, it is not enough to show that this pattern is always associated with the mental activity, we also have to show that when the pattern of motor activity is disrupted, the mental activity is in turn disrupted. In that vein, the peripheralist interpretation of the motor correlates of inner speech (see Box ) has been disproved by the heroic experiment carried out by Smith, Brown, Toman, & Goodman (1947). Smith used d-tubocurarine (curare) to paralyse his own facial muscles in order to test whether peripheral muscular activation was necessary to inner speech. He reported that, while being paralysed, he was still able to think in words and to solve mathematical problems (these results echo those of Dodge, 1896, mentioned earlier).

Another way of looking at the motor correlates of inner speech production is to assume that these correlates are instead a consequence of central processes involved in inner speech production. As such, a disruption of these correlates does not necessary entail a disruption of the ongoing mental processes. Depending on the framework, these peripheral correlates might be considered as either necessary at the first stages of development of inner speech (as in behaviourist views of inner speech) or not necessary at all in other centralist perspectives such as the Russian reflexology or the more recent simulation or emulation frameworks. In these simulationnist frameworks, the peripheral muscular activity observed during inner speech production (or motor imagery) may be hypothesised to be the result of an incomplete inhibition of motor output during the mental states involving motor simulation (although the precise nature of these inhibitory mechanisms is still the matter of debates, cf. section 1.2.3).

Another fruitful line of research consisted in using mental chronometry (i.e., the timing of mental operations) to examine the cognitive processes underlying inner speech production. The logic underlying this paradigm is that if inner speech and overt speech production involve the same (or the same kind of) cognitive processes, their production should therefore take approximately the same time. By varying the conditions in which inner (or overt) speech is to be produced and by noticing the temporal equivalence (or non-equivalence) between inner and overt speech, we can infer whether the underlying cognitive processes are (dis)similar and how they are impacted by contextual demands. In that vein, Landauer (1962) has shown (in a single subject) that it takes approximately the same amount of time to say the alphabet (or series of numbers) aloud as it takes to produce it innerly. Similarly, Weber & Bach (1969) and Weber & Castleman (1970) have shown that the rate of inner speech and overt speech is approximately the same (around 6 to 6.5 letters per second in these experiments). However, other researchers have obtained opposite findings with inner speech being faster to produce than overt speech (e.g., Anderson, 1982; Coltheart, 1999; Korba, 1990; Mackay, 1981). More recently, Netsell, Kleinsasser, & Daniel (2016) have examined the rate of spontaneous speech production in both overt and covert modes. They asked participants to produce the first thing that came to their mind and observed that the rate of inner speech (around 5.8 syllables / second) was faster than the rate of overt speech (around 5.2 syllables / second). They suggest that this difference may be due to the time taken to effectively move the articulators during overt speech production (whereas these movements are inhibited during inner speech production). However, they also highlight that the rate of inner speech and the temporal equivalence between inner speech and overt speech may be affected by i) the type of speaking task (i.e., whether the task consists in reciting some learned verbal material or novel material) and ii) the form of inner speech (e.g., condensed vs. expanded inner speech). More precisely, they suggest that the rate of inner speech should be faster for learned material than for novel material and that condensed inner speech should be faster than expanded inner speech.

MacKay (1992) notes that the faster rate that is usually observed for inner speech in comparison to overt speech reminds of the faster rates that also occur for other highly trained skills (e.g., tying a shoelace). Indeed, the fact that inner speech is usually faster than overt speech (or that some forms of inner speech are faster than overt speech) and the fact that the chronometric similarity between inner speech and overt speech may be affected by the task echo findings from the field of motor imagery studies. In their review of the determinants of the temporal equivalence (or non-equivalence) between overt and covert actions, Guillot et al. (2012b) have clearly identified that this temporal equivalence may be affected by the type of action to be performed and the form of imagery. For instance, they suggest that there exists a sigmoidal relation between the duration of the overt action and the duration of the covert action, with short actions (less than a few seconds) being usually overestimated, medium action showing an isochrony in overt and covert modes and longer actions (more than 30 seconds) being usually underestimated in motor imagery (cf. Figure 1.4). In addition to the duration of the movement, Guillot et al. (2012b) suggest that environmental constraints (e.g., temporal constraints, circadian rhythms), motor imagery content (e.g., imagery type, imagery perspective), individual strategy (e.g., where the focus of attention is), individual characteristics (e.g., expertise level, age) and motor skills characteristics (e.g., task duration, task difficulty) may also affect the duration of covert actions and the temporal congruence between overt and covert actions. Accordingly, the rate of inner speech (and its correspondence to overt speech rate) might depend, as suggested by Netsell et al. (2016), on the type of inner speech to produce, on the length of the material to be produced as well as on individual characteristics (e.g., age, expertise).

Relationship between the actual duration of a movement and its mental representation. Figure from Guillot et al. (2012).

Figure 1.4: Relationship between the actual duration of a movement and its mental representation. Figure from Guillot et al. (2012).

In addition to mental chronometry, many authors in the second part of the XXth century turned to psychophysiological methods to investigate inner speech. The idea that the production of inner speech may involve the speech motor system is supported by many studies showing peripheral muscular activation during inner speech production (as reviewed for instance in Garrity, 1977; Locke, 1970; Sokolov, 1972). Among these, Faaborg-Andersen, Edfeldt, & Nykøbing (1958) and McGuigan & Rodier (1968) found an increase in peripheral muscular activity in the speech muscles during silent reading. Interestingly, this activity was more strongly marked for novice readers or for difficult material. Locke & Fehr (1970) compared the electromyographic correlates of subvocal speech (inner speech) during the (visual) presentation and rehearsal of disyllabic words that either contain or do not contain labial phonemes. They observed a greater EMG amplitude recorded over a “chin-lip” site during the presentation and rehearsal of labial words than for non-labial words.

In his seminal book, Sokolov (1972) meticulously describes a series of experiments conducted in order to examine the relation between inner speech and thought. Sokolov (1972) starts with a review of previous theories about the relation between speech and thought, before turning to the specific question of inner speech. He then presents his experimental work under two main parts. First, Sokolov (1972) used articulatory suppression9 to interfere with mental activity (e.g., perception, memorisation, thinking). Second, he used electromyography to investigate the involvement of the speech motor system during inner speech as well as in verbal and concrete thinking.

Summarising the studies using articulatory suppression, Sokolov (1972) notes that “mechanical retardation of external articulation (speech movements of lips and tongue) has an insignificant effect on the performance of mental tasks by adults; in many cases it has no effect at all. In children, the mechanical retardation of articulation has a noticeable negative effect” (p. 152). This result is coherent with the idea of a progressive internalisation of inner speech, that would become more and more independent from the speech motor system throughout development (and thus less affected by motor constraints). However, Sokolov notes that articulated speech and verbal-auditory stimuli have a strong effect on memory (p. 152). Moreover, Sokolov discusses some of his previous experimental work showing that motor interference (e.g., articulatory suppression) ceases to be efficient when the mental activity (inner speech) is automatised (e.g., rehearsing a poem learned by heart). In addition to age and expertise, Sokolov discusses findings from Teplov, who observed that the involvement of the speech motor system during inner speech might vary according to the “voluntariness” (deliberateness) of the speech to be produced. According to Teplov, the speech motor system would be necessarily involved during voluntary inner singing (a musical form of inner speech) whereas it may or may not be involved during involuntary inner singing (Sokolov, 1972, p. 51).

Using electromyography, Sokolov (1972) also provided seminal observations that inner speech is involved during reading, to an extent that is directly related to the difficulty of the ongoing reading task (as observed previously by Faaborg-Andersen et al., 1958). More precisely, he observed that the more difficult the task was, the stronger the “speech motor impulses” (i.e., the EMG amplitude) in the speech muscles. Moreover, the difficulty of the task was also related to the abbreviatedness of inner speech. Simpler reading tasks were associated with abbreviated (condensed)10 inner speech whereas difficult tasks were associated with “unfolded” (expanded) inner speech, and sometimes externalised (overt) speech. Sokolov later says (on p.202):

“[…] thus, it is evident that both the degree to which mental operations are automatized and the degree of complexity of the operations being performed can be assessed with a high degree of probability [confidence] on the basis of the intensity of hidden motor speech reactions.”

Moreover, Sokolov observes that the muscular activity associated with inner speech production decreases when the verbal material is repeated many times (pp. 200-201). It increases again when new content is to be produced. For instance, he observes an important muscular activity during the reading of a new text, whereas this activity decreases when reading the text again. Interestingly, this reduction of peripheral muscular activity as a function of repetition may be countered by the instruction given to the participant. For instance, when the participant is given the instruction to “read it more attentively” or to “memorize it more accurately”, the reading of a known text results in similar peripheral muscular activity (in the speech muscles) as for the reading of a novel text (read without such instructions).

To summarise previous (i.e., anterior to Sokolov) research, articulatory suppression and electromyographic investigations conducted by Sokolov (1972), the involvement of the speech motor system during inner speech may vary according to the content of the verbal material, to characteristics of the task as well as to individual characteristics. More precisely, the intensity of “motor speech impulses” (in Sokolov’s terms) may be intensified or reduced depending on i) the difficulty and novelty of the mental tasks being performed, ii) the degree of automatisation, iii) the inclusion of visual elements (whether the task is purely verbal or not), iv) individual disposition toward specific types of imagery. We could also add to these factors the age of the participant, with an involvement of the speech motor system possibly being a decreasing function of age. Overall, these findings are coherent with the idea of a progressive internalisation of speech into inner speech, which led Sokolov to state that “inner speech is nothing but speech to oneself” and that it can be considered as an internalisation, a psychological transformation or an “internal projection” of overt speech (Sokolov, 1972). Sokolov concludes his work by stating that inner speech is “the principal mechanism of thought” and “an essential factor to human consciousness” (Sokolov, 1972, p. 262).

Following seminal work by Jacobson (1931) and Sokolov (1972), the 70s and 80s witnessed an upsurge of electromyographic studies of inner speech production. For instance, McGuigan & Winstead (1974) recorded both lip and tongue EMG activity during the reading, viewing, memorising or recalling of either bilabial or lingual-alveaolar verbal material. They observed a double dissociation with the bilabial material being associated with a greater EMG amplitude recorded over the lip and the lingual-alveolar being associated being associated with a greater EMG amplitude recorded over the tongue (whereas EMG amplitude recorded over the arm or the leg did not show these condition-specific changes). Similarly, Garrity (1975) observed a greater lip activity during the covert production of labial items than during the covert production of nonlabial items. Importantly, in her review, Garrity (1977) highlights some methodological limitations to EMG studies of inner speech and makes practical recommendations to avoid these pitfalls (see Box ). McGuigan & Dollins (1989) recorded EMG activity over the lip and the tongue during the processing of single phonemes (“P” vs. “T”) and observed a greater activity of the lip during the processing of “P” and a greater amplitude of the tongue during the processing of “T”, confirming previous results suggesting a discriminative relationship between the content of inner speech and its peripheral muscular correlates. In the same vein, Livesay, Liebke, Samaras, & Stanley (1996) recorded EMG over the lip during the production of inner speech and during the visualisation of non-linguistic material and observed a greater EMG amplitude recorded over the lip during the production of inner speech. Taken together, these results suggest that the peripheral muscular correlates of inner speech are content-specific and that it should be possible to use electromyographic measurements to identify or “decode” the content of inner speech. This idea has been corroborated by recent work showing that surface EMG can be used to discriminate between different digits produced innerly, and that it could be used as a silent communication device (e.g., Kapur, Kapur, & Maes, 2018). However, other teams find contrasting results (e.g., our results in Chapter 5 or Meltzner et al., 2008) and we discuss this issue further in Chapter 5.

Besides mental chronometry and electromyography, the second part of the last century also witnessed a revival of introspective methods, with the aim of refining the description of the phenomenological properties of inner speech. For instance, the use of the experience sampling methodology (ESM, Csikszentmihalyi & Larson, 1987) permitted to examine inner speech in a naturalistic environment and to assess its frequency, forms and usages. For instance, Klinger & Cox (1987) asked 29 students to carry a beeper that probed them randomly to described the properties of their mental activity. They observed that around 51% of the samples contained some form of internal monologue. Using a modified version of the ESM known as the descriptive experience sampling methodology (DES, Hurlburt, 2011; Hurlburt & Akhter, 2006; Hurlburt & Heavey, 2001; Hurlburt, Heavey, & Kelsey, 2013),11 Heavey & Hurlburt (2008) assessed the frequency of common inner experiences and found that inner speech fills around 25% of our conscious inner life. Their results suggest that the rest of our inner experience is filled with four other main components: inner seeing, feeling (i.e., affective experiences such as happiness or sadness), sensory awareness (i.e., paying attention to immediate sensations such as hunger), and unsymbolised thinking (i.e., thinking without words, images, or any other symbol). Thus, our inner life is not only filled with language but other forms of thinking (defined broadly, as before, as any sort of mental activity) may coexist (for a review and synthesis of DES findings, see Hurlburt, 2011; Hurlburt et al., 2013).

Moreover, based on historical and DES data, Hurlburt (2011) argues for a distinction between two forms of inner speech (or two phenomenological aspects of inner speech). According to Hurlburt, it is possible to make a distinction between the phenomenon of inner speaking and the phenomenon inner hearing, whose feelings would be similar to talking in a tape recorder and to hear your voice played back, respectively (Hurlburt et al., 2013). Hurlburt, Alderson-Day, Kühn, & Fernyhough (2016) provide data suggesting that these two phenomena may have distinct neural correlates (but see Grandchamp et al., 2019; Lœvenbruck et al., 2018, for another stance on these data). The distinction between inner speaking and inner hearing echoes previous distinctions (e.g., MacKay, 1992) such as the one between the “generative component” (i.e., the feeling of producing speech) and the “auditory component” (i.e., the feeling of hearing speech) and the distinction between the inner ear and the inner voice in studies of working memory (e.g., Baddeley, Lewis, & Vallar, 1984; Buchsbaum, 2013).

Another source of information concerning the nature of inner speech comes from the study of errors produced during inner speech. For instance, Dell & Repka (1992) asked participants to produce tongue twisters (such as “Unique New York”) either aloud or mentally and to report the type of error they made (if any). They observed that the participants made the same kind of errors in overt and inner speech, indicating that inner speech, like overt speech, may involve the same kind of units (e.g., phonological, morphological or lexical units). As suggested by Oppenheim & Dell (2008), the similarity of errors found in inner speech and overt speech indicates that slips of the tongues are not really slip of the tongue, but rather slips of speech planning. More recently, Oppenheim & Dell (2008) have shown that the covert recitation of tongue twisters is accompanied by the lexical bias also observed in overt production but does not show the phonemic similarity bias (i.e., the tendency to exchange phonemes with common articulatory features) observed in overt speech. This observation led Oppenheim & Dell (2008) and Oppenheim & Dell (2010) to claim that although inner speech is specified at a lexical level, it is impoverished at the featural (articulatory) level. In contrast to these results, however, Corley, Brocklehurst, & Moat (2011) found the phonemic similarity effect to be present in inner speech, suggesting that inner speech may not necessarily be impoverished at the articulatory level. However, Oppenheim (2012) takes a different perspective on Corley et al. (2011)’s data and argues that finding an effect in both overt and inner speech is not the same as finding “equal effects” (i.e., an absence of interaction) in the two conditions. Through a detailed reanalysis of Oppenheim (2012)’s data, Oppenheim (2012) suggests that these data actually corroborate the sub-phonemic attenuation hypothesis. At this point, the question of whether inner speech includes or not sub-phonemic features is therefore still unresolved.

Besides, some studies tried to directly interfere with the motor system in order to make causal claims about inner speech and the role of the motor system during inner speech production. These studies include for instance articulatory suppression studies (see our more detailed discussion of articulatory suppression findings in Chapter 6) at the behavioural level and transcranial magnetic stimulation studies at the neural level. Using repetitive transcranial magnetic stimulation, Aziz-Zadeh, Cattaneo, Rochat, & Rizzolatti (2005) induced both overt and covert speech arrests (i.e., a transient inability to produce speech) by targeting both a motor (posterior) and a “non-motor” (anterior, corresponding to the inferior frontal gyrus) area of the left hemisphere. Many studies investigated the cerebral correlates of both overt speech and inner speech and showed that both modes involve language areas in the left hemisphere, such as motor and premotor cortices in the frontal lobe, including Broca’s areas or the left inferior frontal gyrus (IFG). These studies also highlight the involvement of regions involved in speech perception such as auditory areas, Wernicke’s areas and the left parietal lobule, an associative region (for review, see Geva, 2018; Lœvenbruck et al., 2018; Perrone-Bertolotti et al., 2014). Moreover, different forms of inner speech may be associated with different (although partially overlapping) cerebral landscapes. For instance, Tian & Poeppel (2010) compared the MEG responses elicited during two tasks: i) imagining saying something in one’s own voice and ii) imagining hearing something in someone else’s voice. They have found that whereas both tasks were associated with activity in the bilateral temporal cortex, speaking imagery was first associated with an activity in the left parietal cortex. Tian & Poeppel (2010) and Tian & Poeppel (2012) argue that these results suggest the involvement of a forward model during the speaking imagery task, which would would provide the basis for the sensory content (the subjective feelings) of inner speech (see also Alderson-Day & Fernyhough, 2015, and @grandchamp_condensation_2019, for a further discussion of these results). Ackermann & Riecker (2004) also observed an activation of the left supplementary motor area (SMA), left primary motor cortex (M1), and right cerebellum during covert speech. Activation of the primary motor cortex during inner speech (and more broadly, during imagined action) is still controversial and may depend on characteristics of the task (e.g., instructions given to the participant, characteristics of the content/material to be produced) and the type of inner speech. Overall, activation in the motor, premotor and sensory cortices is known to be stronger during overt speech than inner speech, supporting the idea that inner speech may be considered on a continuum from inner speech to overt speech. However, inner speech also involves additional regions in comparison to overt speech (e.g., Basho, Palmer, Rubio, Wulfeck, & Müller, 2007). Importantly, inner speech recruits regions involved in the inhibition of overt responses (e.g., cingulate gyrus, left middle frontal gyrus, pre-SMA). To sum up, neuroimaging studies support the idea that inner speech may be conceived as simulated speech, involving similar motor and sensory areas (but see Gauvin, De Baene, Brass, & Hartsuiker, 2016), albeit to a lesser extent than overt speech. In addition to common areas, inner speech also involve supplementary areas related to the inhibition of overt responses, supporting the idea that inner speech is simulated overt speech resulting from inhibited speech acts (for more details, see the cerebral landscape of (wilful) inner speech production proposed in the next section, as well as in recent reviews, Alderson-Day & Fernyhough, 2015; Lœvenbruck et al., 2018; Perrone-Bertolotti et al., 2014).

More recently, technical and methodological developments from the field of neural engineering offered new ways of investigating inner speech. Several teams are conducting research with the aim of “decoding inner speech”, that is, of deciphering the content of inner speech based on neurophysiological signals. For instance, Martin et al. (2014) recorded electrocorticographic (ECoG) signals from epileptic patients performing either overt or covert reading tasks. Then, they built a neural decoding model capable of reconstructing the spectrotemporal auditory features of the overt reading task and evaluated whether this model could reconstruct auditory speech features in the covert reading condition. They demonstrated that it is possible to decode (or to infer) inner speech content by using a model learned on corresponding overt speech data, with the superior temporal gyrus as well as the pre- and post-central gyrus providing the most diagnostic information. Martin et al. (2016) also used ECoG recording from the temporal lobe and sensorimotor cortex and showed that it is possible to reach a relatively high accuracy level in a two-class classification framework and above-chance accuracy levels in classifying fifteen word-pairs based on ECoG signals (for a recent review, see Martin, Iturrate, Millán, Knight, & Pasley, 2018). Using a different technic, Kapur et al. (2018) developed a wearable device capable of discriminating inner speech content based on surface electromyographic signals. They showed that their method was able to discriminate with relatively high accuracy digits (between 0 and 9) that were produced covertly. However, as mentioned previously, other teams find contrasting results (e.g., our results in Chapter 5 or Meltzner et al., 2008) and we discuss this issue further in the discussion of Chapter 5. Overall, these results show that it is presently possible to decode inner speech based on neurophysiological signals above chance levels. Although these results currently stand for limited vocabulary sets, it might soon be possible to have fully operational online inner speech decoding systems. However, the issue of prediction and the issue of explanation are not reducible one to the other and although we might be in situation of correctly inferring (predicting) the content of inner speech based on neurophysiologial signals, some theoretical issues still need to be resolved (we turn to theoretical propositions in the next section).

In this section, we briefly reviewed the history of inner speech research carried out over the last 170 years (from 1850 to present days) to give an overview of the evolution of ideas and methods related to inner speech research (these investigations are summarised in a non-exhaustive timeline presented in Figure ??). The interested reader will find supplementary information in more comprehensive reviews, theses, and books (e.g., Alderson-Day & Fernyhough, 2015; Fernyhough, 2016; Gregory, 2017; Langland-Hassan & Vicente, 2018; Lœvenbruck, 2019; Lœvenbruck et al., 2018; Perrone-Bertolotti et al., 2014; Rapin, 2011; Smadja, 2019). In the next section, we discuss the most recent and important theoretical positions about the nature and production of inner speech.

1.2.2 Theoretical perspectives on inner speech

1.2.2.1 The psycholinguistics perspective

How do we (humans) produce speech? At a biomechanical level, producing speech means coordinating a complex dynamic system (i.e., the ensemble of speech muscles and organs) to produce perturbations of the air flow (sound waves). At a psychological level, speech production can be said to consist in the translation of thoughts into speech, with the goal of communicating information. Before being communicated however, the information of interest is submitted to several important transformations.

Although speech production is an everyday phenomenon, the way this process is exactly performed is still the subject of lively debates. However, current models generally agree with the core steps occurring during speech production. Willem Levelt (Levelt, 1989, 2000) proposed an influential psycholinguistic model of speech production (see Figure 1.5). According to this model, speech production involves three levels: conceptualisation, formulation and articulation. The first step is managed by a component called the conceptualizer, and consists in selecting a conceptual message to be produced (message generation). In other words, the speaker conceives a communicative intention that she wishes to reveal to an interlocutor. This preverbal message is then forwarded to another component, the formulator, that handles both grammatical encoding (i.e., selecting appropriate words or lemmas) and phonological encoding (i.e., selecting the appropriate speech sounds, lexemes and phonemes). During grammatical encoding, lemmas are retrieved from the lexicon and are ordered in a syntactical appropriate way, giving the message its surface structure. During phonological encoding, the message is given its phonetic or articulatory characteristics. At this stage, phonemes are grouped into pronounceable syllables. Then, each syllable is associated with an articulatory program, composed of an ensemble of articulatory gestures (i.e., coordinative structures of movements). These articulatory programs are stored in the syllabary. In brief, the formulator component transforms a preverbal message into a linguistic phonetic object. Finally, the phonetic plan is forwarded to the articulator, responsible for the activation of articulatory gestures, to be executed by the speech articulators (e.g., tongue, lips, jaw).12

Illustration of Levelt's (1989, 2000) model of speech production.

Figure 1.5: Illustration of Levelt’s (1989, 2000) model of speech production.

Interestingly, in this model, inner speech is thought to correspond to the phonetic plan. In other words, inner speech is considered as a plan for overt speech, something that precedes overt speech. The idea that inner speech is some sort of a plan for overt speech is widespread in psycholinguistics. According to Levelt, Roelofs, & Meyer (1999), we produce inner speech in the same way we produce overt speech, except that articulation is absent (we already encountered the continuum hypothesis previously). One of the role of this covert mode in speak production would be to allow for monitoring planned speech for errors (e.g., Hartsuiker & Kolk, 2001; Levelt, 1983). For some authors, inner speech would only be a by-product of the need of the speaker to control overt speech (e.g., Oppenheim, 2013). If we are to accept the continuum hypothesis, according to which there is a continuum between inner speech and overt speech, we are faced with the question of the locus of truncation. If both inner and overt speech lie on the same continuum, where inner speech ceases to be inner speech?

Hypotheses regarding inner speech's locus of generation. Depending on the framework, inner speech is thought to be specified at an articulatory level (motor simulation view) or not (abstraction view). Figure from Oppenheim \& Dell (2010).

Figure 1.6: Hypotheses regarding inner speech’s locus of generation. Depending on the framework, inner speech is thought to be specified at an articulatory level (motor simulation view) or not (abstraction view). Figure from Oppenheim & Dell (2010).

Oppenheim & Dell (2008) listed and examined three hypotheses regarding this issue. First, inner speech may be exactly like overt speech, except that articulators are not moved. Second inner speech may be impoverished at a surface level (featural representations). Third, inner speech may be impoverished at a deeper (e.g., lexical level) with relatively intact phonological or articulatory features. As discussed in the previous section, the observation that only the lexical bias (but not the phonemic similarity effect) was found in inner speech led Oppenheim & Dell (2008) to claim that inner speech was impoverished at a featural (articulatory) level. Oppenheim & Dell (2010) further added that theories about inner speech may be classified into two main classes (cf. Figure 1.6). According to the first class of theories, referred to as the motor simulation view, inner speech would be like overt speech, except that articulators are not moved (this represents the first hypothesis listed in Oppenheim & Dell, 2008). The second class of theories is known as the abstraction view and considers inner speech to be the consequence “of the activation of abstract linguistic representations” (Oppenheim & Dell, 2010). After reviewing supporting and contradictory evidence for each view, Oppenheim & Dell (2010) suggest a reconciliatory hypothesis, according to which the abstractiveness of inner speech would be flexible. More precisely, the flexible abstraction account postulates that inner speech would only be specified at a phonological level but that this phonological level would be affected by articulation. In support of this idea, Oppenheim & Dell (2010) observed that mouthed inner speech showed both a lexical bias and a phonemic similarity effect, which was not the case for unmouthed inner speech.

1.2.2.2 The motor theory of voluntary thinking

The motor theory of voluntary thinking (MTVT, Cohen, 1986) aims to explain how thinking and the experience of volition can emerge from motor activity. Cohen first notes that a critical aspect of motor theories is that they rely on peripheral motor feedback (i.e., afferent feedback from the contraction of the muscles). However, he then suggests that although motor feedback might be necessary at the initial stages of an internalised action (e.g., inner speech), this feedback might become unnecessary through repeated associations that would “short-circuit connections within the central nervous system” (Cohen, 1986, p. 21). According to the MTVT, motor activity would be necessary for mental experiences without external sensation (i.e., for imagery or thoughts). More precisely, Cohen (1986) suggests that inner speech (or rather, the sensory percepts associated with inner speech) might be explained by its associations with motor activity. Indeed, according to Cohen (1986), “associations between one’s voice and kinesthetic sensations from one’s speech musculature are very specific, consistent, and frequently repeated” (p. 22). Therefore, slight (unconscious) contractions of the speech musculature might evoke speech auditory images. In support of this idea, Cohen reports the results of an experiment led by Hefferline & Perera (1963):

“when the subject occasionnaly emitted an invisibly small thumb twitch (detected electromyographically), he received a tone as a signal to press a key. After several conditioning sessions, the tone was progressively diminished to zero. The subject nevertheless continued to press the key whenever he emitted a thumb twitch, and he reported that he still heard the tone.”

These observations support the idea that motor activity (and kinaesthetic feedback) might, after frequent association, evoke auditory sensations. Cohen then moves to a presentation of the motor theory of attention, according to which motor activity allows oneself to emphasise (or weight) one aspect of perception over another. According to Cohen, the MTVT, albeit not suggesting that motor activity is necessary for any sort of mental image or thought, suggests that motor feedback can evoke mental images and thoughts (e.g., via the principle of association discussed above) and that motor activity is responsible for the experience of volition in thinking. The MTVT suggests that thoughts that are experienced voluntary (e.g., rehearsing a novel phone number) are accompanied by motor activity whereas involuntary thoughts (e.g., intrusive thoughts or ruminative thoughts) are not. Interestingly, Cohen also suggests that “a thought may appear to be effortless because no motor activation is involved, or because the motor activity is of an automatic nature” (p. 27). Cohen interprets the effect of distraction on the implication of the motor system during motor imagery (and inner speech) in terms of attentional sharing, building upon Norman & Shallice (1986)’s work:

“In order to rehearse a telephone number one would simply ‘speak’ the numbers covertly – that is, activate the appropriate speech motor patterns, but too slightly to produce audible speech. To take the case of rehearsing a telephone number a step further, consider that the person is being distracted by loud music. Because the music would be competing for his attention, he would have to increase the amplitude of his rehearsal by increasing the speech motor activity, perhaps to the point of making actual lip and tongue movements. Were the music loud he might have to speak the numbers aloud so that the numbers would capture enough of his awareness to remain in his short-term memory.”

This idea is consistent with some work showing a greater implication of the speech motor system during cognitively demanding tasks (e.g., Sokolov, 1972; McGuigan & Rodier, 1968) and provides a mechanism to explain these observations (but see our own theoretical interpretation in the next section). To sum up, the MTVT suggests that all voluntary images and thoughts are associated with motor activity and that “deliberate inner speech is based13 on the appropriate covert activity in the speech musculature” (Cohen, 1986, pp. 45–46).

1.2.2.3 Predictive and motor control account(s) of inner speech

Speech production requires the fine-grained timing and coordination of complex sequences of movements (cf. biomechanical aspects of speech production in Chapter 2) and can therefore be considered in a common conceptual framework with other forms of motor actions. Complex motor actions have been successfully modelled in a motor control framework (e.g., Kawato, Furukawa, & Suzuki, 1987; Kawato, 1999; Wolpert, Ghahramani, & Jordan, 1995; Wolpert, 1997). Motor control models describe how the central nervous system and the musculoskeletal system interact in order to perform motor actions. Applied to speech production, these models describe how humans generate and regulate speech acts (for an introduction to motor control models and a review of speech motor control models, see Parrell, Lammert, Ciccarelli, & Quatieri, 2019).

A forward model of motor control. Crossed circles represent comparators (see text for explanation). Figure adapted from Rapin et al. (2013).

Figure 1.7: A forward model of motor control. Crossed circles represent comparators (see text for explanation). Figure adapted from Rapin et al. (2013).

As reviewed in Rapin, Dohen, Polosan, Perrier, & Lœvenbruck (2013) or Lœvenbruck et al. (2018), motor control models generally assume two types of interacting internal models: a forward model that is used to predict the consequences of some planned action and an inverse model that is used to compute (to predict) the necessary movements to attain some goal (cf. Box ??). As can be seen from Figure 1.7, the inverse model is first used to compute the necessary motor commands to attain some intended state (e.g., producing the /i/ vowel). When motor commands are sent to the motor system, a copy of these motor commands (known as the efference copy) is sent to a second internal model (a forward model) that predicts the sensory consequences of these motor commands. This predicted sensory feedback, known as the corollary discharge (i.e., what is expected to happen if the motor commands were to be executed), is then compared to actual sensory feedback (the sensory consequences of actual motor actions) by a comparator (the crossed circle). It has been suggested by many authors (for review, see Lœvenbruck et al., 2018) that this comparison is responsible for perceptual attenuation, when predicted sensory feedback and actual sensory feedback match.14

An essential role of this predictive mechanism is to allow for fast correction of potential errors before the actual sensory feedback is even available to the central nervous system. Indeed, by computing a prediction of what is expected to happen, the central nervous system can correct or adjust motor commands (if needed), without having to wait until the action is executed. This mechanism of monitoring by feedforward control allows for online correction during speech production and account for the notoriously low rate of errors in speech production.

Interestingly, the efference copy is not only useful for coordinating and correcting ongoing actions but is also hypothesised to play a role in the feeling of agency (i.e., the feeling of being the agent of some action). This feeling is hypothesised to arise from (internal) successful comparisons between actual movements and predicted movements (i.e., the comparison on the right side of Figure 1.7). More precisely, agency might emerge when predicted sensory experience and actual sensory experience match. This motor control framework has been successfully applied to speech (e.g., Guenther, Ghosh, & Tourville, 2006; Houde & Nagarajan, 2011; Parrell et al., 2019) and has also been applied to inner speech production, initially with the aim of explaining the experience of AVHs in patients with schizophrenia. For instance, Feinberg (1978), Frith, Blakemore, & Wolpert (2000), and S. R. Jones & Fernyhough (2007a) have suggested that a defective predictive system could lead to control delusions and the experience of AVHs. Indeed, they suggests that a mismatch between predicted sensory experience and actual sensory experience would not lead to a sensory attenuation and would lead to agency not being felt. The idea that episodes of AVHs are accompanied by (partially inhibited) motor commands is supported by several EMG studies showing an increase of peripheral muscular activity in the speech muscles during these episodes (e.g., Gould, 1948; Rapin, 2011; Rapin et al., 2013). More generally, the generation of a corollary discharge and its role in inner speech production is supported by many behavioural and neurophysiological findings (e.g., Ford & Mathalon, 2004; Tian, Ding, Teng, Bai, & Poeppel, 2018; Tian & Poeppel, 2010, 2012; Tian, Zarate, & Poeppel, 2016; Whitford et al., 2017).

By building upon previous motor control models of speech production (e.g., Houde & Nagarajan, 2011; Wolpert et al., 1995) and motor control models applied to inner speech in the context of schizophrenia (e.g., Frith et al., 2000; Feinberg, 1978; S. R. Jones & Fernyhough, 2007b; Rapin, 2011; Rapin et al., 2013), Lœvenbruck et al. (2018) recently introduced a novel model of (deliberate) inner speech. In this model, Lœvenbruck et al. (2018) describe inner speech as “multi-modal acts with multi-sensory percepts stemming from coarse multi-sensory goals”. In other words, the auditory and kinaesthetic sensations perceived during inner speech prediction are assumed to be the predicted sensory consequences of (inhibited) speech motor acts, emulated by internal forward models, that use the efference copies issued from an inverse model (cf. Figure 1.8).

Predictive control account of inner speech production. Figure from L\oe venbruck et al. (2018).

Figure 1.8: Predictive control account of inner speech production. Figure from Lvenbruck et al. (2018).

In the previous section, we discussed the relation between the degree of automaticity, the difficulty, and the involvement of the speech motor system during inner speech production. Cohen (1986) suggested that in difficult situations (e.g., noisy environment, difficult, novel, or degraded verbal material), inner speech percepts may be accentuated to draw more attention as compared to other (non-relevant) stimuli. We can reinterpret these findings in the motor control framework by saying that the involvement of the speech motor system during inner speech (that can be examined via peripheral muscular activation, neuroimagery, or neurostimulation) is a function of the degree of inhibition (the inhibitory signals represented by the vertical dotted lines in Figure 1.8), with a greater involvement of the speech motor system when inhibition is weaker, and reciprocally, a weaker involvement of the speech motor system when inhibition is stronger. The “quantity and quality” of inhibition (i.e., what proportion of motor commands and their efference copies are inhibited, when, and at which level) may in turn be a function of contextual and individual characteristics. We might speculate that the reason why inhibition is weaker in demanding situation (e.g., when reading a difficult text or rehearsing novel material) is that understanding more difficult material requires “clearer” (more vivid) inner speech percepts than understanding known or easy material (the same goes for noisy or degraded material). The exact nature of these inhibitory signals and how the “amount is inhibition” is determined still need to be examined, however (but see our discussion in section 1.2.3).

A cerebral landscape of deliberate inner speech production. Figure from L\oe venbruck et al. (2018).

Figure 1.9: A cerebral landscape of deliberate inner speech production. Figure from Lvenbruck et al. (2018).

In addition to explicitly modelling inner speech production in a formal motor control model, Lœvenbruck et al. (2018) proposed a cerebral landscape underlying the production of deliberate inner speech (cf. Figure 1.9). This model aims to integrate findings and models from the fields of psycholinguistics and neurolinguistics, as well as neuroanatomical theories of speech production (e.g., Hickok, 2012; Tian & Poeppel, 2013). This model proposes that lemma retrieval is performed by the left middle temporal gyrus. Then, the lemma is converted to a lexeme in a multisensory format via two routes, the first one providing the auditory representation (a) and the second one providing the somatosensory representation (b). The auditory specification of the desired auditory state then activates the left posterior superior temporal gyrus (pSTG) and the superior temporal sulcus (STS), represented by arrow 1a. In parallel, the somatosensory route activates the anterior supramarginal gyrus (aSMG) and the primary somatosensory cortex (S1), represented by arrow 1b. An inverse model transformation is then performed, again involving two routes. The auditory specification is sent to the temporo-parietal junction (TPJ), represented by arrow 2a. The somatosensory specification is sent to the cerebellum, represented by arrow 2b. Then, motor programmes are specified. The transformed auditory goals are sent from the TPJ to the left IFG and to the left ventral premotor cortex (arrow 3a). The transformed somatosensory goals are sent from the cerebellum to the lower primary motor cortex (M1), represented by arrow 3b. Motor programmes issueds by the left IFG are then sent to M1 (represented by arrow 4), where the two motor programmes computed in the auditory and somatosensory routes are integrated. Importantly, articulation is inhibited via inhibitory signals emitted by the rostral prefrontal cortex (BA 10) and the anterior cingulate gyrus (BA 32) and sent to M1 only, or to both the left IFG and M1. A residual somatosensory feedback may be felt (aSMG and S1), resulting from attenuated motor commands being sent to the motor system. The efference copy mediated by the left IFG is sent to the TPJ (arrow 4a) and is inversed into a predicted auditory signal, activating the pSTG and the STS (arrow 5a). The other copy, in M1, is sent to the cerebellum (arrow 4b) and is inversed into a predicted somatosensory signal, activating th aSMG and S1 (arrow 5b). The comparison between predicted and original desired states (C2) takes place at two sites, in auditory and somatosensory cortices (for more details, see Lœvenbruck et al., 2018). It should be noted that this model has been further developed and is presented in Grandchamp et al. (2019) or Lœvenbruck (2019). However, because this work has not been published yet and because of length constraints, we will not discuss the latest version of this model here.

An interesting question related to the application of motor control models to inner speech and imagined actions (but also to executed actions more broadly) is the issue of whether we need both an inverse and a forward model. Pickering & Clark (2014) make a distinction between two types of architectures, differing by the place forward models play in these architectures: the auxiliary forward model (AFM) account, according to which forward models are “special-purpose prediction mechanisms implemented by additional circuitry distinct from core mechanisms of perception and action” and the integrated forward model (IFM) account, according to which “forward models lie at the heart of all forms of perception and action”. On a similar note, Friston (2011) argues for an IFM architecture (instead of conventional motor control schemes) and shows how motor control can be formalised in a Bayesian predictive framework, where optimal control can be seen as an (active) inference. Recently, Wilkinson & Fernyhough (2017) similarly suggested to model inner speech production in a predictive processing framework (PPF, for an introduction, see for instance Clark, 2013). In this framework, the main task of the brain is thought to be inferring, from incoming signals, what the causes of these signals are. Accordingly, the only information that is passed on up the cortical hierarchy is prediction error, and the hypotheses (about the causes of the percepts) that minimise prediction error are selected (or “inferred”). An interesting consequence of this model applied to motor control is that it does not postulate the existence of motor commands but rather the presence of predictions only, that are fulfilled (or not) by bodily movements (with the aim of minimising prediction error). According to the PPF account of inner speech sketched by Wilkinson & Fernyhough (2017), sensory aspects of inner speech (e.g., motor or auditory percepts) may be conceived as predictions in themselves (predictions that have been “selected” to reduce prediction error), instead of resulting from a stimulus to be monitored.

To understand the appeal of predictive and motor control modelling applied to inner speech and imagined actions, let’s consider the analogy between speaking and playing an instrument (e.g., playing the piano). Essentially, learning how to play the piano can be said to consist in learning and coordinating complex and fine-grained motor sequences that produce in turn sensory (e.g., kinesthaetic, auditory, visual) feedback to the producer of the action (the agent). Therefore, it seems that (from a certain level of analysis), the act of speech can be paralleled with the act of playing an instrument in that it consists in the coordination of complex movements that result in some modifications of the environment, that in turn generate sensory feedbacks (e.g., kinesthaetic, auditory) for the agent. Thus, pursuing the analogy, we could argue that the relation between playing an instrument and imagining playing an instrument is similar to the relation between producing speech and imagining speaking (i.e., producing inner speech). This analogy suggests that we might be able to study the development of (pairs of) internal models responsible for the sensory experience accompanying imagined actions in the adult mind (e.g., when an individual is learning either a novel music instrument or a new language with speech sounds that are not present in his/her native language). By examining the development of novel imagined actions in the adult mind, we might gain new insights about the internalisation of speech during childhood.15

This view on the relation between inner speech and overt speech is somehow consistent with Vygotsky’s view of inner speech as internalised egocentric speech but it proposes a formal mechanism to explain how overt speech develops into inner speech. More precisely, we might speculate that what is internalised during childhood is an internal model (or a hierarchy of paired internal models). This internalisation is a slow and gradual process and might be similar to the internalisation of other types of motor actions. Considering inner speech as a form of motor action brings some interesting insights. Indeed, if speech production can be broadly described as the coordinated sequence of (groups of) muscular movements that results in some predictable sensory consequences (e.g., auditory, visual, kinesthesic or somesthesic feelings), then it can be compared to other actions. In that sense, the process of speech internalisation, as the process of “internalised walking”, might follow the same general steps. This process can be broadly defined as the learning of the mapping between some muscular commands (or patterns of muscular commands) and the associated sensory consequences. Learning these associations results in the construction of internal models, permitting to predict ongoing actions, but also to simulate these actions in the absence of any overt movement. Therefore, the process of inner speech might be considered under the broad category of imagined actions (motor imagery).

1.2.3 Explaining the muscular activity observed during inner speech

Motor imagery can be defined as the mental process by which one rehearses a given action, without engaging in the physical movements involved in this particular action. One of the most influential theoretical explanation for this phenomenon is the motor simulation theory (MST, Jeannerod, 1994, 2001, 2006). In this framework, the concept of simulation refers to the “offline rehearsal of neural networks” (Jeannerod, 2006) and motor imagery is conceptualised as a simulation of the covert (i.e., invisible and inaudible to an external observer) stage of the same executed action (O’Shea & Moran, 2017). The MST shares some similarities with the theories of embodied and grounded cognition (Barsalou, 2008) in that both allow to account for motor imagery by appealing to a simulation mechanism. However, the concept of simulation in grounded theories is assumed to be multi-modal (not just motoric) and to operate in order to acquire specific conceptual knowledge (O’Shea & Moran, 2017), which is not the concern of the MST.16 As highlighted by O’Shea & Moran (2017), the MST contains the three following postulates at its core: i) there exists a continuum between the covert (the mental representation) and the overt execution of an action, ii) action representations can operate off-line, via a simulation mechanism, and iii) covert actions rely on the same set of mechanisms as the overt actions they simulate, except that execution is inhibited. The MST is supported by a wealth a findings, going from mental chronometry studies showing that the time taken to perform an action is often found to be similar to the time needed to imagine the corresponding action17 (but see Glover & Baran, 2017, for a review of chronometric findings and for an alternative conceptualisation of motor imagery), to neuroimaging and neurostimulation studies showing that both motor imagery and overt actions tend to recruit similar frontal, parietal and sub-cortical regions (e.g., Hétu et al., 2013; Jeannerod, 2001). The involvement of the motor system during motor imagery is also supported by repeated observations of autonomic responses, increased corticospinal excitability, as well as peripheral muscular activity during motor imagery (for an overview, see Collet & Guillot, 2010; Jeannerod, 2006; Stinear, 2010).

Motor imagery has consistently been defined as the mental rehearsal of a motor action without any overt movement. One consequence of this claim is that, in order to prevent execution, the neural commands for muscular contractions should be blocked at some level of the motor system by active inhibitory mechanisms (for review, see Guillot et al., 2012a). Despite these inhibitory mechanisms, there is now abundant evidence for peripheral muscular activation during motor imagery (for review, see Guillot & Collet, 2005; Guillot et al., 2012a). As suggested by Jeannerod (1994), the incomplete inhibition of the motor commands would provide a valid explanation to account for the peripheral muscular activity observed during motor imagery. Consistent with this assumption, Schwoebel, Boronat, & Branch Coslett (2002) showed that a brain-damaged patient failed to inhibit the motor consequences of motor imagery, and thus fully “executed the imagined action”, hence highlighting uninhibited movements during mental rehearsal.18 This idea has also been corroborated by studies of changes in the excitability of the motor pathways during motor imagery tasks. Bonnet, Decety, Jeannerod, & Requin (1997) measured spinal reflexes while participants were instructed to either press a pedal with the foot or to simulate the same action mentally. They observed that both H-reflexes and T-reflexes increased during motor imagery, and that these increases correlated with the force of the simulated pressure. Using transcranial magnetic stimulation and motor evoked potentials (MEPs), several investigators observed muscle-specific increases of MEPs during various motor imagery tasks, whereas no such increase could be observed in antagonist muscles (e.g., Fadiga et al., 1999; Rossini, 1999).

However, although there are many observations showing a peripheral muscular activity during motor imagery, there are also many studies failing to do so, or reporting surprisingly high levels of inter-subject variability, with some participants showing no muscular activity at all (for review, see Guillot, Lebon, & Collet, 2010). Two main explanations have been advanced to resolve these discrepancies. First, the electromyographic activity recorded during motor imagery could be moderated by the perspective taken in motor imagery.19 Indeed, it has been shown that a first-person perspective may result in greater EMG activity than motor imagery in a third-person perspective (Hale, 1982; Harris & Robinson, 1986). Second, some authors postulated that the intensity of the EMG activity recorded during motor imagery might be related to the individual ability to form an accurate mental representation of the motor skill (i.e., the vividness of the mental image). However, after reviewing the available evidence, Guillot et al. (2009) concluded that this is unlikely to be the case. Alternatively, discrepancies in experimental design and methodological choices (e.g., use of intramuscular versus surface electromyography) could also explain these contradictory results (Guillot et al., 2010).

In order to investigate the inhibitory mechanisms involved during motor imagery, Rieger, Dahm, & Koch (2017) extended the logic of task switching paradigms and developed a novel action mode (imagery vs. execution) switching paradigm. In these procedures, performance in the current trial is analysed depending on the condition of the previous trial, assuming that execution or inhibition in the previous trial persists to a certain degree. Put simply, the main idea is that inhibition during motor imagery should leave after-effects by increasing activation thresholds, then affecting the performance of subsequently executed (or imagined) movements. In analysing sequential effects, Rieger et al. (2017) observed shorter movement times when motor execution (ME) preceded motor imagery (MI) than when motor imagery preceded motor execution, corroborating the idea of a global inhibition (i.e., the second option from Box ) mechanism taking place during motor imagery. In addition, they observed hand repetition costs (i.e., movement times were longer when the task had to be performed with the same hand than with the other hand in motor imagery trials), suggesting that effector-specific inhibitory mechanisms may also taking place during motor imagery (corroborating the third option discussed in Box ). However, as highlighted by O’Shea & Moran (2018), global inhibitory mechanisms may also induce longer movements times in MI-ME sequences than in ME-ME sequences, but this effect was not observed in Rieger et al. (2017). To push forward this investigation, O’Shea & Moran (2018) used pupillometry to examine the degree of attentional effort involved in the execution or the inhibition of a motor response during both motor imagery and motor execution in a Go/NoGo procedure, embedded in a modified task-switching paradigm. They observed that the amount of attentional effort (assessed via pupillometry) varied according to the type of block (i.e., pure vs. mixed), suggesting that different inhibitory mechanisms (or “routes”) may underlie inhibition during motor imagery. For instance, it may be that inhibition during motor imagery is programmed in a pre-emptive way when the participants know that the next block will be uniquely composed of motor imagery trials or in a more active (and more effort-costly) way in mixed blocks. Therefore, different inhibitory mechanisms (e.g., proactive vs. reactive, global vs. selective) may also vary according to the task characteristics (for a more detailed discussion of these findings, see also O’Shea, 2017). Although these studies are among the first to investigate these issues, they show that it is possible to use a combination of cognitive and psychophysiological tasks to assess the inhibitory mechanisms involved during motor imagery.

To summarise this section, the available neural and psychophysiological evidence suggests that inner speech and imagined actions may result from internal simulation (or emulation) of the corresponding executed action. This appealing idea however presupposes that the motor commands emitted during inner speech (which give rise to the sensory percepts of inner speech such as the inner voice) are somehow completely or partially inhibited in order to prevent execution. We discussed several explanations with regards to the source of these inhibitory signals (that remains to be tested in the case of inner speech). Interestingly, these questions echo our previous discussion of the centralism versus peripheralism debate (cf. Box ). Recent theoretical frameworks of inner speech and motor imagery (e.g., motor control models, simulation and emulation theories) are centralist theories of motor cognition. Indeed, in these frameworks, the peripheral muscular activity observed during imagined action is conceived as a consequence of (a partial inhibition) these actions, rather than a necessary condition for imagining actions (including speech). This idea was well summarised by Jeannerod (2006), discussing the motor inhibition problem and the case of subvocal (inner) speech:

“Subvocal speech was first interpreted as a source of peripheral kinesthetic information which, when projected to central nervous structures, generated auditory images of the corresponding words. The same interpretation was given to the low intensity EMG recorded during mental motor imagery of limb actions, which was thought to be the origin of the feelings experienced by the subject during mental rehearsal (Jacobson, 1930), or to the eye movements recorded during mental visual imagery (e.g., Brandt and Stark, 1997). However, this interpretation of mental processes as consequences of peripheral feedback is now disproved by recent experiments showing complete absence of muscular activity in many subjects during motor imagery. When present, this activity is rather assumed to be a consequence of incomplete inhibition of motor output during mental states involving motor simulation. This same interpretation might also hold for inner speech.” (p. 153)

Therefore, although the precise neural generators of these inhibitory signals remain to be examined, the peripheral muscular activation observed during inner speech may be resulting from an incomplete inhibition of motor commands. Moreover, we may speculate that some forms of inner speech may or may not be accompanied by peripheral muscular activations in the speech muscles, depending on the degree (the amount) of inhibition.

1.3 Summary, research question and directions

We reviewed the main theoretical positions about rumination, the different ways it has been assessed (either as a trait or as a state) and discussed the sensory properties of ruminative thoughts. Acknowledging the predominantly verbal character of rumination, we suggested that it might be considered as a form of inner speech. In order to understand the repercussions of this assumption with regards to the study of rumination, we presented a brief historical review of inner speech research from 1850 to present days. This review led us to a presentation of the main contemporary theoretical views on inner speech and to the suggestion that inner speech may be conceived as a form of motor imagery and that it could be understood and modelled in a motor control framework. We then briefly discussed findings from the field of motor cognition and the study of motor imagery to take a new perspective on the findings previously discussed about the involvement of the speech motor system during inner speech production. In consideration of this discussion, the main goals of the present work are i) to refine the description of inner speech and the involvement of the speech motor system during its production by studying a particular form of inner speech (i.e., rumination) and ii) to shed a new light on rumination by studying it as a form of inner speech, with the potential outcome of providing psychophysiological (electromyographic) markers of (induced) rumination, as well as possible guidelines for remediation. Before turning to a presentation of this experimental work (where each study is presented as a standalone empirical article, cf. Chapters 3 to 7), in the next chapter, we briefly introduce some key elements and technical details with regards to the methods we used in this work.


  1. In the context of depression, dysphoria is usually defined as a preclinical state of general dissatisfaction or discomfort. In the DSM-V, dysphoria (or dysphoric mood) is defined “a condition in which a person experiences intense feelings of depression, discontent, and in some cases indifference to the world around them.”

  2. Cognitive control refers to a set of mental processes allowing flexible adaptation of cognition and behaviour in accordance to one’s current goals (Braver, 2012; Friedman & Miyake, 2017). We use the terms of cognitive control, executive control or executive functions in an interchangeable manner.

  3. The exact items are not specified in Moberly & Watkins (2008). However, Huffziger, Ebner-Priemer, Koudela, Reinhard, & Kuehner (2012) used a similar methodology and report the items they used, which were “At the moment, I am thinking about my feelings” and “At the moment, I am thinking about my problems”.

  4. Which can be translated by “inner speech is only the memory of the sensation produced by external speech”.

  5. We will comme back to this important distinction in more details later when discussing the “centralism versus peripheralism” debate.

  6. Indeed, it is plausible that the predominance of some sort of representation over other forms might be contingent on contextual demands. In other words, depending on the task to be realised, the motoric and sensory aspects of inner speech might be weighted differently.

  7. Referring to Paulhan’s distinction between the dictionary meaning of a word on one hand, and the individual sense of a word which is acquired by usage, on the other hand.

  8. See Chapter 2 for a brief introduction to (surface) electromyogaphy.

  9. The expression articulatory suppression usually refers to a task which requires participants to utter speech sounds (or to produce speech gestures without sound), so that this activity disrupts ongoing speech production processes.

  10. Sokolov (1972) uses the term of “curtailment” for abbreviation (p. 203).

  11. The DES differs from the classical random-beeping strategy in that the participant, in addition to being probed several times a day, also has to meet with the experimenter at the end of the study. During this “expositional interview”, the experimenter and the participant work together to clarify the meaning of these reported inner experiences as well as to contextualise them.

  12. This model permits to explain how a communicative intention is transformed into speech acts. However, it does not explicitly account for how speech acts are executed by the articulators. In Chapter 2, we briefly introduce some of the cores principles related to the biomechanics of speech production.

  13. NB: "based" is used here in a developmental sense (cf. the beginning of this section).

  14. That’s the reason why, according to Blakemore, Wolpert, & Frith (1998), we cannot tickle ourselves. Because when we deliberately produce an action we formulate a prediction of the sensory consequences of this action, the actual sensory consequences of this action (when it matches with our predictions) are attenuated.

  15. While keeping in mind the obvious limitation that the child mind is not equivalent to the adult mind, nor is it equivalent to a smaller version of the adult mind. Nevertheless, examining the development of novel imagined actions in adults avoids the contamination of the process of interest (imagined action) by developmental confounds present during childhood.

  16. We should also make a distinction between embodiment of content, which concerns the semantic content of language, and embodiment of form, which concerns “the vehicle of thought”, that is, proper verbal production (Pickering & Garrod, 2013).

  17. Although not always. As previously discussed in section 1.2.1, Guillot et al. (2012b) reviewed chronometric findings related to motor imagery and listed the several factors that may affect the temporal equivalence between executed and imagined actions.

  18. However, it should be noted that Schwoebel et al. (2002) reported no difficulty for this patient to read silently.

  19. We usually make a distinction between a first-person perspective or internal imagery (i.e., imagining an action as we would execute it) and a third-person perspective or external imagery (i.e., imagining an action as an observer of this action), that seem to involve different neural and cognitive processes (Ruby & Decety, 2001).