Inner verbalisation can be willful, when we deliberately engage in inner speech (e.g., mental rehearsing, counting, list making) or more involuntary, when unbidden verbal thoughts occur. It can either be expanded (fully phonologically specified) or condensed (cast in a prelinguistic format). Introspection and empirical data suggest that willful expanded inner speech recruits the motor system and involves auditory, proprioceptive, tactile as well as perhaps visual sensations. We present a neurocognitive predictive control model, in which willful inner speech is considered as deriving from multisensory goals arising from sensory cortices. An inverse model transforms desired sensory states into motor commands which are specified in motor regions and inhibited by prefrontal cortex. An efference copy of these motor commands is transformed by a forward model into simulated multimodal acts (inner phonation, articulation, gesture). These simulated acts provide predicted multisensory percepts that are processed in sensory regions and perceived as an inner voice unfolding over time. The comparison between desired sensory states and predicted sensory end states provides the sense of agency, of feeling in control of one’s inner speech. Three types of inner verbalisation can be accounted for in this framework: unbidden thoughts, willful expanded inner speech, and auditory verbal hallucination.