Helmholtz on Unconscious Inference

You probably have seen this meme or a variant of it before. Have you ever wondered how the meme works? As in why do people unconsciously read ‘What I if told you’ as ‘What if I told you’. Before I get to the answer, let’s talk very briefly about Unconscious Inference.

Unconscious Inference

Every evening apparently before our eyes the sun goes down behind the stationary horizon, although we are well aware that the sun is fixed and the horizon moves.

An actor who cleverly portrays an old man is for us an old man there on the stage, so long as we let the immediate impression sway us, and do not forcibly recall that the programme states that the person moving about there is the young actor with whom we are acquainted. We consider him as being angry or in pain according as he shows us one or the other mode of countenance and demeanour. He arouses fright or sympathy in us […]; and the deep-seated conviction that all this is only show and play does not hinder our emotions at all, provided the actor does not cease to play his part. On the contrary, a fictitious tale of this sort, which we seem to enter into ourselves, grips and tortures us more than a similar true story would do when we read it in a dry documentary report.

Helmholtz has made some interesting observations about perception(above) in 1867 and he coined the term Unconscious inference to explain this behaviour. Unfortunately Helmholtz’s theory was long ignored or even dismissed by philosophers and researchers. However Free energy principle which provides theoretical framework for understanding mind is inspired by Unconscious inference and we have some experimental evidence suggesting this how our brains work.

If you are interested, you can learn more about the math behind Free energy principle/predictive coding here(A tutorial on the free-energy framework for modelling perception and learning), here(Basic Mathematics of Predictive Coding) or here(Tutorial on Active Inference)

(From wikipedia) Siegfried Frey has pointed out the revolutionary quality of Helmholtz’s proposition that it is from the perceiver, not the actor, whence springs the meaning-attribution process performed when we interpret a nonverbal stimulus:

By failing to distinguish appearance from reality, the psychology of expression merely perpetuated a fallacy deeply ingrained in everyday language: with unswerving belief in our perceptions, we routinely call the other person’s expression what is, in plain truth, our own impression of her or him.

How Perception works

This is how most people imagine how perception works. Our body gets sensory information from the environment and it gets processed by bunch of biological neural networks and reaches our consciousness. This sounds quite reasonable, however it turns out this is not entirely accurate.

According to predictive coding (which can be derived from Free energy principle) and based on experimental evidence, perception can be thought of a two way process i.e information doesn’t just flow in one way from environment (via senses ) to consciousness, instead our brain has a hierarchical generative model and higher layers of the model generate predictions for lower layers and lower layers send prediction errors to higher layers.

This diagram from Basic Mathematics of Predictive Coding should give a better idea of how the predictions and prediction errors flow through the biological neural network.

How and why the meme works

Now we that we have rudimentary understanding of predictive coding, we can explain how the meme works. Most people have encountered the text ‘What if I told you’ far more often than ‘What I if told you.’ Unlike the latter, the former is grammatically correct and makes sense. Therefore, even though the sensory input reads ‘What I if told you,’ our brain’s higher layers overwrite this, leading us to perceive it as ‘What if I told you.

This is a very simplified explanation and doesn’t address how we manage to read the sentences accurately after realizing we’ve made a mistake. Predictive coding offers an explanation for this, but I would prefer not to delve into it in this post. The intuitive answer is, higher layers don’t just make predictions about the data from lower layers but also model the uncertainty. When you are reading the sentence second time, you are paying more attention and so the uncertainty of the data from lower layers is low and higher layer doesn’t overwrite it.

Here is a more accurate but simplified schematic of the hierarchical predictive coding in the cortex from Hierarchical disruption in the Bayesian brain: Focal epilepsy and brain networks which also includes the uncertainty aspect as precision signal.

Quote from Surfing Uncertainty: Prediction, Action, and the Embodied Mind

Attention, thus construed, is a means of variably balancing the potent interactions between top-down and bottom-up influences by factoring in their so-called ‘precision’, where this is a measure of their estimated certainty or reliability (inverse variance, for the statistically savvy). This is achieved by altering the weighting (the gain or ‘volume’, to use a common analogy) on the error units accordingly.

If you are not still not convinced about predictive coding, here is a 30 sec video which might change your mind. You can find more examples like this in the Surfing Uncertainty: Prediction, Action, and the Embodied Mind book

Helmholtz on Unconscious Inference

Unconscious Inference

How Perception works

How and why the meme works

Further reading