预测处理 101

Predictive processing 101

Planted 2026-05-02 seedling · 2 min read

···阅读reads

#neuroscience #philosophy #ai #free-energy

大脑不是一台刺激-反应机器。它是一台预测机器——持续向感觉输入发出自上而下的假设，仅当预测误差熬过几层抑制后才修改这些假设。

极简描述

皮层维持一个关于世界（以及关于身体）的层级生成模型。
高层预测低层的活动。
低层只回传那些没有被预测到的部分——残差。
学习是这些残差的长期最小化；感知是它们的短期最小化；行动则是”通过改变世界使其匹配预测”来最小化。

这就是主动推断（active inference）的框架（Karl Friston 及其同伴）。自由能原理称：任何在波动世界中维持自身边界的自组织系统，都表现得仿佛在最小化变分自由能。

我为什么反复回到这里

它溶解了感知、行动、注意之间的人为切分。三者成为同一个预测误差回路的三种状态。
它对无聊与新奇的体验质感给出了一个干净的、形式化的解释。
它是可计算执行的：你可以用 PyTorch 写一个小小的主动推断智能体，然后看它形成习惯、产生好奇、在模型崩溃时陷入恐慌。

我想咀嚼的开放问题

意识是否是预测层级中的某种特定模式（全局工作空间作为高精度的注意瓶颈），还是与之正交？
这个框架如何处理语言——一半是运动行为，一半是符号底质？
我们应该多严肃地对待”LLM 在做某种相关的事——在学到的生成模型上做序列预测”这一说法？而那张图里缺失的是什么（具身、稳态）？

交叉链接：Attention as relation, not state, The loop and the self。

The brain is not a stimulus-response machine. It is a prediction machine that issues continuous top-down hypotheses about its sensory input and revises them only when prediction error survives several levels of suppression.

The minimal sketch

The cortex maintains a hierarchical generative model of the world (and of the body).
Higher levels predict the activity of lower levels.
Lower levels return only what was not predicted — the residual.
Learning is the long-run minimisation of these residuals; perception is the short-run minimisation; action is “minimisation by changing the world to match the prediction”.

This is the active inference framing (Karl Friston and friends). The free-energy principle says any self-organising system that maintains its boundary against a fluctuating world will behave as if it were minimising variational free energy.

Why I keep coming back to it

It dissolves the artificial split between perception, action, and attention. They become three regimes of the same prediction-error loop.
It gives a clean, formal account of why boredom and novelty feel the way they do.
It is computationally executable; you can write a small active- inference agent in PyTorch and watch it form habits, get curious, and panic when its model collapses.

Open questions I want to chew on

Is consciousness a particular pattern in the prediction hierarchy (the global workspace as a high-precision attentional bottleneck), or orthogonal to it?
How does this framework handle language, which is half motor act, half symbolic substrate?
How seriously should we take the claim that LLMs are doing a related thing — sequence prediction in a learned generative model — and what does missing in that picture (embodiment, homeostasis)?

Cross-links: Attention as relation, not state, The loop and the self.