预测处理 101
Predictive processing 101
大脑不是一台刺激-反应机器。它是一台预测机器——持续向感觉输入发出自上而下的假设,仅当预测误差熬过几层抑制后才修改这些假设。
极简描述
- 皮层维持一个关于世界(以及关于身体)的层级生成模型。
- 高层预测低层的活动。
- 低层只回传那些没有被预测到的部分——残差。
- 学习是这些残差的长期最小化;感知是它们的短期最小化;行动则是”通过改变世界使其匹配预测”来最小化。
这就是主动推断(active inference)的框架(Karl Friston 及其同伴)。自由能原理称:任何在波动世界中维持自身边界的自组织系统,都表现得仿佛在最小化变分自由能。
我为什么反复回到这里
- 它溶解了感知、行动、注意之间的人为切分。三者成为同一个预测误差回路的三种状态。
- 它对无聊与新奇的体验质感给出了一个干净的、形式化的解释。
- 它是可计算执行的:你可以用 PyTorch 写一个小小的主动推断智能体,然后看它形成习惯、产生好奇、在模型崩溃时陷入恐慌。
我想咀嚼的开放问题
- 意识是否是预测层级中的某种特定模式(全局工作空间作为高精度的注意瓶颈),还是与之正交?
- 这个框架如何处理语言——一半是运动行为,一半是符号底质?
- 我们应该多严肃地对待”LLM 在做某种相关的事——在学到的生成模型上做序列预测”这一说法?而那张图里缺失的是什么(具身、稳态)?
交叉链接:Attention as relation, not state, The loop and the self。
The brain is not a stimulus-response machine. It is a prediction machine that issues continuous top-down hypotheses about its sensory input and revises them only when prediction error survives several levels of suppression.
The minimal sketch
- The cortex maintains a hierarchical generative model of the world (and of the body).
- Higher levels predict the activity of lower levels.
- Lower levels return only what was not predicted — the residual.
- Learning is the long-run minimisation of these residuals; perception is the short-run minimisation; action is “minimisation by changing the world to match the prediction”.
This is the active inference framing (Karl Friston and friends). The free-energy principle says any self-organising system that maintains its boundary against a fluctuating world will behave as if it were minimising variational free energy.
Why I keep coming back to it
- It dissolves the artificial split between perception, action, and attention. They become three regimes of the same prediction-error loop.
- It gives a clean, formal account of why boredom and novelty feel the way they do.
- It is computationally executable; you can write a small active- inference agent in PyTorch and watch it form habits, get curious, and panic when its model collapses.
Open questions I want to chew on
- Is consciousness a particular pattern in the prediction hierarchy (the global workspace as a high-precision attentional bottleneck), or orthogonal to it?
- How does this framework handle language, which is half motor act, half symbolic substrate?
- How seriously should we take the claim that LLMs are doing a related thing — sequence prediction in a learned generative model — and what does missing in that picture (embodiment, homeostasis)?
Cross-links: Attention as relation, not state, The loop and the self.
💬展开评论 / Show comments