1 min read

The Surprise. Why "better outcome than predicted" is rewarded

If brain is a prediction machine, the common sense is just to penalize errors for bad predictions. Thus it seems rather counterintuitive, that the prediction error signal is sometimes rewarded and treated as a positive "surprise".

However, there are explorations, proving that rewarding errors that led to positive outcomes can be beneficial.

It's successfully applied in reinforcement machine learning. It allows us to avoid trapping the agent in a local optimum and encourages it to explore alternative ways to achieve rewards.

There is actually a fundamental problem in finding an optimal strategy for it, known as the exploration exploitation dilemma.