What does this actually mean, and what is the motivation for saying it?
Agent-ness being a leaky abstraction is not exactly a novel concept for Less Wrong; it has been touched upon several times, such as in Scott Alexander’s Blue-Minimizing Robot Sequence. At the same time, I do not think that it has been quite fully internalized yet, and that many foundational posts on LW go wrong due to being premised on the assumption of humans being agents. In fact, I would go as far as to claim that this is the biggest flaw of the original Sequences: they were attempting to explain many failures of rationality as being due to cognitive biases, when in retrospect it looks like understanding cognitive biases doesn’t actually make you substantially more effective. But if you are implicitly modeling humans as goal-directed agents, then cognitive biases is the most natural place for irrationality to emerge from, so it makes sense to focus the most on there.
This was what piqued my interest in reading, and what kept me going for five hours straight (!) But I didn't find what I was looking for. I did get a lot of other useful information, though: All in all, this is a wonderful, brilliant, deep and highly thought-provoking text. A lot of work and thought has gone into it. Kaj is surely one of the brightest minds around.
Now, I have no problem whatsoever with the first part of the passage quoted above. Of course intentional agents are an abstraction, and as such of course it is leaky. My concern lies with the second part: It seems to suggest that viewing people as intentional agents is mistaken; to coarse a model; misleading. Which seems to lead to the conclusion that cognitive biases are the wrong way to characterize human thinking and behavior. Which suggests that they are not even real...
I may be overly trigger-happy here. I am not out to criticize Sotala himself, nor - as it turns out - any major part of what he actually has written in this sequence so far. It is just that I am currently (yet again) in a process of investigation and possible re-orientation of what I believe is best characterized as the latest round of the "rationality wars". I am currently reading Gerd Gigerenzer's book "Risk Savvy", in a long stretch of similar stuff (e.g. Mercier & Sperber), with the aim of trying to reconcile the seemingly opposing sides in an ever ongoing battle for the right to define "rationality".
I am a long-time fan of Kahneman (and Dennett). It may be self-delusion on my part, but much of the criticism leveled at him and others seem to me either plain wrong, ideologically motivated or mistaken. The more I read, the more I get the feeling that my intuitive interpretation of Kahneman and others does not need updating; rather, it is his critics who either straw-man him or just do not have the whole picture.
Sotala promises to tell me why the biases and fallacies school of thought is lacking. But I just don't see it.
I would go as far as to claim that this is the biggest flaw of the original Sequences: they were attempting to explain many failures of rationality as being due to cognitive biases, when in retrospect it looks like understanding cognitive biases doesn’t actually make you substantially more effective. But if you are implicitly modeling humans as goal-directed agents, then cognitive biases is the most natural place for irrationality to emerge from, so it makes sense to focus the most on there.
Just knowing that an abstraction leaks isn’t enough to improve your thinking, however. To do better, you need to know about the actual underlying details to get a better model. In this sequence, I will aim to elaborate on various tools for thinking about minds which look at humans in more granular detail than the classical agent model does. Hopefully, this will help us better get past the old paradigm.
There is a sense in which I get this: Higher-level abstractions trade accuracy for expediency, yes. And sometimes you need to go down an explanatory level or two, depending on your goals. But when it comes to explaining human decision making, or how humans view themselves and others and the society that emerges from their interaction, or why this is the case, or how and when problems and contradictions occur, or what to do about it... Well, I just don't see the need to shed the intentional stance or to deconstruct it. (Apart from convincing people that they are usually MoreWrong than they think.)
My main question when reading the above was - and still is: Are we talking descriptively, prescriptively, or normatively?
Mercier & Sperber, for instance, accuse Kahneman of assuming a logical, but flawed, human psyche. Rationality to them seems to mean a description of what people are, what they have evolved into - even if the process is incomplete. They then go on to sarcastically point out that there is no evolutionary reason to expect people to be logical inference machines. At the same time they redefine rationality to mean "socially flexible and pragmatic" rather than logical, and to contend that that is exactly what people are - so stop shaming them for not being able to solve logical puzzles. Also, they and Gigerenzer and others go on to say: "Oh, and by the way, people are quite adept at logic, statistics and probabilities - if you just stop tricking them!"
To me, this is highly confused. Man is not the measure of everything. Rationality, meaning logic, statistical thinking, utility maximizing etc, is a cultural invention, a norm, a standard to which we aspire - and should aspire. The fact that we have a hard time living up to those ideals is an observation of facts, and there are plenty of good reasons why this is the case. But it is equally obvious that we should do whatever we can to get beter at it. We can't (yet) re-engineer ourselves, so we need to work on education, societal structures, political systems etc.
Gigerenzer thinks Sunstein is an autocrat who doesn't trust people to know their own good. I think that Sunstein is way to libertarian.
---
One line of evidence for this are subliminal priming experiments, not to be confused with the controversial “social priming” effects in social psychology; unlike those effects, these kinds of priming experiments are well-defined and have been replicated many times.
Is there a difference? Sotala never uses the term, but I constantly think of associative networks (and perceptrons). Priming is potential for spreading activation. Priming is priming, however mixed results and exaggerated results from sloppy social-psychology experiments.
---
First, in order for the robot to take physical actions, the intent to do so has to be in its consciousness for a long enough time for the action to be taken. If there are any subagents that wish to prevent this from happening, they must muster enough votes to bring into consciousness some other mental object replacing that intention before it’s been around for enough time-steps to be executed by the motor system. (This is analogous to the concept of the final veto in humans, where consciousness is the last place to block pre-consciously initiated actions before they are taken.)
Oh, oh, oh! Veto without a libertarian prime mover. Yes! This resolves the tension I experienced when reading Patrik Lindenfors' speculations on free will in his new book "The Cultural Animal". Libet-experiments should measure several different signals simultaneously.
---
Second, the different subagents do not see each other directly: they only see the consequences of each other’s actions, as that’s what’s reflected in the contents of the workspace. In particular, the self-narrative agent has no access to information about which subagents were responsible for generating which physical action. It only sees the intentions which preceded the various actions, and the actions themselves. Thus it might easily end up constructing a narrative which creates the internal appearance of a single agent, even though the system is actually composed of multiple subagents.
Oh! Self-serving bias, confabulation, FAE, myside bias... But what is the difference in practice? It is still a case of self-deception.
---
Third, even if the subagents can’t directly see each other, they might still end up forming alliances. For example, if the robot is standing near the stove, a curiosity-driven subagent might propose poking at the stove (“I want to see if this causes us to burn ourselves again!”), while the default planning system might propose cooking dinner, since that’s what it predicts will please the human owner. Now, a manager trying to prevent a fear model agent from being activated, will eventually learn that if it votes for the default planning system’s intentions to cook dinner (which it saw earlier), then the curiosity-driven agent is less likely to get its intentions into consciousness. Thus, no poking at the stove, and the manager’s and the default planning system’s goals end up aligned.
Fourth, this design can make it really difficult for the robot to even become aware of the existence of some managers. A manager may learn to support any other mental processes which block the robot from taking specific actions. It does it by voting in favor of mental objects which orient behavior towards anything else. This might manifest as something subtle, such as a mysterious lack of interest towards something that sounds like a good idea in principle, or just repeatedly forgetting to do something, as the robot always seems to get distracted by something else. The self-narrative agent, not having any idea of what’s going on, might just explain this as “Robby the Robot is forgetful sometimes” in its internal narrative.
Ah! Dunning-Kruger, ignorance, witness psychology, unwarranted self-assurance... But what is the difference from intentional agents and the bias and fallacies perspective?
---
Fifth, the default planning subagent here is doing something like rational planning, but given its weak voting power, it’s likely to be overruled if other subagents disagree with it (unless some subagents also agree with it). If some actions seem worth doing, but there are managers which are blocking it and the default planning subagent doesn’t have an explicit representation of them, this can manifest as all kinds of procrastinating behaviors and numerous failed attempts for the default planning system to “try to get itself to do something”, using various strategies. But as long as the managers keep blocking those actions, the system is likely to remain stuc
Aha! Akrasia, ”irrationality” in the sense of not living up to the homo economics template etc...
---
Sixth, the purpose of both managers and firefighters is to keep the robot out of a situation that has been previously designated as dangerous. Managers do this by trying to pre-emptively block actions that would cause the fear model agent to activate; firefighters do this by trying to take actions which shut down the fear model agent after it has activated. But the fear model agent activating is not actually the same thing as being in a dangerous situation. Thus, both managers and firefighters may fall victim to Goodhart’s law, doing things which block the fear model while being irrelevant for escaping catastrophic situations.”
But this is missing an evolutionary perspective (which Sotala brings up much later) outside of the individual agent. Systems that are reasonably well adjusted beget offspring with pre-installed settings that also work reasonably well (as long as the environment doesn't change too much).
Goodhart! Yes! Isn't that the perfect summation of every bias in the book!?
It's a (too) tall order to get people to change their evolved picture of themselves and others, from intentional agents to more or less coordinated subsystems.
Normatively, also we want to act, judge and be judged as intentional agents.
---
Exiles are said to be parts of the mind which hold the memory of past traumatic events, which the person did not have the resources to handle. They are parts of the psyche which have been split off from the rest and are frozen in time of the traumatic event. When something causes them to surface, they tend to flood the mind with pain. For example, someone may have an exile associated with times when they were romantically rejected in the past.
IFS further claims that you can treat these parts as something like independent subpersonalities. You can communicate with them, consider their worries, and gradually persuade managers and firefighters to give you access to the exiles that have been kept away from consciousness. When you do this, you can show them that you are no longer in the situation which was catastrophic before, and now have the resources to handle it if something similar was to happen again. This heals the exile, and also lets the managers and firefighters assume better, healthier roles.
Very Freudian! Both in a good sense and in a bad one. (And I suspect that the IFS crowd really longs for a true Self - which is exactly what there isn't!)
---
In my earlier post, I remarked that you could view language as a way of joining two people’s brains together. A subagent in your brain outputs something that appears in your consciousness, you communicate it to a friend, it appears in their consciousness, subagents in your friend’s brain manipulate the information somehow, and then they send it back to your consciousness.
If you are telling your friend about your trauma, you are in a sense joining your workspaces together, and letting some subagents in your workspace, communicate with the “sympathetic listener” subagents in your friend’s workspace. So why not let a “sympathetic listener” subagent in your workspace, hook up directly with the traumatized subagents that are also in your own workspace?
Yeah... This is what Mercier & Sperber get right - social cognition. But a bit too idealized on communication with others. There is a lot of "pollution" in those exchanges... Even internal monologues are polluted by irrelevant concerns and noise.
---
Instead of remaining blended, you then use various unblending / cognitive defusion techniques that highlight the way by which these thoughts and emotions are coming from a specific part of your mind. You could think of this as wrapping extra content around the thoughts and emotions, and then seeing them through the wrapper (which is obviously not-you), rather than experiencing the thoughts and emotions directly (which you might experience as your own).
...when I became aware of how much time I spent on useless rumination while on walks, I got frustrated. And this seems to have contributed to making me ruminate less: as the system’s actions and their overall effect were metacognitively represented and made available for the system’s decision-making, this had the effect of the system adjusting its behavior to tune down activity that was deemed useless.
Creativity? Heureka moments? Openness to new impressions? (This is discussed later.)
---
Similarly, several of the experiments which get people to exhibit incoherent behavior rely on showing different groups of people different formulations of the same question, and then indicating that different framings of the same question get different answers from people. It doesn’t work quite as well if you show the different formulations to the same people, because then many of them will realize that differing answers would be inconsistent.
This is the point of contention in the rationality wars!
---
The original question which motivated this section was: why are we sometimes incapable of adopting a new habit or abandoning an old one, despite knowing that to be a good idea? And the answer is: because we don’t know that such a change would be a good idea. Rather, some subsystems think that it would be a good idea, but other subsystems remain unconvinced. Thus the system’s overall judgment is that the old behavior should be maintained.
Yees! But normatively, we can know that something is better, while emotionally we do not experience it that way. This is what a bias is!
---
Nevertheless, a fundamental problem remains: at any point in time, which mode should be allowed to control which component of a task? Daw et al. have used a computational approach to address this problem. Their analysis was based on the recognition that goal-directed responding is flexible but slow and carries comparatively high computational costs as opposed to the fast but inflexible habitual mode. They proposed a model in which the relative uncertainty of predictions made by each control system is tracked. In any situation, the control system with the most accurate predictions comes to direct behavioural output.
Note those last sentences: besides the subsystems making their own predictions, there might also be a meta-learning system keeping track of which other subsystems tend to make the most accurate predictions in each situation, giving extra weight to the bids of the subsystem which has tended to perform the best in that situation. We’ll come back to that in future posts.
Automatic vs controlled processes (system 1 and 2). Again, a tall order to transition from the former to the latter. Energy conservation. But also, built-in inertia to avoid paralysis (see Minsky quote):
”Human self-control is no simple skill, but an ever-growing world of expertise that reaches into everything we do. Why is it that, in the end, so few of our self-incentive tricks work well? Because, as we have seen, directness is too dangerous. If self-control were easy to obtain, we'd end up accomplishing nothing at all.”
---
When there is significant uncertainty, the brain seems to fall back to those responses which have worked the best in the past - which seems like a reasonable approach, given that intelligence involves hitting tiny targets in a huge search space, so most novel responses are likely to be wrong.
Also over evolutionary time, over generations. Bias as hard-coded patterns which have previously comprised the best compromise.
---
...positive or negative moods tend to be related to whether things are going better or worse than expected, and suggest that mood is a computational representation of momentum, acting as a sort of global update to our reward expectations.
Yeeesss!!!
---
So to repeat the summary that I had in the beginning: we are capable of changing our behaviors on occasions when the mind-system as a whole puts sufficiently high probability on the new behavior being better, when the new behavior is not being blocked by a particular highly weighted subagent (such as an IFS protector whose bids get a lot of weight) that puts high probability on it being bad, and when we have enough slack in our lives for any new behaviors to be evaluated in the first place. Akrasia is subagent disagreement about what to do.
This is perfectly in line with the bias perspective.
---
Likewise, the subagent frame seems most useful when a person’s goals interact in such a way that applying the intentional stance - thinking in terms of the beliefs and goals of the individual subagents - is useful for modeling the overall interactions of the subagents.
Confusing. Wasn't the whole point to question the intentional agent, the system as a whole, the unmoved mover, the green man at the center of it all?
---
More generally, subagents may be incentivized to resist belief updating for at least three different reasons (this list is not intended to be exhaustive):
1 The subagent is trying to pursue or maintain a goal, and predicts that revising some particular belief would make the person less motivated to pursue or maintain the goal.
2 The subagent is trying to safeguard the person’s social standing, and predicts that not understanding or integrating something will be safer, give the person an advantage in negotiation, or be otherwise socially beneficial. For instance, different subagents holding conflicting beliefs allows a person to verbally believe in one thing while still not acting accordingly - even actively changing their verbal model so as to avoid falsifying the invisible dragon in the garage.
3 Evaluating a belief would require activating a memory of a traumatic event that the belief is related to, and the subagent is trying to keep that memory suppressed as part of an exile-protector dynamic.
Reminds me of Omohundro's thesis on goal-preservation. (And Olle's problematization of the same...)
---
Suppose that a disease, or a monster, or a war, or something, is killing people. And suppose you only have enough resources to implement one of the following two options:
1. Save 400 lives, with certainty.
2. Save 500 lives, with 90% probability; save no lives, 10% probability.
Most people choose option 1. [...] If you present the options this way:
1. 100 people die, with certainty.
2. 90% chance no one dies; 10% chance 500 people die.
Then a majority choose option 2. Even though it's the same gamble. You see, just as a certainty of saving 400 lives seems to feel so much more comfortable than an unsure gain, so too, a certain loss feels worse than an uncertain one.
In my previous post, I presented a model where subagents which are most strongly activated by the situation are the ones that get access to the motor system. If you are hungry and have a meal in front of you, the possibility of eating is the most salient and valuable feature of the situation. As a result, subagents which want you to eat get the most decision-making power. On the other hand, if this is a restaurant in Jurassic Park and a velociraptor suddenly charges through the window, then the dangerous aspects of the situation become most salient. That lets the subagents which want you to flee to get the most decision-making power.
Eliezer’s explanation of the saving lives dilemma is that in the first framing, the certainty of saving 400 lives is salient, whereas in the second explanation the certainty of losing 100 lives is salient. We can interpret this in similar terms as the “eat or run” dilemma: the action which gets chosen, depends on which features are the most salient and how those features activate different subagents (or how those features highlight different priorities, if we are not using the subagent frame).
Suppose that you are someone who was tempted to choose option 1 when you were presented with the first framing, and option 2 when you were presented with the second framing. It is now pointed out to you that these are actually exactly equivalent. You realize that it would be inconsistent to prefer one option over the other just depending on the framing. Furthermore, and maybe even more crucially, realizing this makes both the “certainty of saving 400 lives” and “certainty of losing 100 lives” features become equally salient. That puts the relevant subagents (priorities) on more equal terms, as they are both activated to the same extent.
What happens next depends on what the relative strengths of those subagents (priorities) are otherwise, and whether you happen to know about expected value. Maybe you consider the situation and one of the two subagents (priorities) happens to be stronger, so you decide to consistently save 400 or consistently lose 100 lives in both situations. Alternatively, the conflicting priorities may be resolved by introducing the rule that “when detecting this kind of a dilemma, convert both options into an expected value of lives saved, and pick the option with the higher value”.
By converting the options to an expected value, one can get a basis by which two otherwise equal options can be evaluated and chosen between. Another way of looking at it is that this is bringing in a third kind of consideration/subagent (knowledge of the decision-theoretically optimal decision) in order to resolve the tie.
1. 400 survivors is not interpreted as 100 deaths, but rather as "at most 100 deaths".
2. This is Gigerenxer's schtick: "We don't really have any biases. It's just a question of presenting or rephrasing situations so that it becomes obvious how to deal with them." But it is precisely the fact that this is not done which comprises the bias! (That, and the fact that we don't even experience any need to rephrase the situation.)
3. What is the rationale for preferring expected utility over, say, a sure positive? How does one resolve that conflict, before and after the choice? To oneself? To others?
---
The structure of the “parking ticket” and “cheque” scenarios are equivalent, in that both cases you can take an action to be $90 better off after 30 days. If you notice this, then it may be possible for you to re-interpret the action of paying off the parking ticket as something that gains you money, maybe by something like literally looking at it and imagining it as a cheque that you can cash in, until cashing it in starts feeling actively pleasant.
No. In one case, you lose something, or end up owing something that you may not even have. You wouldn't survive if you had to give away your food. In the other case, you go from surviving as usual to receiving a windfall, an extra bonus. This is exactly the kind of ill-conceived homo economicus rationality that even the economists have abandoned.
---
Reading through my notes, I am starting to wonder if what you're really saying is this: "There is no man in the middle, no unmoved mover, no central control to which we can ascribe beliefs and desires, or hold accountable; who is the author of our destiny, the locus of our (free) will."
And of course I agree.
Maybe your point is that viewing ourselves and others as intentional agents create or reinforce these misconceptions; that we need to understand that we *don't* actually have good reasons for thinking, feeling and doing what we do. That to humble ourselves, we need to understand that the self, the agent, is a figment of our imagination, an illusion to explain our subconscious elephant to our translucent rider...
And I agree.
But still: The best way to summarize the totality is the intentional agent. Maybe this is the reason why I am confused: I have always, ever since childhood, been perfectly onboard with a super-cynical view of people as biological contraptions, recently endowed with (an experience of) (self-)awareness; trying to make sense of the (apparent) voices inside our heads.
The intentional agent is a big improvement over many previous centuries of an over-inflated sense of importance. It is a description of how we have evolved to navigate in the world and coordinate with other moving objects. It is an "as if"-model. Nothing more. This is blatantly obvious to me.
The biases-and-fallacies paradigm serves as an educational device in the service of convincing people who think that we know what we (and others around us) are doing, that we don't. Or at least, that our guesses are just that: shortcuts that try to minimize catastrophic failures in a maximum number of (familiar) situations.