A Whirlwind Tour of LW Rationality: 6 Books in 32 Pages

Mere Goodness U: Fake Preferences

Human desires include preferences for how the world is, not just preferences for how they think the world is or how happy they are. (Not For The Sake Of Happiness Alone) People who claim their preferences reduce down to a single principle have some other process by which they choose what they want, and then find a rationalisation for how what they want is justified by that principle (Fake Selfishness). Simple utility functions fail to compress our values, and we suffer from anthropomorphic optimism about what they suggest. (Fake Utility Functions)

People who fear that humans would lack morality without an external threat, regard this as bad rather than liberating. This means they like morality, and aren’t just forced to abide by it. (Fake Morality)

The detached lever fallacy is the assumption that actions that trigger behaviour from one entity will trigger it from another, without any reason to think the mechanics governing the reaction are present in the second. The actions that make a human compassionate will not make a non-human AI so. (Detached Lever Fallacy) AI design is reducing the mental to the non-mental. Models of an intelligence which can’t predict what it will do other than by analogy to a human are incomplete. (Dreams Of AI Design) The space of possible minds is extremely large. Resist the temptation to generalise over all of mind design space. (The Design Space Of Minds-In-General)

Mere Goodness V: Value Theory

Justifying any belief leads to infinite regress. Rather than accepting any assumption, we should reflect on our mind’s trustworthiness using our current mind as best we can, and accept that. (Where Recursive Justification Hits Bottom) Approach such questions from the standpoint of whether we should want ourselves or an AI using similar principles to change how they choose beliefs. We should focus on improvement, not justification, and expect to change our minds. Don’t exalt consistency in itself, but effectiveness. Separate asking “why” an approach works from whether it “does”. We should reason about our own mind the way we do about the rest of the world, and use all available information. (My Kind Of Reflection)

There are no arguments compelling to all possible minds. For any system processing information, there is a system with inverted output which makes the opposite conclusion. This applies to moral conclusions, and regardless of the intelligence of the system. (No Universally Compelling Arguments, Sorting Pebbles Into Correct Heaps) A mind must have a process that adds beliefs, and a process that acts, or no argument can convince it to believe or act. (Created Already In Motion)

Some properties can be either thought of as as taking two parameters and giving a result, or as a space of one-parameter functions, with different people using different ones. For example, ‘attractiveness(admirer, admired) -> result’ vs ‘attractiveness_1…9999(admired) -> result’. Currying specifies that a two parameter function is equivalent to a one parameter function returning another function, and unifies these. For example, ‘attractiveness(admirer) -> attractiveness_712(admired) -> result’. This reflects the ability to judge a measure independently of the user, but also that the measure used is variable. (2-Place And 1-Place Words)

If your moral framework is shown to be invalid, you can still choose to act morally anyway. (What Would You Do Without Morality?) It’s important to have a line of retreat to be able to seriously review your metaethics. (Changing Your Metaethics) You must start from a willingness to evaluate in terms of your moral intuition in order to find valid metaethics. (Could Anything Be Right?) What we consider to be right grows out of a starting point. To get a system that specifies what is right requires it fit that starting point, which we cannot define fully. (Morality As Fixed Computation) Concepts that we develop to describe good behaviour are very complex. Depictions of them have many possible concepts that fit them, and an algorithm would pick the wrong one.You cannot fix a powerful optimisation process optimising for the wrong thing with patches. (Magical Categories) Value is fragile; optimising for the wrong values creates a dull future. (Value Is Fragile) Our complicated values are the gift that we give to tomorrow. (The Gift We Give To Tomorrow)

The prisoner’s dilemma is a hypothetical in which two people can both either cooperate (C) or defect (D), and each one prefers (D, C) > (C, C) > (D, D) > (C, D). The typical example involves two totally selfish prisoners, but humans can’t imagine this. A better example would have the first entity as humans trying to save billions, vs an entity trying to maximise numbers of paperclips. (The True Prisoner’s Dilemma)

We understand others by simulating them with our brains, which creates empathy. It was evolutionarily useful to develop sympathy. An AI wouldn’t use either approach, an alien might. (Sympathetic Minds)

A world with no difficulty would be boring, We prefer real goals to fake ones. We need goals which we prefer working on to having finished, or which have no end state. (High Challenge) A utopia with no problems has no stories. Pain can be more intense than pleasure. Pleasure that scaled like pain would trap us. We can be rid of pain that breaks or grinds down people, and pointless sorrow, and keep what we value. Whether we will get rid of pain entirely someday, EY does not know. (Serious Stories)

Mere Goodness W: Quantified Humanism

Scope insensitivity is ignoring the number of people or animals or area affected, the scope, when deciding how important an action is. Groups were asked how much they would pay to save 2000 / 20000 / 200000 migrating birds from drowning in oil ponds, and answered $80, $78, and $88. We visualise a single bird, react emotionally, and cannot visualise scope. To be an effective altruist, we must evaluate the numbers. (Scope Insensitivity) Saving one life feels as good as many, but is not as good. We do not treat saving lives as a satisficed virtue, such that once you’ve saved one you ignore others. (One Life Against The World)

The certainty effect is a bias where going from 99% chance to near 100% chance of getting what we want is valued more than going from, say, 33% to 34%. This causes the allais paradox, where we prefer a fixed prize over a 33/34 chance of a bigger prize, but prefer a 33% chance of a larger prize to a 34% chance of a smaller prize. This cannot be explained by non-linear marginal utility of money, permits extracting money from you, and shows a failure of intuition to steer reality. (The Allais Paradox, Zut Allais!)

A certain loss feels worse than an uncertain one. By changing the point of comparison so the certain outcome is a loss rather than a gain, you reverse intuition. You must multiply out costs and benefits, or you will fail at directing reality. This reduces nice feelings, but they are not the point. (Feeling Moral)

Intuition is what morality is built on, but we must pursue reflective intuitions or we won’t accomplish anything due to circular preferences. (The Intuitions Behind Utilitarianism) Making up probabilities can trick you into thinking they’re more grounded than they are, and override working intuitions. (When (Not) To Use Probabilities)

Ends don’t justify the means among humans. We run on corrupted hardware; we rationalise using bad means, past the point that benefits us, let alone anyone else. Otherwise we wouldn’t have developed ethical injunctions. Follow them as a higher-level consequentialist strategy. (Ends Don’t Justify Means Among Humans, Ethical Injunctions)

To pursue rationality effectively, you must have a higher goal that it serves. (Something To Protect) Newcomb’s problem is a scenario in which an entity that can predict you perfectly offers two boxes, and says that box A contains $1000, and box B contains $1,000,000 if and only if they predicted you would only take box B. Traditional causal decision theory says you should take both boxes, as the money is either already in the box or not. Rationally, you should take only box B. Doing so makes you win more, and rationality is about winning, not about reasonableness or any particular ritual of thought. (Newcomb’s Problem And Regret Of Rationality)

(Continue with “Becoming Stronger”)