Notes on fixed points --------------------- Bernie Pope. First, let us ignore the lambda calculus, and just consider functions over the natural numbers. Given a function f, if there exists a value x, such that x = f(x), then we say that "x is a fixed point of f". A function may have zero fixed points, or it may have one, or many. In fact, a function can have infinitely many fixed points. For example, this function has no fixed points: f(x) = x+1 This function has one fixed point: f(x) = 5 This function has infinitely many fixed points: f(x) = x We can show by a diagonalisation argument that there are uncountably many functions over the natural numbers which do not have fixed points. (See Sudkamp: Languages and Machines, 2nd Edition, page 19). Question: what about the functions over naturals which _do_ have fixed points. Are they countable or uncountable? So what do fixed points have to do with computing? You can think of a program as a finite state machine. The behaviour of the machine is given by a "transition function", which says how the machine goes from one state to the next. You can make the machine do work by applying the transition function to the current state, yielding a new state as output. Some initial state is chosen to start the whole thing off, and a computation emerges when we iterate the process of applying the transition function to successive states of the machine. A key question is: when do you stop? We can designate some special state (or states) as terminals. If the terminal states are reached, the computation is done. But how does the designation work? Well, we can say that the transition function becomes the identity when given a terminal state as input. Then we can say: the computation is finished if we reach a fixed point of the transition function. That is: the output state is equal to the input state. Clearly the machine will make no more progress once it has reached a fixed point. Not all state transition functions have fixed points because we can easily construct examples with loops in them. If x is a fixed point of a function f, then we can say: x = FIX f From this equation, and the general property of fixed points (ie. x = f(x)), we can say: FIX f = f (FIX f) Thus FIX is a "fixed point finder", which takes a function a returns a fixed point (if one exists). Note, we don't specify _which_ fixed point might be found, if there is more than one. If FIX can be defined, then it must satisfy the above equation. The natural question to ask is, can we define the fixed point finder (FIX), and for what functions is it defined? For functions over the natural numbers, we know that no such general fixed point finder exists for all functions. Curiously, in the lambda calculus, the answer is yes, we can define FIX, and it is defined over all terms in the lambda calculus. What is more, FIX can be defined as a lambda term. That is it can be defined within the lambda calculus. This is a surprising result, and it was the cause of much consternation amongst mathematicians and logicians. Something suggests that there is a difference between terms in the lambda calculus, and the "classical mathematical" view of functions. And yes, there is a difference, since the lambda calculus has a computational aspect to it, which is not present in classical mathematics. Indeed, we already know that the (lambda/Turing) computable functions are a proper subset of functions over the natural numbers. Perhaps the most unusual thing about the lambda calculus is that any term can be applied as if it were a function (there is no syntactic restriction on application). Thus, functions in the lambda calculus are also ordinary terms of the lambda calculus. Compare this to the classical view of functions over natural numbers, which makes a clear distinction between the domain of values and the functions over that domain. The nature of the lambda calculus can seem rather incongruous, especially if we have any hope of using it to model the classical view of functions. We may return to this topic later in semester, time permitting. For now, let us cover what was said in the last lecture. I mistakenly said the following lambda term was due to Church, but it seems it is due to Curry: FIX = \h -> (\x -> h (x x)) (\x -> h (x x)) We can verify that it satisfies the important fixed point equation by applying the function to some term f on both sides: FIX f = (\h -> (\x -> h (x x)) (\x -> h (x x))) f The RHS is a beta redex, which can be reduced: FIX f = (\x -> f (x x)) (\x -> f (x x)) The new RHS is also a beta redex, which can be reduced: FIX f = f ((\x -> f (x x)) (\x -> f (x x))) ------------------------------- We have previously shown that the underlined term is convertible (not reducible) to FIX f, thus we can conclude: FIX f = f (FIX f) It turns out that Curry's FIX is not unique, and there are other lambda terms which can also be used. Turing's version is this term: FIX = (\x -> \y -> y (x x y)) (\x -> \y -> y (x x y)) As an exercise, verify that Turing's term satisfies the important equation for fixed point finders. Let us use CURRY_FIX and TURING_FIX to name the two example fixed point finders. Note that both CURRY_FIX and TURING_FIX work independently of their argument terms. That is, they are defined over all terms in the lambda calculus. So every term in the lambda calculus has a fixed point! But, this does not mean that all fixed points have a normal form. For instance, consider: CURRY_FIX (\x -> x) and TURING_FIX (\x -> x) Neither have a normal form. And even more surprisingly, they don't share a common reduct. That is, they do not eventually converge to the same reduction sequence. However, we do think of them as denoting the same thing. What this suggests is that reduction is only a weak form of determining equality between terms in the lambda calculus. From a practical point of view, FIX gives us a way of encoding recursion into languages (such as the lambda calculus) that do not have it as a primitive construct. Let's shift into a language like Haskell for convenience of notation. Here is a Haskell data type encoding the natural Numbers: data Nat = Z | S Nat and here is a procedure which denotes a function: val :: Nat -> Int val Z = 0 val (S n) = 1 + val n We can write the above procedure as one equation: val :: Nat -> Int val = \n -> case n of Z -> 0 S m -> 1 + val m Now, the above definition is a recursive, but we want to write a version which is not recursive. We take the recursive reference to "val" in the body out as a parameter. The idea is that we want to give some meaning to the procedure (find out the function that it denotes), but we don't want the procedure to refer to itself, because it leads to a tangle of self-reference. val_non_rec :: (Nat -> Int) -> (Nat -> Int) val_non_rec v = \n -> case n of Z -> 0 S m -> 1 + v m Notice that val_non_rec takes "v" as a parameter, and "v" is a function of type (Nat -> Int). We can think of val_non_rec as a "state transition function". The input state is some approximation to the function denoted by the val procedure, and the output state is a better approximation. We recover the function denoted by val by taking the fixed point of val_non_rec: val = fix val_non_rec How does this compute the function? Initially we know absolutely nothing about the function denoted by val, so the initial state of the fixed point computation is an empty set: s0 = {} We apply the state transition function (val_non_rec) to the initial state to get the next state: s1 = val_non_rec s0 = \n -> case n of Z -> 0 S m -> 1 + s0 m From this step we obtain a new state: s1 = { (Z -> 0) } That is by one application of the state transition function we have learned what the function does in the base case. We compare the new state (s1) to the old state (s0) and we see that they are different, so we have not reached a fixed point, so we must continue. s2 = val_non_rec s1 = \n -> case n of Z -> 0 S m -> 1 + s1 m Now we can make use of the second branch in the case statement, and we discover slightly more information about the function: s2 = { (Z -> 0), (S Z -> 1) } Since (s2 != s1) we are not at a fixed point, so we continue yet again. Next time we discover: s3 = { (Z -> 0), (S Z -> 1), (S (S Z) -> 2) } In fact, we never reach a fixed point, so this iteration goes on forever. But that is to be expected because the function we are trying to construct is defined over an infinite domain. Therefore the function is itself infinite. Of course in a real program evaluation we do not need, or even want, to compute the whole function, we just want to know what its value is for certain elements of its domain. So in practice, we compute the fixed point on demand. Therefore, all (Turing complete) programming languages are necessarily "lazy" (demand driven) to some extent. End of a rather long transmission.