I don’t have any formal mathematics degree. I found the following problem/puzzle very interesting and tried to solve it.
First things first, I don’t even know what type of class the problem is, so I will just show you the problem.
It all began from an interesting tweet here
Let a > 0 and consider the exponential function
F(x) = aˣ = a^x
where x ≥ 0. Show that the sequence of iterations
a, F(a), F(F(a)), F(F(F(a))), …
a ≤ exp(1/e) = 1.4446…
In particular, it converges for Dillon’s @InertialObservr example a = √2. #math #calculus
— Sam Walters ☕️ (@SamuelGWalters) March 8, 2021
The question is: “Why such things happens?”
In order to appreciate the question more, let’s reiterate the question itself, but now with more layman’s terms.
Let’s say, you have a function . Then you have a number . Substituting with , you will get the function output .
If you feed the output to the function again, you will get for the second iteration. Do it for multiple times, you will eventually get a converging number .
- Run the function using , it will return but delta is greater than epsilon. Which means, it doesn’t converge
It’s make sense because is the basis of the exponent. If it is 0, any power of it will return 0. It will never converge.
Run the function using , it will return . Converging, but not interesting. Any power of 1 is 1 indeed.
Run the function using , it will converge
Run the function using , now it doesn’t converge!
Specifically for #4, goes unbounded towards infinity. It keeps increasing as the number of iteration goes.
From the result of #3 and #4, we know that there is an upper bound of possible value of . We want to know the upper bound.
We begin backwards, by saying that some upperbound of exists that causes to converges. If converges, then surely it is within a limit, for a huge number of iteration .
We define a recursive function first. The function is an input of two variables, the input value and the iteration number .
Let’s say, for huge number of , converge to a finite value . However for a really big number of and , both of them will converge to the same thing, which is . If this value really exists, then we can say:
Then if does exists, we conclude that it’s value are only determined solely by number . Regardless of the number of iteration.
Solving this is a little bit difficult, however. To do that, you had to group into one side of the equation. Let’s put this off for now, because we are just interested in finding the upper bound of . So, we go to the second part of the statement.
Let’s say, we can freely choose the value of , then we “measure” how close the result would be to our convergent value , which is the final value. Let’s define the range of first. We know it had to be greater than 0, since is divergent if less than 0. It’s also less than 1.5, since greater than that is divergent too. So, hey, the range of values is actually small!
We can explore a little bit more. Let’s say we can “variate” around these range, in small increment. has to be close to our convergent value , right? We can then define the “difference” between the function’s output from the convergent output : . We also define the “difference” between the function’s input from where it is convergent: . We want to measure these differences and see how “far” is from the value that we want to find. So, we divide it to have a more “normalized” value or ratio: . We call this our measured quantity, the objective function. Objective function can have arbitrary output, but it should tell us how close it is to our target objective value. In our case, our objective is to get as close as possible to .
You might be thinking what good this will be? Well, now, I can show you where the magic happens. If we look at our objective function, it looks really similar to a derivative! Surprising! Let’s make it a derivative. To justify it, we say that we variate as close to as possible, but not get passed it (because if we pass it, we get divergent output). So the limit is increasing from the left.
Now, mind you, this equation only valid around . This is not useful unless we made a connection to something else. The great news is, we do happen to know the value of the derivative when ! How lucky are we? A derivative of , which is is a function that corresponds to the tangent of graph. Our function have a special property, no matter what we choose, it will converge to a value . When it converges, no matter how many times are given as input to the function, it will always produce again. Which is the equality that we have before, . If we have 2D graph, the function converges to a point .
Something special happens when it converges. By intuition alone, our objective function will say “perfect match” if the value we choose is actually , the value when it converge. Perfect match means the ratio between and should be the target/optimum value of the objective function, which is:
It conveniently reduce to 1 because nominator and denominator have equal value.
You might not be satisfied with this intuition. Unfortunately I’m weak at formal proof, hahaha. But another reasoning can be derived if you think that the derivative of can vary. First hint, we know that if the tangent line has to keep increasing, so that the function can recursively get closer to , by logic alone. This means, in the entire range of , the derivative has to always increase, but somewhat stop increasing in . However, due to the nature of the function, for , it has to have a drastic steep tangent/slope so that the function can’t converge (it always increase in rapid stride). This is the second hint.
You might be thinking that:
“Meh, it’s a tangent, so let’s just divide function output by input, we get . The tangent is 1”.
While we got the same result, the logic has a flaw because it’s just an average tangent from 0 to and we only got lucky because the function doesn’t have any constant in it.
The correct reasoning, is to understand why the function decided to suddenly increase the slope when gets past . Something is happening at . For the function to drastically increase, that means the first derivative has to drastically increase. But, for the first derivative to drastically increase, the second derivative has to drastically increase too. We can keep this going, until the -th derivative, even to infinity. But wait, before it got past , it converges, it just means a simple fact that at the -th derivative, that derivative is simply ZERO.
The -th derivative of is just (conveniently) . The deciding factor is to understand that for it to be 0, it must be that . Because, otherwise if greater than 1, the compounding effect will make sure that goes to infinity. We then deduce a very simple result, the very first derivative itself must be less than one. Which is the same with what we found earlier.
So, back on track. We got the upper bound here (completely making term irrelevant, yay!). Focus on finding the value:
Stop here for a moment. Notice that the left hand side can be a negative number, but we want to retrieve by using logarithm. Input of logarithm can’t be negative or even zero. So we have the second part of the inequality, the lower bound. This means:
Now moving on, assuming that .
This only works when , so the upper bound is:
Wow! This really looks complicated, man. Very different from what is conjectured by the tweet above. We relate with . So this is in fact the upper bound of . We can’t say yet that this value is the convergent value, because it might be something lower.
To eliminate any trace of , combine with the fact that
Now, we have everything that makes it possible to reduce the equality/inequality! Substitute this equality to the previous inequality. We got:
Simpler. But wait… We can reduce it some more :D. Just replace with the previous equality.
Interesting, we have two inequality now with clear value and free of . Combine this and reorder. With the fact that , it just means that has to be the upper bound of if we want all of these to actually makes sense and consistent. The combined equality becomes (we can then later kick out of the inequality):
Oh boy, now we found the more strict version of the upper bound, which is , or more conveniently written in the tweet as .
If we trace back the steps, they seems to consists of several magic tricks. How can we even know that the solution exists (and elegant) from the beginning? This doesn’t seems to be a reliable problem solving method. Hahaha. I get it that math are beautiful poetry, but in practice, we want a reliable and generic way to solve problem, and not relying of any “genious” trick that math plebs can’t figure out from their own. I mean, not everyone of us is Gauss.
This begs the second part of the article, which explore how even a mere mortal like me can even glimpse at the final solution.
I began by poking around at the problem. Trying to see what happens if I look at it at certain way. At first, you don’t know what the proof is, so you kind of assume that it’s true. If it’s true, then what happens? We should be able to derive another conclusion that don’t contradict with the assumptions. If it contradicts, then we actually proof it otherwise.
So, let’s start by again assuming that it resembles a recursive function, and it has a base. Meaning, the limit converges. If it converges, at the very end of the iteration the value will be exact same. Whatever value we put on , it is the and this happens by that definition alone:
We got the above result by comparing the exponent, because we are using the same base .
Next, as I said before, solving for will be possible, although difficult. We then weigh several options. I listed it here (it can become your source of inspiration for other approach):
- Expand as power series, then look at the the resulting equation, then fold it back as some kind of closure function from the series.
- Variate and see what happens around
- Guess at the substitution, usually by doing numerical analysis or plotting the graph.
No #3, is a big no, because I don’t have that much time to guess at it. Ain’t got time to do full-time math job.
No #1 will provide a better way of understanding the problem, because it will have a clear picture of why does it converges. But again, let’s leave that for the pro mathlete. We just want to poke around.
No #2 seems to be a good candidate because we can quickly check it if it is valid.
So, in conclusion, left side and right side comes from a successive iteration. We can think of it as coming from a different original function. We want to variate left and right side to quickly check when exactly the equality holds and how. Because we variate it, we make a derivative wrt both in left and right hand side. We got:
Hold up, these seems dodgy because the left hand side’s derivative is simply 1. Which means we are comparing function with a straight line! What does that even mean? Anyway, we have no time (yet) to provide the justifications. Just assume the universe let this be true. What happens now, is we can substitute (conveniently, strange) from both equation. We immediately got . Then continue substituting with the second equation will lead us to:
Now that I knew the solution exists. I just have to trace back and justify the proof. I’m quite surprised myself that the upper bound of doesn’t depend on any variable. As long as is within that range, the recursive function will converges eventually. Very interesting.
We already established the fact that the final convergent value is only depends on from this equation . Knowing what the value is, however, is entirely different problem.
First, just to emphasizes, even though we have relation , that doesn’t mean . The equation above is for calculating the upper bound of . But itself can be less than that, consequently making different. That equation above only valid for , which in turns make .
To find the value for any within convergence radius, currently we only have one equation. To be honest, I was happen to know about the Lambert-W function, so I can figure out a hint. We can use this to solve this equation. The catch is, Lambert-W is not an elementary function.
Let’s do a quick intro of Lambert-W for those who don’t know. Otherwise, just skip this.
We call Lambert-W function as simply . The is not important for now, so let’s just call it . The function’s input is a , meaning it’s on a complex domain. Lambert-W is defined as:
For the inverse equation:
Using this definition, we are going to move backwards. We pretend that we can calculate then manipulate the inverse equation so that we can substitute . That’s basically the plan.
Our original equation is a product+exponent or product logarithm and we want to manipulate it in such a way that it resembles the inverse of .
We now get that and . We are now going to assume that is in a complex domain. Meaning that our equation will also cover the values of for complex logarithm function. Using Lambert-W we separate the variable:
We now have the closed form of for any arbitrary . But what good will it be if is not an elementary function? We can’t calculate it. Also, Lambert-W can returns multiple output depending on the argument. But because has to strictly a real number, then and will suffice.
Since Lambert-W have some special properties. We get some of the facts for free because of the already established theorems.
First fact, since we are dealing with only real positive values of both and , the input of can be guessed to come from the domain that made the principal branch of to be available. If the input of greater than , we are using Lambert-W with . have radius of convergence . From this information alone, we can derive the fact that the upper bound of has to be . Pretty quick, eh?
Look, we even got more specific lower bound: . You can check back with the widget above, if we put , the function will not converge.
To test it, try (very close to the lower bound), with small enough epsilon It will not converge. But flip the value of a little bit higher to it converges. This is because the delta is delicate enough.
Now, let’s try . What? We got convergent result?
So, our lower bound of here must be wrong. We need to investigate further. If we are careful, our arguments are making sure that the lower bound of is clearly , but the upper bound here is just an assumptions. Technically, as long as is positive, will also returns positive real value. We need to be more strict.
For the deciding factor would be the value here. Specifically, we have to make the following inequality because we want to be positive. Note that it is not possible to have because then cannot be satisfied. In summary, we have:
For the second line, note that the sign in the left side now depends on the sign of . switch signs the same way as . Thus, grouping it like that making sure the inequality doesn’t change sign.
You probably need to do some mental gymnastic for the last part if you’re not familiar. We basically have to say: pick such that the greater than 0. Since and the function increases. That means the lower bound of is 0. The upper bound of thus becomes 1, i.e .
Let’s stop again for a moment and digest it. We are a little bit confused. We already described that the upper bound of is , but now we have a result that the upper bound is . Something is wrong here and I don’t know what it is :p.
I can only guess that there is a branching point (because can have multiple output) when . At that point the value of trivially becomes by definition. But it’s not possible to calculate it directly using above closed form, since becomes zero by zero division. You had to take the limit.
Consequently, when , has to be some small value and decrease as decrease, then becomes undefined when . At this moment, the interval becomes , which can be quite big. The closed form says it converging to a small value of , but if we take the value procedurally using recursive definition at the beginning, any base exponentiated into a small value of , will cause the output to approach . In the next iteration, a small base of exponentiated into a value of , will cause the output to approach the base again. So we have an alternating value of nearly and nearly that made it not possible for to converge, recursively. But, the closed form function will only consider the smaller one.
In conclusion, I am guessing that within the interval , the function recursively jumps between branches. You can check this using the previous widget. For , calculate the value between odd and even max iteration to see that the output alternates.
Now that we have the bound for , it is possible for us to directly calculate the convergent value , as long as we keep within the bounds. To calculate numerically, we can use Taylor series to arbitrary precisions. I’m not going to write down the implementation details, but here’s the second widget:
FYI, I didn’t test all edge cases in the following widget. Basically, we expect to get a result as long as is within bounds.
The bounds that I refer to is the bounds that applies to the principal branch of Lambert-W. I only implement that one, so the bounds are . Other than that, it’s not guaranteed that the Taylor series I implemented can converge.
The widget below works by calculating L directly using Lambert-W function. The Lambert-W function is calculated using Taylor series.
Because Lambert-W works in complex domain, we convert it’s input into a complex number. We limit our implementation by truncating the terms when the real part of term is lower then accepted epsilon, which means further terms will not significantly adds up. Due to how complex number behaves, it will always have real and imaginary part for each terms. The imaginary parts will rotate for each term. However we want to make sure the final sum of the imaginary parts gets as close to 0 as possible to consider the output to be real. When the final sum has significant imaginary unit, more than epsilon, then we consider it to “diverge”. Diverge here is in the sense the value is not pure real number. It actually has value (converge) in the sense of complex domain (analytical).
This article is getting long now. I want to explore more using a more graphic approach but I don’t have time to create the widget.
Anyway, I hope you enjoy the article. If you are interested in the source code of the widgets (react code), you can download the source code here.
Alternatively, you can also check it in Github