Converging exponential recursive function

14-Mar-2021

Disclaimer:

I don’t have any formal mathematics degree. I found the following problem/puzzle very interesting and tried to solve it.

First things first, I don’t even know what type of class the problem is, so I will just show you the problem.

It all began from an interesting tweet here

The question is: “Why such things happens?”

In order to appreciate the question more, let’s reiterate the question itself, but now with more layman’s terms.

Let’s say, you have a function F(x)F(x). Then you have a number aa. Substituting xx with aa, you will get the function output F(a)F(a).

If you feed the output to the function again, you will get F(F(a))F(F(a)) for the second iteration. Do it for multiple times, you will eventually get a converging number LL.

This is very procedural (or rather, recursive?), in fact you can make the function and code it, let’s say, using Javascript.

Sample Recursive Runner

Parameters
Outputs

Converging output LL

0

Iteration number

0

Final Δ\Delta . Difference between current output and previous iteration’s output.

0

In the above Javascript snippets, you can try the following:

  1. Run the function using a=0a=0, it will return L=0L=0 but delta is greater than epsilon. Which means, it doesn’t converge

It’s make sense because aa is the basis of the exponent. If it is 0, any power of it will return 0. It will never converge.

  1. Run the function using a=1a=1, it will return L=1L=1. Converging, but not interesting. Any power of 1 is 1 indeed.

  2. Run the function using a=1.4a=1.4, it will converge

  3. Run the function using a=1.5a=1.5, now it doesn’t converge!

Specifically for #4, LL goes unbounded towards infinity. It keeps increasing as the number of iteration goes.

From the result of #3 and #4, we know that there is an upper bound of possible value of aa. We want to know the upper bound.

Solving the problem using limit

We begin backwards, by saying that some upperbound of aa exists that causes LL to converges. If LL converges, then surely it is within a limit, for a huge number of iteration nn.

We define a recursive function first. The function is an input of two variables, the input value xx and the iteration number nn.

F(0,x)=axF(1,x)=F(0,F(0,x))=a(ax)F(2,x)=F(0,F(1,x))=a(a(ax))F(0,x)=a^x \\ F(1,x)=F(0,F(0,x))=a^{(a^x)} \\ F(2,x)=F(0,F(1,x))=a^{(a^{(a^x)})}

Let’s say, for huge number of nn, F(n,x)F(n,x) converge to a finite value LL. However for a really big number of nn and n+1n+1, both of them will converge to the same thing, which is LL. If this value really exists, then we can say:

F(n+1,x)=L=F(n,x)aL=LF(n+1,x)=L=F(n,x) \\ a^L=L

Then if LL does exists, we conclude that it’s value are only determined solely by number aa. Regardless of the number of iteration.

Solving this is a little bit difficult, however. To do that, you had to group LL into one side of the equation. Let’s put this off for now, because we are just interested in finding the upper bound of aa. So, we go to the second part of the statement.

We know that at first F(0,x)F(0,x) won’t be equal to LL. It’s value depend on both aa and xx. In the javascript code above, we set initial value of xx to be aa. The output of that function were feed back, over and over again.

Let’s say, we can freely choose the value of xx, then we “measure” how close the result would be to our convergent value LL, which is the final value. Let’s define the range of xx first. We know it had to be greater than 0, since xx is divergent if less than 0. It’s also less than 1.5, since greater than that is divergent too. So, hey, the range of values is actually small!

We can explore a little bit more. Let’s say we can “variate” xx around these range, in small increment. xx has to be close to our convergent value LL, right? We can then define the “difference” between the function’s output from the convergent output aLa^L: D(x)=F(0,x)aLD(x)=F(0,x)-a^L. We also define the “difference” between the function’s input from where it is convergent: d(x)=xLd(x)=x-L. We want to measure these differences and see how “far” xx is from the value LL that we want to find. So, we divide it to have a more “normalized” value or ratio: O(x)=D(x)d(x)O(x)=\frac{D(x)}{d(x)}. We call this our measured quantity, the objective function. Objective function can have arbitrary output, but it should tell us how close it is to our target objective value. In our case, our objective is to get as close as possible to O(L)O(L).

You might be thinking what good this will be? Well, now, I can show you where the magic happens. If we look at our objective function, it looks really similar to a derivative! Surprising! Let’s make it a derivative. To justify it, we say that we variate xx as close to LL as possible, but not get passed it (because if we pass it, we get divergent output). So the limit is increasing from the left.

limxLD(x)d(x)=F(0,x)limxLaxaLxL=F(0,x)ln(a)ax=F(0,x)\lim_{x \to L^{-}}\frac{D(x)}{d(x)}=F'(0,x) \\ \lim_{x \to L^{-}}\frac{a^x-a^L}{x-L}=F'(0,x) \\ \ln{(a)} a^x = F'(0,x)

Now, mind you, this equation only valid around x=Lx=L. This is not useful unless we made a connection to something else. The great news is, we do happen to know the value of the derivative when x=Lx=L! How lucky are we? A derivative of F(0,x)F(0,x), which is F(0,x)F'(0,x) is a function that corresponds to the tangent of F(0,x)F(0,x) graph. Our function have a special property, no matter what aa we choose, it will converge to a value LL. When it converges, no matter how many times LL are given as input to the function, it will always produce LL again. Which is the equality that we have before, aL=La^L=L. If we have 2D graph, the function converges to a point (L,aL)=(L,L)(L,a^L)=(L,L).

Something special happens when it converges. By intuition alone, our objective function will say “perfect match” if the xx value we choose is actually LL, the value when it converge. Perfect match means the ratio between D(x)D(x) and d(x)d(x) should be the target/optimum value of the objective function, which is:

D(x)d(x)=aLL=1\frac{D(x)}{d(x)}=\frac{a^L}{L}=1

It conveniently reduce to 1 because nominator and denominator have equal value.

You might not be satisfied with this intuition. Unfortunately I’m weak at formal proof, hahaha. But another reasoning can be derived if you think that the derivative of F(0,x)F(0,x) can vary. First hint, we know that if x<Lx<L the tangent line has to keep increasing, so that the function can recursively get closer to LL, by logic alone. This means, in the entire range of xx, the derivative has to always increase, but somewhat stop increasing in x=Lx=L. However, due to the nature of the function, for x>Lx>L, it has to have a drastic steep tangent/slope so that the function can’t converge (it always increase in rapid stride). This is the second hint.

You might be thinking that:

“Meh, it’s a tangent, so let’s just divide function output by input, we get aLL=1\frac{a^L}{L}=1. The tangent is 1”.

While we got the same result, the logic has a flaw because it’s just an average tangent from 0 to LL and we only got lucky because the function axa^x doesn’t have any constant in it.

The correct reasoning, is to understand why the function decided to suddenly increase the slope when xx gets past LL. Something is happening at LL. For the function to drastically increase, that means the first derivative has to drastically increase. But, for the first derivative to drastically increase, the second derivative has to drastically increase too. We can keep this going, until the nn-th derivative, even to infinity. But wait, before it got past LL, it converges, it just means a simple fact that at the nn-th derivative, that derivative is simply ZERO.

The nn-th derivative of F(0,x)=axF(0,x)=a^x is just (conveniently) Fn(0,x)=ln(a)naxF'^n(0,x)=\ln{(a)}^n a^x. The deciding factor is to understand that for it to be 0, it must be that ln(a)<1\ln{(a)}<1. Because, otherwise if ln(a)\ln{(a)} greater than 1, the compounding effect will make sure that ln(a)n\ln{(a)}^n goes to infinity. We then deduce a very simple result, the very first derivative itself must be less than one. Which is the same with what we found earlier.

So, back on track. We got the upper bound here (completely making LL term irrelevant, yay!). Focus on finding the xx value:

ln(a)ax1ln(a)ax\ln{(a)} a^x \leq 1 \\ \ln{(a)} \leq a^{-x} \\

Stop here for a moment. Notice that the left hand side can be a negative number, but we want to retrieve xx by using logarithm. Input of logarithm can’t be negative or even zero. So we have the second part of the inequality, the lower bound. This means:

ln(a)0a1\ln{(a)} \ge 0 \\ a \ge 1

Now moving on, assuming that a1a\ge 1.

ln(a)ax1ln(ln(a))xln(a)xln(ln(a))ln(a)\ln{(a)} a^x \leq 1 \\ \ln{(\ln{(a)})} \leq -x \ln{(a)} \\ x \leq -\frac{\ln{(\ln{(a)})}}{\ln{(a)}}

This only works when x=Lx=L, so the upper bound is:

Lln(ln(a))ln(a)L \leq -\frac{\ln{(\ln{(a)})}}{\ln{(a)}}

Wow! This really looks complicated, man. Very different from what is conjectured by the tweet above. We relate LL with aa. So this is in fact the upper bound of LL. We can’t say yet that this value is the convergent value, because it might be something lower.

To eliminate any trace of LL, combine with the fact that aL=La^L=L

aL=LLln(a)=ln(L)L=ln(L)ln(a)a^L=L \\ L \ln{(a)} = \ln{(L)} \\ L = \frac{\ln{(L)}}{\ln{(a)}}

Now, we have everything that makes it possible to reduce the equality/inequality! Substitute this equality to the previous inequality. We got:

ln(L)ln(a)ln(ln(a))ln(a)ln(L)ln(ln(a))L1ln(a)\frac{\ln{(L)}}{\ln{(a)}} \leq -\frac{\ln{(\ln{(a)})}}{\ln{(a)}} \\ \ln{(L)} \leq -\ln{(\ln{(a)})} \\ L \leq \frac{1}{\ln{(a)}}

Simpler. But wait… We can reduce it some more :D. Just replace LL with the previous equality.

ln(L)ln(a)1ln(a)Le\frac{\ln{(L)}}{\ln{(a)}} \leq \frac{1}{\ln{(a)}} \\ L \leq e

Interesting, we have two inequality now with clear value and free of LL. Combine this and reorder. With the fact that 1ae1 \le a \le e, it just means that 1ln(a)\frac{1}{\ln{(a)}} has to be the upper bound of LL if we want all of these to actually makes sense and consistent. The combined equality becomes (we can then later kick LL out of the inequality):

Le1ln(a)ln(a)1e1ae1eL \leq e \leq \frac{1}{\ln{(a)}} \\ \ln{(a)} \leq \frac{1}{e} \\ 1 \le a \leq e^{\frac{1}{e}}

Oh boy, now we found the more strict version of the upper bound, which is e1ee^{\frac{1}{e}}, or more conveniently written in the tweet as exp(1e)\exp({\frac{1}{e}}).

If we trace back the steps, they seems to consists of several magic tricks. How can we even know that the solution exists (and elegant) from the beginning? This doesn’t seems to be a reliable problem solving method. Hahaha. I get it that math are beautiful poetry, but in practice, we want a reliable and generic way to solve problem, and not relying of any “genious” trick that math plebs can’t figure out from their own. I mean, not everyone of us is Gauss.

This begs the second part of the article, which explore how even a mere mortal like me can even glimpse at the final solution.

What did I do to guess the solution?

I began by poking around at the problem. Trying to see what happens if I look at it at certain way. At first, you don’t know what the proof is, so you kind of assume that it’s true. If it’s true, then what happens? We should be able to derive another conclusion that don’t contradict with the assumptions. If it contradicts, then we actually proof it otherwise.

So, let’s start by again assuming that it resembles a recursive function, and it has a base. Meaning, the limit converges. If it converges, at the very end of the iteration the value LL will be exact same. Whatever xx value we put on axa^x, it is the LL and this happens by that definition alone:

ax=aaxx=axa^x = a^{a^x} \\ x = a^x

We got the above result by comparing the exponent, because we are using the same base aa.

Next, as I said before, solving for xx will be possible, although difficult. We then weigh several options. I listed it here (it can become your source of inspiration for other approach):

  1. Expand axa^x as power series, then look at the the resulting equation, then fold it back as some kind of closure function from the series.
  2. Variate xx and see what happens around x=Lx=L
  3. Guess at the substitution, usually by doing numerical analysis or plotting the graph.

No #3, is a big no, because I don’t have that much time to guess at it. Ain’t got time to do full-time math job.

No #1 will provide a better way of understanding the problem, because it will have a clear picture of why does it converges. But again, let’s leave that for the pro mathlete. We just want to poke around.

No #2 seems to be a good candidate because we can quickly check it if it is valid.

So, in conclusion, left side and right side comes from a successive iteration. We can think of it as coming from a different original function. We want to variate left and right side to quickly check when exactly the equality holds and how. Because we variate it, we make a derivative wrt xx both in left and right hand side. We got:

x=ax1=ln(a)axx = a^x \\ 1 = \ln{(a)} a^x

Hold up, these seems dodgy because the left hand side’s derivative is simply 1. Which means we are comparing axa^x function with a straight line! What does that even mean? Anyway, we have no time (yet) to provide the justifications. Just assume the universe let this be true. What happens now, is we can substitute (conveniently, strange) axa^x from both equation. We immediately got x=1ln(a)x=\frac{1}{\ln{(a)}}. Then continue substituting xx with the second equation 1=ln(a)ax1 = \ln{(a)} a^x will lead us to:

1=ln(a)axln(ln(a))=xln(a)=1a=exp(1e)1 = \ln{(a)} a^x \\ - \ln{(\ln{(a)})} = x \ln{(a)} = 1 \\ a = \exp({\frac{1}{e}})

Now that I knew the solution exists. I just have to trace back and justify the proof. I’m quite surprised myself that the upper bound of aa doesn’t depend on any variable. As long as aa is within that range, the recursive function will converges eventually. Very interesting.

So, now how to compute the convergent value?

We already established the fact that the final convergent value LL is only depends on aa from this equation aL=La^L=L. Knowing what the value is, however, is entirely different problem.

First, just to emphasizes, even though we have relation x=1ln(a)x=\frac{1}{\ln{(a)}}, that doesn’t mean L=1ln(a)L=\frac{1}{\ln{(a)}}. The equation above is for calculating the upper bound of aa. But aa itself can be less than that, consequently making LL different. That equation above only valid for a=exp(1e)a=\exp({\frac{1}{e}}), which in turns make L=eL=e.

To find the value LL for any aa within convergence radius, currently we only have one equation. To be honest, I was happen to know about the Lambert-W function, so I can figure out a hint. We can use this to solve this equation. The catch is, Lambert-W is not an elementary function.

Let’s do a quick intro of Lambert-W for those who don’t know. Otherwise, just skip this.

Lambert-W function intro

We call Lambert-W function as simply Wk(z)W_k(z). The kk is not important for now, so let’s just call it W(z)W(z). The function’s input is a zz, meaning it’s on a complex domain. Lambert-W is defined as:

W(z)=wW(z)=w

For the inverse equation:

wew=zwe^w=z

Using this definition, we are going to move backwards. We pretend that we can calculate W(z)W(z) then manipulate the inverse equation so that we can substitute W(z)W(z). That’s basically the plan.

Solve using Lambert-W

Our original equation is a product+exponent or product logarithm and we want to manipulate it in such a way that it resembles the inverse of W(z)W(z).

aL=Llna eLlna=Llnalna=(Llna)(eLlna)z=wewa^L=L \\ \ln{a} \ e^{L \ln{a}} = L \ln{a} \\ - \ln{a} = (-L \ln{a}) \cdot (e^{-L \ln{a}}) \\ z = we^w

We now get that z=lnaz=-\ln{a} and w=Llnaw=-L\ln{a}. We are now going to assume that lna\ln{a} is in a complex domain. Meaning that our equation will also cover the values of a0a \leq 0 for complex logarithm function. Using Lambert-W we separate the variable:

W(lna)=LlnaL=W(lna)lnaW(-\ln{a})=-L\ln{a} \\ L = - \frac{W(-\ln{a})}{\ln{a}}

We now have the closed form of LL for any arbitrary aa. But what good will it be if W(z)W(z) is not an elementary function? We can’t calculate it. Also, Lambert-W can returns multiple output depending on the kk argument. But because LL has to strictly a real number, then k=0k=0 and k=1k=-1 will suffice.

What to do when we have Lambert-W function representation of LL ?

Since Lambert-W have some special properties. We get some of the facts for free because of the already established theorems.

First fact, since we are dealing with only real positive values of both aa and LL, the input of W(z)W(z) can be guessed to come from the domain that made the principal branch of W(z)W(z) to be available. If the input of W(z)W(z) greater than 1e-\frac{1}{e}, we are using Lambert-W with k=0k=0. W0(z)W_0(z) have radius of convergence 1e\frac{1}{e}. From this information alone, we can derive the fact that the upper bound of aa has to be exp(1e)\exp{(\frac{1}{e})}. Pretty quick, eh?

1elna1e1elna1eexp(1e)aexp(1e)-\frac{1}{e} \leq -\ln{a} \leq \frac{1}{e} \\ \frac{1}{e} \geq \ln{a} \geq -\frac{1}{e} \\ \exp{(\frac{1}{e})} \geq a \geq \exp{(-\frac{1}{e})}

Look, we even got more specific lower bound: exp(1e)\exp{(-\frac{1}{e})}. You can check back with the widget above, if we put aexp(1e)a\leq\exp{(-\frac{1}{e})}, the function will not converge.

To test it, try a=0.6922a=0.6922 (very close to the lower bound), with small enough epsilon ϵ=0.0000000000000001\epsilon=0.0000000000000001 It will not converge. But flip the value of aa a little bit higher to 0.69230.6923 it converges. This is because the delta is delicate enough.

Now, let’s try a=0.1a=0.1. What? We got convergent result?

So, our lower bound of aa here must be wrong. We need to investigate further. If we are careful, our arguments are making sure that the lower bound of zz is clearly 1e-\frac{1}{e}, but the upper bound here is just an assumptions. Technically, as long as zz is positive, W(z)W(z) will also returns positive real value. We need to be more strict.

For z0z \geq 0 the deciding factor would be the value lna\ln{a} here. Specifically, we have to make the following inequality because we want LL to be positive. Note that it is not possible to have L=0L=0 because then aL=La^L=L cannot be satisfied. In summary, we have:

L=W(lna)lna0W(lna)lna0W(lna)0L = - \frac{W(-\ln{a})}{\ln{a}} \ge 0 \\ \frac{W(-\ln{a})}{-\ln{a}} \ge 0 \\ W(-\ln{a}) \ge 0

For the second line, note that the sign in the left side now depends on the sign of lna-\ln{a}. W(z)W(z) switch signs the same way as zz. Thus, grouping it like that making sure the inequality doesn’t change sign.

You probably need to do some mental gymnastic for the last part if you’re not familiar. We basically have to say: pick zz such that the W(z)W(z) greater than 0. Since W(0)=0W(0)=0 and the function increases. That means the lower bound of zz is 0. The upper bound of aa thus becomes 1, i.e a1a \le 1.

Let’s stop again for a moment and digest it. We are a little bit confused. We already described that the upper bound of aa is exp(1e)\exp{(\frac{1}{e})}, but now we have a result that the upper bound aa is 11. Something is wrong here and I don’t know what it is :p.

I can only guess that there is a branching point (because W(z)W(z) can have multiple output) when a=1a=1. At that point the value of LL trivially becomes 11 by definition. But it’s not possible to calculate it directly using above closed form, since LL becomes zero by zero division. You had to take the limit.

Consequently, when 0a10 \le a \le 1, LL has to be some small value and decrease as aa decrease, then becomes undefined when a=0a=0. At this moment, the zz interval becomes 0z0 \le z \le \infty, which can be quite big. The closed form says it converging to a small value of LL, but if we take the value procedurally using recursive definition at the beginning, any base aa exponentiated into a small value of L0L \approx 0, will cause the output to approach 11. In the next iteration, a small base of aa exponentiated into a value of L1L \approx 1, will cause the output to approach the base again. So we have an alternating value of nearly 11 and nearly 00 that made it not possible for LL to converge, recursively. But, the closed form function will only consider the smaller one.

In conclusion, I am guessing that within the interval 0a10 \le a \le 1, the function recursively jumps between branches. You can check this using the previous widget. For a=0.0001a=0.0001, calculate the value between odd and even max iteration to see that the output alternates.

Now that we have the bound for aa, it is possible for us to directly calculate the convergent value LL, as long as we keep aa within the bounds. To calculate W0(z)W_0(z) numerically, we can use Taylor series to arbitrary precisions. I’m not going to write down the implementation details, but here’s the second widget:

Calculating L directly

FYI, I didn’t test all edge cases in the following widget. Basically, we expect to get a result as long as aa is within bounds.

The bounds that I refer to is the bounds that applies to the principal branch of Lambert-W. I only implement that one, so the bounds are exp(1e)aexp(1e)\exp{(\frac{1}{e})} \geq a \geq \exp{(-\frac{1}{e})}. Other than that, it’s not guaranteed that the Taylor series I implemented can converge.

The widget below works by calculating L directly using Lambert-W function. The Lambert-W function is calculated using Taylor series.

Because Lambert-W works in complex domain, we convert it’s input into a complex number. We limit our implementation by truncating the terms when the real part of term is lower then accepted epsilon, which means further terms will not significantly adds up. Due to how complex number behaves, it will always have real and imaginary part for each terms. The imaginary parts will rotate for each term. However we want to make sure the final sum of the imaginary parts gets as close to 0 as possible to consider the output to be real. When the final sum has significant imaginary unit, more than epsilon, then we consider it to “diverge”. Diverge here is in the sense the LL value is not pure real number. It actually has value (converge) in the sense of complex domain (analytical).

Parameters
Outputs

Direct calculation of output LL

0

Iteration number

0

Wrapping Up and Conclusion

This article is getting long now. I want to explore more using a more graphic approach but I don’t have time to create the widget.

Anyway, I hope you enjoy the article. If you are interested in the source code of the widgets (react code), you can download the source code here.

Alternatively, you can also check it in Github


Rizky Maulana Nugraha

Written by Rizky Maulana Nugraha
Software Developer. Currently remotely working from Indonesia.
Twitter FollowGitHub followers