Following up from previous article about Lp space,
I want to discuss an interesting topic about Uncertainty Principle.
When we are talking about “Uncertainty Principle”, we usually link it to Heisenberg Uncertainty Principle, which is found
from Quantum Mechanical observation in a deductive manner. But in this article we are going to derive it from bottom up.
From just a minimal assumptions/axioms.
Start from “information”
When we observe something, we gain information. So if we have a function, we can gain information from observing the function.
Let’s say we encode information from an event that has probability distribution ρ. Let’s say it depends on a parameter x,
which is usually a state (be it discrete or continuous) in the sense of statistics or random variables.
We wrote ρ(x).
If this function is measurable and behave nice enough — it can be discrete or continuous, doesn’t matter as long as it is measurable —,
then it must have a corresponding dual in the Fourier frequency domain.
This is due to Fourier Transform property, which is just a mathematical fact.
Let’s just say it is γ(ξ).
But we have several ways to choose on how to build the corresponding dual function. We can transform the function directly from ρ(x),
or we can do other transform and then do the Fourier Transform. The thing I want to suggest here is: what if we use
Plancherel Theorem, because we can further constraint the relation between
the dual function. This is also discussed in my previous article about local time entropy
So we say ρ(x) is in the Lp space as square integrable function (means p=2). This means that there exists a function
ψ(x) where ρ(x)=∣ψ(x)∣2.
We then perform the Fourier Transform on ψ(x). We get a dual function ϕ(ξ).
This way, we can apply Plancherel Theorem like this:
∫∣ψ(x)∣2dx∫ρ(x)dx=∫∣ϕ(ξ)∣2dξ=∫γ(ξ)dξ=1
The last line is just the sum total of the probability distribution, so it must equal to 1.
Since Plancherel Theorem can be applied, then γ(ξ) also equal 1, meaning it is also some sort of probability distribution.
Now here’s the crucial insight. Because both of ρ and γ is a probability distribution. It must have Shannon entropy.
So, it has information.
Stating the inequalities from Lp space
As we have discussed in previous article regarding Lp space. It turns out that p=2 is the only optimum solution to
derive some sort of “circle” concept, where the distance and angle have uniform length. In this space, the triangle inequalities
is a little bit unique. It become the most extremum bound for the inequalities. You can’t get any better than that!
Loosely speaking, in our 2D Cartesian coordinate (flat Euclidean geometry, where L2 space is), any right-triangle will have
natural inequalities, such that the hypothenuse length has to be smaller than the other two lengths combined.
This simple idea is what formed another theorem called Hausdorff-Young inequality
which I will call HY from now on. It is a direct corollary of the Plancherel Theorem.
The full general statement of HY is like this:
Suppose that p (from the Lp space) is in the range (inclusive) between [1,2].
There will be a corresponding (conjugate) called q with the relation:
p1+q1=1
Then the inequality has this form for multidimensional n:
(∫Rnϕ(ξ)qdξ)1/q≤(∫Rnψ(x)pdx)1/p
However, for our case, we already knew that p=2. Simplifying the notation (hiding the parameter and integral by using the norm notation):
In a simple notation using functions we already defined above:
∥ϕ∥q≤∥ψ∥p
You might be wondering, since we can choose any arbitrary function to represent ψ(x), then even if we flipped the inequalities, it should still be apply the same way.
So how come this is correct?
The reason why we can flip around the function (it is symmetric after all), is because the Fourier Transform normalization factor
will ensure that one side is bigger than the other. However in the case of p=2, it is the most optimum, causing it to be equality due to Pancherel Theorem.
The important insight from these inequalities is that it is not possible to make both sides as dense as possible. If you make one function dense/small, the inequality
ensure that the Fourier dual be wider.
Although in our case, we can easily set p=2 to apply Plancherel Theorem. But we will defer, and do it later.
We are going to find the entropy first.
Linking inequalities to entropy
We limit our class of function ψ to have corresponding probability distribution ρ in the L2 space.
We can calculate its entropy. We will use the general notion of entropy in information theory called the Rényi entropy
You will see why this is a natural choice, just from the form of it.
Rényi entropy Rα(P) for a probability distribution P(X) of random variables X is defined:
Rα(P)=1−α1ln(∥P∥αα)
The above formula means for every state in random variables X, the probability P(x) will have total norm of ∥P∥α
for the parameter α. To remind that again:
∥P∥α∥P∥αα=(∫P(x)αdx)α1=(∫P(x)αdx)
Very appropriate link to the Lp space HY inequality we used before.
Some little bit of algebra
Because ∣ψ(x)∣2=ρ(x) and Rényi entropy uses P(x)α, we are going to substitute that p=2α. So that:
To move the term α(1−α) to right hand side, remember that p is in the range [1,2]. Which means 1−α is always a positive number.
So if we multiply both terms by (1−α)α, we got
Rα(ρ)+Rβ(γ)≥n⋅α−12αln(A(2α))
In the right hand side, we got n, representing the number of dimension of the space.
We also have α−12αln(A(2α)) which is a constant that depends on our choice of α.
Because we establish that p=2, it means α=1, so we can’t evaluate the right side immediately because it will be divided by α−1=0.
Taking the limit into Shannon Entropy
The constant of the bounds C(α)=α−12αln(A(2α)) had to be calculated by taking its limit when α→1.
When α→1. Both the entropy Rα and Rβ conveniently becomes the Shannon entropy.
We already know above, the value of limα→1A(2α)=1 from HY inequality matched to Plancherel Theorem. This causes the limit to be in the form of 00.
So let’s apply L’Hôpital’s rule.
But before that, let’s expand the expression by substituting A(2α)=(2β)4β1(2α)4α1
This has such a profound connection. Any function, that has a corresponding probability distribution in L2 will have information such that it is fundamentally limited entropic uncertainty.
Just from a mathematical observation alone, we can know that any function can’t have its information be certain in both its domain and its Fourier dual domain.
You can only make certain of it in one domain. But you can’t make both certain at the same time. There is a mathematical limit on how certain you can be, if you want to know both of them.
The limiting entropy is exactly ln2e at the very minimum.
Testing Entropic Uncertainty Principle
In the previous article, we derived that a normal/Gaussian distribution is the best possible probability distribution
with least possible amount of information needed to guess it.
We figure that out using Action principle and Lagrangian method to derive normal distribution
One remarkable thing from this Entropic Uncertainty Principle is that how you can intuitively say that for any Gaussian/Normal probability
distribution. The total entropy between the function and its dual Fourier transform, has to be always ln(2e).
Because, by the Maximum Entropy principle, Gaussian/Normal distribution is the best distribution you can guess with very limited information.
It’s the highest entropy you can get, but it’s also the smallest total entropy bound possible, because of Entropic Uncertainty Principle.
Combining both principle, the inequality has to become an equality for every Gaussian, just by definition alone.
It’s very easy to derive it mathematically too.
Let’s say that that our probability distribution is a Normal distribution in the form of
ρ(x)=N2exp(−2αx2)
Such that
ψ(x)=Nexp(−αx2)
This caused
ϕ(x)=Nαπexp(−απ2ξ2)
And
γ(x)=N2απexp(−α2π2ξ2)
The variance:
σx2=4α1σξ2=4π2α
So whatever the original function is, if it is a Gaussian, then the product of the variances is:
σx2σξ2=4α1⋅4π2α=16π21
The α parameter that controls how dense the Gaussian is, just cancel out each other.
We can then uses above product to calculate the entropy.