Published
- 17 min read
Max Planck's Black Body Radiation, Part 2: Information theoritic approach
In the previous post Black Body Radiation derivation, we derive Black Body Radiation spectrum using Boltzmann’s statistic and Quantum Energy Postulate.
In this post, we are going to rederive it using information theory. We then show that the quanta of energy is just a direct consequence of the energy distribution from the experimental data.
By using this information-theoritic approach, hopefully you can understand that Max Planck’s postulate doesn’t appear at random. It was rather an educated guess or scientific intuition, similar of using Bayesian update to match probability distribution of the model/theory to the real experimental data.
Perceiving radiation as information exchange
When observing black body radiation, we can perceive it as information exchange from measurements.
Things that we can measure is the macroscopic variables:
- energy intensity
- spectrum of frequency
- temperature
The set of measurements is the information, which is a spectrum in the function of . We have a plot of intensity, with the frequency of the radiation as the x-axis, and temperature controlling the curve and peak intensity.
So we observed some information. The spectrum itself is essentially an un-normalized distribution over all possible frequencies. Radiation then can be thought of as a means to transfer information. When the temperature is at equilibrium, that means:
- It is the maximum amount of information that can be transferred, since the distribution doesn’t change anymore.
- The information is mutual, in the sense that what we observed has to be exactly or proportional to the information each microstates wants to send to the observer.
In other words, the information entropy we calculate must be the same with the sum of information entropy from microstates of the black body/black box.
Calculating information entropy of a single chunk of datum
We will now consider if the Black Body can be divided into infinite or finite of smallest black body that can emit information.
Let’s start as usual, with continuous and discrete summations of information entropy. The variables for the information to measure in this case is “energy”, since this is what we measure to be different by frequency.
Continuous entropy
Suppose there is a continuous distribution . With continuous energy, the Shannon entropy looks like this:
To apply Maximum Entropy principle, we gradually add least-biased constraints.
The usual normalization constraint of probability distribution:
The average energy constraint, the one we observed from the measurements
If you are still confused on why we have average energy as the constraint, then remember that the physical measurements of energy spectrum of black body are stabilizing into the same curve for the same temperature. In other words, the energy settles into the same number for a given frequency. It doesn’t increase indefinitely. So the energy spectrum averages out into the same number.
Applying Lagrangian constraint for Maximum Entropy principle (or equivalently, stationary action principle), we derived:
From
Because probability can’t be a negative number, then this means that is a negative constant. If we switched the convention that is a positive constant by flipping signs, then:
The average energy then becomes
The probability distribution becomes
Discrete entropy
If we assume the Shannon entropy to be discrete, because the space of states is finite/discrete. The entropy formula uses sum over discrete states
Applying Max Entropy the same way using Lagrangian constraints of both normalization and average energy, we get:
Seems that not much of difference, don’t you think? Assuming that probability is normalized, we can choose positive again. However, this time because the energy state is discrete. The normalization factor is a discrete sum.
Notice that we rename the sum as as convenience. Because it depends on parameter . The probability distribution can then be simplified into.
With average energy:
If we substitute back then
You might say that this doesn’t look like Planck’s quantum distribution at all. The form is still very similar to the average energy we get from continuous Shannon entropy. So why bother?
Energy statistics and information
By assuming that each chunk of blackbody emits average energy as a form of information exchange, we can then relate the information with the physical measurements.
We have experimental data of spectrum , which is a function of frequency and temperature. It is usually a plot graph of continuous in x-axis with as y-axis, for a specific temperature . This is because it is not so easy to variate smooth temperature observation, because all the measurements has to be done on thermal equilibrium. If we variate the temperature, it destroys the thermodynamic equalibrium. So the way to measure is usually by choosing an exact temperature of the black body cavity (can be calibrated using the wall’s temperature), then waits until the temperature is uniform. Record the spectrum, and then move on to a different temperature.
Thus, we can imagine that for one session of observation/experiments, we choose temperature . The radiation emitted by the cavity of black body is then detected by a specific frequency , then we record the energy density. We then get the value of intensity .
If we are using information-theoritical perspectives, then the individual black body states must have some kind of microstates. Doesn’t matter what. But since we assume that the microstates only depends on variable frequency , we just call it . For a setup of parameter and , we can then imply the energy has some kind of distribution over parameter and . The information we retrieve from the radiation then must be proportional to the average energy of that distribution .
Because for each session, the temperature is fixed. Let’s call this average energy .
It is proportional with the microstates prefactor , which is also usually called modes (will explain later).
So the relation we have now:
We now need to calculate . Which unfortunately, need some kind of intuition of the physics behind the models. So, it can’t be purely mathematical.
Remember in the last article, Planck’s modeled the black body objects to be of a vibrating resonator. The resonator uses EM waves model. The derivation probably worth another articles. But basically, for a frequency , the emitted energy exists in multiple form of modes. So the total energy for a given frequency is the average energy times the total modes. The density of modes per unit volume per unit frequency is .
Expand the density mode calculation here
We are skipping the derivation of the modes density because it’s not the focus of this article. Explaining this will require us to explain how the Electromagnetic waves model works in the cavity of black body. I’m just going to assume you understand the glossary.
Basically an electromagnetic waves can travel in any direction in 3D spaces. EM waves contains both electric waves and magnetic waves. It has two polarization modes. Meaning switching the polarization still carry the same energy (symmetry around polarization). So each mode of energy has to be multiplied by 2.
In a cubic space of volume , the energy modes can only live in standing waves, because non-standing waves will interfere each other and cancel out the contribution of other waves. Only standing waves superpose in a constructive way. However the standing waves depends on the wavelength. For a cube with dimension , it means the standing wave has to have wavelength of , , , and so on.
But we want to calculate the mode by frequency, not wavelengths.
For EM waves, the frequency is .
If we imagine that the waves can occupy a microstate, or configuration , then it has to be a combination of possible standing waves in 3D space. So we have 3 axes to choose from. In one axis it has to be a multiple of . Because is for one full waves, and is one full length of the cavity.
Then the total number of modes per unit volume per wavelengths are the polarization factor, times mode counts density, times spherical volume:
The value of is invariant to how the information itself is transferred. It’s just a scaling factor from the contribution of the total microstates, or mode configuration, that the blackbody has.
So, how do we justify which energy distribution to choose, based on the experimental data?
This is where the statistics came into play. We basically wants to calculate the likelihood of our two possible distribution, matched with the experimental data.
In order to do that, at least we need to know the average and variance of our predicted models. We then compare which one fits the distribution from our Max Entropy principle.
We have two parameters and . However, the average energy is using parameter of assuming is fixed. We need to know how the distribution behaves when we variate in a fixed .
Average Energy
From thermodynamic relations of partition function
If the current observed object only variates by temperature, by thermodynamic relations of canonical ensemble. If we have multiple measurements of the blackbody radiation in different temperature, then we can calculate the statistics to check the dependence of energy with . Knowing will mean knowing , so that we can have multiple values of to derive its statistics.
Variance of Energy states
The variance of the energy states can also be computed from the partition function
Determining which distribution fits the blackbody spectral curves
Rewriting the equation above, for convenience, so it is clear what parameters we are comparing about.
For sets of temperature measurements, we have by data of average energy per frequency . With corresponds to total numbers of energy states that depends directly from frequency .
So we can calculate the variance of energy values by frequency , from varying temperature .
The idea is basically as follows, we can compare theoritical variance vs experimental variance. Let’s say the left hand side is the theoritical variance (we derive it algebraically). The right hand side is the experimental variance. For a given frequency , the experimental variance of energy from all possible microstates, can be computed from an exact chosen , for a given average energy per frequency , with the derivative (from the differential of ) , which we can compute numerically.
In a given value of and , the experimental data of will let us retrieve , which is the average energy per frequency, when the temperature is fixed. However, notice that in that point of time, due to thermal equilibrium, it is also the average energy per frequency, when the frequency is fixed, and the temperature variates.
So, essentially in that specific point of combination of and . This way, we can compute the right hand side numerically.
So the right hand side will be the same for both continuous and discrete energy derivation.
The left hand side, however, will be different.
If we calculate the variance to the continuous Shannon entropy derivation:
Meanwhile for the discrete Shannon entropy derivation, we won’t have the closed form algebraic solution if we don’t make any assumptions about the relation between and . So here’s where Planck’s postulate came into play.
Planck’s assume that energy quanta is an integer multiple of a discrete energy which is some function of . This is the simplest discrete energy states arrangements, where multiple energy packet of the same mode can be accumulated without limit.
In other words, can be rewritten in closed form, because it is a geometric series now.
The average energy is then:
The variance is then:
As you can see that the theoritical/algebraic derivation for the variance is different, even though the average energy per frequency is the same. This way, which distribution we should use is pretty much clearly can be determined from experimental data.
Spoiler alert, the experimental variance only matches the discretized distribution. So we have no choice but to accept that the energy is quantized.
We can easily graphed the difference between continuous and discretized distribution. As you can see above, the variance differs by , which only been affected by the unknown discretization factor .
Case closed.
Some simulation
If you are interested in seeing how we can compute it from the plot, we can make a little simulation. Of course using synthetic data set instead of the actual experimental black body radiation. We already knew that the Planck’s Radiation formula is the correct one.
Python Script (Collapsible)
import numpy as np
from scipy.constants import h, c, k
from scipy.optimize import curve_fit
np.random.seed(0)
nu = np.linspace(1e12, 3e14, 400) # Hz
T_vals = np.linspace(600, 2000, 25) # 25 temperatures
noise_level = 0.01
def planck_u(nu, T):
return (8*np.pi*h*nu**3 / c**3) / (np.exp(h*nu/(k*T)) - 1)
# generate noisy u
u = np.array([planck_u(nu, T)*(1 + noise_level*np.random.randn(len(nu))) for T in T_vals])
u = np.clip(u, a_min=0, a_max=None)
# compute average energy $\langle E_\nu \rangle$
E_nu = (c**3 / (8*np.pi*nu**2)) * u # shape (nT, nunu)
# partial derivative of E_nu by T
dmu_dT = np.zeros_like(E_nu)
for j in range(len(nu)):
y = E_nu[:, j]
dmu_dT[:, j] = np.gradient(y, T_vals)
# compute experimental variance of energy states
T_matrix = T_vals[:, None]
Var_experimental = k * (T_matrix**2) * dmu_dT
# compute variance difference with continuous energy distribution
R_Var_discrete = Var_experimental - E_nu**2
# choose reference T (middle)
idx_ref = len(T_vals)//2
T_ref = T_vals[idx_ref]
mu_ref = E_nu[idx_ref]
R_ref = R_Var_discrete[idx_ref]
# estimate eps_hat = R/E_nu for E_nu>0
eps_hat = np.full_like(mu_ref, np.nan)
mask = mu_ref > 0
eps_hat[mask] = R_ref[mask] / mu_ref[mask]
# fit linear and power law
mask_fit = mask & np.isfinite(eps_hat) & (eps_hat > 0)
nu_fit = nu[mask_fit]
eps_fit = eps_hat[mask_fit]
def linear(nu, alpha):
return alpha * nu
popt_lin, pcov_lin = curve_fit(linear, nu_fit, eps_fit, maxfev=10000)
alpha_hat = popt_lin[0]
alpha_se = np.sqrt(np.diag(pcov_lin))[0]
log_nu = np.log(nu_fit)
log_eps = np.log(eps_fit)
slope, intercept = np.polyfit(log_nu, log_eps, 1)
p_hat = slope
A_hat = np.exp(intercept)
eps_lin_pred = linear(nu, alpha_hat)
eps_pow_pred = A_hat * nu**p_hat
rss_lin = np.nansum((eps_hat - eps_lin_pred)**2)
rss_pow = np.nansum((eps_hat - eps_pow_pred)**2)
results = {
'nu': nu.tolist(),
'R_Var': R_Var_discrete.tolist(),
'E_nu': E_nu.tolist(),
'idx_ref': idx_ref,
'T_ref': T_ref,
'T_vals': T_vals.tolist(),
'eps_hat': eps_hat.tolist(),
'eps_lin_pred': eps_lin_pred.tolist(),
}
results
function generatePlot() {
const result = window.pythonResult
const generatedID = window.pyodideElementID
const plotID =`${generatedID}-plot`
let plotDiv = document.getElementById(plotID)
if(!plotDiv) {
plotDiv = document.createElement('div')
plotDiv.id = plotID
const parentDiv = document.getElementById(generatedID)
parentDiv.prepend(plotDiv)
}
const nu = result.get('nu')
const E_nu = result.get('E_nu')
const R_Var = result.get('R_Var')
const T_vals = result.get('T_vals')
const eps_hat = result.get('eps_hat')
const eps_lin_pred = result.get('eps_lin_pred')
const idx_ref = result.get('idx_ref')
const T_ref = result.get('T_ref')
const layoutVariance = {
title:`Energy variance difference at T=${T_vals[Math.floor(T_vals.length/2)]} K`,
xaxis: {
title: 'Frequency (Hz)',
showgrid: true,
},
yaxis: {
title: 'Variance - ⟨E⟩^2',
showgrid: true,
},
showlegend: true,
}
const trVar = {
x: nu.map((w) => w),
y: R_Var[idx_ref].map((w) => w),
mode: 'markers',
name: 'Variance - ⟨E⟩^2',
line: {
color: 'red',
width: 2,
}
}
const trace1 = {
x: nu.map((w) => w),
y: eps_hat.map((w) => w),
mode: 'markers',
name: "Variance Ratio Data",
marker: {
color: 'blue',
size: 3
}
};
const trace2 = {
x: nu.map((w) => w),
y: eps_lin_pred.map((w) => w),
mode: 'lines',
name: "Linear Fit of Variance Ratio Data",
line: {
color: 'red',
width: 2
}
};
const layout = {
title: `Variance Ratio vs Frequency at T=${T_vals[Math.floor(T_vals.length/2)]} K`,
xaxis: {
title: 'Frequency (Hz)',
showgrid: true
},
yaxis: {
title: 'Energy quanta (ε = R/⟨E⟩)',
showgrid: true
},
showlegend: true
};
divVar = document.createElement('div')
divQuanta = document.createElement('div')
plotDiv.appendChild(divVar)
plotDiv.appendChild(divQuanta)
Plotly.newPlot(divVar, [trVar], layoutVariance)
Plotly.newPlot(divQuanta, [trace1, trace2], layout);
document.getElementById(`${generatedID}-spinner`).classList.add('hidden');
}
generatePlot()
In the graph above, we can see that in the first graph of is non-zero. It implies that the energy distribution is definitely not continuous, but rather discretized, so that the variance is actually bigger than classical limit.
The difference in variance is exactly , which we already discussed above.
Moreover, since we know , the average energy value from the experimental data, we can compute . The quantity is simply the difference of variance from the first graph, divided by the average energy.
Although the plot deliberately contains noise to fudge the average energy computation, we can clearly see that , can be smoothly interpolated by a line. This means that is a linear scale of . Which then suggest that the quanta of energy , with can be computed by the gradient of the plot.
From these mechanism, we can immediately sure without a doubt that Blackbody radiation suggests that energy were transmitted in discrete chunk, with discrete Shannon information/entropy transfer.