Home

Published

- 33 min read

Sketch of Proof: Deriving the Prime Counting Function from First Principles


Sketch of Proof: Deriving the Prime Counting Function from First Principles

This document attempts to reconstruct the path to the explicit formula for the prime counting function π(x)\pi(x), building from first principles using Information Theory, empirical observations, and Fourier Analysis. We explore a possible discovery narrative: deriving mathematical tools as they become necessary, then revealing their modern names.

Important disclaimer: This is an exploratory sketch, not a rigorous proof. We stand on the shoulders of giants—Gauss, Riemann, Chebyshev, and countless others who developed these profound ideas over centuries. Our goal is to explore one possible path of understanding, knowing that the actual historical development was far more complex and rigorous. There may be oversimplifications or gaps in this presentation; for complete rigor, consult standard texts on analytic number theory.

The exploration moves from empirical observations (Gauss’s tables) through information-theoretic reasoning (Maximum Entropy principle) to spectral analysis (Fourier transforms), attempting to arrive at Riemann’s explicit formula. Each step asks: “How might we naturally discover this if we approached it fresh, with these particular tools?“


1. Empirical Starting Point: The Staircase Shape

Historical Context: Counting Primes Before Modern Theory

In the late 18th century, mathematicians including Gauss and Legendre compiled extensive tables of prime numbers. By manually checking divisibility, they identified all primes up to several hundred thousand. Gauss, reportedly as young as 15 or 16 years old, studied these tables not to find individual primes, but to understand the pattern of their distribution.

The prime counting function π(x)\pi(x) counts how many primes exist less than or equal to xx. For example:

  • π(10)=4\pi(10) = 4 because the primes less than 10 are {2,3,5,7}\{2, 3, 5, 7\}
  • π(100)=25\pi(100) = 25
  • π(1000)=168\pi(1000) = 168

The Staircase Pattern

When we plot π(x)\pi(x) as a function of xx, we observe a distinctive staircase curve. The function remains constant between primes, then suddenly jumps by 1 each time we encounter a prime number. This creates a step function with an irregular, yet not completely random, pattern of jumps.

Despite the irregularity of individual steps, Gauss noticed something remarkable: when viewed from afar, the staircase appears to follow a smooth, upward trend. The density of primes seems to decrease as numbers get larger, but in a predictable way.

Gauss’s Empirical Observation

Through careful analysis of his tables, Gauss conjectured that the average density of primes near a large number xx is approximately 1lnx\frac{1}{\ln x}. This means the number of primes up to xx should be roughly:

π(x)2xdtlnt\pi(x) \approx \int_2^x \frac{dt}{\ln t}

This integral is called the logarithmic integral Li(x)\text{Li}(x). For large xx, it behaves asymptotically like xlnx\frac{x}{\ln x}, giving us the famous asymptotic relation:

π(x)xlnx\pi(x) \sim \frac{x}{\ln x}

This conjecture was not proven until 1896 (independently by Hadamard and de la Vallée Poussin), when it became known as the Prime Number Theorem (PNT).

Wave Decomposition Strategy

The staircase pattern suggests a natural mathematical approach. In Fourier analysis, we know that well-behaved step functions can be decomposed into a superposition of waves—a smooth principal component plus oscillatory corrections:

π(x)=principal wavesmooth trend+oscillatory modesthe steps\pi(x) = \underbrace{\text{principal wave}}_{\text{smooth trend}} + \underbrace{\text{oscillatory modes}}_{\text{the steps}}

Gauss’s empirical observation (later PNT) gives us the principal wave: xlnx\frac{x}{\ln x}. This represents the “DC component” or average trend. But what creates the steps? What are the frequencies and amplitudes of the oscillatory modes that combine to produce the sharp jumps exactly at prime locations?

This is our quest: to derive the oscillatory corrections using first principles.

Next Step: Finding the Right Coordinate System

To analyze oscillations using Fourier methods, we need to choose an appropriate coordinate system and measure. The key question is: what symmetry does the prime distribution actually respect? Is it translation-invariant (constant spacing) or something else?


2. First Principle: Scale-Invariant Measure

The Problem of Measuring Prime Density

Consider a naive question: “What is the density of primes?” If we measure density as “primes per unit interval,” we immediately run into a problem. The density in the interval [100,200][100, 200] (which contains 21 primes) is very different from the density in [1000,1100][1000, 1100] (which contains only 16 primes), even though both intervals have the same width of 100.

This suggests that prime density is not translation-invariant—moving the interval while keeping its width constant changes the density.

Discovering Scale Invariance

Let’s try a different comparison. Consider these two intervals:

  • [100,200][100, 200]: width = 100, midpoint = 150, scale factor = 2
  • [1000,2000][1000, 2000]: width = 1000, midpoint = 1500, scale factor = 2

Even though the absolute widths differ by a factor of 10, both intervals represent a doubling of the lower bound. Empirically, when we count primes in these intervals and normalize appropriately, the densities become comparable!

This observation suggests that prime density respects scale invariance (multiplicative symmetry) rather than translation invariance (additive symmetry).

The Maximum Entropy Approach

In information theory, the Maximum Entropy principle states that when we have incomplete information, we should use the probability distribution that respects our known constraints while making no additional assumptions (maximizing entropy). This gives us the “least biased” framework for analysis.

Here, our constraint is scale invariance: the measure should treat multiplicative shifts symmetrically. The question becomes: what measure dμd\mu on the real line respects multiplicative symmetry?

Deriving the Scale-Invariant Measure

For translation invariance, the natural measure is simply dxdx (the standard Lebesgue measure). For scale invariance, we need a measure that remains unchanged under multiplicative scaling.

Consider the transformation xλxx \mapsto \lambda x for constant λ>0\lambda > 0. Under this scaling:

  • Standard measure: dxλdxdx \mapsto \lambda \, dx (scales with λ\lambda)
  • Logarithmic measure: d(lnx)=dxxd(λx)λx=dxxd(\ln x) = \frac{dx}{x} \mapsto \frac{d(\lambda x)}{\lambda x} = \frac{dx}{x} (invariant!)

Therefore, the natural scale-invariant measure is:

dt=dxxdt = \frac{dx}{x}

Integrating both sides:

t=ln(x)+Ct = \ln(x) + C

Since we’re interested in primes starting from 2 (the smallest prime), we can set C=ln(2)C = -\ln(2), giving us t=ln(x/2)t = \ln(x/2). For simplicity, we’ll often just work with t=ln(x)t = \ln(x), understanding that the constant affects only the reference point.

Working in Log-Space

This derivation tells us that logarithmic space is the natural coordinate system for studying prime distribution. In this coordinate system:

  • Integers are located at positions tn=ln(n)t_n = \ln(n)
  • Multiplicative relationships become additive: ln(ab)=ln(a)+ln(b)\ln(ab) = \ln(a) + \ln(b)
  • Scale-invariant properties become translation-invariant in tt-space

This fundamental choice of measure will guide all subsequent analysis.

Next Step: Information Content of Individual Primes

Now that we have the right measure for density, we need to quantify the “information content” of each prime. When we discover that a number is prime, how much information have we gained? This requires understanding what it means to “locate” a prime in our logarithmic space.


3. Deriving the Information Weight Function

The Question: How Much Information is a Prime?

Imagine we’re systematically checking integers to determine which are prime. We’ve already tested all integers up to some value, and now we examine the next integer nn. If it turns out to be prime, we’ve gained information—we’ve reduced our uncertainty about the location of primes.

But how much information? In information theory, the information content of an event is quantified using Shannon entropy. If an event has probability pp, the information we gain when it occurs is:

I=ln(p)I = -\ln(p)

(using natural logarithm). This is measured in nats—the natural logarithm equivalent of bits.

Estimating the Probability of Primality

In our log-space coordinate system t=ln(x)t = \ln(x), integers are not uniformly spaced. They’re located at tn=ln(n)t_n = \ln(n), and the spacing between consecutive integers near nn is:

δtn=ln(n+1)ln(n)=ln(1+1n)\delta t_n = \ln(n+1) - \ln(n) = \ln\left(1 + \frac{1}{n}\right)

For large nn, using the approximation ln(1+ϵ)ϵ\ln(1 + \epsilon) \approx \epsilon for small ϵ\epsilon:

δtn1n\delta t_n \approx \frac{1}{n}

Now, we derived that the natural measure in our space is dt=dxxdt = \frac{dx}{x}. At integer location nn, the “weight” or “probability measure” of that position is proportional to the spacing δtn1n\delta t_n \approx \frac{1}{n}.

If we model “discovering” a prime as randomly selecting a position in this measure, then finding a prime pp corresponds to an event with probability weight ρ(p)1p\rho(p) \propto \frac{1}{p}.

Calculating Information Content

The information content of finding prime pp is:

I(p)=ln(ρ(p))=ln(1p)=ln(p)I(p) = -\ln(\rho(p)) = -\ln\left(\frac{1}{p}\right) = \ln(p)

This is a beautiful result: the information content of discovering prime pp is simply its natural logarithm.

Extension to Prime Powers

Consider prime powers pkp^k for k1k \geq 1. The number pkp^k is “constructed” from prime pp exactly kk times (via unique prime factorization). If we think of information as additive—finding pp twice gives twice the information—then:

I(pk)=kln(p)I(p^k) = k \cdot \ln(p)

Defining the Von Mangoldt Function

We formalize this by defining a function Λ(n)\Lambda(n) that assigns information weight to each positive integer:

Λ(n)={ln(p)if n=pk for some prime p and positive integer k0otherwise\Lambda(n) = \begin{cases} \ln(p) & \text{if } n = p^k \text{ for some prime } p \text{ and positive integer } k \\ 0 & \text{otherwise} \end{cases}

This function is zero for composite numbers (like 6=2×36 = 2 \times 3) because they’re “constructed from multiple different primes,” not powers of a single prime. Only prime powers carry this pure information signature.

Historical Note and Modern Name

What we’ve just derived from information-theoretic first principles is known in analytic number theory as the von Mangoldt function, named after the German mathematician Hans von Mangoldt who introduced it in the 1890s in the context of studying prime distribution and the Riemann zeta function.

Our derivation shows why this particular function is natural—it’s not an arbitrary choice, but rather the unique function that correctly weighs integers by their information content in log-space under scale-invariant measure.

Next Step: Accumulating Information

We now have the information weight Λ(n)\Lambda(n) for individual integers. To understand the global distribution pattern of primes, we need to accumulate this information over ranges. How much total information is encoded in all integers up to xx?


4. Building the Cumulative Information Function

Defining Total Information Up to xx

Having established that each prime power pkp^k carries information weight Λ(pk)=ln(p)\Lambda(p^k) = \ln(p), we naturally want to accumulate this information to understand the distribution pattern. Define:

ψ(x)=nxΛ(n)=pkxln(p)\psi(x) = \sum_{n \leq x} \Lambda(n) = \sum_{p^k \leq x} \ln(p)

where the second sum runs over all prime powers pkp^k (primes and their integer powers) not exceeding xx.

This function ψ(x)\psi(x) represents the total information content encoded in all integers up to xx, weighted by their fundamental prime structure.

Properties and Behavior

Let’s understand how ψ(x)\psi(x) behaves:

Jumps at prime powers: The function is constant between prime powers, then jumps by ln(p)\ln(p) whenever we encounter pkp^k:

  • At x=2x = 2: jump of ln(2)\ln(2)
  • At x=3x = 3: jump of ln(3)\ln(3)
  • At x=4=22x = 4 = 2^2: jump of ln(2)\ln(2) (second power of 2)
  • At x=5x = 5: jump of ln(5)\ln(5)
  • At x=7x = 7: jump of ln(7)\ln(7)
  • At x=8=23x = 8 = 2^3: jump of ln(2)\ln(2) (third power of 2)

Dominant contribution from first powers: Most of the contribution to ψ(x)\psi(x) comes from primes themselves (k=1k=1), not higher powers. This is because:

ψ(x)=pxln(p)+p2xln(p)+p3xln(p)+\psi(x) = \sum_{p \leq x} \ln(p) + \sum_{p^2 \leq x} \ln(p) + \sum_{p^3 \leq x} \ln(p) + \cdots

The number of primes up to xx is roughly xlnx\frac{x}{\ln x} (by PNT), but the number of prime squares is only xlnx\approx \frac{\sqrt{x}}{\ln x}, prime cubes x1/3lnx\approx \frac{x^{1/3}}{\ln x}, and so on. These higher-power terms are increasingly negligible.

Relationship to π(x)\pi(x): The prime counting function π(x)\pi(x) counts primes with unit weight, while ψ(x)\psi(x) weights them logarithmically. The two are related, but ψ(x)\psi(x) will turn out to have cleaner analytic properties.

Why Use ψ(x)\psi(x) Instead of π(x)\pi(x)?

At first glance, we’ve moved away from our original goal (analyzing π(x)\pi(x)). Why introduce this new function ψ(x)\psi(x)?

Several reasons:

  1. Natural from information theory: The weights ln(p)\ln(p) emerged naturally from our Maximum Entropy framework
  2. Multiplicative structure: ψ(x)\psi(x) respects the multiplicative structure of integers (prime factorization)
  3. Analytic properties: As we’ll see, ψ(x)\psi(x) has better behavior under Fourier transforms
  4. Recoverable: We can convert back to π(x)\pi(x) from ψ(x)\psi(x) later (Section 8)

Historical Note and Modern Name

The function ψ(x)\psi(x) we’ve constructed is called the Chebyshev function (specifically, the second Chebyshev function), named after Pafnuty Chebyshev who used it in the 1850s in his work on prime number distribution. Chebyshev proved important bounds on π(x)\pi(x) using this function, coming tantalizingly close to proving the Prime Number Theorem decades before it was actually proven.

Our derivation shows this function isn’t an arbitrary mathematical trick—it’s the natural cumulative information function when primes are viewed through the lens of information theory.

Next Step: Spectral Decomposition

We’ve claimed that ψ(x)\psi(x) can be decomposed into a principal wave (smooth trend) plus oscillatory modes (creating the steps). To find these modes, we need to compute the frequency spectrum of our information weight function Λ(n)\Lambda(n). This requires Fourier analysis in the appropriate space.


5. Fourier Analysis: Finding the Frequency Spectrum

The Goal: Decomposing Information into Frequencies

We want to understand ψ(x)=nxΛ(n)\psi(x) = \sum_{n \leq x} \Lambda(n) as a wave superposition. In classical Fourier analysis, we transform a signal from time/space domain to frequency domain to understand its spectral components. Here, we’ll transform Λ(n)\Lambda(n) from the “integer domain” to a “frequency domain” parameterized by complex frequency ss.

5.1 The Forward Transform: Signal to Frequency

Direction and Analogy: In standard continuous Fourier analysis, the forward transform (signal → frequency) is:

f^(ω)=f(t)eiωtdt\hat{f}(\omega) = \int_{-\infty}^{\infty} f(t) \cdot e^{-i\omega t} \, dt

Note the negative exponent in eiωte^{-i\omega t}—this is characteristic of the forward transform.

Our Discrete Case: Our signal Λ(n)\Lambda(n) is defined only at discrete integer points n=1,2,3,n = 1, 2, 3, \ldots. We cannot evaluate Λ(2.5)\Lambda(2.5) or Λ(π)\Lambda(\pi)—only integers have prime factorizations! Therefore, we must use a sum rather than an integral:

F(s)=n=1Λ(n)(kernel)F(s) = \sum_{n=1}^{\infty} \Lambda(n) \cdot (\text{kernel})

Finding the Kernel: Recall that in log-space, integer nn is at position tn=ln(n)t_n = \ln(n), so n=etnn = e^{t_n}. By analogy with the continuous case where the kernel is eiωte^{-i\omega t}, our kernel should be:

estn=eslnn=nse^{-s t_n} = e^{-s \ln n} = n^{-s}

Here ss plays the role of complex frequency (generalization of iωi\omega).

Our Forward Transform:

F(s)=n=1Λ(n)ns=n=1Λ(n)nsF(s) = \sum_{n=1}^{\infty} \Lambda(n) \cdot n^{-s} = \sum_{n=1}^{\infty} \frac{\Lambda(n)}{n^s}

This is a Dirichlet series, the discrete analogue of the Laplace transform or Mellin transform. The negative power nsn^{-s} corresponds to the negative exponent in the forward Fourier transform.

5.2 Connecting to Statistical Mechanics

Challenge: Computing F(s)=Λ(n)nsF(s) = \sum \frac{\Lambda(n)}{n^s} directly seems difficult because Λ(n)\Lambda(n) is zero for most integers. Can we find a better way to understand this sum?

Analogy with Statistical Mechanics: In statistical mechanics, the partition function encodes all information about a system:

Z=states ieβEiZ = \sum_{\text{states } i} e^{-\beta E_i}

where EiE_i is the energy of state ii, and β=1kT\beta = \frac{1}{kT} is the inverse temperature. The average energy is related to the partition function by:

E=lnZβ=Z(β)Z(β)\langle E \rangle = -\frac{\partial \ln Z}{\partial \beta} = -\frac{Z'(\beta)}{Z(\beta)}

Our Analogue: In our setup:

  • States are labeled by integers nn
  • “Energy” of state nn is the information content Λ(n)\Lambda(n)
  • Parameter ss plays the role of inverse temperature β\beta
  • Weighting factor ns=eslnnn^{-s} = e^{-s \ln n} resembles the Boltzmann factor eβEe^{-\beta E}

Our sum F(s)=Λ(n)nsF(s) = \sum \frac{\Lambda(n)}{n^s} represents the average information per state, weighted by nsn^{-s}. By analogy with statistical mechanics, this should be related to the derivative of some partition function:

F(s)=ddslnZ(s)=Z(s)Z(s)F(s) = -\frac{d}{ds} \ln Z(s) = -\frac{Z'(s)}{Z(s)}

for an appropriate partition function Z(s)Z(s).

5.3 Solving for the Partition Function

Setting Up the Differential Equation: We have:

Z(s)Z(s)=n=1Λ(n)ns-\frac{Z'(s)}{Z(s)} = \sum_{n=1}^{\infty} \frac{\Lambda(n)}{n^s}

Equivalently:

ddslnZ(s)=n=1Λ(n)ns\frac{d}{ds} \ln Z(s) = -\sum_{n=1}^{\infty} \frac{\Lambda(n)}{n^s}

Using Prime Power Structure: Recall that Λ(pk)=ln(p)\Lambda(p^k) = \ln(p) for prime powers. We can reorganize the sum:

n=1Λ(n)ns=p primek=1ln(p)pks\sum_{n=1}^{\infty} \frac{\Lambda(n)}{n^s} = \sum_{p \text{ prime}} \sum_{k=1}^{\infty} \frac{\ln(p)}{p^{ks}}

Therefore:

ddslnZ(s)=pk=1ln(p)pks=pln(p)k=11pks\frac{d}{ds} \ln Z(s) = -\sum_{p} \sum_{k=1}^{\infty} \frac{\ln(p)}{p^{ks}} = -\sum_{p} \ln(p) \sum_{k=1}^{\infty} \frac{1}{p^{ks}}

Integrating Both Sides: We integrate with respect to ss:

lnZ(s)=pln(p)k=11pksds\ln Z(s) = -\sum_{p} \ln(p) \int \sum_{k=1}^{\infty} \frac{1}{p^{ks}} \, ds

The inner integral:

1pksds=pksds=pkskln(p)=1kln(p)pks\int \frac{1}{p^{ks}} \, ds = \int p^{-ks} \, ds = \frac{p^{-ks}}{-k \ln(p)} = -\frac{1}{k \ln(p)} \cdot p^{-ks}

Therefore:

lnZ(s)=pln(p)k=1(1kln(p)pks)=pk=1pksk\ln Z(s) = -\sum_{p} \ln(p) \sum_{k=1}^{\infty} \left( -\frac{1}{k \ln(p)} \cdot p^{-ks} \right) = \sum_{p} \sum_{k=1}^{\infty} \frac{p^{-ks}}{k}

Recognizing the Series: We have the Taylor series:

ln(1x)=k=1xkkfor x<1-\ln(1-x) = \sum_{k=1}^{\infty} \frac{x^k}{k} \quad \text{for } |x| < 1

Setting x=psx = p^{-s}:

lnZ(s)=pk=1pksk=pln(1ps)=lnp11ps\ln Z(s) = \sum_{p} \sum_{k=1}^{\infty} \frac{p^{-ks}}{k} = -\sum_p \ln(1 - p^{-s}) = \ln \prod_p \frac{1}{1 - p^{-s}}

Solution - The Euler Product:

Z(s)=p prime11psZ(s) = \prod_{p \text{ prime}} \frac{1}{1 - p^{-s}}

This beautiful product formula over primes is called the Euler product, discovered by Leonhard Euler in 1737.

Connection to All Integers: By the fundamental theorem of arithmetic (unique prime factorization), when we expand this product, each integer appears exactly once:

Z(s)=p11ps=p(1+ps+p2s+p3s+)=n=11nsZ(s) = \prod_p \frac{1}{1-p^{-s}} = \prod_p (1 + p^{-s} + p^{-2s} + p^{-3s} + \cdots) = \sum_{n=1}^{\infty} \frac{1}{n^s}

Every integer nn has a unique factorization n=pikin = \prod p_i^{k_i}, contributing pikis=ns\prod p_i^{-k_i s} = n^{-s} to the sum.

Historical Note and Modern Name

What we just discovered—solving for the partition function from first principles—is the Riemann zeta function:

ζ(s)=n=11ns=p prime11ps\zeta(s) = \sum_{n=1}^{\infty} \frac{1}{n^s} = \prod_{p \text{ prime}} \frac{1}{1-p^{-s}}

Named after Bernhard Riemann, who in his famous 1859 paper extended this function to the complex plane and discovered its profound connection to prime distribution. Euler had studied ζ(s)\zeta(s) for real values in the 1730s, but Riemann’s insight was to treat ss as a complex variable and study the function’s zeros.

Our Discovery: The frequency spectrum of the information function Λ(n)\Lambda(n) is:

F(s)=ζ(s)ζ(s)=n=1Λ(n)nsF(s) = -\frac{\zeta'(s)}{\zeta(s)} = \sum_{n=1}^{\infty} \frac{\Lambda(n)}{n^s}

The zeros and poles of ζ(s)\zeta(s) encode the entire structure of prime distribution!

Next Step: The Inverse Transform

We’ve found the frequency spectrum F(s)=ζ(s)ζ(s)F(s) = -\frac{\zeta'(s)}{\zeta(s)}. Now we need to invert the transform to recover ψ(x)=nxΛ(n)\psi(x) = \sum_{n \leq x} \Lambda(n) in the original space. What is the formula for the inverse transform?


6. Inverse Transform: From Frequency Back to Signal

The Challenge: Recovering the Cumulative Sum

We have the frequency spectrum:

F(s)=ζ(s)ζ(s)=n=1Λ(n)nsF(s) = -\frac{\zeta'(s)}{\zeta(s)} = \sum_{n=1}^{\infty} \frac{\Lambda(n)}{n^s}

But we want the cumulative sum ψ(x)=nxΛ(n)\psi(x) = \sum_{n \leq x} \Lambda(n), not the infinite series. How do we extract only terms with nxn \leq x?

6.1 Deriving the Inverse Transform

Strategy from Complex Analysis: Consider the product:

F(s)xs=n=1Λ(n)xsns=n=1Λ(n)(xn)sF(s) \cdot x^s = \sum_{n=1}^{\infty} \Lambda(n) \cdot \frac{x^s}{n^s} = \sum_{n=1}^{\infty} \Lambda(n) \left(\frac{x}{n}\right)^s

Each term involves (x/n)s(x/n)^s. When n<xn < x, the ratio x/n>1x/n > 1. When n>xn > x, the ratio x/n<1x/n < 1. If we could somehow “pick out” terms where x/n>1x/n > 1, we’d extract exactly the terms with nxn \leq x.

The Heaviside Step Function in Complex Domain: There’s a beautiful identity from complex analysis, related to Cauchy’s integral formula:

12πicic+iyssds={1if y>10if y<1\frac{1}{2\pi i} \int_{c - i\infty}^{c + i\infty} \frac{y^s}{s} \, ds = \begin{cases} 1 & \text{if } y > 1 \\ 0 & \text{if } y < 1 \end{cases}

This integral along a vertical line in the complex plane acts as a Heaviside step function in yy. The parameter cc must be chosen so the vertical line Re(s)=c\text{Re}(s) = c lies in the region where the integral converges.

Applying to Our Problem: We apply this identity with y=x/ny = x/n:

12πicic+iF(s)xssds=12πicic+i(nΛ(n)(x/n)ss)ds\frac{1}{2\pi i} \int_{c - i\infty}^{c + i\infty} F(s) \cdot \frac{x^s}{s} \, ds = \frac{1}{2\pi i} \int_{c - i\infty}^{c + i\infty} \left(\sum_n \Lambda(n) \frac{(x/n)^s}{s}\right) ds

Assuming we can interchange sum and integral (valid for appropriate cc):

=nΛ(n)12πicic+i(x/n)ssds= \sum_n \Lambda(n) \cdot \frac{1}{2\pi i} \int_{c - i\infty}^{c + i\infty} \frac{(x/n)^s}{s} \, ds

By the step function identity:

  • When n<xn < x: the integral equals 1, contributing Λ(n)\Lambda(n)
  • When n>xn > x: the integral equals 0, contributing nothing

Result:

ψ(x)=nxΛ(n)=12πicic+iF(s)xssds\psi(x) = \sum_{n \leq x} \Lambda(n) = \frac{1}{2\pi i} \int_{c - i\infty}^{c + i\infty} F(s) \cdot \frac{x^s}{s} \, ds

This is the inverse transform we sought!

Historical Note and Modern Name

What we just derived is known as Perron’s formula, named after Oskar Perron who formalized it in 1908. It’s the discrete analogue of the inverse Mellin transform. Our derivation from Cauchy’s integral formula shows why this particular contour integral formula is natural—it’s precisely the tool needed to extract truncated sums from infinite series.

6.2 Applying to Our Specific Case

Substituting F(s)=ζ(s)ζ(s)F(s) = -\frac{\zeta'(s)}{\zeta(s)}:

ψ(x)=12πicic+i(ζ(s)ζ(s))xssds\psi(x) = \frac{1}{2\pi i} \int_{c - i\infty}^{c + i\infty} \left(-\frac{\zeta'(s)}{\zeta(s)}\right) \cdot \frac{x^s}{s} \, ds

This is a complex contour integral along the vertical line Re(s)=c\text{Re}(s) = c in the complex plane. To evaluate it, we’ll use the residue theorem from complex analysis.

6.3 Evaluating via Residue Theorem

Strategy: We shift the contour of integration to the left (decreasing real part), picking up residues at the poles of the integrand ζ(s)ζ(s)xss-\frac{\zeta'(s)}{\zeta(s)} \cdot \frac{x^s}{s}.

Poles of the Integrand:

  1. Simple pole at s=1s = 1: The Riemann zeta function has a simple pole at s=1s = 1 because the harmonic series 1n\sum \frac{1}{n} diverges. Near s=1s = 1: ζ(s)1s1\zeta(s) \approx \frac{1}{s-1} Therefore: ζ(s)ζ(s)1s1-\frac{\zeta'(s)}{\zeta(s)} \approx \frac{1}{s-1}

    The residue at s=1s = 1 of ζ(s)ζ(s)xss-\frac{\zeta'(s)}{\zeta(s)} \cdot \frac{x^s}{s} is: Ress=1=x11=x\text{Res}_{s=1} = \frac{x^1}{1} = x

  2. Poles at zeros of ζ(s)\zeta(s): Whenever ζ(ρ)=0\zeta(\rho) = 0, the function ζ(s)ζ(s)-\frac{\zeta'(s)}{\zeta(s)} has a pole. For a simple zero, the residue is 1-1: Ress=ρ=xρρ\text{Res}_{s=\rho} = -\frac{x^\rho}{\rho}

    There are infinitely many such zeros!

  3. Poles at zeros of ζ(s)\zeta(s) (continued): Through analytic continuation, ζ(s)\zeta(s) extends to the entire complex plane and has zeros at negative even integers: s=2,4,6,s = -2, -4, -6, \ldots (called “trivial zeros”). Each contributes a residue: Ress=2k=x2k2k=x2k2k\text{Res}_{s=-2k} = -\frac{x^{-2k}}{-2k} = \frac{x^{-2k}}{2k}

    For large xx, these terms x2kx^{-2k} decay rapidly (like 1x2,1x4,...\frac{1}{x^2}, \frac{1}{x^4}, ...) and contribute negligibly. Their sum can be written as: k=1x2k2k=12ln(1x2)\sum_{k=1}^{\infty} \frac{x^{-2k}}{2k} = -\frac{1}{2}\ln(1-x^{-2})

    This uses the Taylor series ln(1y)=k=1ykk\ln(1-y) = -\sum_{k=1}^{\infty} \frac{y^k}{k} with y=x2y = x^{-2}.

  4. Pole at s=0s = 0: The integrand has a pole at s=0s = 0 from two sources:

    • The factor 1s\frac{1}{s} in xss\frac{x^s}{s}
    • The behavior of ζ(s)\zeta(s) near s=0s = 0

    To evaluate this residue, we need to know values from the functional equation of ζ(s)\zeta(s):

    • ζ(0)=12\zeta(0) = -\frac{1}{2}
    • ζ(0)=12ln(2π)\zeta'(0) = -\frac{1}{2}\ln(2\pi)

    Near s=0s = 0, we have ζ(s)1212ln(2π)s+O(s2)\zeta(s) \approx -\frac{1}{2} - \frac{1}{2}\ln(2\pi) \cdot s + O(s^2)

    The residue calculation at s=0s=0 (which involves careful Laurent expansion) yields: Ress=0=ln(2π)\text{Res}_{s=0} = -\ln(2\pi)

    Note: The functional equation and these specific values come from deeper properties of ζ(s)\zeta(s) that we haven’t derived here. In a complete treatment, one would prove the functional equation using Poisson summation or theta function techniques. For our sketch, we state these known values.

Collecting All Residues: By the residue theorem, summing contributions from all poles:

ψ(x)=xpole at s=1ρxρρnon-trivial zerosln(2π)pole at s=012ln(1x2)trivial zeros\begin{align} \psi(x) &= \underbrace{x}_{\text{pole at } s=1} - \underbrace{\sum_{\rho} \frac{x^\rho}{\rho}}_{\text{non-trivial zeros}} \underbrace{- \ln(2\pi)}_{\text{pole at } s=0} \underbrace{- \frac{1}{2}\ln(1-x^{-2})}_{\text{trivial zeros}} \end{align}

where the sum runs over all non-trivial zeros ρ\rho of ζ(s)\zeta(s) (zeros in the “critical strip” 0<Re(s)<10 < \text{Re}(s) < 1).

For large xx, the last term becomes negligible since ln(1x2)x20\ln(1-x^{-2}) \approx -x^{-2} \approx 0.

This is a remarkable result! Let’s understand what it means.

Next Step: Interpreting the Formula

We’ve derived an explicit formula for ψ(x)\psi(x) involving the zeros of the zeta function. This formula separates the principal wave from oscillatory corrections—exactly what we set out to find. Let’s interpret each term and understand the wave decomposition.


7. The Explicit Formula: Wave Decomposition Revealed

The Complete Formula

Our derivation has produced:

ψ(x)=xρxρρln(2π)12ln(1x2)\psi(x) = x - \sum_{\rho} \frac{x^\rho}{\rho} - \ln(2\pi) - \frac{1}{2}\ln(1-x^{-2})

where ρ\rho ranges over all non-trivial zeros of the Riemann zeta function.

7.1 Interpreting Each Component

Principal Wave: xx

The term xx is the dominant component for large xx. This gives the asymptotic linear growth of ψ(x)\psi(x). Recall that ψ(x)pxln(p)\psi(x) \approx \sum_{p \leq x} \ln(p), and by the Prime Number Theorem, there are roughly xlnx\frac{x}{\ln x} primes up to xx, each contributing an average logarithm of ln(x/2)\approx \ln(x/2). This gives:

ψ(x)xlnxln(x/2)x\psi(x) \approx \frac{x}{\ln x} \cdot \ln(x/2) \approx x

This is our “DC component”—the smooth background trend.

Oscillatory Modes: ρxρρ-\sum_{\rho} \frac{x^\rho}{\rho}

This is the spectacular discovery! Each non-trivial zero ρ=β+iγ\rho = \beta + i\gamma of the zeta function contributes one oscillatory mode:

xρρ=xβ+iγβ+iγ=xβeiγlnxβ+iγ\frac{x^\rho}{\rho} = \frac{x^{\beta + i\gamma}}{\beta + i\gamma} = \frac{x^\beta \cdot e^{i\gamma \ln x}}{\beta + i\gamma}

Let’s unpack this:

  • Amplitude: xβx^\beta (grows with xx if β>0\beta > 0, decays if β<0\beta < 0)
  • Oscillation: eiγlnxe^{i\gamma \ln x} is a complex exponential
  • Frequency in log-space: In terms of t=ln(x)t = \ln(x), this is eiγte^{i\gamma t}, a pure wave with frequency γ2π\frac{\gamma}{2\pi}

Each zero contributes a wave! The imaginary part γ\gamma determines the frequency of oscillation, while the real part β\beta determines how the amplitude scales with xx.

Small Corrections: The terms ln(2π)-\ln(2\pi) and 12ln(1x2)-\frac{1}{2}\ln(1-x^{-2}) are small corrections that become negligible for large xx.

7.2 The Physical Picture: Resonant Frequencies

The formula reveals a profound physical picture. The staircase pattern of primes is created by an infinite superposition of waves, each corresponding to a zero of the zeta function:

ψ(x)=xsmooth trendρxρρinfinitely many waves+O(1)\psi(x) = \underbrace{x}_{\text{smooth trend}} - \underbrace{\sum_{\rho} \frac{x^\rho}{\rho}}_{\text{infinitely many waves}} + O(1)

In log-space t=ln(x)t = \ln(x), the oscillatory term becomes:

ρeβteiγtρ-\sum_{\rho} \frac{e^{\beta t} \cdot e^{i\gamma t}}{\rho}

These are damped oscillators (if β<1\beta < 1) with discrete frequencies γ1,γ2,γ3,\gamma_1, \gamma_2, \gamma_3, \ldots. The zeros of the zeta function are the resonant frequencies of the prime distribution!

The “unpredictability” of primes—the irregular spacing of the staircase steps—is encoded in this infinite sum. The waves conspire, through constructive and destructive interference, to create sharp jumps exactly at prime locations.

7.3 The Riemann Hypothesis: Spectral Uniformity

Empirical Observation: When mathematicians numerically compute the zeros ρ=β+iγ\rho = \beta + i\gamma, they find something astonishing: all computed non-trivial zeros lie exactly on the line β=12\beta = \frac{1}{2}.

The first few zeros are approximately:

  • ρ1=12+14.135i\rho_1 = \frac{1}{2} + 14.135i
  • ρ2=12+21.022i\rho_2 = \frac{1}{2} + 21.022i
  • ρ3=12+25.011i\rho_3 = \frac{1}{2} + 25.011i

Billions of zeros have been computed, and all lie on the critical line Re(s)=12\text{Re}(s) = \frac{1}{2}.

The Riemann Hypothesis: In his 1859 paper, Riemann conjectured that all non-trivial zeros have real part 12\frac{1}{2}. This is the famous Riemann Hypothesis, one of the most important unsolved problems in mathematics (and one of the Millennium Prize Problems with a $1 million prize).

Implication for Prime Distribution: If RH is true, all oscillatory terms have the same amplitude scaling:

xρρx1/2ρ\frac{x^\rho}{\rho} \sim \frac{x^{1/2}}{\rho}

This means:

ψ(x)=x+O(xlog2x)\psi(x) = x + O(\sqrt{x} \cdot \log^2 x)

Information-Theoretic Interpretation: The Riemann Hypothesis states that the prime distribution has maximal spectral uniformity. All wave components decay at the same optimal rate (x\sqrt{x}). This is a profound statement about symmetry and entropy: the primes are distributed as “randomly” as possible while still maintaining their deterministic structure. If RH is true, primes achieve a kind of optimal “pseudorandomness.”

Historical Note and Modern Name

The formula we derived:

ψ(x)=xρxρρ+(small terms)\psi(x) = x - \sum_{\rho} \frac{x^\rho}{\rho} + \text{(small terms)}

is Riemann’s explicit formula (specifically for the Chebyshev ψ\psi function), first stated in Riemann’s groundbreaking 1859 paper “On the Number of Primes Less Than a Given Magnitude.” Our derivation from information theory and Fourier analysis shows why this formula is natural—it’s simply the inverse transform of the frequency spectrum, evaluated via residues.

Next Step: Recovering π(x)\pi(x)

We set out to understand the prime counting function π(x)\pi(x), but we’ve derived a formula for ψ(x)\psi(x) instead. While ψ(x)\psi(x) was more natural for our analysis, we need to convert back to answer our original question. How do we recover π(x)\pi(x) from ψ(x)\psi(x)?


8. Recovering the Prime Counting Function π(x)\pi(x)

The Relationship Between π(x)\pi(x) and ψ(x)\psi(x)

Recall the definitions:

Prime counting function: π(x)=#{px:p prime}\pi(x) = \#\{p \leq x : p \text{ prime}\}

This counts primes with unit weight (each prime contributes 1).

Chebyshev function: ψ(x)=pkxln(p)\psi(x) = \sum_{p^k \leq x} \ln(p)

This weights each prime power pkp^k by ln(p)\ln(p) (the information weight).

8.1 Understanding the Conceptual Difference

The key difference between π(x)\pi(x) and ψ(x)\psi(x) is what they count:

π(x)\pi(x): Counts each prime once (unit weight)

  • Example: π(10)=4\pi(10) = 4 (counting 2, 3, 5, 7)

ψ(x)\psi(x): Weights each prime power pkp^k by ln(p)\ln(p) (information weight)

  • At n=2n=2: Λ(2)=ln(2)\Lambda(2) = \ln(2)
  • At n=3n=3: Λ(3)=ln(3)\Lambda(3) = \ln(3)
  • At n=4=22n=4 = 2^2: Λ(4)=ln(2)\Lambda(4) = \ln(2)
  • At n=5n=5: Λ(5)=ln(5)\Lambda(5) = \ln(5)
  • At n=7n=7: Λ(7)=ln(7)\Lambda(7) = \ln(7)
  • At n=8=23n=8 = 2^3: Λ(8)=ln(2)\Lambda(8) = \ln(2)
  • At n=9=32n=9 = 3^2: Λ(9)=ln(3)\Lambda(9) = \ln(3)

Therefore: ψ(10)=3ln(2)+2ln(3)+ln(5)+ln(7)7.32\psi(10) = 3\ln(2) + 2\ln(3) + \ln(5) + \ln(7) \approx 7.32

The challenge: How do we convert from ψ(x)\psi(x) (which includes all prime powers) back to π(x)\pi(x) (which counts only primes)?

8.2 Introducing the First Chebyshev Function

To bridge ψ(x)\psi(x) and π(x)\pi(x), we introduce an intermediate function that counts only first powers of primes:

θ(x)=pxln(p)\theta(x) = \sum_{p \leq x} \ln(p)

This function θ(x)\theta(x) is called the first Chebyshev function (while ψ(x)\psi(x) is the second).

Connection to π(x)\pi(x): The function θ(x)\theta(x) is directly related to π(x)\pi(x) through a Stieltjes integral:

θ(x)=2xln(t)dπ(t)\theta(x) = \int_2^x \ln(t) \, d\pi(t)

This integral accumulates ln(p)\ln(p) each time π\pi jumps by 1 at a prime pp.

Example: For x=10x = 10: θ(10)=ln(2)+ln(3)+ln(5)+ln(7)4.14\theta(10) = \ln(2) + \ln(3) + \ln(5) + \ln(7) \approx 4.14

Compared to ψ(10)7.32\psi(10) \approx 7.32, we see that θ\theta excludes the contributions from prime powers 4,8,94, 8, 9, which together contribute 2ln(2)+ln(3)3.182\ln(2) + \ln(3) \approx 3.18.

8.3 Relating ψ\psi to θ\theta via Prime Powers

The Forward Relationship: We can express ψ\psi as a sum of θ\theta evaluated at fractional powers:

ψ(x)=k=1θ(x1/k)\psi(x) = \sum_{k=1}^{\infty} \theta(x^{1/k})

Why this works: Recall that ψ(x)=pkxln(p)\psi(x) = \sum_{p^k \leq x} \ln(p). Reorganizing by the power kk:

ψ(x)=k=1px1/kln(p)=k=1θ(x1/k)\psi(x) = \sum_{k=1}^{\infty} \sum_{p \leq x^{1/k}} \ln(p) = \sum_{k=1}^{\infty} \theta(x^{1/k})
  • θ(x)\theta(x) contributes ln(p)\ln(p) for each prime pxp \leq x (k=1k=1 term)
  • θ(x1/2)\theta(x^{1/2}) contributes ln(p)\ln(p) for each prime pxp \leq \sqrt{x} (k=2k=2 term, accounting for p2p^2)
  • θ(x1/3)\theta(x^{1/3}) contributes ln(p)\ln(p) for each prime px3p \leq \sqrt[3]{x} (k=3k=3 term, accounting for p3p^3)
  • And so on…

Verification: For x=10x=10:

  • θ(10)=ln(2)+ln(3)+ln(5)+ln(7)\theta(10) = \ln(2) + \ln(3) + \ln(5) + \ln(7)
  • θ(101/2)=θ(3.16...)=ln(2)+ln(3)\theta(10^{1/2}) = \theta(3.16...) = \ln(2) + \ln(3) (primes 3.16\leq 3.16)
  • θ(101/3)=θ(2.15...)=ln(2)\theta(10^{1/3}) = \theta(2.15...) = \ln(2) (primes 2.15\leq 2.15)
  • θ(101/k)=0\theta(10^{1/k}) = 0 for k4k \geq 4

Summing: ψ(10)=3ln(2)+2ln(3)+ln(5)+ln(7)\psi(10) = 3\ln(2) + 2\ln(3) + \ln(5) + \ln(7)

8.4 Möbius Inversion: From ψ\psi to θ\theta

We have the forward relationship ψ(x)=k=1θ(x1/k)\psi(x) = \sum_{k=1}^{\infty} \theta(x^{1/k}). To recover π(x)\pi(x), we need to invert this: express θ\theta in terms of ψ\psi.

This is a classic application of Möbius inversion.

The Möbius Function: Define the Möbius function μ(n)\mu(n):

μ(n)={1if n=1(1)kif n=p1p2pk (product of k distinct primes)0if n has a squared prime factor\mu(n) = \begin{cases} 1 & \text{if } n = 1 \\ (-1)^k & \text{if } n = p_1 p_2 \cdots p_k \text{ (product of } k \text{ distinct primes)} \\ 0 & \text{if } n \text{ has a squared prime factor} \end{cases}

Examples: μ(1)=1\mu(1) = 1, μ(2)=1\mu(2) = -1, μ(3)=1\mu(3) = -1, μ(4)=0\mu(4) = 0 (since 4=224 = 2^2), μ(6)=1\mu(6) = 1 (since 6=2×36 = 2 \times 3).

Möbius Inversion Formula: If g(x)=k=1f(x1/k)g(x) = \sum_{k=1}^{\infty} f(x^{1/k}), then:

f(x)=k=1μ(k)g(x1/k)f(x) = \sum_{k=1}^{\infty} \mu(k) g(x^{1/k})

Applying to our case: Since ψ(x)=k=1θ(x1/k)\psi(x) = \sum_{k=1}^{\infty} \theta(x^{1/k}), we have:

θ(x)=k=1μ(k)ψ(x1/k)\theta(x) = \sum_{k=1}^{\infty} \mu(k) \psi(x^{1/k})

This is the inverse relationship we need!

Physical interpretation: The Möbius function acts as an “inclusion-exclusion” operator, removing the overcounting from higher prime powers. The alternating signs (1)k(-1)^k implement this correction.

8.5 From θ\theta to π\pi via Integration by Parts

Now we relate θ(x)\theta(x) to π(x)\pi(x) using integration by parts on the Stieltjes integral:

θ(x)=2xln(t)dπ(t)\theta(x) = \int_2^x \ln(t) \, d\pi(t)

Integration by parts formula: udv=uvvdu\int u \, dv = uv - \int v \, du

Let u=ln(t)u = \ln(t) and dv=dπ(t)dv = d\pi(t). Then du=dttdu = \frac{dt}{t} and v=π(t)v = \pi(t):

θ(x)=[ln(t)π(t)]2x2xπ(t)dtt\theta(x) = \left[\ln(t) \cdot \pi(t)\right]_2^x - \int_2^x \pi(t) \cdot \frac{dt}{t} =ln(x)π(x)ln(2)π(2)2xπ(t)tdt= \ln(x) \cdot \pi(x) - \ln(2) \cdot \pi(2) - \int_2^x \frac{\pi(t)}{t} \, dt

Since π(2)=1\pi(2) = 1:

π(x)=θ(x)ln(x)+1ln(x)2xπ(t)tdt+ln(2)ln(x)\pi(x) = \frac{\theta(x)}{\ln(x)} + \frac{1}{\ln(x)} \int_2^x \frac{\pi(t)}{t} \, dt + \frac{\ln(2)}{\ln(x)}

The integral term can be handled by substituting π(t)Li(t)\pi(t) \sim \text{Li}(t) (from our explicit formula), or by further integration by parts. The result is:

π(x)=θ(x)ln(x)+2xθ(t)tln2(t)dt+O(1ln2x)\pi(x) = \frac{\theta(x)}{\ln(x)} + \int_2^x \frac{\theta(t)}{t \ln^2(t)} \, dt + O\left(\frac{1}{\ln^2 x}\right)

8.6 Combining Everything: The Final Formula

We now have all the pieces to express π(x)\pi(x) in terms of the explicit formula for ψ(x)\psi(x). Let’s combine them systematically.

Step 1: Start with the explicit formula for ψ(x)\psi(x) (from Section 7):

ψ(x)=xρxρρln(2π)+O(x2)\psi(x) = x - \sum_{\rho} \frac{x^\rho}{\rho} - \ln(2\pi) + O(x^{-2})

Step 2: Apply Möbius inversion to get θ(x)\theta(x) (from Section 8.4):

θ(x)=n=1μ(n)ψ(x1/n)\theta(x) = \sum_{n=1}^{\infty} \mu(n) \psi(x^{1/n})

Substituting the explicit formula:

θ(x)=n=1μ(n)[x1/nρxρ/nρln(2π)+O(x2/n)]\theta(x) = \sum_{n=1}^{\infty} \mu(n) \left[x^{1/n} - \sum_{\rho} \frac{x^{\rho/n}}{\rho} - \ln(2\pi) + O(x^{-2/n})\right]

The dominant term is:

n=1μ(n)x1/n=xx1/2x1/3x1/5+x1/6+\sum_{n=1}^{\infty} \mu(n) x^{1/n} = x - x^{1/2} - x^{1/3} - x^{1/5} + x^{1/6} + \cdots

(The signs come from μ(n)\mu(n), and only squarefree integers contribute.)

For large xx, the x1/nx^{1/n} terms for n2n \geq 2 are much smaller than xx, so θ(x)x+O(x)\theta(x) \approx x + O(\sqrt{x}).

Step 3: Convert θ(x)\theta(x) to π(x)\pi(x) (from Section 8.5):

π(x)=θ(x)ln(x)+2xθ(t)tln2(t)dt\pi(x) = \frac{\theta(x)}{\ln(x)} + \int_2^x \frac{\theta(t)}{t \ln^2(t)} \, dt

Step 4: Substitute θ\theta in terms of ψ\psi:

The calculation is involved, but the key observation is that each term ψ(x1/n)\psi(x^{1/n}) in the Möbius inversion contributes to π(x)\pi(x) through the integral transform. The result, after combining all terms, is:

π(x)=n=1μ(n)n[ψ(x1/n)ln(x1/n)+2x1/nψ(t)tln2(t)dt]\pi(x) = \sum_{n=1}^{\infty} \frac{\mu(n)}{n} \left[\frac{\psi(x^{1/n})}{\ln(x^{1/n})} + \int_2^{x^{1/n}} \frac{\psi(t)}{t \ln^2(t)} \, dt\right]

Step 5: Express using the Logarithmic Integral:

The combination ylny+2yttln2tdt\frac{y}{\ln y} + \int_2^y \frac{t}{t\ln^2 t} dt can be expressed more elegantly. Define the logarithmic integral:

Li(y)=2ydtlnt\text{Li}(y) = \int_2^y \frac{dt}{\ln t}

Through integration by parts and careful analysis, the relationship becomes:

π(x)=n=1μ(n)nLi(x1/n)\pi(x) = \sum_{n=1}^{\infty} \frac{\mu(n)}{n} \text{Li}(x^{1/n})

Step 6: Substitute the explicit formula ψ(x)=xρxρρ+O(1)\psi(x) = x - \sum_{\rho} \frac{x^\rho}{\rho} + O(1):

When we substitute and collect terms:

π(x)=Li(x)ρLi(xρ)ln(2)+O(xln2x)\pi(x) = \text{Li}(x) - \sum_{\rho} \text{Li}(x^\rho) - \ln(2) + O\left(\frac{x}{\ln^2 x}\right)

where the sum runs over all non-trivial zeros ρ\rho of ζ(s)\zeta(s).

Understanding the Terms:

  • Li(x)\text{Li}(x): The principal term (Gauss’s empirical observation!)
  • ρLi(xρ)-\sum_{\rho} \text{Li}(x^\rho): Oscillatory corrections, one for each zeta zero
  • ln(2)-\ln(2): Small constant correction
  • Error term: Negligible for large xx

Historical Note and Modern Name

This is Riemann’s explicit formula for π(x)\pi(x), the culminating result of Riemann’s 1859 paper. It gives an exact expression for the number of primes up to xx in terms of:

  1. The logarithmic integral Li(x)\text{Li}(x) (the principal wave)
  2. Corrections from each zero of the zeta function (the oscillatory modes)
  3. Möbius-weighted corrections for prime powers

Gauss’s Observation Recovered: The principal term Li(x)\text{Li}(x) is exactly what Gauss empirically conjectured in the late 18th century! Gauss noticed that Li(x)\text{Li}(x) approximates π(x)\pi(x) far better than xlnx\frac{x}{\ln x}. Riemann’s formula proves this and adds the precise oscillatory corrections.

The Möbius inversion π(x)=n=1μ(n)nLi(x1/n)\pi(x) = \sum_{n=1}^{\infty} \frac{\mu(n)}{n} \text{Li}(x^{1/n}) is particularly elegant: it shows how counting primes (with multiplicity 1) requires carefully subtracting the overcounting from prime powers using the inclusion-exclusion principle encoded in μ(n)\mu(n).

8.7 The Complete Picture

We now have the full wave decomposition of the prime counting function:

π(x)=Li(x)smooth trend(Gauss’s conjecture)ρLi(xρ)oscillatory corrections(from zeta zeros)+O(1)\pi(x) = \underbrace{\text{Li}(x)}_{\substack{\text{smooth trend} \\ \text{(Gauss's conjecture)}}} - \underbrace{\sum_{\rho} \text{Li}(x^\rho)}_{\substack{\text{oscillatory corrections} \\ \text{(from zeta zeros)}}} + O(1)

The staircase of primes emerges from:

  1. A smooth principal wave (the logarithmic integral)
  2. An infinite superposition of oscillatory modes, one for each zero of the Riemann zeta function

The zeros “conspire” to create the irregular jumps at exactly the right locations—the prime numbers.

Next Step: Reviewing the Complete Chain

We’ve completed the derivation from empirical observations to the explicit formula. Let’s step back and review the entire logical chain, understanding how each piece fits together.


9. Summary: The Complete Derivation Chain

What We Explored from First Principles

Starting from Gauss’s observation of a staircase pattern in prime tables, we attempted to reconstruct the theory through a logical chain of reasoning. Each “rediscovery” below acknowledges that these tools were actually developed by brilliant mathematicians over centuries:

Section 1: Empirical Foundation

  • Observation: π(x)\pi(x) exhibits a staircase pattern
  • Gauss’s conjecture: π(x)xlnx\pi(x) \sim \frac{x}{\ln x} (smooth trend)
  • Strategy: Decompose into principal wave + oscillatory modes (Fourier perspective)

Sections 2-3: Information Theory - Finding the Right Measure

  • Exploration: Prime density appears scale-invariant, not translation-invariant
  • Maximum Entropy derivation: Natural measure is dt=dxxdt = \frac{dx}{x} → work in log-space t=ln(x)t = \ln(x)
  • Information content: Finding prime pp has information value I(p)=ln(p)I(p) = \ln(p)
  • Rediscovered: Von Mangoldt function Λ(n)=ln(p)\Lambda(n) = \ln(p) for n=pkn = p^k

Section 4: Cumulative Information

  • Definition: ψ(x)=nxΛ(n)\psi(x) = \sum_{n \leq x} \Lambda(n) = total information up to xx
  • Properties: Jumps by ln(p)\ln(p) at each prime power, cleaner analytically than π(x)\pi(x)
  • Rediscovered: Chebyshev ψ\psi function

Section 5: Spectral Analysis - Forward Transform

  • Discrete Fourier transform: F(s)=Λ(n)nsF(s) = \sum \frac{\Lambda(n)}{n^s} (Dirichlet series)
  • Statistical mechanics analogy: F(s)F(s) is average information, related to partition function Z(s)Z(s)
  • Solving differential equation: ZZ=F(s)-\frac{Z'}{Z} = F(s)Z(s)=11psZ(s) = \prod \frac{1}{1-p^{-s}}
  • Rediscovered: Riemann zeta function ζ(s)=1ns\zeta(s) = \sum \frac{1}{n^s} and Euler product
  • Result: Frequency spectrum is F(s)=ζ(s)ζ(s)F(s) = -\frac{\zeta'(s)}{\zeta(s)}

Section 6: Inverse Transform

  • Goal: Recover ψ(x)=nxΛ(n)\psi(x) = \sum_{n \leq x} \Lambda(n) from spectrum
  • Derivation from Cauchy’s formula: Heaviside step function in complex plane
  • Rediscovered: Perron’s formula
  • Evaluation via residue theorem:
    • Pole at s=1s=1: contributes principal term xx
    • Zeros ρ\rho of ζ(s)\zeta(s): each contributes oscillatory term xρρ-\frac{x^\rho}{\rho}

Section 7: The Explicit Formula

  • Result: ψ(x)=xρxρρ+O(1)\psi(x) = x - \sum_{\rho} \frac{x^\rho}{\rho} + O(1)
  • Interpretation: Each zero ρ=β+iγ\rho = \beta + i\gamma is a resonant frequency creating one wave xβeiγlnxx^\beta e^{i\gamma \ln x}
  • Rediscovered: Riemann’s explicit formula for ψ(x)\psi(x)
  • Riemann Hypothesis: All zeros on line β=1/2\beta = 1/2 → maximal spectral uniformity

Section 8: Recovery of π(x)\pi(x)

  • Introduced θ(x)\theta(x): First Chebyshev function θ(x)=pxln(p)=2xln(t)dπ(t)\theta(x) = \sum_{p \leq x} \ln(p) = \int_2^x \ln(t) \, d\pi(t)
  • Prime power relationship: ψ(x)=k=1θ(x1/k)\psi(x) = \sum_{k=1}^{\infty} \theta(x^{1/k}) (accounting for all powers)
  • Möbius inversion: θ(x)=k=1μ(k)ψ(x1/k)\theta(x) = \sum_{k=1}^{\infty} \mu(k) \psi(x^{1/k}) (removing overcounting)
  • Rediscovered: Möbius function and Möbius inversion
  • Integration by parts: π(x)=θ(x)lnx+θ(t)tln2tdt\pi(x) = \frac{\theta(x)}{\ln x} + \int \frac{\theta(t)}{t \ln^2 t} dt
  • Combined formula: π(x)=n=1μ(n)nLi(x1/n)\pi(x) = \sum_{n=1}^{\infty} \frac{\mu(n)}{n} \text{Li}(x^{1/n})
  • Final result: π(x)=Li(x)ρLi(xρ)+O(1)\pi(x) = \text{Li}(x) - \sum_{\rho} \text{Li}(x^\rho) + O(1)
  • Rediscovered: Riemann’s explicit formula for π(x)\pi(x)
  • Recovered: Gauss’s empirical observation Li(x)\text{Li}(x) as the principal term

Key Insights

Mathematical:

  • The “unpredictability” of primes is encoded as an infinite superposition of waves
  • The zeros of ζ(s)\zeta(s) are the resonant frequencies needed to build the staircase
  • Riemann Hypothesis = all waves have the same amplitude scaling (x\sqrt{x})

Physical:

  • Each zero is like a harmonic oscillator with frequency γ\gamma and damping β\beta
  • The zeros “conspire” through interference to create sharp jumps at prime locations
  • Prime distribution is a spectral phenomenon

Information-Theoretic:

  • Information content of prime pp is ln(p)\ln(p) (natural measure in log-space)
  • ψ(x)\psi(x) represents cumulative information encoded in integers
  • RH states primes achieve optimal “pseudorandomness”—maximal entropy consistent with deterministic structure

Standing on the Shoulders of Giants

Every major function we explored—von Mangoldt, Chebyshev, Riemann zeta, Perron formula, Möbius function—represents centuries of mathematical genius. In our reconstruction, these tools appeared to emerge “naturally” from our framework:

  1. Information Theory (Maximum Entropy) suggested weights and measures
  2. Empirical observation (Gauss’s tables) provided the principal wave
  3. Fourier Analysis offered the spectral decomposition
  4. Inclusion-Exclusion principle (through Möbius inversion) handled prime powers

But we must remember: the actual historical development was far more complex, involving countless dead ends, breakthroughs, and insights we’ve glossed over. Our “natural discovery” is only possible because we already know where we’re going. The original mathematicians—Euler, Gauss, Riemann, Chebyshev, Möbius, and others—deserve full credit for these profound discoveries.


10. Notes on Rigor and Methodology

Philosophy of This Derivation

This document is an exploratory sketch, not a rigorous proof. We emphasize discovery and intuition over formal rigor. Our goal has been to explore: “If 18th-century mathematicians had known information theory and Fourier analysis, how might they have approached Riemann’s formula?”

Important caveat: This reconstruction is inevitably colored by hindsight. We may have glossed over subtleties, made unjustified leaps, or oversimplified complex arguments. The actual mathematics requires much more care than presented here. If you find errors or gaps, please consult rigorous texts on analytic number theory.

What We Intentionally Omitted

For brevity and clarity, we’ve omitted several technical details that a fully rigorous treatment would require:

Convergence and Analytic Properties:

  • Convergence conditions for Dirichlet series (requires Re(s)>1\text{Re}(s) > 1 for absolute convergence)
  • Analytic continuation of ζ(s)\zeta(s) to the entire complex plane (requires functional equation)
  • Precise growth bounds for error terms
  • Justification of term-by-term integration and interchange of limits

Complex Analysis:

  • Rigorous contour deformation argument (shifting the integration line)
  • Behavior of ζ(s)\zeta(s) near s=1s=1 (requires Laurent expansion)
  • Proof that the shifted contour contributes negligibly
  • Detailed residue calculations

Number-Theoretic Subtleties:

  • Smoothing at discontinuities (ψ(x)\psi(x) vs ψ0(x)\psi_0(x)—the averaged version)
  • Prime powers vs primes (higher powers contribute O(x)O(\sqrt{x}))
  • Error bounds in the explicit formula

Functional Equation:

  • The functional equation ζ(s)=2sπs1sin(πs2)Γ(1s)ζ(1s)\zeta(s) = 2^s \pi^{s-1} \sin(\frac{\pi s}{2}) \Gamma(1-s) \zeta(1-s)
  • Location of trivial zeros (at negative even integers)
  • Symmetry of non-trivial zeros about Re(s)=1/2\text{Re}(s) = 1/2

What We Emphasized

Complete Logical Chain:

  • Every mathematical object was derived, not assumed
  • Each step flows naturally from the previous
  • The “next step” transitions make the logic explicit

First Principles Approach:

  • Information Theory (Maximum Entropy) for measures and weights
  • Fourier Analysis for spectral decomposition
  • Complex Analysis for inversion (Cauchy’s formula, residue theorem)

Rediscovery Narrative:

  • Pose problems naturally
  • Derive solutions from first principles
  • Reveal modern names afterward

Physical and Information-Theoretic Intuition:

  • Waves and frequencies (spectral picture)
  • Information content and entropy (MaxEnt picture)
  • Statistical mechanics analogies (partition functions)

How This Differs from Standard Treatments

Standard Rigorous Approach:

  1. Define ζ(s)\zeta(s) and carefully prove its properties
  2. Establish functional equation and analytic continuation with full rigor
  3. Apply to prime counting with precise error bounds

Our Exploratory Approach:

  1. Study primes directly (empirical observation)
  2. Explore how ζ(s)\zeta(s) might emerge (partition function analogy)
  3. Attempt to understand what it tells us about primes

The standard approach is more rigorous, more complete, and mathematically correct. Our approach sacrifices rigor for intuition, attempting to show one possible path of understanding. We cannot claim our approach “reveals why these tools exist”—that’s overstating it. Rather, we hope it provides one possible intuitive framework among many.

For readers who want the full rigorous treatment:

  1. Prime Number Theorem - Complete proof and history
  2. Riemann’s 1859 paper - The original (translated)
  3. Analytic Number Theory - General field overview
  4. Edwards, Riemann’s Zeta Function - Comprehensive modern treatment
  5. Davenport, Multiplicative Number Theory - Standard graduate textbook

Final Thought

Mathematics, at its best, reveals deep truths that were always there, waiting to be understood. The explicit formula for primes was implicit in the structure of arithmetic—but it took Riemann’s extraordinary insight (and decades of prior work by Euler, Gauss, and others) to see it.

Our exploration attempted to show one possible path through this beautiful mathematics, using information theory and Fourier analysis as guides. But we stand in awe of the original discoverers who found these truths without hindsight, without modern tools, through pure mathematical intuition and relentless dedication.

The connection between the zeros of the zeta function and the distribution of primes—between an analytic function in the complex plane and the multiplicative structure of integers—remains one of the most profound and mysterious phenomena in mathematics. We hope this sketch, despite its inevitable shortcomings, helps readers appreciate the beauty of this connection.


End of Sketch

Related Posts

There are no related posts yet. 😢