Demystifying the QCD Vacuum – Part 1 – The Standard Story

After being confused for several weeks about various aspects of the QCD vacuum, I now finally feel confident to write down what I understand.

The topic itself isn’t complicated. However, a big obstacle is that there are too many contradictory “explanations” out there. In addition, many steps that are far from obvious are usually treated in like two lines. I’m not going to flame against such confusing attempts to explain the QCD vacuum. Instead, I want to tell a (hopefully) consistent story that illuminates many of the otherwise highly confusing aspects.

The QCD vacuum is currently (again) a big thing. It was discussed extensively in the 80s and it is now again popular because there are lots of people working on axion physics. The axion mechanism is an attempt to explain what we know so far experimentally about the QCD vacuum. A careful analysis of the structure of the QCD vacuum implies that QCD, generally, violates CP symmetry. So far, no such violation was measured. This is a problem and the axion mechanism is one possibility to explain why this is the case.

However, before thinking about possible solutions, it makes sense to spend some time to understand the problem.

Usually, when we think about the vacuum, we don’t think that there is a lot to talk about. Instead, we have something quite boring in mind. Empty space. Quantum fields doing nothing, because they are, by definition, not excited.

However, it turns out that this naive picture is completely wrong. Especially the vacuum state of the quantum theory of strong interactions (QCD) has an incredibly rich structure and there are lots of things that are happening.

In fact, there is so much going on, that the vacuum isn’t fully understood yet. The main reason is, of course, that, so far, we always have to use approximations in quantum field theory. Usually, we use perturbation theory as the approximation method of choice, but it turns out that this is not the correct tool to describe the vacuum state of QCD.

The reason for this is that there are (infinitely) many states with the minimal amount of energy, ground states, and the QCD fields can change from one ground state into another. When we study these multitudes of ground states in detail, we find that they do not lie “next to each other” (not meant in a spatial sense) but instead there are potential barriers between them. The definition of a ground state is that the fields which are in this configuration, have the minimal amount of energy and thus certainly not enough to jump across the potential barriers. Therefore, the change from one ground state into another is not a trivial process. Instead, the fields must tunnel through these potential barriers. Tunneling is a well-known process is quantum mechanics. However, it can not be described in perturbation theory. In perturbation theory, we consider small perturbations of our fields around one ground state. Thus, we never notice any effects of the other ground states that exist behind the potential barriers. We will see below how this picture of infinitely many ground states with potential barriers between them emerges in practice.

The correct tool to describe tunneling processes in quantum field theory is to use a semiclassical approximation. At a first glance, this certainly seems contradictory. There is no tunneling in a classical theory, so why should a semiclassical approximation help to describe tunneling processes in quantum field theory? The trick that is used to make the semiclassical approximation work is to substitute $t\to i \tau$, i.e. to make the time imaginary. At first, this looks completely crazy. However, there are good reasons to do this, because it is this trick that allows us to use a semiclassical approximation to describe tunneling processes. Possibly the easiest way to see why this makes sense is to recall the standard quantum mechanical problem of an electron facing a potential barrier. Before the potential barrier, we have an ordinary oscillating wave function $e^{i\omega t}$. But inside the potential barrier, we find a solution proportional to $e^{-\omega t}$. Physically, this means that the probability to find the electron inside the potential barrier decreases exponentially. By comparing the tunneling wave function with the ordinary wavefunction, we can see that the difference is precisely described by the substitution: $t\to i \tau$. In addition, we will see below that the effect of $t\to i \tau$ is basically to flip the potential upside down. Therefore, the potential barrier becomes a valley and there is a classical path across this valley. The technical term for such a tunneling process is instanton.

If this didn’t convince you, here is a different perspective: In the path integral approach to quantum mechanics, we need to sum the action for all possible paths a particle can travel between two fixed points. The same is true in quantum field theory, but there we must sum over all possible field configurations between two fixed field configurations. Usually, we cannot compute this sum exactly and must approximate it instead. One idea to approximate the sum is to find the dominant contributions. The dominant contributions to the sum come from the extremal points of the action and these extremal points correspond exactly to the classical paths. For our tunneling processes, the key observation is that there are, of course, no classical paths that describe the tunneling. Thus, without a clever idea, we don’t know how to approximate the sum. The clever idea is, as already mentioned above, to substitute $t\to i \tau$. After this substitution, we can identify the dominant contributions to the path integral sum, because now there are classical paths. At the end of our calculation, after we’ve identified the dominant contributions, we can change back again to real-time $ i \tau \to t$. This is another way to see that a semiclassical approximation can make sense in a quantum field theory.

Now, after this colloquial summary, we need to fill in the gaps and show how all this actually works in practice. We start by discussing how the QCD vacuum picture with infinitely many ground states, separated by potential barriers, comes about. Afterward, we discuss how there can be tunneling between these ground states and then we write down the actual ground state of the theory. This real ground state is a superposition of the infinite number of ground states. The final picture of the QCD vacuum state will be completely analogous to the wave function of an electron in a periodic potential. In nature, such a situation is realized, for example, in a semiconductor. The electron does not sit at one of the many minimums, but is instead in a state of superposition, because it can tunnel from one minimum of the potential to another. The correct wave function for the electron in this situation is known as a Bloch wave. We find energy bands that are separated by gaps. The bands are characterized by a parameter $\theta$, which corresponds to the phase that the electron picks up when it tunnels from one minimum to another. Analogously, the real QCD ground state will be written as a Bloch wave and is equally characterized by a phase $\theta$. This phase is a new fundamental constant of nature and can be measured in experiments. However, so far, no experiment was able to measure $\theta$ for the QCD vacuum and we only know that it is incredibly small. In theory, the measurement is possible, because $\theta$ tells us to what extent strong interactions respect CP symmetry. The surprising smallness of $\theta$ is known as the strong CP problem. QCD alone says nothing about the value of $\theta$ and therefore it could be any number.

The QCD Vacuum Structure

The vacuum of a theory is defined as the state with the minimal amount of energy. In a non-abelian gauge theory, this minimal amount of energy can be defined to be zero and corresponds, for example, to the gauge potential configuration

$$ G_\mu = 0 . $$

However, this is not the only potential configuration with zero energy. Every gauge transformation of $0$ is also a state with minimal energy. The gauge potential transforms under gauge transformations $U$ as:

\begin{equation}
G_{\mu} \to U G_{\mu} U^\dagger -\frac{i}{g}U\partial_{\mu}U^{\dagger} .
\end{equation}

Putting $G_\mu = 0$ into this formula yields all configurations of the gauge potential with zero energy, i.e. all vacuum configurations:

\begin{equation}
G_{\mu}^{\left( pg\right) }=\frac{-i}{g}U\partial_{\mu}U^{\dagger}
\label{pureg}%
\end{equation}

Such configurations are called pure gauge.

This observation means that we have infinitely many possible field configurations with the minimal amount of energy. Each of these is a “classical” vacuum state of the theory. This may not seem very interesting because all these states are connected by a gauge transformation. Thus aren’t all these “classical” vacua equivalent? Isn’t there just one vacuum state that we can write in many complicated ways by using gauge transformations?

Well… to understand if things are really that simple, we need to talk about gauge transformations. Each “classical” vacuum of the theory corresponds to a specific gauge transformation $U$ via the formula

\begin{equation}
G_{\mu}^{\left( pg\right) }=\frac{-i}{g}U\partial_{\mu}U^{\dagger} .
\end{equation}

Now, the standard way to investigate the situation further is to mention the following two things as casually as possible

1.) We work in the temporal gauge $A_0 = 0$.
2.) We assume that it is sufficient to only consider those gauge transformations that become trivial at infinity $U(x) \to 1$ for $|x| \to \infty $.

Most textbooks and reviews offer at most one sentence to explain why we do these things. In fact, most authors act like assumptions are trivial and obvious or not important at all. As soon as these “nasty” technicalities are out of the way, we can start discussing the beautiful picture of the QCD vacuum that emerges under these assumptions. However, things aren’t really that simple. We will discuss the two assumptions in my second post about the QCD vacuum. Here I just note that they are not obvious choices and you need a very special perspective if you want to understand these choices.

For now, we simply summarize what we can say about our gauge transformations under these assumptions.

So, now back to the vacuum. We wanted to talk about gauge transformations, to understand if really all “classical” vacua are trivially equivalent.

We will see in a moment that the subset of all gauge transformations that fulfill the extra condition $U(x) \to 1$ for $|x| \to \infty $, fall into distinct subsets that can’t be smoothly transformed into each other. The interpretation of this observation is that these distinct subsets correspond via the formula $G_{\mu}^{\left( pg\right) }=\frac{-i}{g}U\partial_{\mu}U^{\dagger}$ to distinct vacua. In addition, when we investigate how a change from one such distinct vacuum configuration into another can happen, we notice that this is only possible if the field leaves the pure gauge configuration for a short amount of time. This is interpreted as a potential barrier between the distinct vacua.

How does this picture emerge? For simplicity, we consider $SU(2)$ instead of as $SU(3)$ as a gauge group, because the results are exactly the same.

“Actually it is sufficient to consider the gauge group $SU(2)$ since a general theorem states that for a Lie group containing $SU(2)$ as a subgroup the instantons are those of the $SU(2)$ subgroup.”

(page 863 in Quantum Field Theory and Critical Phenomena by Zinn-Justin)

Any element of $SU(2)$ can be written as

$$ U(x) = e^{i f(x) \vec{r} \vec{\sigma} },$$

where $\vec{\sigma}=(\sigma_1,\sigma_2,\sigma_3)$ are the usual Pauli matrices and $ \vec{r} $ is a unit vector. The condition $U(x) \to 1$ for $|x| \to \infty $ therefore means $f(x) \to 2\pi n$ for $|x| \to \infty $, where $n$ is an arbitrary integer, because we can write the matrix exponential as

$$e^{i f(x) \vec{r} \vec{\sigma}} = \cos(f(x)) + i \vec{r} \vec{\sigma} \sin( f(x) ) .$$

( $\sin( 2\pi n ) = 0 $ and $\cos(2\pi n ) =1 $ for an arbitrary integer $n$.)

The number $n$ that appears in the limit of the function $f(x)$ as we go to infinity, is called the winding number. (To confuse people there exist several other names: Topological charge, Pontryagin index, second Chern class number, …)

Before we discuss why this name makes sense, we need to talk about why we are interested in this number. To understand this, take note that we can’t transform a gauge potential configuration that corresponds to a gauge transformation with winding number $1$ (i.e. where the function $f(x)$ in the exponential approaches $2 \pi$ as we go to $|x| \to \infty$) into a gauge potential configuration that corresponds to a gauge transformation with a different winding number. In this sense, the corresponding vacuum configurations are distinct.

Similar sentences appears in all books and reviews and confused me a lot. An explicit example of a gauge transformation with winding number $1$ is

\begin{equation}
U^{\left( 1\right) }\left( \vec{x}\right) =\exp\left( \frac{i\pi
x^{a}\tau^{a}}{\sqrt{x^{2}+c^{2}}}\right)
\end{equation}

and a trivial example of a gauge transformation with winding number $0$ is

\begin{equation}
U^{\left( 0\right) }\left( \vec{x}\right) =1 .
\end{equation}

I can define

$$U^\lambda(\vec x) = \exp\left( \lambda \frac{i\pi
x^{a}\tau^{a}}{\sqrt{x^{2}+c^{2}}}\right) $$

and certainly

$$ U^{\lambda=0}(\vec x) = I $$
$$ U^{\lambda=1}(\vec x) = U^{\left( 1\right) }\left( \vec{x}\right) $$

Thus I have found a smooth map that transforms $U^{\left( 1\right) }\left( \vec{x}\right)$ into $U^{0}(\vec x)$.

The thing is that we restricted ourselves to only those gauge transformations that satisfy $U(x) \to 1$ for $|x| \to \infty $. For an arbitrary $\lambda$ this is certainly not the case. Thus, the correct statement is that we can’t transform $U^{0}(\vec x)$ to $U^{\left( 1\right) }\left( \vec{x}\right)$ without leaving the subset of gauge transformations that yield $U(x) \to 1$ for $|x| \to \infty $ . To transform $U^{\left( 1\right) }\left( \vec{x}\right)$ smoothly into $U^{0}(\vec x)$ requires gauge transformations that do not approach the identity transformation at infinity. (Smoothly means that we can invent a map which is smooth in some parameter $\lambda$ (as I did above in my definition of $U^\lambda(\vec x)$) that yields $U^{0}(\vec x)$ for $\lambda =0 $ and $U^{1}(\vec x)$ for $\lambda =1 $.)

Maybe a different perspective helps to understand this important point a little better. As mentioned above, a gauge transformation always involves the generator $G_a$ and some function $f(x)$ and can be written as $U(x)=e^{i f(x) \vec{r} \vec{\sigma}}$, where $\vec{r}$ is some unit vector. The generators are just matrices and therefore the restriction $U(x)  \to 1 $ for $|x| \to \infty$ translated directly to $f(x) \to 2 \pi n$ for $|x| \to \infty$. The crucial thing is now that only these discrete endpoints are allowed for the functions that appear in the exponent of our gauge functions that satisfy $U(x)  \to 1 $ for $|x| \to \infty$. If you now imagine some arbitrary function $f(x)$ that goes to $0$ and another function $g(x)$ that goes to $2 \pi$ at spatial infinity, it becomes clear that you can’t smoothly deform $f(x)$ into $g(x)$, while keeping the endpoint fixed at one of the allowed values! The crucial thing is really that we require that endpoint at spatial infinity of the functions that appear in the exponential are restricted to the values $2 \pi n$.

Maybe an (admittedly ugly) picture helps to bring this point home:

 

To summarize: by restricting ourselves to a subset of gauge transformations that approach $1$ at infinity, we’re able to classify the gauge transformations according to the number which the function in the exponent approaches. This number is called the winding number and gauge transformations with a different winding number cannot be smoothly transformed into each other without leaving our subset of gauge transformations.

So far, all we found is a method to label our gauge transformations. But what does this means for our classical vacua?

We can see explicitly that two vacuum configurations that correspond to gauge transformations with a different winding number are separated by a potential barrier. This observation will mean that our infinitely many vacuum states do not lie next to each other (not meant in a spatial sense). Instead, there is a potential barrier between them.

(Afterwards, we will talk about the so-far a bit unmotivated name “winding number”.)

Origin of the Potential Barrier between Vacua

So we start with a gauge potential $A_i^{(1)}(x)$ that is generated by a gauge transformation that belongs, say, to the equivalence class with the winding number $1$. We want to describe the change of this gauge potential to the gauge potential that is generated by a gauge transformation of winding number $0$, which simply means $A_i^{(0)}=0$. A possible description is

$$  A_i^{(\beta)}(x) = \beta A_i^{(1)}(x) $$

where $\beta$ is a real parameter. For $\beta =0$, we get the gauge potential with winding number $0$: $A_i^{(0)}=0$, and for $\beta =1$, we get the gauge potential with winding number $1$:$A_i^{(1)}(x) $.

For $\beta =1$ and $\beta =0$ our $A_i^{(\beta)}(x)$ corresponds to zero classical energy, because we are dealing with a pure gauge potentials.

However, for any other value for $\beta$ in between: $0<\beta <1$, our $A_i^{(\beta)}(x)$ is not pure gauge!

The analogue of the electric field for a non-abelian gauge theory $E_i \equiv G^{0i}$ still vanishes, because$\dot{A}_i^{(\beta)}=0$ and $A_i^{(\beta)}(x)$ are time-independent. In contrast, the analogue of the magnetic field $V_i \equiv \frac{1}{2} \epsilon{ijk}G^{jk}$ does not vanish:

\begin{align}  G_{jk} &= \beta(\partial_j A_k^{(1)}-\partial_k A_j^{(1)} + \beta^2 [A_j^{(1)},A_k^{(1)} ] \notag \\
&=(\beta^2-\beta)[A_j^{(1)},A_k^{(1)} ] \notag \\
& \neq 0 \quad \text{ for } 0 <\beta < 1.
\end{align}

The energy is proportional to $\int Tr(G_{jk}G_{jk})d^3x$, and is therefore non-zero for $0< \beta < 1$. It is important to notice that it not only non-zero, but also finite. This is, because at the boundaries $A_k^{(1)}$ vanishes sufficiently fast.

To summarize: $A_i^{(\beta)}(x)$ describes the transition from a vacuum state with winding number $1$ to a vacuum state with winding number $0$. By considering the field energy $\int Tr(G_{jk}G_{jk})d^3x$, explicitly, we can see that during this transition the field does not stay in a pure gauge transformation all the time. Instead, during the transition from $A_i^{(1)}(x) $ to $A_i^{(0)}(x) $ we necessarily encounter field configurations that correspond to a non-zero, but finite, field energy. In this sense, we can say that there is a finite potential barrier between vacua with a different winding number.

What is a winding number?

In the previous sections, we simply used the notion “winding number”.  This notion is best understood by considering an easy example with $U(1)$ as a gauge group. In addition, to make things even more simple, we restrict ourselves to only one spatial dimension. Afterward, we will talk about the notion winding number in the $SU(2)$ and 4D context that we are really interested here.

Winding Number for a U(1) gauge theory

As a reminder: We are interested in gauge transformations that yield physical gauge field configurations through

\begin{equation}
G_{\mu}^{\left( pg\right) }=\frac{-i}{g(x)}U\partial_{\mu}U^{\dagger}
\end{equation}

Thus, we assume that our $g(x)$ behave nicely everywhere. Especially this means that $g(x)$ must be a continuous function because otherwise, we would have points with infinite field momentum. The reason for this is that the field momentum is directly related to the derivative of the field with respect to $x$ and if there is a non-continuous jump somewhere, the derivative of the field would be infinity there.

As casually mentioned above (and as will be discussed below),  we restrict ourselves to those gauge transformations $U(x)$ that satisfy $U(x) \to 1 $ for  $|x| \to \infty$.  This condition means that we are allowed to consider the range where $x$ is defined as an element of $S^1$ instead of as an element of $\mathbb{R}$. The reason for this is $U(x) \to 1 $ for  $|x| \to \infty$ means that $U(x)$ has the same value at $x= – \infty$ and at $x= \infty$. Since all that interests us here is $U(x)$ or functions that are derived from $U(x)$, we can use instead of two points $-\infty$ and $\infty$, just one point, the point at infinity.  Expressed differently, because of the condition $U(x) \to 1 $ for  $|x| \to \infty$ we can treat $x= -\infty$ and $x = \infty$ as one point and this means our $\mathbb{R}$ becomes a circle $S^1$:

source: http://www.iop.vast.ac.vn/theor/conferences/vsop/18/files/QFT-4.pdf

Here’s the same procedure for 3 dimensions:

Source: https://arxiv.org/pdf/hep-th/0010225.pdf

Therefore, our gauge transformations are no longer functions that eat an element of $\mathbb{R}$ and spit out an element of the gauge group $U(1)$, but instead they are now maps from the circle $S^1$ to $U(1)$. Points on the circle can be parameterized by an angle $\phi$ that runs from $0$ to $2\pi$ and therefore, we can write possible maps as follows:

$$ S^1 \to U(1) : g(\phi)= e^{i\alpha(\phi)} \, . $$

A key observation is now that the set of all possible $g(\phi)$ is divided into various topological sectors, which can be labeled by an integer $n$. This can be understood as follows:

The map from the circle $S^1$ to $U(1)$ needs not to be one-to-one. The degree to which a given map is not one-to-one is the winding number. For example, when the map is two-to-one, the winding number is 2. A map from the circle onto elements of $U(1)$ is

$$ S^1 \to U(1) : f_n(\phi)= e^{in\phi} \, . $$

This map eats elements of the circle $S^1$ and spits out a $U(1)$. Now, depending on the value of $n$ in the exponent we get for multiple elements of the circle the same $U(1)$ element.

Here is how we can think about a map with winding number 1:

(For simplicity, space is here depicted as a line instead of a circle. Just imagine that the endpoints ∞ and -∞ are identified.)

Each arrow that points in a different direction stands for a different $U(1)$ element. As we move from ∞ to -∞ we encounter each $U(1)$ element exactly once. Similarly, a gauge transformation with winding number $0$ would only consist of arrows that point upwards, i.e. each element is mapped to the same $U(1)$ element, namely the identity. A gauge transformation with winding number 2 would consist of arrows that rotate twice as we move from ∞ to -∞, and so on for higher winding numbers.

Formulated differently, this means that depending on $n$ our map $f_n(\phi)$ maps several points on the circle onto the same $U(1)$ element.

For example, if $n=2$, we have
$$ f_2(\phi)= e^{i2\phi} .$$
Therefore
$$ f_2(\pi/2)= e^{i \pi} = -1 $$
and also
$$ f_2(3\pi/2)= e^{i3 \pi} = e^{i2 \pi} e^{i1 \pi} = -1 .$$

Therefore, as promised, for $n=2$ the map is two-to-one, because $\phi=\pi/2$ and $\phi= 3\pi/2$ are mapped onto the same $U(1)$ element. Equally, for $n=3$, we get for $\phi=\pi/3$, $\phi=\pi$ and $\phi= 5\pi/3$ the same $U(1)$ element $f_3(\pi/3)=f_3(\pi)=f_3(5\pi/3)=-1$.

In this sense, the map $f_n(\phi)$ determines how often $U(1)$ is wrapped around the circle and this justifies the name “winding number” for the number $n$.

Source: page 80 Selected Topics in Gauge Theories by Walter Dittrich, Martin Reuter

 

As a side remark: The elements of $U(1)$ also lie on a circle in the complex plane. ($U(1)$ is the group of the unit complex numbers). Thus, in this sense, $f_n(\phi)$ is a map from $S^1 \to S^1$.

A clever way to extract the winding number for an arbitrary map $ S^1 \to U(1)$ is to compute the following integral

$$ \int_0^{2\pi} d\phi \frac{f_n'(\phi)}{f_n(\phi)} = 2\pi i n, $$
where $f_n'(\phi)$ is the derivative of $f_n(\phi)$. Such tricks are useful for more complicated structures where the winding number isn’t that obvious.

Winding Number for an SU(2) gauge theory

Now, analogous to the compactification of $\mathbb{R}$ to the circle $S^1$, we compactify our three space dimensions to the three-sphere $S^3$. The argument is again the same, that the restriction $U(x) \to 1 $ for  $|x| \to \infty$ means that spatial infinity looks everywhere the same, no matter how we approached it (i.e. from all directions). Thus there is just one point infinity and not, for example, the edges of a hyperplane as infinities.

Thus, for a $SU(2)$ gauge theory our gauge transformations are maps from $S^3$ to $SU(2)$. In addition, completely analogous to how we can understand $U(1)$, i.e. the set of unit complex numbers, as the circle $S^1$, we can understand $SU(2)$, the set of unit quaternions, as a sphere $S^3$. Thus, in some sense our gauge transformations, are maps

$$ S^3 \to S^3 \quad : \quad U(x) = a_0(x) 1 + i a_i(x) \sigma ,$$
where $\sigma$ are the Pauli matrices.

Again, we can divide the set of all $SU(2)$ gauge transformations into topological distinct sets that are labeled by an integer.

Analogous to how we could extract the $U(1)$ winding number from a given gauge transformation, we can compute the $SU(2)$ winding number by using an integral formula (source: page 23 here):

$$n = \frac{1}{24\pi^2} \int_{S^3} d^3x \epsilon_{ijk} Tr\left[ \left( U^{-1} \partial_i U \right)\left(U^{-1} \partial_jU \right)\left(U^{-1} \partial_kU \right) \right] $$

This formula looks incredibly complicated, but can be understood quite easily.  The trick is that we can parametrize elements of $SU(2)$ by Euler angles $\alpha,\beta,\gamma$ and then define a volume element in parameter space

$$ d\mu(U) = \frac{1}{16\pi} \sin\beta d\alpha d\beta d\gamma . $$

Then by an explicit computation one can show that this volume element can be expressed as

$$d\mu(U) = \frac{1}{4\pi} Tr\left[ \left( U^{-1} \partial_i U \right)\left(U^{-1} \partial_jU \right)\left(U^{-1} \partial_kU \right) \right]d\alpha d\beta d\gamma .$$

This allows us to see that the integral over this volume element yields indeed, when we integrate $x$ all over the spatial $S^3$, the number of times we get the $SU(2)$ manifold. Moreover, geometrically $SU(2)$ is $S^3$, too. Expressed differently, when $x$ ranges one time over all points on the spatial sphere $S^3$, the winding number integral is simply the integral over the volume element of $SU(2)$ and yields the number of times we get the $SU(2)$ manifold. For example, when we have the trivial gauge function

$$U=1 ,$$

we cover the $SU(2)$ sphere zero times.

However, for example, for

$$  U^{(1)}(x) = \frac{1}{|x|}(x_4+ \vec x \cdot \vec \sigma)$$

we can see that we get exactly one time all the points of a sphere $S^3$, when the $x$ range one time over all points on the spatial $S^3$. Thus this gauge transformation has winding number $1$.

Gauge transformations with an arbitrary winding number can be computed from the gauge transformation with winding number $1$ via

$$ U(x)^{(n)} = [U^{(1)}(x)]^n $$

All this is shown nicely and explicitly on page 90 in the second edition of Quarks, Leptons and Gauge Fields by Kerson Huang.  He also shows explicitly why $ U(x)^{(n)} = [U)^{(1)}(x)]^n $ holds.

Now it’s probably time for a short intermediate summary.

Intermediate Summary – What have we learned so far?

We started by studying vacuum configurations of Yang-Mills field theory (a gauge theory, for example, with $SU(3)$ gauge symmetry like QCD). Vacuum configurations correspond to field configurations with a minimal amount of field energy. This means they correspond to vanishing field strength tensors and thus to gauge potential configurations that are pure gauge:

$$A_\mu = U \partial_\mu (U^{-1}) .$$

We then made two assumptions. While the first assumption (temporal gauge $A_0 = 0$) look okay, the second one is really strange: We restrict ourselves to those gauge transformations that satisfy the condition $U(x) \to 1$ for $|x| \to \infty$. Just to emphasize how strange this assumption is, here is a picture:

Imagine this geometrical objects represents all gauge transformations, i.e. each point is a gauge transformation. What we do with the restriction $U(x) \to 1$ for $|x| \to \infty$ is cherry picking. We pick from this huge set only a very specific set of gauge transformations, denoted by $X$’s in the picture. With this in mind, it’s no wonder that the resulting topology is non-trivial.

However, without discussing this assumption any further we pushed on and discussed the picture of the vacuum that emerges from these assumptions.

We found that the subset of all gauge configurations that satisfy $U(x) \to 1$ for $|x| \to \infty$ can be classified with the help of a label called winding number. We then computed that if the gauge potential changes from one vacuum configuration with a given winding number to a configuration with a different winding number, it needs to go through configurations that correspond to a non-zero field energy. This means that there is a potential barrier between configurations with a different winding number.

We then talked about why the name “winding number” makes sense. The crucial point is that this number really measures how often the gauge group winds around spacetime.

Physical Implications of the Periodic Structure

The first ones who came up with the periodic picture of the QCD vacuum that we described above, were Jackiw and Rebbi in 1976. However, they didn’t simply looked at QCD and then derived this structure.

Instead, they a had a very specific goal when they started their analysis. Their study was motivated by the recent discovery of so-called instantons (Alexander Belavin, Alexander Polyakov, Albert Schwarz and Yu. S. Tyupkin 1975).

Instantons are finite energy solutions of the Yang-Mills equations in Euclidean spacetime. For reasons that will be explained in a second post, this leads to the suspicion that they have something to do with tunneling processes.

(In short: The transformation from Minkowski to Euclidean spacetime is $t \to i \tau$. A “normal” wave function in quantum mechanics looks like $\Psi \sim e^{iEt}$. Now, remember how the quantum mechanical solution looks like for the tunneling of a particle through a potential barrier $\Psi \sim e^{-Et}$. The difference is $t \to i \tau$, too! This is the main reason why normal solutions in Euclidean spacetime are considered to be tunneling solutions in Minkowski spacetime.)

The motivation behind the study by Jackiw and Rebbi was to make sense of these instanton solutions in physical terms. What is tunneling? And from where to where?

(While you may not care about the history of physics, this bit of history is crucial to understand the paper by Jackiw and Rebbi; and especially how the standard picture of the QCD vacuum came about. The important thing to keep in mind is that instantons were discovered before the periodic structure of the QCD vacuum. )

The notion “winding number” was already used by Belavin, Polyakov, Schwarz, and Tyupkin. However, no physical interpretation was given. The idea by Jackiw and Rebbi was that instantons describe the tunneling between vacuum states that carry different winding number. Most importantly, they had the idea that vacuum states with different integer winding number are separated by a potential barrier, as already discussed above. Thus, the vacuum states do not lie “next to each other” and the quantum field can only transform itself from one such vacuum state into another through a tunneling process (or, of course, if it carries enough energy, for example, when the temperature is high enough).

The situation then is completely analogous to an electron in a crystal. The crystal is responsible for a periodic structure in which the electron “moves”. Like the QCD gauge field, the electron needs to tunnel to get from one minimum in the crystal potential to the next. Let’s say the minima of the crystal potential are separated by a distance $a$. Then, we can conclude that the periodic structure of the potential means that the wave function must be periodic, too: $\psi(x) = \psi(x+a)$! However, we are dealing with quantum mechanical wave functions and thus, it’s possible that the electron picks up a phase when it tunnels from one minimum to the next: $\psi(x) = e^{i\theta} \psi(x+a)$! This makes no difference for the conclusion that the probability to find the electron must be the same for locations that are separated by the distance $a$. The correct states of the electron are then not described by some $\psi(x)$, but rather by a superposition of the wave function of all minima. There are different superpositions possible, and each one is characterized by a specific value of the phase parameter $\theta$. The resulting wave function is known as Bloch wave and the phase $\theta$ as Bloch momentum. (You can read much more on this, for example in Kittel chapter 9 and Ashcroft-Mermin chapter 8).

The idea of Jackiw and Rebbi was that we have exactly the same situation for the QCD vacuum.

We have a periodic potential, tunneling between the minimas, and consequently also a parameter $\theta$, analogous to the Bloch momentum. (Take note that for the QCD vacuum neighboring minima are not separated by some distance $a$, but instead by a winding number difference of $1$.) Upon closer inspection, it turns out that the parameter $\theta$ leads to CP violation in QCD interactions and can, in principle, be measured.

It is important to know the backstory of the paper by Jackiw and Rebbi because otherwise some of their arguments do not seem to make much sense. They already knew about the instanton solutions and had the “electron in a crystal” picture in mind as a physical interpretation of the instantons. Around this idea, they wrote their paper.

The periodic vacuum structure of the QCD vacuum was not discovered from scratch, but with these very specific ideas in mind.

We have seen above, that the periodic structure of the QCD vacuum does not arise without two crucial assumptions. If you know that this structure was first described with instantons and Bloch waves in mind, it makes a lot more sense how the original authors came up with these assumptions. These assumptions are exactly what you need to give the QCD vacuum the nice periodic structure and thus to be able to draw the analogy with an electron in a crystal. As I will describe in a later post, without these assumptions, the QCD vacuum looks very different.

In their original paper, Jackiw and Rebbi motivated one of the assumptions, namely the restriction to gauge transformations that satisfy $g(x) \to 1$ for $|x| \to \infty$, simply with “we study effects which are local in space and therefore”. As far as I know, this reason does not make sense and was never again repeated in any later paper. In subsequent papers, Jackiw came up with all sorts of different reasons for this restriction. However, ultimately in 1980, he concluded: “while some plausible arguments can be given in support of this hypothesis (see below) in the end we must recognize it as an assumption” (Source).

The path to the standard periodic picture of the QCD vacuum was thus not through some rigorous analyses, but rather strongly guided by “physical intuition”. It was the idea, that the interpretation of the QCD vacuum could be done similar to the quantum mechanical problem of an electron in a crystal, which lead to the periodic picture of the QCD vacuum.

My point is not that this picture is wrong. Instead, I was long puzzled for along time by the reasons that are given for the restriction $g(x) \to 1$ for $|x| \to \infty$ and I want to help others who are confused by them, too. I will write a lot more about this in a second post, but I hope that the few paragraphs above already help a bit. The path to the periodic vacuum structure is not as straight-forward as most authors want you to believe. However, it is important to keep in mind that only because physicists came up with a description through intuition and not via some rigorous analysis, does not mean that it is wrong. Even when the original arguments the discoverers give do not hold upon closer scrutiny, it is still possible that their conclusions are correct. As already mentioned above, after the original publication, both Jackiw and Rebbi and many other authors, came up with lots of other arguments to strengthen the case for the periodic vacuum picture.

However, it is also important to keep in mind that, so far, all experimental evidence point in the direction that $\theta$ is tiny $\theta \lesssim 10^{10}$ or even zero. This is hard to understand if you believe in the analogy with the Bloch wave. In this picture, $\theta$ is an arbitrary phase and could be any value between $0$ and $2\pi$. There is no reason why it should be so tiny or even zero. This is famously known as the strong CP problem. (Things aren’t really that simple. The parameter $\theta$ also pops up from a completely different direction, namely from an analysis of the chiral anomaly. Thus, even if you don’t believe in the Bloch wave picture of the QCD vacuum, you end up with a $\theta$ parameter. Much more on this in a later post.)

Outlook (or: which puzzle pieces are still missing?)

Unfortunately, there are still a lot of loose ends. These will be hopefully tied up in future posts.

Most importantly we need to talk more about the assumptions

1.) The choice of the temporal gauge $A_0 = 0$.
2.) The restriction to those gauge transformations that become trivial at infinity $U(x) \to 1$ for $|x| \to \infty $.

In the second post of this series, I will try to elucidate these assumptions which are only noted in passing in almost all standard discussion of the QCD vacuum.

In a third post, I will show how the QCD vacuum can be understood beautifully from a completely different perspective.

Another important loose end is that we have not talked about instantons so far. These are solutions of the Yang-Mills equations and describe the tunneling processes between degenerate vacua.

Until I have finished these posts, here are some reading recommendations.

Reading Recommendations

The classical papers that elucidated the standard picture of the QCD vacuum are:

  • Vacuum Periodicity in a Yang-Mills Quantum Theory by R. Jackiw and C. Rebbi
  • Toward a theory of the strong interactions by Curtis G. Callan, Jr., Roger Dashen, and David J. Gross
  • The Structure of the Gauge Theory Vacuum Curtis G. Callan et. al.
  • Pseudoparticle solutions of the Yang-Mills equations A.A. Belavin et. al.
  • Concept of Nonintegrable Phase Factors and Global Formulation of Gauge Fields Tai Tsun Wu (Harvard U.), Chen Ning Yang

The standard introductions to instantons and the QCD vacuum are:

  • ABC of instantons by A I Vaĭnshteĭn, Valentin I Zakharov, Viktor A Novikov and Mikhail A Shifman
  • The Uses of Instantons by Sidney Coleman

(However, I found them both to be not very helpful)

Books on the topic are:

  • The QCD Vacuum, Hadrons and Superdense Matter by E. V. Shuryak
  • Solitons and Instantons by Ramamurti Rajaraman (Highly Recommended)
  • Classical Solutions in Quantum Field Theory: Solitons and Instantons by Erick Weinberg
  • Topological Solitons by Manton and Sutcliff (Highly Recommended)
  • Some Elementary Gauge Theory Concepts Hong-Mo Chan, Sheung Tsun Tsou
  • Classical Theory of Gauge Fields by Rubakov (Highly Recommended)

Review articles are:

  • Theory and phenomenology of the QCD vacuum by Edward V. Shuryak
  • A Primer on Instantons in QCD by Hilmar Forkel (Highly Recommended)
  • Effects of Topological Charge in Gauge Theories R.J. Crewther
  • TASI Lectures on Solitons Instantons, Monopoles, Vortices and Kinks David Tong
  • Topological Concepts in Gauge Theories by F. Lenz

Textbooks that contain helpful chapters on instantons and the QCD vacuum are:

  • Quarks, Leptons & Gauge Fields by Kerson Huang (Highly Recommended)
  • Quantum Field Theory by Lewis H. Ryder (Highly Recommended)
  • Quantum Field Theory by Mark Srednicki (Highly Recommended)
  • Quantum Field Theory and Critical Phenomena by Zinn-Justin

Another informal introduction is:

  • ’t Hooft and η’ail Instantons and their applications by Flip Tanedo

The same things explained more mathematically can be found in:

  • Geometry of Yang-Mills Fields by M. F. ATIYAH
  • plus chapters in Geometry of Physics by Frankel and
  • Topology and Geometry for Physicists by Nash and Sen

Larger Symmetries

“Further progress lies in the direction of making our equations invariant under wider and still wider transformations.”

These prophetic lines were written in 1930 by P. A. M. Dirac in his famous book “The Principles of Quantum Mechanics”. In the following centuries, tremendous progress was made exactly as he predicted.

Weak interactions were described perfectly using $SU(2)$ symmetry, strong interactions using $SU(3)$ symmetry and it is well known that electrodynamics can be derived from $U(1)$ symmetry. Other aspects of elementary particles, like their spin, can be understood using the symmetry of special relativity.

A symmetry is a transformation that leaves our equations invariant, i.e. that does not change the equations. A set of symmetry transformations is called a group and, for example, the set of transformations that leaves the equations of special relativity invariant is called the Poincare group.

By making our equations invariant under the quite large set of transformations:

$$ \text{Poincare Group} \times U(1) \times SU(2) \times SU(3) , $$

we are able to describe all known interactions of elementary particles, except for gravity. This symmetry is the core of the standard model of modern physics, which is approximately 40 years old. Since then it has been confirmed many times, for example, through the discovery of the Higgs boson. Just as Dirac predicted, we gained incredible insights into the inner workings of nature, by making the symmetry of our equations larger and larger.

Unfortunately, since the completion of the standard model $\sim 40$ years ago, there was no further progress in this direction. No further symmetry of nature was revealed by experiments. (At least that’s the standard view, but I don’t think it’s true. More on that later). In 2017 our equations are still simply invariant under $ \text{Poincare Group} \times U(1) \times SU(2) \times SU(3) , $ but no larger symmetry.

I’m a big believer in Dirac’s mantra. Despite the lack of new experimental insights, I do think there are many great ideas for how symmetries could guide us towards the correct theory beyond the standard model.

Before we can discuss some of these ideas, there is one additional thing that should be noted. Although the four groups $ \text{Poincare Group} \times U(1) \times SU(2) \times SU(3) $ are written equally next to each other, they aren’t treated equally in the standard model. The Poincare group is a spacetime symmetry, whereas all other groups describe inner symmetries of quantum fields. Therefore, we must divide the quest for a larger symmetry into two parts. On the one hand, we can enlarge the spacetime symmetry and on the other hand, we can enlarge the inner symmetry. In addition to these two approaches, we can also try to treat the symmetries equally and enlarge them at the same time.

Let’s start with the spacetime symmetry.

Enlargement of the Spacetime Symmetry

The symmetry group of special relativity is the set of transformations that describe transformations between inertial frames of reference and leave the speed of light invariant. As already noted, this set of transformations is called the Poincare group.

Before Einstein discovered special relativity, people used a spacetime symmetry that is called the Galilean group. The Galilean group also describes transformations between inertial frames of reference but does not care about the speed of light.

The effects of special relativity are only important for objects that are moving fast. For everything that moves slowly compared to the speed of light, the Galilean group is sufficient. The Galilean group is an approximate symmetry when objects move slowly. Mathematically this means that the Galilean group is the contraction of the Poincare group in the limit where the speed of light goes to infinity. For an infinite speed of light, nothing can move with a speed close to the speed of light and thus the Galilean group would be the correct symmetry group.

It is natural to wonder if the Poincare group is an approximate symmetry, too.

One hint in this direction is that the Poincare group is an “ugly” group. The Poincare group is the semi-direct product of the group of translations and the Lorentz group, which described rotations and boosts. Therefore the Poincare group, not a simple group. The simple groups are the “atoms of groups” that can be used to construct all other groups from. However, the spacetime symmetry group that we use in the standard model is not one of these truly fundamental groups.

Already in 1967, Monique Levy‐Nahas studied the question which groups could yield the Poincare group as a limit, analogous to how the Poincare group yields the Galilean group as a limit.

The answer she found was stunningly simple: “the only groups which can be contracted in the Poincaré group are $SO(4, 1)$ and $SO(3, 2)$”. These groups are called the de Sitter and the anti-de Sitter group.

They consist of transformations that describe transformations between inertial frames of reference, leave the speed of light invariant and leave additionally an energy scale invariant. The de Sitter group leaves a positive energy scale invariant, whereas the anti deSitter group leaves a negative energy scale invariant. Both contract to the Poincare group in the limit where the invariant energy scale goes to zero.

Levy‐Nahas’ discovery is great news. There isn’t some large pool of symmetries that we can choose from, but only two. In addition, the groups she found are simple groups and therefore much “prettier” than the Poincare group.

Following Dirac’s mantra and remembering the fact that the deformation: Galilean Group $\to $ Poincare Group led to incredible progress, we should take the idea of replacing the Poincare group with the de Sitter or anti de Sitter group seriously. This point was already emphasized in 1972 by Freeman J. Dyson in his famous talk “Missed opportunities”.

Nevertheless, I didn’t hear about the de Sitter groups in any particle physics lecture or read about them in any particle physics book. Maybe because the de Sitter symmetry is not a symmetry of nature? Because there is no experimental evidence?

To answer these questions, we must first answer the question: what is the energy scale that is left invariant?

The answer is: it’s the cosmological constant!

The present experimental status is that the cosmological constant is tiny but nonzero and positive: $\Lambda \approx 10^{-12}$ eV! This smallness explains why the Poincare group works so well. Nevertheless, the correct spacetime symmetry group is the de Sitter group. I’m a bit confused why this isn’t mentioned in the textbooks or lectures. If you have an idea, please let me know!

Can we enlarge the spacetime symmetry even further?

Yes, we can. But as we know from Levy‐Nahas’ paper, only a different kind of symmetry enlargement is possible. There isn’t any other symmetry that could be more exact and yield the de Sitter group in some limit. Instead, we can think about the question, if there could be a larger broken spacetime symmetry.

Nowadays the idea of a broken symmetry is well known and already an important part of the standard model. In the standard model, the Higgs field triggers the breaking $SU(2) \times U(1) \to U(1)$.

Something similar could’ve been happened to a spacetime symmetry in the early universe. A good candidate for such a broken spacetime symmetry is the conformal group $SO(4,2)$.

The temperature in the early universe was incredibly high and “[i]t is an old idea in particle physics that, in some sense, at sufficiently high energies the masses of the elementary particles should become unimportant” (Sidney Coleman in Aspects of Symmetry). In the massless limit, our equations become invariant under the conformal group (source). The de Sitter group and the Poincare group are subgroups of the conformal group. Therefore it is possible that the conformal group was broken to the de Sitter group in the early universe.

This idea is interesting for a different reason, too. The only parameter in the standard model that breaks conformal symmetry at tree level is the Higgs mass parameter. This parameter is the most problematic aspect of the standard model and possibly the Higgs mass fine-tuning problem can be solved with the help of the conformal group. (See: On naturalness in the standard model by William A. Bardeen.)

Enlargement of the Inner Symmetry

The inner symmetry group of the standard model $ U(1) \times SU(2) \times SU(3) $ is quite ugly, too. Like the Poincare group, it is not a simple group.

There is an old idea by Howard Georgi and Sheldon Glashow that instead of $ U(1) \times SU(2) \times SU(3) $ we use a larger, simple group $G_{GUT} $. These kinds of theories are called Grand Unified Theories (GUTs).

While GUTs have problems, they are certainly beautiful. On obvious “problem” is that in present-day colliders, we do not observe effects of a $G_{GUT}$ structure and thus we assume the unified gauge symmetry is broken at some high energy scale:

\begin{equation} \label{eq:schematicgutbreaking}
G_{GUT} \stackrel{M_{GUT}}{\rightarrow} \ldots \stackrel{M_I}{\rightarrow} G_{SM} \stackrel{M_Z}{\rightarrow} SU(3)_C \times U(1)_Q \, ,
\end{equation}

where the dots indicate possible intermediate scales between $G_{GUT}$ and $G_{SM}$. In the following, we discuss some of the “mysteries” of the standard model that can be resolved by a GUT.

Quantization of Electric Charge

In the standard model the electric charges of the various particles must be put in by hand and there is no reason why there should be any relation between the electron and proton charge. However from experiments it is known that $Q_{\text{proton}}+Q_{\text{electron}}= \mathcal{O}(10^{-20})$. In GUTs one multiplet of $G_{GUT}$ contains quarks and leptons. This way, GUTs provide an elegant explanation for the experimental fact of charge quantization. For example in $SU(5)$ GUTs the conjugate $5$-dimensional representation contains the down quark and the lepton doublet

\begin{equation}
\bar{5} = \begin{pmatrix} \nu_L \\ e_L \\ (d_R^c)_{\text{red}} \\ (d_R^c)_{\text{blue}} \, .\\ (d_R^c)_{\text{green}} \end{pmatrix}
\end{equation}

The standard model generators must correspond to generators of $G_{GUT}$. Thus the electric charge generator must correspond to one Cartan generator of $G_{GUT}$ (The eigenvalues of the Cartan generators of a given gauge group correspond to the quantum numbers commonly used in particle physics.). In $SU(5)$ the Cartan generators can be written as diagonal $5\times 5$ matrices with trace zero. (In $SU(5)$ is the set of $5 \times 5$ matrices $U$ with determinant $1$ that fulfil $U^\dagger U = 1$. For the generators $T_a$ this means $\text{det}(e^{i \alpha_a T_a})=e^{i \alpha_a \text{Tr}(T_a)} \stackrel{!}{=}1$. Therefore $Tr(T_a) \stackrel{!}{=} 0$) Therefore we have

\begin{align}
\text{Tr}(Q)&= \text{Tr} \begin{pmatrix} Q(\nu_L) & 0 & 0 & 0 &0 \\ 0 & Q(e_L) & 0 & 0 &0 \\ 0 & 0 & Q((d_R^c)_{\text{red}}) & 0 &0\\ 0 & 0 & 0 & Q((d_R^c)_{\text{blue}})&0\\ 0 & 0 & 0 & 0 &Q((d_R^c)_{\text{green}}) \end{pmatrix} \stackrel{!}{=} 0 \notag \\
&\rightarrow Q(\nu_L) + Q(e_L) + 3Q(d_R^c) \stackrel{!}{=} 0 \notag \\
&\rightarrow Q(d_R^c) \stackrel{!}{=} -\frac{1}{3} Q(e_L) \, .
\end{align}

Analogously, we can derive a relation between $e_R^c$, $u_L$ and $u_R^c$. Thus $Q_{\text{proton}}+Q_{\text{electron}}= \mathcal{O}(10^{-20})$ is no longer a miracle, but rather a direct consequence of of the embedding of $G_{SM}$ in an enlarged gauge symmetry.

Coupling Strengths

The standard model contains three gauge couplings, which are very different in strength. Again, this is not a real problem of the standard model, because we can simply put these values in by hand. However, GUTs provide a beautiful explanation for this difference in strength. A simple group $G_{GUT}$ implies that we have only one gauge coupling as long as $G_{GUT}$ is unbroken. The gauge symmetry $G_{GUT}$ is broken at some high energy scale in the early universe. Afterward, we have three distinct gauge couplings with approximately equal strength. The gauge couplings are not constant but depend on the energy scale. This is described by the renormalization group equations (RGEs). The RGEs for a gauge coupling depends on the number of particles that carry the corresponding charge. Gauge bosons have the effect that a given gauge coupling becomes stronger at lower energies and fermions have the opposite effect. The adjoint of $SU(3)$ is $8$-dimensional and therefore we have $8$ corresponding gauge bosons. In contrast, the adjoint of $SU(2)$ is $3$-dimensional and thus we have $3$ gauge bosons. For $U(1)$ there is only one gauge boson. As a result for $SU(3)$ the gauge boson effect dominates and the corresponding gauge coupling becomes stronger at lower energies. For $SU(2)$ the fermion and boson effect almost cancel each other and thus the corresponding gauge coupling is approximately constant. For $U(1)$ the fermions dominate and the $U(1)$ gauge coupling becomes much weaker at low energies. This is shown schematically in the figure below. This way GUTs provide an explanation why strong interactions are strong and weak interactions are weak.

 

Another interesting aspect of the renormalization group evolution of the gauge couplings is that there is a close between the GUT scale and the proton lifetime. Thus proton decay experiments yield directly a bound on the GUT scale $M_{GUT} \gtrsim
10^{15}$ GeV. On the other hand, we can use the measured values of the gauge couplings and the standard model particle content to calculate how the three standard model gauge couplings change with energy. Thus we can approximate the GUT scale as the energy scale at which the couplings become approximately equal. The exact scale depends on the details of the GUT model, but the general result is a very high scale, which is surprisingly close to the value from proton decay experiments. This is not a foregone conclusion. With a different particle content or different measured values of the gauge coupling, this calculation could yield a much lower scale and this would be a strong argument against GUTs. In addition, the gauge couplings could run in the “wrong direction” as shown in the figure. The fact that the gauge coupling run sufficiently slow and become approximately equal at high energies are therefore hints in favor of the GUT idea.

 

Further Postdictions

In addition to the “classical” GUT postdictions described in the last two sections, I want to mention two additional postdictions:

  • A quite generic implication of grand unification small neutrino masses through the type-1 seesaw mechanism. Models based on the popular $SO(10)$ or $E_6$ groups contain automatically a right-handed neutrino $\nu_R$. As a result of the breaking chain this standard model singlet $\nu_R$ gets a superheavy mass $M$. After the last breaking step $G_{SM}\rightarrow SU(3)_C \times U(1)_Y$ the right-handed and left-handed neutrinos mix. This yields a suppressed mass of the left-handed neutrino of order $\frac{m^2}{M}$, where $m$ denotes a typical standard model mass.
  •  GUTs provide a natural framework to explain the observed matter-antimatter asymmetry in the universe. As already noted above a general implication of GUTs is that protons are no longer stable. Formulated differently, GUTs allow baryon number-violating interactions. This is one of three central ingredients, known as Sakharov condition, needed to produce more baryons than antibaryons in the early universe. Thus, as D. V. Nanopoulos put it, “if the proton was stable it would not exist”.

 

What’s next?

 

While the unification of spacetime symmetries was already confirmed by the measurement of the cosmological constant, so far, there is no experimental evidence for the correctness of the GUT idea. Thus the unification of internal symmetries still has to wait. However, proton decay could be detected anytime soon. When Hyper-Kamiokande will start operating the limits on proton lifetime will become one order of magnitude better and this means there is a realistic chance that we finally find evidence for Grand Unification.

This, however, would by no means be the end of the road.

Arguably, it would be awesome if we could unify spacetime and internal symmetries into one large symmetry. However, there is one no-go theorem that blocked progress in this direction: the famous Coleman-Mandula theorem.

Nevertheless, a no-go theorem in physics never really means that something is impossible, only that it isn’t as trivial as one might think. There are several loopholes in the theorem, that potentially allow the unification of spacetime and internal symmetries.

At least to m, it seems as Dirac was right and larger symmetries is the way to go. However, so far, we don’t know which way we should follow.

Resources that helped me understand Grand Unified Theories

I recently finished my master’s thesis on dark matter in Grand Unified Theories. Here are some resources that I found particularly helpful.

Group Theoretical Preliminaries

Unification means that we embed the standard model gauge group $G_{SM} \equiv SU(3)_C \times SU(2)_L \times U(1)_Y$ in a larger gauge group $G_{GUT}$.

Thus the first important questions for me were:

  • Which groups $G_{GUT}$ can be used?
  • How can we embed $G_{SM}$ in $G_{GUT}$ and what does this actually mean?
  • How can we describe and compute the breaking of $G_{GUT}$ to $G_{SM}$? (Although there are other methods to break a symmetry, I restricted myself to the usual Higgs mechanism.)

The answer to the first question is, that we use simple groups that have non-self-conjugate representations and that are large enough such that $G_{SM}$ can be embedded. I’ve written a long post about the classification of all simple groups. A representation of a Lie group is called non-self-conjugate if it is not equivalent to the corresponding conjugated representation. This means the group elements are represented by complex matrices $R^a$ and it is impossible to get the conjugated matrices $\overline{R}^a$ from the original matrices using a similarity transformation $U R^a U^\dagger \neq \overline{R}^a$. If there is a map $U R^a U^\dagger= \overline{R}^a  $ the representation is called self-conjugate. In physics, it is conventional to call a self-conjugate representation real and a non-self-conjugate representation complex, although these notions have in mathematics a different meaning. The short version is that GUT models that put the fermions in a self-conjugate representation “will not give the observed chiral structure of weak interactions”. I was quite confused about this for several weeks because I couldn’t find a good explanation. What made it finally click for me was

The Algebra of Grand Unified Theories by John C. Baez and John Huerta

The answer to the second question is that embedding means that we identify the standard model Cartan generators among the Cartan generators of $G_{GUT}$. An embedding is best described by so-called charge axes. Unfortunately, I don’t know any good resource that explains how we can find an embedding of $G_{SM}$ in a given GUT group and I’ve planned to write about it as soon as I find some time. The standard resource on this kind of thing is

but I wasn’t able to understand his explanations. For the concept of charge axes I found

Dark Matter and Gauge Coupling Unification in Non-supersymmetric SO(10) Grand Unified Models
by Yann Mambrini, Natsumi Nagata, Keith A. Olive, Jeremie Quevillon, Jiaming Zheng

very helpful.

Unfortunately, I also don’t know any good resource that answers the third question. The breaking of $G_{GUT}$ can be described either using tensor methods or using Dynkin’s methods. In tensor language, a vev is given by a matrix $V$, which is a member of some representation $R$. Then one acts on this matrix with every generator $G$ of the group via the usual Lie algebra product $[G,V]$ (the commutator).

  • If the result $M$ of the commutator: $[G,V]=M$ is an element of $R$, i.e. of the same representation as $V$ then the generator $G$ is broken.
  • If the result $M$ of the commutator: $[G,V]=M$ is not an element of $R$, then the generator $G$ is unbroken because $G$ annihilates (by definition) the vev $V$. If acting with a generator on an element of some representation $R$ does not yield another element of $R$, the result is by definition zero. This is analogous to the ladder operators for the angular momentum in quantum mechanics. We can apply these ladder operators only until we reach the highest state. If we act with a raising operator on the highest state, we would get a state which is not part of the representation and therefore we say that the state gets annihilated.

The main task is always to write down the correct explicit matrices for the various representations. The rest of the computation, i.e. the commutator and then check if the result is in the same representation as the VEV, can be easily implemented in Mathematica. While tensor methods are easier to explain, I find Dynkin’s methods most of the time easier, especially for a non-trivial group like $SO(10)$ or $E_6$. (However, a colleague of mine recently published a Mathematica package for explicit matrix expressions of $E_6$ representations and this should simplify the computations in $E_6$ models significantly.) I don’t know any good paper or book that explains how symmetry breaking can be described in Dynkin’s framework and therefore I will write about it as soon as I find the time.

For the group-theoretical aspects of GUTs is found the following resources really helpful:

Grand Unified Theories in General

To get an overview I found the famous review

really helpful, although it is quite old and some things are outdated.

I know of three books about GUTs and all of them are quite old, too. Nevertheless, I think that

Grand Unification with and without Supersymmetry by Costas Kounnas, Antonio Masiero, Dimitri Nanopoulos, Keith A Olive

is still a great book and I found some chapters in

Grand Unified Theories by Graham Ross

really helpful.

The third GUT book I know of is

Unification and Supersymmetry by Rabindra N. Mohapatra 

but it didn’t help me.

Specific Topics

To understand many subtle effects in GUTs I found the review

Effective Field Theory by Howard Georgi

incredibly helpful. It helped me finally understand, why the gauge couplings should unify at all and what the GUT scale actually is.

Two other excellent papers that were immensely helpful and elucidate many important GUT problems are

An Exceptional Model for Grand Unification by Riccardo Barbieri, Dimitri V. Nanopoulos

Mass relations and neutrino oscillations in an SO(10) model by Harvey, J. A.; Reiss, D. B.; Ramond, P.

Renormalization Group Running

I’ve written three long posts about the renormalization group running and most papers that I found helpful are linked there. The most helpful paper was

Implications of LEP results for SO(10) grand unification by N.G. Deshpande, E. Keith, Palash B. Pal

and for threshold effects:

Scalar Masses

In this context, the extended survival hypothesis is extremely important. We actually don’t understand scalar masses, but we need to know them in order to compute the renormalization group equations. Therefore on usually invokes the extended survival hypothesis which states that “at every stage oft he symmetry breaking chain only those scalars are present that develop a vacuum expectation value (VEV) at the current or the subsequent levels of the spontaneous symmetry breaking.” The two standard papers on this topic are

Higgs Bosons in SO(10) and Partial Unification by F. del Aguila, Luis E. Ibanez

Higgs-boson effects in grand unified theories by Rabindra N. Mohapatra and Goran Senjanović

Breaking Chains

The scalar sector is usually too complicated in GUTs and thus one must stick to rules of thumb like the extended survival hypothesis. Another very important such rule is Michel’s conjecture. This conjecture states that states that minima of a Higgs potential correspond to vacuum expectation values that break a given algebra to a maximal subalgebra. This conjecture is explained nicely in

Group Structure of Gauge Theories by L. O’Raifeartaigh

Another important aspect explained there is the necessary condition that in order for a scalar representation to be able to break a group to the subgroups in a given breaking chain that it “must contain singlets with respect to the various subgroups G’, G”…”

Induced Vacuum Expectation Values

A third important observation in the scalar sector is that even if we assume that some scalar field that is allowed to get a VEV does not get a VEV, it is possible that is gets a small induced VEV. This is shown nicely in

Proton lifetime and fermion masses in an SO(10) model by G. Lazarides, Q. Shafi,  C. Wetterich

Aspects of the Grand Unification of Strong, Weak and Electromagnetic Interactions by A.J. Buras, John R. Ellis, M.K. Gaillard, Dimitri V. Nanopoulos