Surprising Symmetries at the other End of the Spectrum

Often in particle physics, we spent a lot of time thinking about what goes on at high-energies. Theories that address current problems or puzzles in particle physics, are often UV-theories (UV=ultraviolet). A UV-theory is a theory that is valid at high energy scales. For example, a popular class of  UV-theories are Grand Unified Theories. In such theories, the standard model gauge group is replaced with a “better” group. This group structure would become visible at much higher energies.

This is the usual side of the energy spectrum where we expect new symmetries to show up.

While everyone expected new symmetries to show up at high energies, instead new symmetries were discovered at the other side of the energy spectrum. This other side of the spectrum is known as infrared (IR) region and corresponds to what is going at low energies.

We can always relate energy scales to length scales. A high energy scale corresponds to tiny length scales (tiny wave lengths), low energies correspond to long length scales (long wave lengths). The newly discovered symmetries are not relevant for what is going on at tiny length scales, but instead for our description of large length scales. Such symmetries are known as asymptotic symmetries.

The first discovery of a non-trivial asymptotic symmetry was a complete surprise. In 1962 Burg, Metzner and Sachs investigated the asymptotic symmetry group of general relativity. They investigated a system that becomes asymptotically flat.  Imagine a large sphere in spacetime. Let’s assume all relevant stuff is inside the spacetime and thus inside the sphere spacetime is certainly curved. However, outside of the isolated system in the sphere, spacetime becomes flat, because everything that can curve spacetime is inside the sphere. Usually, to simplify the discussion, we say the sphere is infinitely large and hence that spacetime becomes flat “at infinity”. However, we do not really mean infinity. Instead what is usually meant with infinity is that it is sufficiently far away. We use the same simplification in quantum field theory. To calculate scattering amplitudes we take the integrals from $t=-\infty $ to $t=\infty$ and often integrate all over space. Analogously these infinities shouldn’t be taken literally, but simply represent, for example, a long enough time span.

Now, back to the situation investigated by Burg, Metzner and Sachs. Inside the sphere the relevant symmetry group is the  diffeomorphism group. (The diffeomorphism group is basically the set of all transformations that do not “destroy” spacetime, i.e. do not rip holes into it etc.). In the asymptotic region where spacetime becomes flat the naive expectation is that the relevant symmetry group is simply the symmetry group of flat spacetime, namely the Poincare group. To everyone’s surprise this is not what Burg, Metzner and Sachs found. They found a different symmetry group that is now called BMS group. The BMS group was the first example of a non-trivial asymptotic symmetry. In words, this discovery means that general relativity does not simply reduces to special relativity for weak fields at large distances.

Now immediately several questions pop up:

  • How can we find such asymptotic symmetries?
  • What about the asymptotic symmetries of the other forces like electromagnetism?
  • Why should we care about them?

The answer to the third question is certainly the most important one. If you don’t care about asymptotic symmetries, there is no point in discussing how they are defined or what the asymptotic symmetry of electromagnetism is.

Why do we care about asymptotic symmetries?

I got interested in asymptotic symmetries because I wanted to understand gauge symmetries in general. The usual discussion of gauge symmetries in almost every textbook is extremely sketchy and confused me a lot. A proper discussion of asymptotic symmetries helped me immensely to understand the different types of gauge transformations (small, large, global, local). I’ll write some more about this below.

Other reasons to be interested have to do with the buzzwords: holography, memory effects, black hole information paradoxes and soft-photon theorems.

I actually know too little about these topics to write something sensible. Thus, I’ll just quote people who know more about this:

“A central motivation for these IR investigations is to understand the holographic structure of quantum gravity in 4D asymptotically flat spacetimes, which is a good approximation to the real world. This is how I came into the subject. There has been a very beautiful unfolding story over the last twenty years about the holographic structure of quantum gravity in antide Sitter space. The story begins [137] with the identification of the symmetries in anti-de Sitter space with those of its proposed holographic dual. Following this successful example, the very first question we should ask in attempting a holographic formulation of flat space quantum gravity is “What are the symmetries?”. Up until three years ago, the answer to this question was unknown. We now know [41] at the very least that the symmetry group is infinite-dimensional and includes a certain subgroup of, but not all of, the BMS group on past and future null infinity.”

“Although I didn’t start this IR project with black holes in mind, as usual all roads lead to black holes [40,46,156]. The IR structure has important implications for the information paradox [157]. This paradox is intertwined with the deep IR because an infinite number of soft gravitons and soft photons are produced in the process of black hole formation and evaporation. These soft particles carry information with a very low energy cost. They must be carefully tracked in order to follow the flow of information. This is hard to do without a definition of the S-matrix! Moreover, their production is highly constrained by an infinite number of exact quantum conservation laws which correlate them both with energetic hard particles and with the quantum state of the black hole itself. This requires that black holes must carry an infinite number of conserved charges, described as ‘soft hair’ in a recent collaboration with Hawking and Perry [46,156]. The information paradox cannot be clearly stated [158], let alone solved, without accounting for soft particles. The implications of soft hair are recently discussed in [126, 158–175], for example.”

Moreover, it was recently discovered that asymptotic symmetries, memory effects and soft-photon theorems are actually the same thing, just viewed from different perspectives. This is an extremely surprising connection, because at a first glance, these things have nothing to do with each other.


Soft theorems characterize universal properties of Feynman diagrams and scattering amplitudes when a massless external particle becomes ‘soft’, i.e. its energy is taken to zero. They tell us that a surprisingly large — in fact infinite — number of soft particles are produced in any physical process, but in a highly controlled manner which is central to the consistency of quantum field theory”

“Soft theorems are relations between n and n + 1 particle scattering amplitudes, where the extra particle is soft. Any linear relation between scattering amplitudes can be recast as an infinitesimal symmetry of the S-matrix. It is gratifying that in some cases the resulting symmetries have turned out to be known space-time or gauge symmetries. For example Weinberg’s soft graviton theorem [20, 21] is equivalent to a symmetry of the S-matrix generated by a certain diagonal subgroup [2] of the product of BMS [22] supertranslations acting on past and future null infinity, I + and I −. This equivalence relation is of interest for several reasons. It “explains” why soft theorems exist and are so universal: they arise from a symmetry principle. Moreover, it imparts observational meaning to Minkowskian asymptotic symmetries, which have at times eluded physical interpretation. The framework has proven useful for establishing new symmetries [14] and new soft theorems [4–6]. In the quantum gravity case, the symmetries provide the starting point for any attempt at a holographic formulation, see e.g. [23]. In the gauge theory case, they are potentially useful for improving the accuracy of collider predictions, see e.g. [24].”

“We often think of gravitational-wave (GW) signals as having an oscillatory amplitude that starts small at early times, builds to some maximum, and then decays back to zero at late times. For example, this is the standard picture of a waveform from a coalescing compact-object binary. However, this picture is incomplete. In reality, all gravitational-wave sources possess some form of gravitational-wave memory. The GW signal from a `source with memory’ has the property that the late-time and early-time values of at least one of the GW polarizations differ from zero:
\Delta h_{+,\times}^{\rm mem} = \lim_{t\rightarrow +\infty} h_{+,\times}(t) – \lim_{t\rightarrow -\infty} h_{+,\times}(t),
where $t$ is time at the observer.

When a GW without memory passes through a detector, it causes oscillatory deformations but eventually returns the detector to its initial state. After a GWwith memory has passed through an idealized detector (one that is truly freely-falling), it causes a permanent deformation—leaving a `memory’ of the waves’ passage. High-frequency detectors like bars or LIGO are rather insensitive to the memory from most sources because the detector response timescale is generally much shorter than the rise-time of typical memory signals (the characteristic time for the non-oscillatory piece of the GW signal to build up to its final value). A detector like LISA is better able to detect the memory because of its good sensitivity in the low-frequency band where typical memory sources are stronger”

“Gravitational waves are observed by geodesic deviation of nearby freely falling observers. An interesting of gravitational waves called ‘bursts with memory’ will induce permanent relative displacement of nearby observers. Such effect is the well known gravitational memory effect. […] The gravitational memory formula is nothing but the Fourier transformation of Weinberg’s formula for soft graviton production. Moreover, accompanied with an earlier discovery [8], a triangular equivalence has been found. The precise ingredients of the three corners are BMS super-translation [9], Weinberg’s soft graviton theorem and gravitational memory effect”


[T]he memory effect both physically manifests and directly measures the action of the asymptotic symmetries. […] The bigger picture emerging from the triangle is that deep IR physics is extremely rich, perhaps richer than previously appreciated. Every time we breathe, an infinite number of soft photons and gravitons are produced […] The gravitational memory effect imparts a physical meaning to the soft graviton theorem. Soft gravitons may seem a bit unphysical because it takes longer and longer to measure them as the energy goes to zero. Surprisingly, despite this, the memory effect can be measured in a finite time because the Fourier transform of the Weinberg pole is a step function in retarded time. In (unconfined and unhiggsed) nonabelian gauge theory, the color memory effect rotates the relative colors of nearby quarks. If a pulse of gluons passes a pair of initially singlet quarks, it will generically no longer be in a singlet. In abelian gauge theories such as QED, the electromagnetic memory effect gives relative phases to adjacent charged particles, which can be measured by quantum interference or other experiments, as recently discussed in [38, 39, 195]. […] To phrase the issue more generally, soft gravitons are produced in every scattering process. The infrared pole in the soft theorem says that their production is more ubiquitous than might have been expected. In fact, infinitely many are produced in any physical process. The soft modes are correlated with the hard modes and can store information at little or no cost in energy. Many are made in any process of black hole formation/evaporation in a manner which is highly regulated by an infinite number of conservation laws. It strikes us as implausible that we could solve the information paradox in asymptotically flat spacetime without a good understanding of these modes.

So, now that you are hopefully interested it’s time to answer the other two questions from above that I recite here for convenience:  How can we find such asymptotic symmetries? What about the asymptotic symmetries of the other forces like electromagnetism?

To understand asymptotic, we first need to talk about gauge symmetries in general. There is a lot of confusion surrounding gauge symmetries and understanding them properly helps a lot. What is especially unhelpful is that almost any authors uses different notions for things.  So, let’s fix the meaning of the notions we use here first.

Let’s define local gauge symmetries properly 

A global gauge symmetry $G = \{ g\}$ is a set of transformations that leaves the action invariant. If $G=U(1)$ a global gauge transformation is simply $e^{i a}$, with some real number $a$. In contrast, a local gauge symmetry is a set of group transformations $\mathcal{G}$ parametrized by some localized functions of spacetime. Again, for the $U(1)$ example this means that a local gauge transformation is  $e^{i f(x)}$, such that $f(x) \to 0 $ for $x \to \infty$. A function $f(x)$ with this property $f(x) \to 0 $ for $x \to \infty$ is what we call a localized function. This aspect of a local gauge transformation is usually not discussed or mentioned and this is the reason for much of the confusion surrounding gauge transformations.

In words the restriction that our local gauge transformation are parametrized by localized functions means that they only act non-trivially inside some compact bounded region. As a formula, we have: $ g(x) \to 1$ as $|x| \to \infty$ . Only such transformations truly deserve the name local  gauge transformation. The global gauge group is not a subgroup of the local gauge group, because global transformations do not become trivial at infinity. This is a crucial aspect that is usually overlooked. Without the restriction to localized functions the global gauge symmetry is simply a special case of the local group with a constant function.

However, with our restriction to truly local transformations, this is no longer the case. With our definition we can easily keep the local gauge group and the global gauge group apart. The global gauge group is responsible for the conservation laws and is a real symmetry. In contrast, the local gauge symmetry act trivially on all observables and is merely a redundancy in our description. The local gauge group acts trivially on all physical states, whereas the global gauge group does not. The global gauge group acts non-trivially on charged states: $e^{i q}$, where $q$ is the charge.

There is one additional thing I want to mention here, although it’s not directly relevant for the following discussion. A gauge transformations can be trivial at infinity: $ g(x) \to 1$ as $|x| \to \infty$, although $f(x)$ is not zero there. This is possible, because the function $f(x)$  appears in the exponent:  $e^{i f(x)}$ and the exponential function is also $1$ for $f(x) = 2 \pi$ or $f(x) = 4 \pi$ etc. The set of gauge transformations that is trivial at infinity, but where the function that parametrizes the transformation is non-zero are known as large gauge transformations. These large gauge transformations are those that carry non-zero winding number and their implications were discussed here.

What are asymptotic and global symmetries?

Now, with this in mind, we can define what asymptotic symmetries are. The asymptotic symmetry group $ASG$ for given gauge theory is defined as

$$ ASG \equiv \frac{\text{all allowed gauge symmetries}}{ \text{all gauge symmetries that are trivial at infinity } }.$$

(If you are unsure what a quotient group is, have at look at this great blog post).

Expressed differently this means the asymptotic symmetry consists of all gauge transformations that are non-trivial at infinity. As discussed above, we call gauge transformations that are trivial at infinity local gauge transformations (plus large gauge transformations), because, well, they are non-trivial only in a localized region.

Take note that this asymptotic group is not the global gauge group. The global gauge group (GGG) can be defined similarly

$$ GGG \equiv \frac{\text{all gauge symmetries that become constant at infinity}}{ \text{all gauge symmetries that are trivial at infinity} }.$$

Does this definition make sense? Elements of the global gauge group are not parametrized by spacetime coordinates. Hence they do not care about infinity and certainly do not become trivial there. However, at the same time we must recall that elements of the global gauge group are paramterized by constant functions, i.e. numbers, and therefore cannot depend on angular variables as $|x| \to \infty$. Therefore, the global gauge transformations are given as the subset of all gauge transformations that are constant at infinity, modulo all local and large gauge transformations.

When one first encounters this definition of the global gauge group it is natural to wonder: What about they rest? What about all these transformations that are non-trivial and non-constant? Well they are what we call asymptotic symmetries.  From the definition here, we can already see that the global gauge group is a subgroup of the asymptotic symmetry group.

(Source: page 37 in Lectures on the Infrared Structure of Gravity and Gauge Theory by Andrew Strominger)

One last thing: What gets broken in the Higgs mechanism?

Finally I want to mention one last thing that leads to far too much confusion: spontaneous breaking of local gauge symmetries via the Higgs mechanism. It is well known that the standard story told in almost any textbook is wrong. Local gauge symmetries are 1.) not really symmetries and 2.) can’t break, which was proven by Elitzur. A proper discussion is worth it’s own essay, but just one short comment. A question that always pops up when people mention that spontaneous breaking of local symmetry is impossible is: “What then is the Higgs mechanism really doing?”. Well, there is symmetry breaking, but just not of the local gauge symmetry. Instead, what gets broken is the global gauge group, which we defined above as $\mathcal{G}/\mathcal{G} _* $, where

\mathcal{G} _* &= \left \{ \text{ set of all } g(x) \text{ such that } g(x) \to 1 \text{ as } |x| \to \infty \right \} \\
\mathcal{G}  &= \left \{ \text{ set of all } g(x) \text{ such that } g(x) \to \text{ constant element of G, not necessarily 1 as } |x| \to \infty \right \}.

Again, $ \mathcal{G} _*$ is the unphysical local gauge group that only represents a redundancy and acts trivially on all states and observables plus the set of all large gauge transformations). The factor group  $\mathcal{G}/\mathcal{G}_* $ is the global gauge group which is responsible for the Noether charges.

(Source: Quantum Field Theory by Nair page 188 and 276)

Further reading tips:

A Mystery called Wick Rotation or can we understand the “Action” Formalism?

The Wick rotation pops up as a “mere technical trick” in quantum field theoretical calculations. Making the time coordinate complex $t \to i \tau$ is described as “analytic continuation” and helps to solve integrals. Certainly, there is nothing deep behind this technical trick, right?

Well, I’m no longer so sure.

There is one observation that makes me (and others) wonder:

The difference between the mystical theory of quantum fields and ordinary statistical mechanics is a Wick rotation $ t ➝ i /(kT) $.

This is puzzling. On the one hand, we have ordinary statistical mechanics that we understand perfectly well. When we want to make a statement about a system where we don’t know all the details, we invoke the principle of maximum entropy and get as a result the famous Boltzmann distribution $\propto exp(-E/T)$. This distribution tells us the probabilities to find the system, depending on the energy $E$, in the various macroscopic states. There is nothing mysterious about this. The principle of maximum entropy is simply an optimal guessing strategy in situations where we don’t know all the details. (If you don’t know this perspective on entropy you can read about it, for example, here and the references therein). This interpretation due to Jaynes and the derivation of the Boltzmann distribution are completely satisfactory. It is no exaggeration, when we say that we understand statistical mechanics.

On the other hand, we have the mysterious “probability distribution” in quantum field theory that is known as the path integral. I know no one who claims to understand why it works. The path integral is proportional to $exp(iS/\hbar)$, where $S$ denotes the action and even Feynman admitted:

I don’t know what action is“.

In his book QED, when he talks about the path integral, he writes

Will you understand what I’m going to tell you? …No, you’re not going to be able to understand it. … I don’t understand it. Nobody does.

It seems as if all the mysteries of the quantum world are encapsulated in a simple Wick rotation.

I’ve been searching for quite a while, but wasn’t able to find any sufficient discussion of this curios fact.

There was some “recent” work by John Baez, which he described in his blog and also a paper. He also tried to make sense of Wick rotations by making use of it in a classical mechanics example. (See the “Homework on A spring in imaginary time” here and additionally, the discussion here). The lesson there was that “replacing time by “imaginary time” in Lagrangian mechanics turns dynamics problems involving a point particle into statics problems involving a spring.” In addition, several years ago Peter Woit tried to emphasize the confusion surrounding Wick rotations in a blog post. He wrote:

“I’ve always thought this whole confusion is an important clue that there is something about the relation of QFT and geometry that we don’t understand. Things are even more confusing than just worrying about Minkowski vs. Euclidean metrics. To define spinors, we need not just a metric, but a spin connection. In Minkowski space this is a connection on a Spin(3,1)=SL(2,C) bundle, in Euclidean space on a Spin(4)=SU(2)xSU(2) bundle, and these are quite different things, with associated spinor fields with quite different properties. So the whole “Wick Rotation” question is very confusing even in flat space-time when one is dealing with spinors.”

However, apart from that there don’t seem to be any good discussions of the “meaning” of a Wick rotation and I still don’t know what to make of it. Yet, it seems clear that something very deep must be going on here. If the Boltzmann distribution can be understood perfectly by invoking the “best guess” strategy known as “maximal entropy”, has the path integral a similar origin? Probably, but so far, no one was able to find it.

In statistical mechanics our best guess for the macroscopic state our system is in, is the state that can be realized through the maximal number of microscopic states. This state is known as the state with maximal entropy. Many microscopic details make no difference for the macroscopic properties and therefore, many microscopic configurations lead to the same macroscopic state. We don’t know all the microscopic details, but want to make a statement about the macroscopic properties. Hence, we must use a best guess approach, and the best guess it he macroscopic state with maximum entropy.

Aren’t we doing in quantum theory something similar? We admit that we don’t know the fundamental microscopic dynamics. We don’t know which path a given particle takes from point $A$ to point $B$. Nevertheless, when pressured, we make a guess. Our best guess is the path with extremal action.

The observation by John Baez mentioned above that a Wick rotation connects a static description, like in statistical mechanics, with a dynamical description, like in quantum field theory, seems to make sense from this perspective.

Some nice thoughts in this direction are collected by Tommaso Toffoli in his two papers: “What Is the Lagrangian Counting?” and “Action, Or the Fungibility of Computation“. For example, he writes:

just as entropy measures, on a log scale, the number of possible microscopic states consistent with a given macroscopic description, so I argue that action measures, again on a log scale, the number of possible microscopic laws consistent with a given macroscopic behaviour. If entropy measures in how many different states you could be in detail and still be substantially the same, then action measures how many different recipes you could follow in detail and still behave substantially the same.

In addition, I think there could be a connection to recent attempts to understand quantum theory as extended probability theory, where we allow negative probabilities. This line of thought leads to complex probability amplitudes like we know them from quantum theory. For a nice introduction to this perspective on quantum theory, see this lecture by Scott Aaronson. Interestingly he argues that this extension of probability theory is all we need to derive quantum theory. Again, the switch to complex quantities seems to make all the difference.

I think this is a good example of an obvious, but, so far, not sufficiently understood connection that could yield deep insights into the quantum world. I usually write about things, where I think I have understood something. However, here I mainly wrote this to organize my thoughts and as a reminder to think more about this in the future.

To finish, here is an incomplete list, where Wick rotations are also crucial

1.) We use a Wick rotation to classify all irreducible representations of the Lorentz group. In this context, the Wick rotation is often called “Weyl’s unitary trick”.
2.) A Wick rotation is used to analyze tunneling phenomena, like, for example, the famous instanton solutions in QFT.
3.) People who consider QFT at finite temperatures make heavy use of Wick rotations.

Demystifying the QCD Vacuum – Part 5: Anomalies and the Strong CP Problem

There is a deep connection between the non-trivial structure of the QCD vacuum and one of the most mysterious phenomenons in QFT: anomalies. In this part, we discuss this connection.

The thing is that, so far, we only talked about the vacuum of the gauge bosons, without saying a word about fermions. We will now see that the fermionic vacuum isn’t trivial either and that there is a close connection to what we discussed earlier for the pure gauge vacuum.

The Chiral and Axial Symmetry

If we take a look at the QCD Lagrangian with, for simplicity, just one massless quark:

$$ \mathcal{L} = -\frac{1}{4}G_a^{\mu\nu} G_{a\mu\nu} + \bar{\Psi} (i\partial_\mu \gamma^\mu -g A_\mu \gamma^\mu ) \Psi$$

we notice that there is a global symmetry

$$ \Psi \to e^{i\varphi} \Psi . $$

The conserved charge that belongs this symmetry via Noether’s theorem is simply the number of $\Psi$ particles. However, there is even more symmetry.

We can rewrite the Lagrangian in terms of left-chiral and right-chiral spinors, with the help of the usual projection operators: $\Psi_{L/R}= \frac{1}{2} (1 \pm\gamma_5) \Psi$. Then, we have

$$ \mathcal{L} = -\frac{1}{4}G_a^{\mu\nu} G_{a\mu\nu} + \bar{\Psi}_L (i\partial_\mu \gamma^\mu-e A_\mu \gamma^\mu ) \Psi_L + \bar{\Psi}_R (i\partial_\mu \gamma^\mu -eA_\mu \gamma^\mu) \Psi_R $$

and we can see that we actually have here two global symmetry:

\Psi_L \to e^{i\alpha} \Psi_L \notag \\
\Psi_R \to e^{i\beta} \Psi_R

The corresponding Noether charges tell us that the number of left-chiral and right-chiral particles are conserved separately!

We can multiply the right-chiral and left-chiral spinors by completely different phases, because there is no term here that couples left-chiral to right-chiral spinors. (Take note that a mass term couples left-chiral to right-chiral spinors and we discuss the implications of mass terms later).

At this point you may wonder, why we care about symmetries in such an unrealistic situation. Every quark is massive and therefore we don’t actually have these symmetries! However, the masses of the two lightest quarks, the up and down quark are so tiny that they can be neglected without making a too large error. In this sense, symmetries that are present in the absence of the masses of the lightest quarks are good approximate symmetries. Such approximate symmetries are often very useful to learn something. For example, if we neglect the masses of the up quark and the down quark, we have an $SU(2)$ symmetry. This symmetry gets broken, but only a little, by the small actual masses of the up quark and the down quark. This small breaking tells us that we can expect Goldstone bosons that correspond to this breaking. Of course, because the symmetry is only an approximate one, we don’t get real massless Goldstone bosons. Yet, we get quasi-Goldstone bosons, called pions, and the approximate symmetry perspective explains why they are so light compared to all other mesons.

However, our motivation here is a bit different. Namely, we will see in a moment that even in the absence of quark masses, which would break one linear combination of these symmetries, one linear combination is broken! This anomalous breaking of the symmetries has important implication that can actually be measured in experiments.

Now, back to our symmetries.

Noether’s theorem tells us that to each symmetry, we get a conserved current. The conserved currents here are

J_L^\mu = \Psi_L \gamma_\mu \Psi_L \notag \\
J_R^\mu = \Psi_R \gamma_\mu \Psi_R

However, upon closer inspection, which will be discussed in a moment, it turns out that these separate currents are not conserved at all. Yet, we can find a linear combination that is conserved:

J_V^\mu = J_L^\mu + J_R^\mu =\bar{\Psi} \gamma_\mu \Psi \notag \\
\partial_\mu J_V^\mu =0.

In turn, the orthogonal linear combination is not conserved:

J_A^\mu = J_L^\mu – J_R^\mu =\bar{\Psi} \gamma_\mu \gamma_5 \Psi \notag \\
\partial_\mu J_A^\mu \neq 0.

The symmetry that corresponds to the conservation of $J_V^\mu$ is known as “vector $U(1)$ and denoted by $U(1)_V$. An $U(1)$_V transformation is given by

$$ \Psi \to e^{i\phi_v} \Psi . $$

The symmetry that would exist if $J_A^\mu$ would be conserved, is known as “axial $U(1)$ and denoted by $U(1)_A$. An $U(1)$_A transformation is given by

$$ \Psi \to e^{i \phi_a \gamma_5} \Psi . $$

The connection to the previous transformations that acted on $\Psi_L$ and $\Psi_R$ is given by $\alpha = \phi_v+\phi_a$ and $\beta = \phi_v-\phi_a$.

The situation here is similar to what happens in the standard model. There $SU(2)_L \times U(1)_Y$ gets broken to $U(1)_{em}$. The thing is that $U(1)_{em}$ is not $U(1)_Y$, but a linear combination of $U(1)_Y$ and the Cartan generator of $SU(2)_L$. Here we start with $U(1)_L \times U(1)_R$, and this symmetry “gets broken” to $U(1)_V$.

How does $U(1)_A$ get broken?

Above, we only stated that $U(1)_A$ gets broken. However, that this breaking happens is far from obvious. There is no scalar field in the theory that could be responsible for the breaking. Instead, we are dealing here with a more subtle type of symmetry breaking, called quantum mechanical symmetry breaking. A symmetry that is present in the classical theory, i.e. when we simply look at the Lagrangian, is no symmetry as soon as we use the Lagrangian in a quantum theory.

The conventional name for such quantum mechanical symmetry breaking is “anomalous breaking”.

There are several ways to see that this anomalous breaking happens.

Historically this was first discovered through a quite complicated computation of an Feynman diagram called “triangle diagram”.

The result of this computation by Adler, Bell and Jackiw was

$$ \partial_\mu J_5^\mu = \frac{g^2}{16\pi^2} G^{\mu\nu a} \tilde{G}_{\mu \nu}^a $$

This looks shockingly like the term that we added to the Lagrangian due to the complex structure of the QCD vacuum. (This was discussed in part 4). The details regarding this laborious computation can be found in the standard textbooks, but aren’t very illuminating. Thus, we won’t go into the details here.

Instead, I want to focus on the implications and a more illustrative explanation.

Understanding the Axial Anomaly

To understand the axial anomaly, we consider the vacuum in a theory of massless fermions. To understand the theory and its vacuum, we consider its energy levels. In practice this means, we calculate the eigenmodes of the Hamiltonian.

The best picture of this vacuum is Dirac’s “sea picture”. All states with negative energy a filled up, whereas all positive energy states are empty. An electron is a positive energy state, whereas a positron is a hole in the sea of negative energy states.

In the real world, however, fermions are never alone because they carry charges. Thus, we now investigate what happens when we take the presence of gauge fields into account. We will then see that the axial anomaly is nothing but a natural consequence of the interplay between the Dirac sea and gauge fields.

To simplify the discussion, we work in two dimensions and use electromagnetic interactions, instead of the more complicated QCD interactions. The massless theory of fermions in two-dimensions, with only electromagnetic interactions present, is known as the Schwinger model. The Schwinger model is incredibly useful to understand many phenomena in quantum field theory and will prove to be invaluable here.

To simplify the discussion even further, we work in the temporal gauge: $A_0=0$. This means our gauge field has only one component $A_1 \equiv A$.

In our two-dimensional theory, we split our spinor again depending on their chirality:

$$ \Psi_+ = \begin{pmatrix} 1 & 0 \\ 0 &0 \end{pmatrix} \Psi $$
$$ \Psi_- = \begin{pmatrix} 0 & 0 \\ 0 &1 \end{pmatrix} \Psi $$

Particles with positive “chirality” are here simply particles that move to the left on our one-dimensional spatial axes (the second dimension is the time axes.) Formulated differently, positive “chirality” states are states with negative momentum. Equivalently, negative “chirality” states are states that move to the right and therefore have positive momentum.

Completely analogous to our four-dimensional problem, we can find here an anomalous divergence. Here it is proportional to $\epsilon^{\mu\nu} F_{\mu\nu} \propto \partial_t A$. We now want to answer the question: What is the origin of this anomalous divergence?

The Dirac equation for our two-dimensional model reads

$$ H \Psi_E = -\sigma_3 (\hat p – g A) \Psi_E = E\Psi_E. $$

The energy eigenstates are

\Psi_+ &= \begin{pmatrix} e^{ipx} \\0 \end{pmatrix} \text{ with energy } E=-p+qA \notag \\
\Psi_- &= \begin{pmatrix} 0 \\ e^{ipx} \end{pmatrix} \text{ with energy } E=p-qA

Now, in the absence of the gauge field $A$, we have for the vacuum the simple Dirac sea picture outlined above. All the negative energy states are filled, while all the positive energy states are empty.

However, something interesting happens when we switch on the gauge field. As the magnitude of $A$ increases from $0$ to $\delta A$, we can see how the energy levels shift. This is best explained by a picture:


The states with positive chirality, and hence negative momentum, do have a higher energy thanks to the gauge field $A$. In contrast the energy levels of states with negative chirality (= positive momentum) get lower when we switch on $A$.

For the Dirac sea this means that states that were once negative energy states and therefore filled states, become now filled positive energy states. Equivalently unfilled positive energy states (= holes) now have negative energy and move below the zero energy border. In other words, the gauge field produces holes in the negative energy sea and filled positive energy states.

Let’s consider, for concreteness a positive magnitude of the gauge field $A = \delta A > 0$:

An empty state with positive momentum, positive energy and left chirality, now acquires negative energy and therefore becomes an right-chiral antiparticle.

An filled state with negative momentum, negative energy and right-chirality, now acquires positive energy and therefore becomes a left-chiral particle.

This means immediately that in the presence of a gauge field $A$ the charge “left-chirality” and the charge “right-chirality” are not conserved. However the sum of “left-chirality” and “right-chirality” is still conserved! This is analogous to what we observed for the conserved current $J_L^\mu$ and $J_R^\mu$.

This is the origin of the anomaly! The gauge field produces a non-zero chirality current by lifting some states up from the Dirac sea and by pushing some holes down into the negative energy region.

It is important to take note that the shift from $A=0$ to $A= \delta A$ is a gauge transformation! The crazy thing that happens here is that such gauge transformation produces particles from the empty vacuum and this is why we get a non-zero current. What we learn here is that it is impossible to separate left-chiral and right-chiral states in a gauge invariant manner.

The fermionic vacuum, i.e. the Dirac sea, is highly susceptible to the gauge field configurations. The mere presence of the gauge fields changes the structure of the energy eigenstates and hence of the Dirac sea dramatically.

As an aside, that will be discussed in more detail in another post: This type of fermion production through gauge fields is the most popular explanation for why there is any matter at all. This explanation is known as Leptogenesis and the main idea is that topological non-trivial gauge field changes can be responsible for a netto baryon number plus lepton number surplus, while baryon minus lepton number remains unchanged.

Another important lesson here, to quote Roman Jackiw, is that:

“we must assign physical reality to Dirac’s negative energy sea, because it produces the chiral anomaly, whose effects are experimentally observed, principally in the decay of the neutral pion to two photons, but there are other physical consequences as well.”

Now, what does this mean for our axial anomaly in four dimensions?

We know that the axial current $J_5^\mu$ is anomalously non-conserved. This means that the divergence $\partial_\mu J_5^\mu$ is non-zero, and a calculation shows that it is $\propto Tr( F_{\mu\nu} \tilde{F}^{\mu\nu})$. Thus, the corresponding Noether charge

$$ Q = \int d^3 x J^0_5 $$

is not conserved. Especially, in any process where the gauge fields change such that

$$ N = \frac{1}{32 \pi^2} \int d^4x Tr( F_{\mu\nu} \tilde{F}^{\mu\nu}) \neq 0 , $$

the Noether charge $Q$ gets changed. Such process were already discussed in the first three parts, and are commonly known as instanton and sphaleron processes. These processes change the winding number $N$. Thanks to the connection to the axial anomaly that we know now of, we understand that such processes produce a netto surplus of left-chiral and right-chiral states. Yet, the number of left-chiral minus the number of right-chiral states remains unchanged. The quantum number “left-chirality” plus “right-chirality” is not conserved and this is the breaking of the axial symmetry.

Topologically non-trivial processes like instantons and sphalerons lift fermions up from the Dirac sea and push unfilled positive states down to negative energies. This way, instantons and sphalerons produce fermions and anti-fermion pairs.

To quote from Eric Weinberg’s book “Classical Solutions in Quantum Field Theory“:

“any change in winding number must be accompanied by a change in fermion chirality”

If you interested to learn more about this perspective on anomalies, here are a few good resources, where you can learn more:

Implications of the Axial Anomaly

So, the non-conservation of the axial current $ \partial_\mu J_5^\mu \neq 0$ tells us that axial rotations $ \Psi \to e^{i \phi_a \gamma_5} \Psi $ are not a symmetry of the system. Therefore, we can now ask: How does the Lagrangian change under axial rotations?

As for anything that has to do with anomalies, there are many ways to answer this question. But, of course, the final answer is always the same:

An axial rotation $ \Psi \to e^{i \phi_a \gamma_5} \Psi $ changes our Lagrangian by

$$ \mathcal{L} \to \mathcal{L} + \frac{\alpha}{16 \pi^2} Tr[G_{\mu\nu} \tilde{G}^{\mu\nu}] . $$

Compare this to the term that we needed to add, because of the complex structure of the QCD vacuum:

$$\Delta \mathcal L = \frac{\theta}{16 \pi^2} Tr[G_{\mu\nu} \tilde{G}^{\mu\nu}]$$

It’s exactly the same!

Thus, we can say that an axial rotation by $\alpha$ shifts the mysterious $\theta$ parameter of the QCD vacuum by:

$$ \theta \to \theta + \alpha .$$

So, why does an axial rotation lead to this new term in the Lagrangian? As already mentioned above, there are different ways to see this.

1.) The standard method that is usually quoted in the textbook is known as “Fujikawa method”. (It has its own Wikipedia page). Again, I don’t want to dive into the technical details, which you can find in the standard textbooks. However, the short version is that once careful analyzes the behaviour of the path integral under an axial rotation. While the Lagrangian behaves, of course, as expected from the discussion above and stays unchanged, the measure of the path integral isn’t invariant. Instead, the final result of Fujikawa’s analysis is that the change in the path integral measure due to an axial rotation, amounts exactly to the change

$$ \mathcal{L} \to \mathcal{L} + \frac{\alpha}{16 \pi^2} Tr[G_{\mu\nu} \tilde{G}^{\mu\nu}] . $$

of the Lagrangian.

2.) Another way to see this, is to go directly back to Noether’s theorem. (See
Palash Pal’s “An Introductory Course of Particle Physics” Eq. 4.108 at page 82 plus page 658 Eq. 21.158).

In the derivation of this theorem in the Lagrangian formalism, we calculate that when a field gets transformed

$$ \Psi^A(x) \to \Psi’^A(x)=\Psi^A(x) + \delta \Psi^A(x), $$

the change of the action is

$$ \delta S = \int d^4 x \sum_r \delta \varphi_r \partial_\mu J_r^\mu, $$


$$ J_r = \sum_A \frac{\partial\mathcal{L}}{\partial(\partial_\mu\Psi^A)} \frac{\partial\Psi^A}{\partial \varphi_r}$$

and  $\varphi_r$ denotes a small change in a number of parameters.

(This is shown, for example at page 106 and 107 in my book “Physics from Symmetry”. In addition, take note that, as usual in the derivation of Noether’s theorem, we only consider infinitesimal transformations).

If we are dealing with a symmetry, the action does not change: $\delta S =0$ and thus we have $ \partial_\mu J_r^\mu =0$, i.e. a conserved current.

However, here we have situation where we found that $ \partial_\mu J_A^\mu \neq 0$. Thus, the corresponding transformation,an axial rotation $ \Psi \to e^{i \varphi \gamma_5} \Psi $, is not a symmetry. We can therefore conclude that the action changes under such a rotation, and the change of the action is given by

$$ \delta S = \int d^4 x \sum_R \delta \varphi_r \partial_\mu J_r^\mu . $$

In our case,
$$ \partial_\mu J_5^\mu = \frac{g^2}{16\pi^2} Tr(G^{\mu\nu a} \tilde{G}_{\mu \nu}^a )$$

and therefore, the the action changes by

$$ \delta S = \int d^4 x \varphi \partial_\mu J_r^\mu = \frac{g^2 \varphi }{16\pi^2} G^{\mu\nu a} \tilde{G}_{\mu \nu}^a . $$

3.) A third method to see this change of the action, is the original method by Jackiw and Rebbi (PhysRevLett.37.172). Again, we only discuss the main idea, and do not dive into the details.

The basic idea is the following: Instead of the non-conserved current $J_5^\mu$, we define a new current that is conserved. The corresponding Noether charge generates the corresponding symmetry. Then we investigate the how this Noether charge acts on our ground state $|\theta\rangle$. The result is the same as for the previous two methods:

$$ e^{i \alpha Q_5} |\theta\rangle = |\theta + \alpha \rangle.$$

So, now let’s see how this comes about in a bit more detail.

From the discussion above, we know that $J_5^\mu = \bar{\Psi} \gamma_\mu \gamma_5 \Psi $ is not conserved. Instead, we have

$$ \partial_\mu J_5^\mu = \frac{g^2}{16\pi^2} Tr(G^{\mu\nu a} \tilde{G}_{\mu \nu}^a) . $$

Now, an important observation is, that $G^{\mu\nu a} \tilde{G}_{\mu \nu}^a$ can be written as total divergence:

$$ \frac{1}{4} G^{\mu\nu a} \tilde{G}_{\mu \nu}^a = \partial_\mu K^\mu, $$


$$ K^\mu = \epsilon^{\mu \alpha\beta \gamma} Tr(\frac{1}{2} A-\alpha \partial_\beta A_\gamma + \frac{i}{3} g A-\alpha A_\beta A_\gamma) $$

(A proof of this statement can be found, for example at page 89 in “Quarks, Leptons and Gauge Fields by K. Huang.)

$K_\mu$ is commonly called the Chern-Simons term or Chern-Simons current.

With the observation that $ G^{\mu\nu a} \tilde{G}_{\mu \nu}^a$ can be written as total divergence, we can define a new, actually conserved, axial current:

$$ \tilde{J}_5^\mu = J_5^\mu – \frac{g^2}{16\pi^2} K^\mu . $$

The trick here is, of course, that if we not take the divergence of this new current, the two terms simply cancel:

$$ \partial_\mu \tilde{J}_5^\mu = \partial_\mu J_5^\mu – \partial_\mu \frac{g^2}{16\pi^2} K^\mu $$
$$= \frac{g^2}{16\pi^2} Tr(G^{\mu\nu a} \tilde{G}_{\mu \nu}^a) – \frac{g^2}{16\pi^2} Tr(G^{\mu\nu a} \tilde{G}_{\mu \nu}^a) =0 . $$

The generator $Q_5$ of this $\tilde{U}(1)_A$ is, as always, the corresponding Noether charge

$$ Q_5 \equiv \int d^3x J_5^0 = \int d^3x \left[\Psi^\dagger \gamma_5 \Psi – \frac{g^2}{16\pi^2} K^0 \right]. $$

A curious feature of this Noether charge is that it isn’t gauge invariant and therefore not a physical quantity. The reason for this is that $K^\mu$ isn’t gauge invariant.

Nevertheless, we have here the generator of a symmetry and we are now interested in how the $\theta$ vacuum, that we discussed in part 4, behaves under the transformation that is generated by $Q_5$.

To do this, we employ a trick. We already saw in part 4 that if we act with some gauge transformation with winding number $n$ on our vacuum state $|\theta\rangle$, we get $ g_n |\theta\rangle = e^{in \theta}$. The idea is now, to use this to find out if $\theta$ gets changed by $Q_5$. In other words, we want to compute

$$ g_n \left( e^{i\alpha Q_5} |\theta\rangle \right) = e^{i\theta’}\left( e^{i\alpha Q_5} |\theta\rangle \right) . $$

The resulting $\theta’$ tells us how $\theta$ is affected by $e^{i\alpha Q_5}$.

To compute this, we need to know how $Q_5$ changes under gauge transformations. The result is (see Jackiw and Rebbi 1976)

$$g_n Q_5 g_n^{-1} = Q_5 + 1 .$$

With this information at hand, we can calculate

g_1 \left( e^{i\alpha Q_5} |\theta\rangle \right) &= g_1 e^{i\alpha Q_5} |\theta\rangle g_1^{-1} g_1\notag \\
&= e^{i\alpha (Q_5+1)}g_1\notag \\
&= e^{i\alpha (Q_5+1)} e^{i\theta} |\theta\rangle \notag \\
&= e^{i(\theta+ \alpha)} \left( e^{i\alpha Q_5} |\theta\rangle \right) \notag \\
&\equiv e^{i\theta’} \left( e^{i\alpha Q_5} |\theta\rangle \right)

and thus we can conclude

$$ e^{i\alpha Q_5} |\theta\rangle = |\theta + \alpha \rangle .$$

From the discussion in part 4 we know that the existence of the non-trivial ground state $|\theta\rangle$ implies a new term in the Lagrangian

$\Delta \mathcal L = \frac{\theta}{16 \pi^2} Tr[G_{\mu\nu} \tilde{G}^{\mu\nu}].$$

The observation here that $Q_5$ shifts $\theta$, then means that the $\theta$ that appears in this new term, get shifted. Hence, we are again led to the conclusion that a chiral rotation implies a new term in the Lagrangian

$$ \Delta \mathcal L = \frac{g^2 \alpha }{16\pi^2} G^{\mu\nu a} \tilde{G}_{\mu \nu}^a$$

The Strong CP Problem

We saw in the last section that an axial rotation by $\alpha$ shifts the $\theta$ parameter of the QCD vacuum by:

$$ \theta \to \theta + \alpha .$$

Without mass terms, we can define a conserved but non-gauge invariant axial symmetry. Then we can make use of this symmetry to get rid of the parameter $\theta$. We are free to do any rotation we want and therefore, we can easily rotate $\theta$ to zero.

However, if there are mass terms

$$ m \bar \Psi \Psi = m \bar{\Psi}_L \Psi_R + m \bar{\Psi}_R \Psi_L $$

for the quarks, we no longer have this freedom. The axial symmetry is broken explicitly by the mass terms, because we are no longer free to rotate the left-chiral spinors and right-chiral spinors independently. A mass term explicitly couples a right-chiral to a left-chiral spinor. Therefore, the only allowed transformation is now

\Psi_L \to e^{i\alpha} \Psi_L \notag \\
\Psi_R \to e^{i\alpha} \Psi_R


\Psi_L \to e^{i\alpha} \Psi_L \notag \\
\Psi_R \to e^{-i\alpha} \Psi_R

is no longer a symmetry. Transforming the left-chiral and the right-chiral spinor with the same phase is a $U(1)_V$ transformation, whereas a transformation with opposite phase is an $U(1)_A$ transformation. In this sense, we can say that mass term breaks $U(1)_A$ explicitly.

Yet, we are forced to perform an axial rotation. This comes about, because in order to understand the physical content of the theory, we like to work in the mass basis where the mass matrices are real and diagonal. In general, the mass matrices aren’t real and diagonal, but instead contain complex entries. The transformation

\Psi_L \to U_L\Psi_L \notag \\
\Psi_R \to U_R \Psi_R,

where $U_L$ are unitary matrices, that make the mass matrix real and diagonal (we suppress generational indices here) leads to the emergence of the CKM matrix in the gauge sector of the theory.

A crucial observation is now that this rotation that we perform to switch to the mass basis, in general, involves an axial rotation. In particular, the desired transformation involves the rotation

\Psi_L \to e^{-i ArgDet(M)} \Psi_L \notag \\
\Psi_R \to e^{i ArgDet(M)} \Psi_R .

(See, Eq. 191 in

Thus, in contrast to the discussion for a massless theory, we are here no longer free to perform arbitrary axial rotations. Instead, there is one very special axial rotation, by the angle $\alpha = ArgDet(M)$ that we need to make the mass matrix $M$ real and diagonal.

From the discussion in the last section, we know that an axial rotation by angle $\alpha$ changes the Lagrangian

$$ \mathcal L \to \mathcal L + \frac{g^2 \alpha }{16\pi^2} G^{\mu\nu a} \tilde{G}_{\mu \nu}^a . $$

If there are mass terms, the angle $\alpha$ is fixed and given by $\alpha = ArgDet(M)$.

Thus, on the one hand, we have a parameter $\theta$ that comes from the detailed study of the QCD vacuum. On the other hand, we have a shift of this parameter through an axial rotation of quark fields by the angle $\alpha = ArgDet(M)$.

To take these two observations into account, one usually introduces a new overall parameters

$$ \bar{\theta} = \theta + ArgDet(M). $$

From experiments we know, as mentioned at the end of part 4, that $\bar{\theta}$ is tiny: $ \bar{\theta} \lesssim 10^{-9} $. Thus, in some sense the two contributions to $\bar{\theta}$ must cancel very, very precisely. This is usually called a “fine-tuning” problem, because the QCD vacuum angle $\theta$ and the $ArgDet$ must be fine-tuned to extremely high precision to yield such a tiny overall $\bar{\theta}$.

This is often presented as a big mystery. Why should there be a connection between these two seemingly completely unrelated parameters? The parameter $\theta$ was discovered by studying the pure gauge vacuum. The shift of $\theta$ by the angle $alpha$ comes from the an axial rotation of fermionic fields, and has its deep origin in the axial anomaly.

However, from the discussion above it should be clear that these two contributions aren’t so unrelated after all. Both originate in non-perturbative processes like instantons.

The emergence of $\theta$ as a parameter that describes the QCD vacuum structure, was a result of instanton process. In the temporal gauge, we discovered

An unrealistic solution of the strong CP problem

One trivial solution to the strong CP problem was, in principle, already mentioned above. Without a mass term $\bar{\theta}$ wouldn’t be a physical parameter, because we can give it any term we want through axial rotations. However, if there is a mass term, we no longer have this freedom.

In the real world, there are many quarks and therefore, in the absence of mass terms many axial symmetries: one for each quark. This means immediately that when one quark is massless, say the up-quark, we could perform an arbitrary axial rotation of the corresponding spinors. Following the discussion above, this would immediately mean that $\theta$ is not a physical quantity, because we can change it at will via this axial rotation.

Only, if all fermions do have mass, $\bar{\theta}$ is a physical parameter. However, as far as we know this is actually the case and therefore $\bar{\theta}$ physical. Yet, “one massless quark” is commonly quoted as a solution of the strong CP problem.