Jakob Schwichtenberg

Larger Symmetries

“Further progress lies in the direction of making our equations invariant under wider and still wider transformations.”

These prophetic lines were written in 1930 by P. A. M. Dirac in his famous book “The Principles of Quantum Mechanics”. In the following centuries, tremendous progress was made exactly as he predicted.

Weak interactions were described perfectly using $SU(2)$ symmetry, strong interactions using $SU(3)$ symmetry and it is well known that electrodynamics can be derived from $U(1)$ symmetry. Other aspects of elementary particles, like their spin, can be understood using the symmetry of special relativity.

A symmetry is a transformation that leaves our equations invariant, i.e. that does not change the equations. A set of symmetry transformations is called a group and, for example, the set of transformations that leaves the equations of special relativity invariant is called the Poincare group.

By making our equations invariant under the quite large set of transformations:

$$ \text{Poincare Group} \times U(1) \times SU(2) \times SU(3) , $$

we are able to describe all known interactions of elementary particles, except for gravity. This symmetry is the core of the standard model of modern physics, which is approximately 40 years old. Since then it has been confirmed many times, for example, through the discovery of the Higgs boson. Just as Dirac predicted, we gained incredible insights into the inner workings of nature, by making the symmetry of our equations larger and larger.

Unfortunately, since the completion of the standard model $\sim 40$ years ago, there was no further progress in this direction. No further symmetry of nature was revealed by experiments. (At least that’s the standard view, but I don’t think it’s true. More on that later). In 2017 our equations are still simply invariant under $ \text{Poincare Group} \times U(1) \times SU(2) \times SU(3) , $ but no larger symmetry.

I’m a big believer in Dirac’s mantra. Despite the lack of new experimental insights, I do think there are many great ideas for how symmetries could guide us towards the correct theory beyond the standard model.

Before we can discuss some of these ideas, there is one additional thing that should be noted. Although the four groups $ \text{Poincare Group} \times U(1) \times SU(2) \times SU(3) $ are written equally next to each other, they aren’t treated equally in the standard model. The Poincare group is a spacetime symmetry, whereas all other groups describe inner symmetries of quantum fields. Therefore, we must divide the quest for a larger symmetry into two parts. On the one hand, we can enlarge the spacetime symmetry and on the other hand, we can enlarge the inner symmetry. In addition to these two approaches, we can also try to treat the symmetries equally and enlarge them at the same time.

Let’s start with the spacetime symmetry.

Enlargement of the Spacetime Symmetry

The symmetry group of special relativity is the set of transformations that describe transformations between inertial frames of reference and leave the speed of light invariant. As already noted, this set of transformations is called the Poincare group.

Before Einstein discovered special relativity, people used a spacetime symmetry that is called the Galilean group. The Galilean group also describes transformations between inertial frames of reference but does not care about the speed of light.

The effects of special relativity are only important for objects that are moving fast. For everything that moves slowly compared to the speed of light, the Galilean group is sufficient. The Galilean group is an approximate symmetry when objects move slowly. Mathematically this means that the Galilean group is the contraction of the Poincare group in the limit where the speed of light goes to infinity. For an infinite speed of light, nothing can move with a speed close to the speed of light and thus the Galilean group would be the correct symmetry group.

It is natural to wonder if the Poincare group is an approximate symmetry, too.

One hint in this direction is that the Poincare group is an “ugly” group. The Poincare group is the semi-direct product of the group of translations and the Lorentz group, which described rotations and boosts. Therefore the Poincare group, not a simple group. The simple groups are the “atoms of groups” that can be used to construct all other groups from. However, the spacetime symmetry group that we use in the standard model is not one of these truly fundamental groups.

Already in 1967, Monique Levy‐Nahas studied the question which groups could yield the Poincare group as a limit, analogous to how the Poincare group yields the Galilean group as a limit.

The answer she found was stunningly simple: “the only groups which can be contracted in the Poincaré group are $SO(4, 1)$ and $SO(3, 2)$”. These groups are called the de Sitter and the anti-de Sitter group.

They consist of transformations that describe transformations between inertial frames of reference, leave the speed of light invariant and leave additionally an energy scale invariant. The de Sitter group leaves a positive energy scale invariant, whereas the anti deSitter group leaves a negative energy scale invariant. Both contract to the Poincare group in the limit where the invariant energy scale goes to zero.

Levy‐Nahas’ discovery is great news. There isn’t some large pool of symmetries that we can choose from, but only two. In addition, the groups she found are simple groups and therefore much “prettier” than the Poincare group.

Following Dirac’s mantra and remembering the fact that the deformation: Galilean Group $\to $ Poincare Group led to incredible progress, we should take the idea of replacing the Poincare group with the de Sitter or anti de Sitter group seriously. This point was already emphasized in 1972 by Freeman J. Dyson in his famous talk “Missed opportunities”.

Nevertheless, I didn’t hear about the de Sitter groups in any particle physics lecture or read about them in any particle physics book. Maybe because the de Sitter symmetry is not a symmetry of nature? Because there is no experimental evidence?

To answer these questions, we must first answer the question: what is the energy scale that is left invariant?

The answer is: it’s the cosmological constant!

The present experimental status is that the cosmological constant is tiny but nonzero and positive: $\Lambda \approx 10^{-12}$ eV! This smallness explains why the Poincare group works so well. Nevertheless, the correct spacetime symmetry group is the de Sitter group. I’m a bit confused why this isn’t mentioned in the textbooks or lectures. If you have an idea, please let me know!

Can we enlarge the spacetime symmetry even further?

Yes, we can. But as we know from Levy‐Nahas’ paper, only a different kind of symmetry enlargement is possible. There isn’t any other symmetry that could be more exact and yield the de Sitter group in some limit. Instead, we can think about the question, if there could be a larger broken spacetime symmetry.

Nowadays the idea of a broken symmetry is well known and already an important part of the standard model. In the standard model, the Higgs field triggers the breaking $SU(2) \times U(1) \to U(1)$.

Something similar could’ve been happened to a spacetime symmetry in the early universe. A good candidate for such a broken spacetime symmetry is the conformal group $SO(4,2)$.

The temperature in the early universe was incredibly high and “[i]t is an old idea in particle physics that, in some sense, at sufficiently high energies the masses of the elementary particles should become unimportant” (Sidney Coleman in Aspects of Symmetry). In the massless limit, our equations become invariant under the conformal group (source). The de Sitter group and the Poincare group are subgroups of the conformal group. Therefore it is possible that the conformal group was broken to the de Sitter group in the early universe.

This idea is interesting for a different reason, too. The only parameter in the standard model that breaks conformal symmetry at tree level is the Higgs mass parameter. This parameter is the most problematic aspect of the standard model and possibly the Higgs mass fine-tuning problem can be solved with the help of the conformal group. (See: On naturalness in the standard model by William A. Bardeen.)

Enlargement of the Inner Symmetry

The inner symmetry group of the standard model $ U(1) \times SU(2) \times SU(3) $ is quite ugly, too. Like the Poincare group, it is not a simple group.

There is an old idea by Howard Georgi and Sheldon Glashow that instead of $ U(1) \times SU(2) \times SU(3) $ we use a larger, simple group $G_{GUT} $. These kinds of theories are called Grand Unified Theories (GUTs).

While GUTs have problems, they are certainly beautiful. On obvious “problem” is that in present-day colliders, we do not observe effects of a $G_{GUT}$ structure and thus we assume the unified gauge symmetry is broken at some high energy scale:

\begin{equation} \label{eq:schematicgutbreaking}
G_{GUT} \stackrel{M_{GUT}}{\rightarrow} \ldots \stackrel{M_I}{\rightarrow} G_{SM} \stackrel{M_Z}{\rightarrow} SU(3)_C \times U(1)_Q \, ,
\end{equation}

where the dots indicate possible intermediate scales between $G_{GUT}$ and $G_{SM}$. In the following, we discuss some of the “mysteries” of the standard model that can be resolved by a GUT.

Quantization of Electric Charge

In the standard model the electric charges of the various particles must be put in by hand and there is no reason why there should be any relation between the electron and proton charge. However from experiments it is known that $Q_{\text{proton}}+Q_{\text{electron}}= \mathcal{O}(10^{-20})$. In GUTs one multiplet of $G_{GUT}$ contains quarks and leptons. This way, GUTs provide an elegant explanation for the experimental fact of charge quantization. For example in $SU(5)$ GUTs the conjugate $5$-dimensional representation contains the down quark and the lepton doublet

\begin{equation}
\bar{5} = \begin{pmatrix} \nu_L \\ e_L \\ (d_R^c)_{\text{red}} \\ (d_R^c)_{\text{blue}} \, .\\ (d_R^c)_{\text{green}} \end{pmatrix}
\end{equation}

The standard model generators must correspond to generators of $G_{GUT}$. Thus the electric charge generator must correspond to one Cartan generator of $G_{GUT}$ (The eigenvalues of the Cartan generators of a given gauge group correspond to the quantum numbers commonly used in particle physics.). In $SU(5)$ the Cartan generators can be written as diagonal $5\times 5$ matrices with trace zero. (In $SU(5)$ is the set of $5 \times 5$ matrices $U$ with determinant $1$ that fulfil $U^\dagger U = 1$. For the generators $T_a$ this means $\text{det}(e^{i \alpha_a T_a})=e^{i \alpha_a \text{Tr}(T_a)} \stackrel{!}{=}1$. Therefore $Tr(T_a) \stackrel{!}{=} 0$) Therefore we have

\begin{align}
\text{Tr}(Q)&= \text{Tr} \begin{pmatrix} Q(\nu_L) & 0 & 0 & 0 &0 \\ 0 & Q(e_L) & 0 & 0 &0 \\ 0 & 0 & Q((d_R^c)_{\text{red}}) & 0 &0\\ 0 & 0 & 0 & Q((d_R^c)_{\text{blue}})&0\\ 0 & 0 & 0 & 0 &Q((d_R^c)_{\text{green}}) \end{pmatrix} \stackrel{!}{=} 0 \notag \\
&\rightarrow Q(\nu_L) + Q(e_L) + 3Q(d_R^c) \stackrel{!}{=} 0 \notag \\
&\rightarrow Q(d_R^c) \stackrel{!}{=} -\frac{1}{3} Q(e_L) \, .
\end{align}

Analogously, we can derive a relation between $e_R^c$, $u_L$ and $u_R^c$. Thus $Q_{\text{proton}}+Q_{\text{electron}}= \mathcal{O}(10^{-20})$ is no longer a miracle, but rather a direct consequence of of the embedding of $G_{SM}$ in an enlarged gauge symmetry.

Coupling Strengths

The standard model contains three gauge couplings, which are very different in strength. Again, this is not a real problem of the standard model, because we can simply put these values in by hand. However, GUTs provide a beautiful explanation for this difference in strength. A simple group $G_{GUT}$ implies that we have only one gauge coupling as long as $G_{GUT}$ is unbroken. The gauge symmetry $G_{GUT}$ is broken at some high energy scale in the early universe. Afterward, we have three distinct gauge couplings with approximately equal strength. The gauge couplings are not constant but depend on the energy scale. This is described by the renormalization group equations (RGEs). The RGEs for a gauge coupling depends on the number of particles that carry the corresponding charge. Gauge bosons have the effect that a given gauge coupling becomes stronger at lower energies and fermions have the opposite effect. The adjoint of $SU(3)$ is $8$-dimensional and therefore we have $8$ corresponding gauge bosons. In contrast, the adjoint of $SU(2)$ is $3$-dimensional and thus we have $3$ gauge bosons. For $U(1)$ there is only one gauge boson. As a result for $SU(3)$ the gauge boson effect dominates and the corresponding gauge coupling becomes stronger at lower energies. For $SU(2)$ the fermion and boson effect almost cancel each other and thus the corresponding gauge coupling is approximately constant. For $U(1)$ the fermions dominate and the $U(1)$ gauge coupling becomes much weaker at low energies. This is shown schematically in the figure below. This way GUTs provide an explanation why strong interactions are strong and weak interactions are weak.

 

Another interesting aspect of the renormalization group evolution of the gauge couplings is that there is a close between the GUT scale and the proton lifetime. Thus proton decay experiments yield directly a bound on the GUT scale $M_{GUT} \gtrsim
10^{15}$ GeV. On the other hand, we can use the measured values of the gauge couplings and the standard model particle content to calculate how the three standard model gauge couplings change with energy. Thus we can approximate the GUT scale as the energy scale at which the couplings become approximately equal. The exact scale depends on the details of the GUT model, but the general result is a very high scale, which is surprisingly close to the value from proton decay experiments. This is not a foregone conclusion. With a different particle content or different measured values of the gauge coupling, this calculation could yield a much lower scale and this would be a strong argument against GUTs. In addition, the gauge couplings could run in the “wrong direction” as shown in the figure. The fact that the gauge coupling run sufficiently slow and become approximately equal at high energies are therefore hints in favor of the GUT idea.

 

Further Postdictions

In addition to the “classical” GUT postdictions described in the last two sections, I want to mention two additional postdictions:

  • A quite generic implication of grand unification small neutrino masses through the type-1 seesaw mechanism. Models based on the popular $SO(10)$ or $E_6$ groups contain automatically a right-handed neutrino $\nu_R$. As a result of the breaking chain this standard model singlet $\nu_R$ gets a superheavy mass $M$. After the last breaking step $G_{SM}\rightarrow SU(3)_C \times U(1)_Y$ the right-handed and left-handed neutrinos mix. This yields a suppressed mass of the left-handed neutrino of order $\frac{m^2}{M}$, where $m$ denotes a typical standard model mass.
  •  GUTs provide a natural framework to explain the observed matter-antimatter asymmetry in the universe. As already noted above a general implication of GUTs is that protons are no longer stable. Formulated differently, GUTs allow baryon number-violating interactions. This is one of three central ingredients, known as Sakharov condition, needed to produce more baryons than antibaryons in the early universe. Thus, as D. V. Nanopoulos put it, “if the proton was stable it would not exist”.

 

What’s next?

 

While the unification of spacetime symmetries was already confirmed by the measurement of the cosmological constant, so far, there is no experimental evidence for the correctness of the GUT idea. Thus the unification of internal symmetries still has to wait. However, proton decay could be detected anytime soon. When Hyper-Kamiokande will start operating the limits on proton lifetime will become one order of magnitude better and this means there is a realistic chance that we finally find evidence for Grand Unification.

This, however, would by no means be the end of the road.

Arguably, it would be awesome if we could unify spacetime and internal symmetries into one large symmetry. However, there is one no-go theorem that blocked progress in this direction: the famous Coleman-Mandula theorem.

Nevertheless, a no-go theorem in physics never really means that something is impossible, only that it isn’t as trivial as one might think. There are several loopholes in the theorem, that potentially allow the unification of spacetime and internal symmetries.

At least to m, it seems as Dirac was right and larger symmetries is the way to go. However, so far, we don’t know which way we should follow.

Physics Model Fits in Mathematica

This shouldn’t be hard. We have some physics model and want to find values for the model parameters that yield some experimentally measured values. However, there are several small things that aren’t obvious and it took me quite some time to make things work. I wasn’t able to find a good explanation how such a problem could be solved in Mathematica. Now that I figured it out, I thought it would be a good idea to share what I’ve learned.

A short disclaimer: I’m not a Mathematica expert. There are certainly much better ways to do this. However, I found it so frustrating that there was no good information available and maybe someone finds this post helpful. If you have any recommendation how I could do things better, a short comment would be awesome! I’ll update this post when I discover further optimizations.

So… the set up is the following:

We have a number of model parameters (a,b,c,d,….) and we seek numerical values for them such that several experimental observables are reproduced as good as possible. The experimental observables are our fit targets.

The connection between the parameters and the observables are given by the model. An explicit example may be helpful: In a Grand Unified Theory, the model parameters are the Yukawa couplings and vacuum expectation values (vevs). We want to find those Yukawa couplings and vevs that reproduce the measured masses and mixing angels as good as possible.

The measure of how good a given set of numerical values for the parameters reproduce the observed values is given by $\chi^2$ (speak: chi-square):
$$ \chi^2 = \frac{O_i -F_i}{\text{err}(O_i)}. $$
It is defined as the sum of the squared differences between the fit target values $F_i$ that are computed for a given set of numerical values for the parameters and the actually measured values for the same observables $O_i$. In addition, each term in the sum is weighted by the experimental error of the measured value $\text{err}(O_i) $. This way one takes into account that it is less bad if some fit target value is a bit off when the experimental value is only vaguely known.

In Mathematica, the model is defined as a function. The variables of the function are the model parameters and in the end, the function returns a value for $\chi^2$. For each set of numerical values for the model parameters, the model function spits out a number that tells us how good these numerical values reproduce the known experimental values.

Here is an example, where we start with some Yukawa couplings and vacuum expectation values (VEVs) and try to fit the masses such that the experimental values are correctly reproduced.

Massesexperimental = {171.7, 6.19*10^-1, 1.27*10^-3, 2.89, 5.5*10^-2, 2.9*10^-3};
Masserrorsexperimental = {3.0, 8.4*10^-2, 4.6*10^-4, 9.0*10^-2, 1.55*10^-2, 1.215*10^-3};

chisquarefunction[{y2711_,y2712_,y2713_,y2722_,y2723_,y2733_,y35111_,y35122_,y35133_,v1_,v2_,v3_,v4_,vbig1_,vbig2_,vbig3_,vbig4_}]:=

Block[{Y27,Y351,mu,md,MC,MR,Z,a,mdeff,fittedMasses,chisquare},


Y27=({{y2711,y2712,y2713},{y2712,y2722,y2723},{y2713,y2723,y2733}});

Y351=({{y35111,0,0},{0,y35122,0},{0,0,y35133}});

mu=Y27*v3+Y351*v4;

md=Y27*v1+Y351*v2;

MC=Y27*vbig1+Y351*vbig2;

MR=Y27*vbig3+Y351*vbig4;

Z=Transpose[MC.Inverse[MR]];

a=MatrixPower[({

 {1, 0, 0},

 {0, 1, 0},

 {0, 0, 1}

})+ConjugateTranspose[Z].Z,-1/2];

mdeff=(md.a);



fittedMasses=Flatten[{SingularValueList[mu],SingularValueList[mdeff]}];

chisquare=Sum[((Massesexperimental[[i]]-fittedMasses[[i]])/Masserrorsexperimental[[i]])^2,{i,6}];


Return[chisquare]

];

Now, we want to tell Mathematica that it should compute those model parameters that reproduce the experimental values as good as possible. This means we need to tell Mathematica to minimize our model function because a minimal  $\chi^2$ value corresponds to the best possible fit for the model parameters.

In general, this can only be done numerically. A problem here is that it’s never possible to be 100% certain that a given minimum for the model function is really the global minimum, i.e. the best fit that is possible. Thus, we need to restart the search for the minimal value as often as possible with different starting values. The best-fit point corresponds then to set of numerical values for the model parameters that yields the smallest $\chi^2$ value.

A numerical search for a minimum of a function is done in Mathematica with NMinimize[]. However, the usage of NMinimize[] is not that trivial. In theory, we simply plug in our function, tell Mathematica what our variables are and he spits out the values for them that correspond to a minimum.

NMinimize[
{chisquarefunction[{y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4}]}
, {y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4}];

However, in practice, there are three things that need to be taken into account additionally.

1.) There are different methods and options for NMinimize[] and different methods and options are suited for different problems. In our case, we are dealing with a function that has many variables and lots of local minima. (For example, each fit point where some mass is fitted to zero corresponds to a local minimum. The “minimal” $\chi^2$ value is in this case simply the square of the experimental value divided by the experimental error. For example, if the error is 5% and all other observables are fitted perfectly, the “minimal” value is $\chi^2= 1/0.05=400$). Some explanations for the various methods that are available for NMinimize[] are given here. In our case, the method DifferentialEvolution yields the best results. In addition, we need to tell Mathematica how long he is allowed to search. This is done by setting: MaxIterations-> 1000 or so. Take note that there can be memory problems and unexpected kernel quits if you allow too many iterations. (One way around this is to call NMinimize[] in parallel. Then, it is no longer a big deal if one of the subkernels quits because the master kernel still runs and simply starts a new subkernel.) In addition, because there is no way to make sure that a minimum that is found by a numerical procedure is indeed a global minimum, one should start the minimization several times, starting from different starting points. This can be achieved by calling the NMinimize function in Mathematica with different “RandomSeeds”. Here is an example:

fct[randnr_] := 
NMinimize[{chisquarefunction[{y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4}]},
 {y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4},
 Method -> {"DifferentialEvolution", "RandomSeed" -> randnr}, MaxIterations -> 100];

To perform the minimization for some random seed, we simply execute

fct[11]

2.) Mathematica tries to do some analytic simplifications of the function that should be minimized. This can be really problematic if we are dealing with a complicated model. In our example from above, especially the MatrixPower and the SingularValueDecomposition is problematic. Thus, when one executes NMinimize[] for a complicated function nothing happens for hours. If there are analytic simplifications that can be done, one possibility is to do them before the minimization and give NMinimize[] the optimized function. To avoid that Mathematica tries for hours to do analytic stuff to our function is to use Hold[]. If we tell Mathematica to minimize Hold[ourfunction], he no longer tries analytic stuff. For the minimization of Hold[ourfunction], he simply plugs in numerical values for the variables, looks what $\chi^2$ he gets back and then based on this information decides what he does next, i.e. what numerical values he plugs in, in the next iteration.

fct[randnr_] := NMinimize[
{Hold[chisquarefunction[{y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4}]]},
 {y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vsu51, vsu52, vSO10, vSO102, phase, ve6, ve62},
 Method -> {"DifferentialEvolution", "RandomSeed" -> randnr}, MaxIterations -> 100];

And again, to perform the minimization for some random seed, we simply execute

fct[14]

3.) NMinimize[] dislikes boundaries. If you want to tell Mathematica that some parameter $p$ should be a really, really large number, say $10^{16}$, a bad way to do this is to execute NMinimize[] with a boundary for the variable. The fit results get incredibly worse if boundaries are given. I suppose this is because the fit algorithm then can’t move as freely as he would like. However, without any, hint Mathematica never yields a fit result with such a large number ($10^16$) for one of the variables. Instead, he starts with values of order one for the parameters and ends up in local minima with order one values for the parameters. In physics, we often have some scale in mind for some of the parameters. For example, in a Grand Unified Theory, some of the VEVs must be larger than $10^{16}$ and any fit result with smaller values doesn’t make sense. The trick is to rescale all parameters such that they are of order one. For the parameter $p$ that should be around $10^{16}$, we simply define $p’ = \frac{p}{10^{16}}$, which yields $p= p’ 10^{16}$ and plug this into our model function. Then, we have a model function that is a function of $p’$ instead of $p$ and it is no longer a problem that Mathematica finds order one solutions for $p’$.

In our example, the big VEVs $vbig1,vbig2,vbig3,vbig4$ should be, for physical reasons, around $10^16$. Thus, we define our model function as follows:

Massesexperimantal = {171.7, 6.19*10^-1, 1.27*10^-3, 2.89, 5.5*10^-2, 2.9*10^-3};
Masserrorsexperimental = {3.0, 8.4*10^-2, 4.6*10^-4, 9.0*10^-2, 1.55*10^-2, 1.215*10^-3};

chisquarefunction[{y2711_,y2712_,y2713_,y2722_,y2723_,y2733_,y35111_,y35122_,y35133_,v1_,v2_,v3_,v4_,vbig1_,vbig2_,vbig3_,vbig4_}]:=

Block[{Y27,Y351,mu,md,MC,MR,Z,a,mdeff,fittedMasses,chisquare},

Y27=({{y2711,y2712,y2713},{y2712,y2722,y2723},{y2713,y2723,y2733}});
Y351=({{y35111,0,0},{0,y35122,0},{0,0,y35133}});

mu=Y27*v3+Y351*v4;
md=Y27*v1+Y351*v2;
MC=Y27*vbig1*10^16+Y351*vbig2*10^16;
MR=Y27*vbig3*10^16+Y351*vbig4*10^16;
Z=Transpose[MC.Inverse[MR]];
a=MatrixPower[({
{1, 0, 0},
{0, 1, 0},
{0, 0, 1}
})+ConjugateTranspose[Z].Z,-1/2];
mdeff=(md.a);

fittedMasses=Flatten[{SingularValueList[mu],SingularValueList[mdeff]}];

chisquare=Sum[((Massesexperimental[[i]]-fittedMasses[[i]])/Masserrorsexperimental[[i]])^2,{i,6}];
Return[chisquare]

];

Demystifying the Hierarchy Problem

There is no hierarchy problem in the Standard Model. The Standard Model has only one scale: the electroweak scale. Therefore, there can’t be any hierarchy problem because there is no hierarchy. But, of course, there are good reasons to believe that the Standard Model is incomplete and almost inevitably if you introduce new physics at a higher scale, you get problems with hierarchies. Formulated more technically, only when the cutoff $\Lambda$ is physical, we have a real hierarchy problem.

However, whenever someone starts talking about the hierarchy problem, you should ask: which one?

  • There is a tree-level hierarchy problem which you get in many extensions of the Standard Model. As an example, let’s consider GUT models. Here the Standard Model gauge symmetry is embedded into some larger symmetry group. This group breaks at an extremely high scale and the remnant symmetry is what we call the Standard Model gauge symmetry. If you now write down the Higgs potential for GUT model, the standard assumption is that all parameters in this potential are of the order of the GUT scale because, well there isn’t any other scale and we need to produce a GUT scale vacuum expectation value. The mystery is now how the, in comparison, tiny vacuum expectation value of the electroweak Higgs comes about. In a GUT this Standard Model Higgs lives in the same representation as several superheavy scalars. The superheavy masses of these scalars are no problem if we assume that all parameters in the GUT Higgs potential are extremely large numbers. But somehow these parameters must cancel to yield the tiny mass of the Standard Model Higgs. If you write down two random large numbers it’s extremely unlikely that they cancel so exactly that you get a tiny result. Such a cancelation needs an explanation and this is what people call the tree-level hierarchy problem. The prefix “tree-level” refers to the fact that no loops are involved here. The problem arises solely by investigating the tree-level Higgs potential.
  • But there is also a hierarchy problem which has to do with loops, i.e. higher orders in perturbation theory. The main observation is that our bare Higgs mass $m$ (the parameter in the Lagrangian) gets modified if we move beyond the tree-level. While this happens for all particles, it leads to a puzzle for scalar particles like the Higgs boson because here the loop corrections are directly proportional to the cutoff scale $\Lambda$. Concretely, the physical Higgs mass we can measure is given by $$ m^2_P = m^2 + \sigma (m^2) +\ldots , $$ where $m$ is the bare mass, $m_P$ the physical mass and $\sigma (m^2) $ the one-loop corrections. The puzzle is now that if we want to get a light Higgs mass $m_P^2 \ll \Lambda^2$, we need to fine-tune the bare parameter $m^2$: $$ m^2 \approx \Lambda^2 +m_P^2. $$ For example, for a physical Higgs mass $m_P \approx 125$ GeV and a cutoff scale around the Planck scale $\Lambda \approx  10^19$ GeV, we find that $$ m^2 = (1+10^{-34}) \Lambda^2 .$$ This means that our bare mass $m$ must be tuned extremely precisely to yield the light Higgs mass that we observe. This is automatically the case if we have a large cutoff scale. If we include higher order in perturbation theory, the situation gets even worse. At each order of perturbation theory, we must repeat the procedure and fine-tune the bare Higgs mass even further. This is what people usually call the hierarchy problem because the core of the problem is that the cutoff scale $\Lambda$ is so far above the electroweak scale.

Now, here’s the catch. Nature doesn’t know anything about loops. Each loop represents a term in our perturbation series. Perturbation theory is a tool we physicists invented to describe nature. But nature knows nothing about the bare mass and loop corrections. She only knows the whole thing $m_P$, which is what we can measure in experiments. In other words, we can’t measure the bare mass $m$ or, say, the one-loop correction $\sigma (m^2)$. Therefore, these parameters only exist within our description and we can simply adjust them to yield the value for the physical Higgs mass that we need.

The situation could be very different. If we could measure $m$ or $\sigma (m^2)$, for example, because they can be calculated using other measurable parameters, there would be a real problem. If two measurable parameters would cancel so precisely, we would have every right to wonder. But as long as the bare mass is only something which exists in our description and isn’t measurable, there is not really a deep problem because we can simply adjust these unphysical parameters at will.

Similar arguments are true for the tree-level hierarchy problem. As long as we haven’t measured the GUT scale Higgs potential parameters, there is nothing to really wonder about. Maybe the large symmetry gets broken differently, was never a good symmetry in the first place or maybe the parameters happen to cancel exactly.

Two great papers which discuss this point of view in more technical terms are

So … there isn’t really a hierarchy problem?

There is one. But to understand it we need to look at the whole situation a bit differently.

Criticality

The main idea of this alternative perspective is to borrow intuition from condensed matter physics. Here, we can also use field theory because we can excite the atoms our system consists of to yield waves. In addition, there can be particle-like excitations which are usually called phonons. For our problem here, the most important observation is that here we also have a cutoff scale $\Lambda$ which represents the inverse atomic spacing/the lattice spacing. Beyond this scale, our description doesn’t make sense.

With this in mind, we can understand what a hierarchy problem really is from a completely new perspective. Naively, we expect that if we excite our condensed matter system we only get small excitations. In technical terms, we generically only expect correlation lengths of the order of the lattice spacing. All longer correlation lengths need an explanation. The correlation length is inversely proportional to the mass associated with the excitation. Hence, in particle physics jargon we would say that for a system with cutoff $\Lambda$, we only expect particles with a mass of order $\Lambda$, i.e. superheavy particles.

Now the mystery is that we know that there are light elementary particles although the cutoff scale is presumably extremely high. In condensed matter jargon this means that we know that there are excitations with an extremely long correlation length compared with the fundamental lattice spacing.

This is the mystery we call the hierarchy problem.

But this is not only a helpful alternative perspective. It also allows us to think about possible solutions in completely different terms. We can now ask: under what circumstances do we get excitations with extremely long correlation length compared to the lattice spacing?

The answer is: whenever the system is close to a critical point. (The most famous example is the liquid-vapor critical point.)

A solution of the hierarchy problem, therefore, requires an explanation why nature seems so close to a critical point.

There are, as far as I know, two types of possible answers.

  • Either, someone/something tuned the fundamental parameters externally (whatever that means in this context). Condensed matter systems can be brought to a critical point by adjusting the temperature and pressure.
  • Or, there is a dynamical reason why nature evolved towards a critical point. This is known as self-organized criticality.

In the first category, we have multiverse-anthropic-principle type explanation.

If you are, like me, not a fan of these types of arguments, there is good news: self-organized criticality is a thing in nature. There are many known systems which evolve automatically towards a critical point. The most famous one is a sandpile.

For a brilliant discussion of self-organized criticality in general, see

A defining feature of systems close to a critical point is that we get complexity at all scales. Under normal circumstances, interesting phenomena only happen on scales comparable to the lattice spacing (which in particle physics possibly means the Planck scale).  But luckily, there are complex phenomena at all scales in nature, not just at extremely small scales. Otherwise, humans wouldn’t exist. This, I think, hints beautifully towards the interpretation that nature is fundamentally close to a critical point.

PS: As far as I know, Christof Wetterich was the first one who noticed the connection between criticality and the hierarchy problem in the paper mentioned above. Combining it with self-organized criticality was proposed in Self-organizing criticality, large anomalous mass dimension and the gauge hierarchy problem by Stefan Bornholdt and Christof Wetterich. Recently, the connection between self-organized criticality and solutions of the hierarchy problem was emphasized by Gian Francesco Giudice in The Dawn of the Post-Naturalness Era.)

PS: Please let me know if you know any paper in which the self-organized criticality idea is applied to the hierarchy problem in fundamental physics.