This shouldn’t be hard. We have some physics model and want to find values for the model parameters that yield some experimentally measured values. However, there are several small things that aren’t obvious and it took me quite some time to make things work. I wasn’t able to find a good explanation how such a problem could be solved in Mathematica. Now that I figured it out, I thought it would be a good idea to share what I’ve learned.

**A short disclaimer:** *I’m not a Mathematica expert. There are certainly much better ways to do this. However, I found it so frustrating that there was no good information available and maybe someone finds this post helpful. If you have any recommendation how I could do things better, a short comment would be awesome! I’ll update this post when I discover further optimizations.*

So… the set up is the following:

We have a number of model parameters (a,b,c,d,….) and we seek numerical values for them such that several experimental observables are reproduced as good as possible. The experimental observables are our fit targets.

The connection between the parameters and the observables are given by the model. An explicit example may be helpful: In a Grand Unified Theory, the model parameters are the Yukawa couplings and vacuum expectation values (vevs). We want to find those Yukawa couplings and vevs that reproduce the measured masses and mixing angels as good as possible.

The measure of how good a given set of numerical values for the parameters reproduce the observed values is given by $\chi^2$ (speak: chi-square):

$$ \chi^2 = \frac{O_i -F_i}{\text{err}(O_i)}. $$

It is defined as the sum of the squared differences between the fit target values $F_i$ that are computed for a given set of numerical values for the parameters and the actually measured values for the same observables $O_i$. In addition, each term in the sum is weighted by the experimental error of the measured value $\text{err}(O_i) $. This way one takes into account that it is less bad if some fit target value is a bit off when the experimental value is only vaguely known.

In Mathematica, the model is defined as a function. The variables of the function are the model parameters and in the end, the function returns a value for $\chi^2$. For each set of numerical values for the model parameters, the model function spits out a number that tells us how good these numerical values reproduce the known experimental values.

Here is an example, where we start with some Yukawa couplings and vacuum expectation values (VEVs) and try to fit the masses such that the experimental values are correctly reproduced.

Massesexperimental = {171.7, 6.19*10^-1, 1.27*10^-3, 2.89, 5.5*10^-2, 2.9*10^-3}; Masserrorsexperimental = {3.0, 8.4*10^-2, 4.6*10^-4, 9.0*10^-2, 1.55*10^-2, 1.215*10^-3}; chisquarefunction[{y2711_,y2712_,y2713_,y2722_,y2723_,y2733_,y35111_,y35122_,y35133_,v1_,v2_,v3_,v4_,vbig1_,vbig2_,vbig3_,vbig4_}]:= Block[{Y27,Y351,mu,md,MC,MR,Z,a,mdeff,fittedMasses,chisquare}, Y27=({{y2711,y2712,y2713},{y2712,y2722,y2723},{y2713,y2723,y2733}}); Y351=({{y35111,0,0},{0,y35122,0},{0,0,y35133}}); mu=Y27*v3+Y351*v4; md=Y27*v1+Y351*v2; MC=Y27*vbig1+Y351*vbig2; MR=Y27*vbig3+Y351*vbig4; Z=Transpose[MC.Inverse[MR]]; a=MatrixPower[({ {1, 0, 0}, {0, 1, 0}, {0, 0, 1} })+ConjugateTranspose[Z].Z,-1/2]; mdeff=(md.a); fittedMasses=Flatten[{SingularValueList[mu],SingularValueList[mdeff]}]; chisquare=Sum[((Massesexperimental[[i]]-fittedMasses[[i]])/Masserrorsexperimental[[i]])^2,{i,6}]; Return[chisquare] ];

Now, we want to tell Mathematica that it should compute those model parameters that reproduce the experimental values as good as possible. This means we need to tell Mathematica to minimize our model function because a minimal $\chi^2$ value corresponds to the best possible fit for the model parameters.

In general, this can only be done numerically. A problem here is that it’s never possible to be 100% certain that a given minimum for the model function is really the global minimum, i.e. the best fit that is possible. Thus, we need to restart the search for the minimal value as often as possible with different starting values. The best-fit point corresponds then to set of numerical values for the model parameters that yields the smallest $\chi^2$ value.

A numerical search for a minimum of a function is done in Mathematica with NMinimize[]. However, the usage of NMinimize[] is not that trivial. In theory, we simply plug in our function, tell Mathematica what our variables are and he spits out the values for them that correspond to a minimum.

NMinimize[ {chisquarefunction[{y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4}]} , {y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4}];

However, in practice, there are three things that need to be taken into account additionally.

1.) There are different methods and options for NMinimize[] and different methods and options are suited for different problems. In our case, we are dealing with a function that has many variables and lots of local minima. (For example, each fit point where some mass is fitted to zero corresponds to a local minimum. The “minimal” $\chi^2$ value is in this case simply the square of the experimental value divided by the experimental error. For example, if the error is 5% and all other observables are fitted perfectly, the “minimal” value is $\chi^2= 1/0.05=400$). Some explanations for the various methods that are available for NMinimize[] are given here. In our case, the method DifferentialEvolution yields the best results. In addition, we need to tell Mathematica how long he is allowed to search. This is done by setting: MaxIterations-> 1000 or so. Take note that there can be memory problems and unexpected kernel quits if you allow too many iterations. (One way around this is to call NMinimize[] in parallel. Then, it is no longer a big deal if one of the subkernels quits because the master kernel still runs and simply starts a new subkernel.) In addition, because there is no way to make sure that a minimum that is found by a numerical procedure is indeed a global minimum, one should start the minimization several times, starting from different starting points. This can be achieved by calling the NMinimize function in Mathematica with different “RandomSeeds”. Here is an example:

fct[randnr_] := NMinimize[{chisquarefunction[{y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4}]}, {y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4}, Method -> {"DifferentialEvolution", "RandomSeed" -> randnr}, MaxIterations -> 100];

To perform the minimization for some random seed, we simply execute

fct[11]

2.) Mathematica tries to do some analytic simplifications of the function that should be minimized. This can be really problematic if we are dealing with a complicated model. In our example from above, especially the MatrixPower and the SingularValueDecomposition is problematic. Thus, when one executes NMinimize[] for a complicated function nothing happens for hours. If there are analytic simplifications that can be done, one possibility is to do them before the minimization and give NMinimize[] the optimized function. To avoid that Mathematica tries for hours to do analytic stuff to our function is to use Hold[]. If we tell Mathematica to minimize Hold[ourfunction], he no longer tries analytic stuff. For the minimization of Hold[ourfunction], he simply plugs in numerical values for the variables, looks what $\chi^2$ he gets back and then based on this information decides what he does next, i.e. what numerical values he plugs in, in the next iteration.

fct[randnr_] := NMinimize[ {Hold[chisquarefunction[{y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4}]]}, {y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vsu51, vsu52, vSO10, vSO102, phase, ve6, ve62}, Method -> {"DifferentialEvolution", "RandomSeed" -> randnr}, MaxIterations -> 100];

And again, to perform the minimization for some random seed, we simply execute

fct[14]

3.) NMinimize[] dislikes boundaries. If you want to tell Mathematica that some parameter $p$ should be a really, really large number, say $10^{16}$, a bad way to do this is to execute NMinimize[] with a boundary for the variable. The fit results get incredibly worse if boundaries are given. I suppose this is because the fit algorithm then can’t move as freely as he would like. However, without any, hint Mathematica never yields a fit result with such a large number ($10^16$) for one of the variables. Instead, he starts with values of order one for the parameters and ends up in local minima with order one values for the parameters. In physics, we often have some scale in mind for some of the parameters. For example, in a Grand Unified Theory, some of the VEVs must be larger than $10^{16}$ and any fit result with smaller values doesn’t make sense. The trick is to rescale all parameters such that they are of order one. For the parameter $p$ that should be around $10^{16}$, we simply define $p’ = \frac{p}{10^{16}}$, which yields $p= p’ 10^{16}$ and plug this into our model function. Then, we have a model function that is a function of $p’$ instead of $p$ and it is no longer a problem that Mathematica finds order one solutions for $p’$.

In our example, the big VEVs $vbig1,vbig2,vbig3,vbig4$ should be, for physical reasons, around $10^16$. Thus, we define our model function as follows:

Massesexperimantal = {171.7, 6.19*10^-1, 1.27*10^-3, 2.89, 5.5*10^-2, 2.9*10^-3}; Masserrorsexperimental = {3.0, 8.4*10^-2, 4.6*10^-4, 9.0*10^-2, 1.55*10^-2, 1.215*10^-3}; chisquarefunction[{y2711_,y2712_,y2713_,y2722_,y2723_,y2733_,y35111_,y35122_,y35133_,v1_,v2_,v3_,v4_,vbig1_,vbig2_,vbig3_,vbig4_}]:= Block[{Y27,Y351,mu,md,MC,MR,Z,a,mdeff,fittedMasses,chisquare}, Y27=({{y2711,y2712,y2713},{y2712,y2722,y2723},{y2713,y2723,y2733}}); Y351=({{y35111,0,0},{0,y35122,0},{0,0,y35133}}); mu=Y27*v3+Y351*v4; md=Y27*v1+Y351*v2; MC=Y27*vbig1*10^16+Y351*vbig2*10^16; MR=Y27*vbig3*10^16+Y351*vbig4*10^16; Z=Transpose[MC.Inverse[MR]]; a=MatrixPower[({ {1, 0, 0}, {0, 1, 0}, {0, 0, 1} })+ConjugateTranspose[Z].Z,-1/2]; mdeff=(md.a); fittedMasses=Flatten[{SingularValueList[mu],SingularValueList[mdeff]}]; chisquare=Sum[((Massesexperimental[[i]]-fittedMasses[[i]])/Masserrorsexperimental[[i]])^2,{i,6}]; Return[chisquare] ];