Will physicists be replaced by robots?

I always thought that such a suggestion is ridiculous. How could a robot ever do what physicists do? While many jobs seem to be in danger because of recent advances in automation – up to 47 % according to recent studies – the last thing that will be automated, if ever, are jobs like that of physicists which need creativity, right?

For example, this site, which was featured in many major publications, states that there is only a 10% chance that robots will take the job of physicists:

Recently author James Gleick commented on how shocked professional “Go” players are by the tactics of Google’s software “AlphaGo”:

Sean Caroll answered and summarized how most physicists think about this:

A Counterexample

Until very recently I would have agreed. However, a few weeks ago I discovered this little paper and it got me thinking. The idea of the paper is really simple. Just feed measurement data into an algorithm. Give him a fixed set of objects to play around with and then let the algorithm find the laws that describe the data best. The authors argue that their algorithm is able to rediscover Maxwell’s equations. These equations are still the best equations to describe how light behaves. Their algorithm was able to find these equations “in about a second”. Moreover, they describe their program as a “computational embodiment of the scientific method: observation, consideration of candidate theories, and validation.” That’s pretty cool. Once more I was reminded that “everything seems impossible until it’s done.”

Couldn’t we do the same to search for new laws by feeding such an algorithm the newest collider data? Aren’t the jobs of physicists that safe after all?

What do physicists do?

First of all, the category “physicist” is much too broad to discuss the danger of automation. For example, there are experimental physicists and theoretical physicists. And even inside these subcategories, there are further important sub-sub-categories.

On the experimental side, there are people who actually build experiments. Those are the guys who know how to use a screwdriver. In addition, there are people who analyze the data gathered by experiments.

On the theoretical side, there are theorists and phenomenologists. The distinction here is not so clear. For example, one can argue that phenomenology is a subfield of theoretical physics. Many phenomenologists call themselves theoretical physicists. Broadly, the job of a theoretical physicist is to explain and predict how nature behaves by writing down equations. However, there are many different approaches how to write down new equations. I find the classification outlined here helpful. There is:

  1. Curiosity Driven Research; where “anything goes, that is allowed by basic principles and data. […] In general, there is no further motivation for the addition of some particle, besides that it is not yet excluded by the data.”
  2. Data-Driven Research; where new equations are written down as a response to experimental anomalies.
  3. Theory-Driven Research; which is mostly about “aesthetics” and “intuition”. The prototypical example, is of course, Einstein’s invention of General Relativity.

The job of someone working in each such sub-sub-category is completely different to the jobs in another sub-sub-category. Therefore, there is certainly no universal answer to the question how likely it is for “robots” to replace physicists. Each of sub-sub-categories mentioned above must be analyzed on its own.

What could robots do?

Let’s start with the most obvious one. Data analysis is an ideal job for robots. Unsurprisingly several groups are already working or experimenting with neural networks to analyze LHC data. In the traditional approach to collider data analyses, people have to invent criteria for how we can distinguish different particles in the detector. If the angle of two detected photons is large than X°, the overall energy of them smaller than Y GeV the particle is with a probability Z% some given particle. In contrast, if you use a neural network, you just have train it using Monte-Carlo data, where you know which particle is where. Then you can let the trained network analyze the collider data. In addition, after the training, you can investigate the network to see what it has learned. This way neural networks can be used to find new useful variables that help to distinguish different particles in a detector. I should mention that this approach is not universally favored because some feel that a neural network is too much of a black box to be trusted.

What about theoretical physicists?

In the tweet quoted above, Sean Carroll argues that “Fundamental physics is analogous to “the rules of Go.” Which are simple and easily mastered. Go *strategy* is more like bio or neuroscience.” Well yes, and no. Finding new fundamental equations is certainly similar to inventing new rules for a game. This is broadly the job of a theoretical physicist. However, the three approaches to “doing theoretical physics”, mentioned above, are quite different.

In the first and second approach, the “rules of the game” are pretty much fixed. You write down a Lagrangian and afterward compare its predictions with measured data. The new Lagrangian involves new fields, new coupling constants etc., but must be written down according to fixed rules. Usually, only terms that respect rules of special relativity are allowed. Moreover, we know that the simplest possible terms are the most important ones, so you focus on them first. (More complicated terms are “non-renormalizable” and therefore suppressed by some large scale.) Given some new field or fields writing down the Lagrangian and deriving the corresponding equations of motion is a straight-forward. Moreover, while deriving the experimental consequences of some given Lagrangian can be quite complicated, the general rules of how to do it are fixed. The framework that allows us to derive predictions for colliders or other experiments starting from a Lagrangian is known as Quantum Field Theory.

This is exactly the kind of problem that was solved, although in a much simpler setting, by the Mark A. Stalzer and Chao Ju in the paper mentioned above. There are already powerful algorithms, like, for example, SPheno or micrOMEGAs which are capable of deriving many important consequences of a given Lagrangian, almost completely automagically. So with further progress in this direction, it seems not completely impossible that an algorithm will be able to find the best possible Lagrangian to describe given experimental data.

As an aside: A funny name for this goal of theoretical physics that deals with the search for the “ultimate Lagrangian of the world” was coined by Arthur Wightman, who called it “the hunt for the Green Lion”. (Source: Conceptual Foundations of Quantum Field Theory: Tian Yu Cao)

What then remains on the theoretical side is “Theory-Driven Research”. I have no idea how a “robot” could do this kind of research, which is probably what Sean Caroll had in mind in his tweets. For example, the algorithm by Mark A. Stalzer and Chao Ju only searches for laws that consist of some predefined objects, vectors, tensors and uses predefined rules of how to combine them: scalar products, cross products etc. It is hard to imagine how paradigm-shifting discoveries could be made by an algorithm like this. General relativity is a good example. The correct theory of gravity needed completely new mathematics that wasn’t previously used by physicists. No physicists around 1900 would have programmed crazy rules such as those of non-Euclidean geometry into the set of allowed rules. An algorithm that was designed to guess Lagrangian will always only spit out a Lagrangian. If the fundamental theory of nature cannot be written down in Lagrangian form, the algorithm would be doomed to fail.

To summarize, there will be physicists in 100 years. However, I don’t think that all jobs currently done by theoretical and experimental physicists will survive. This is probably a good thing. Most physicists would love to have more time to think about fundamental problems like Einstein did.

Layers of Understanding

Update: I’ve now started a website motivated by the idea outlined in this post. It’s called Physics Travel Guide.com. For each topic, there are different layers such that everyone can find an explanation that speaks a language he/she understands.


 

Over the years I’ve had many discussions with fellow students about the question: when do you understand something?

Usually, I’ve taken the strong position that is summarized by this famous Vonnegut quote:

“any scientist who couldn’t explain to an eight-year-old what he was doing was a charlatan.”

In other words: you’ve only understood a given topic if you can explain it in simple terms.

Many disagree. Especially one friend who, studies math, liked to argue that some topics are simply too abstract and such “low-level” explanations may not be possible.

Of course, the quote is a bit exaggerated. Nevertheless, I think as a researcher you should be able to explain what you do to an interested beginner student.

I don’t think that any topic is too abstract for this. When no “low-level” explanation is available so far, this does not mean that it doesn’t exist, but merely that it hasn’t been found yet.

In my first year as a student, I went on a camping trip to Norway. At that time, I knew little math and nothing about number theory or the Riemann zeta function. During the trip, I devoured “The Music of the Primes” by Marcus Du Sautoy. Sautoy managed to explain to a clueless beginner student why people care about prime numbers (they are like the atoms of numbers), why people find the Riemann zeta function interesting (there is a relationship between the complex zeros of the Riemann zeta function and the prime numbers) and what the Riemann hypothesis is all about. Of course, after reading the book I still didn’t know anything substantial about number theory or the Riemann zeta function. However, the book gave me a valuable understanding of how people who work on these subjects think. In addition, after several years I still understand why people get excited when someone proposes something new about the Riemann hypothesis.

I don’t know any topic more abstract than number theory and if it is possible to explain something as abstract as the Riemann zeta function to a beginner student, it can be done for any topic, too.

My point is not that oversimplified PopSci explanations are what all scientists should do and think about. Instead, my point is that any topic can be explained in non-abstract terms.

Well maybe, but why should we care? An abstract explanation is certainly the most rigorous and error-free way to introduce the topic. It truly represents the state of the art and how experts think about the topic.

While this may be true, I don’t think that this is where real understanding comes from.

Maybe you are able to follow some “explanation” that involves many abstract arguments or some abstract proof and maybe afterward you realize that the concept or theorem is correct. However, what is still missing is some understanding of why it is correct.

Here is a great example, from the book “Street-Fighting Mathematics” by Sanjoy Mahajan:

There is a formula that tells you the result for the sum of the first $n$ odd numbers:

$$ S_n = 1+ 3 +5 + \ldots + (2n-1) = \sum_1^n (2k-1) = n^2$$

You can proof this, for example, by induction. After such a proof you are certainly convinced that the formula $\sum_1^n (2k-1) = n^2$ is correct. But still, you have no idea why it is correct.

Now, instead consider the following pictorial explanation:

We draw each odd number as an L-shaped puzzle piece:

Source: Street-Fighting Mathematics by Sanjoy Mahajan

Then, we can draw the sum of the first $n$ odd numbers as follows:

Source: Street-Fighting Mathematics by Sanjoy Mahajan

The odd numbers as puzzle pieces fit together such that we get a $n\times n$ square. We can see here that the sum is $n^2$. After seeing this proof, you’ll never forget why the sum of the first $n$ odd numbers equals $n^2$.

Especially most math books are guilty of relying solely on abstract explanations without any pictorial explanations or analogies. I personally find this oftentimes incredibly frustrating. No one gets a deep understand by reading pages full of definitions and proofs. This type of exposition discourages beginner students and simply communicates the message “well real math is complicated stuff”.

I recently read an interesting hypothesis about how this way of teaching math became the standard. In his book “Not Even Wrong” Peter Woit writes:

“What [Mathematicians] learned long ago was that to get anywhere in the long term, the field has to insist strongly on absolute clarity of the formulation of ideas and the rigorous understanding of their implications. Modern mathematics may be justly accused of sometimes taking these standards too far, to the point of fetishising them. Often, mathematical research suffers because the community is unwilling to let appear in print the vague speculative formulations that motivate some of the best new work, or the similarly vague and imprecise summaries of older work that are essential to any readable expository literature. […] The mathematics literature often suffers from being either almost unreadable or concerned ultimately with not very interesting problems […] I hope that the trend in mathematical teaching, writing, and editing will continue to recoil from the extreme of Bourbakisme, so that explanations and non-trivial examples can be presented and physicists (to say nothing of other scientists) can once more have a fighting chance of understanding what mathematicians are up to, as they did early in the twentieth century. […]’Bourbakisme’ refers to the activities of a very influential group of French mathematicians known collectively by the pseudonym Bourbaki. Bourbaki was founded by Andre Weil and others during the 1930s, partly as a project to write a series of textbooks that would provide a completely rigorous exposition of fundamental mathematical results. They felt such a series was needed in order to have a source of completely clear definitions and theorems to use as a basis for future mathematical progress. This kind of activity is what appalled Gell-Mann, and it did nothing for improving communication between mathematicians and physicists. While their books were arid and free of any examples, in their own research and private communications the mathematicians of Bourbaki were very much engaged with examples, non-rigorous argument and conjecture. […] The Bourbaki books and the point of view from which they emerged had a bad effect on mathematical exposition in general, with many people writing very hard-to-read papers in a style emulating that of the books.

This passage reminded me of this famous quote by Chen-Ning Yang (the Yang in Yang-Mills theory):

“There are only two kinds of math books: Those you cannot read beyond the first sentence, and those you cannot read beyond the first page.”

The good news is that nowadays there exists a third kind of book that explains things pictorial and with analogies. This is where beginners should start. Here are some examples:

For example, Needham manages to give you beautiful pictures for the series expansion of the complex exponential function, which otherwise is just another formula. Another example, I’ve written about here is what is really going between a Lie algebra and a given Lie group. You can accept the relationship as some abstract voodoo or you can draw some pictures and get some deep understanding that allows you to always remember the most important results.

This problem with too abstract explanations is not only a problem in mathematics. Many physics books suffer from the same problem. A great example is how quantum field theory is usually explained by the standard textbooks (Peskin-Schröder and Co.). Most pages are full of complicated computations and comments about high-level stuff. After reading one of these books you can not help yourself but have the impression: “well, quantum field theory is complicated stuff”. In contrast, when you read “Student Friendly Quantum Field Theory” by Robert Klauber, you will come to the conclusion that quantum field theory is at its core quite easy. Klauber carefully explains things with pictures and draws lots of analogies. Thanks to this, after reading his book, I was always able to remember the most important, fundamental features of quantum field theory.

Another example from physics are anomalies. Usually they are introduced in highly complicated way, although there exists a simple pictorial way to understand anomalies. Equally, the Noether theorem is usually just proven. Students accept its correctness, but have no clue why it is correct. On the other hand, there is Feynman’s picture proof of the Noether’s theorem.

The message here is similar to what I wrote in “One Thing You Must Understand About Studying Physics“. Don’t get discouraged by explanations that are too abstract for your current level of understanding. On any topic there exists some book or article that explains it in a language that you can understand and that brings you to the next level. Finding this book or article can be long and difficult process, but it is always worth it. If there really isn’t any readable on the topic that you are interested, write it yourself!

 

Physics Model Fits in Mathematica

This shouldn’t be hard. We have some physics model and want to find values for the model parameters that yield some experimentally measured values. However, there are several small things that aren’t obvious and it took me quite some time to make things work. I wasn’t able to find a good explanation how such a problem could be solved in Mathematica. Now that I figured it out, I thought it would be a good idea to share what I’ve learned.

A short disclaimer: I’m not a Mathematica expert. There are certainly much better ways to do this. However, I found it so frustrating that there was no good information available and maybe someone finds this post helpful. If you have any recommendation how I could do things better, a short comment would be awesome! I’ll update this post when I discover further optimizations.

So… the set up is the following:

We have a number of model parameters (a,b,c,d,….) and we seek numerical values for them such that several experimental observables are reproduced as good as possible. The experimental observables are our fit targets.

The connection between the parameters and the observables are given by the model. An explicit example may be helpful: In a Grand Unified Theory, the model parameters are the Yukawa couplings and vacuum expectation values (vevs). We want to find those Yukawa couplings and vevs that reproduce the measured masses and mixing angels as good as possible.

The measure of how good a given set of numerical values for the parameters reproduce the observed values is given by $\chi^2$ (speak: chi-square):
$$ \chi^2 = \frac{O_i -F_i}{\text{err}(O_i)}. $$
It is defined as the sum of the squared differences between the fit target values $F_i$ that are computed for a given set of numerical values for the parameters and the actually measured values for the same observables $O_i$. In addition, each term in the sum is weighted by the experimental error of the measured value $\text{err}(O_i) $. This way one takes into account that it is less bad if some fit target value is a bit off when the experimental value is only vaguely known.

In Mathematica, the model is defined as a function. The variables of the function are the model parameters and in the end, the function returns a value for $\chi^2$. For each set of numerical values for the model parameters, the model function spits out a number that tells us how good these numerical values reproduce the known experimental values.

Here is an example, where we start with some Yukawa couplings and vacuum expectation values (VEVs) and try to fit the masses such that the experimental values are correctly reproduced.

Massesexperimental = {171.7, 6.19*10^-1, 1.27*10^-3, 2.89, 5.5*10^-2, 2.9*10^-3};
Masserrorsexperimental = {3.0, 8.4*10^-2, 4.6*10^-4, 9.0*10^-2, 1.55*10^-2, 1.215*10^-3};

chisquarefunction[{y2711_,y2712_,y2713_,y2722_,y2723_,y2733_,y35111_,y35122_,y35133_,v1_,v2_,v3_,v4_,vbig1_,vbig2_,vbig3_,vbig4_}]:=

Block[{Y27,Y351,mu,md,MC,MR,Z,a,mdeff,fittedMasses,chisquare},


Y27=({{y2711,y2712,y2713},{y2712,y2722,y2723},{y2713,y2723,y2733}});

Y351=({{y35111,0,0},{0,y35122,0},{0,0,y35133}});

mu=Y27*v3+Y351*v4;

md=Y27*v1+Y351*v2;

MC=Y27*vbig1+Y351*vbig2;

MR=Y27*vbig3+Y351*vbig4;

Z=Transpose[MC.Inverse[MR]];

a=MatrixPower[({

 {1, 0, 0},

 {0, 1, 0},

 {0, 0, 1}

})+ConjugateTranspose[Z].Z,-1/2];

mdeff=(md.a);



fittedMasses=Flatten[{SingularValueList[mu],SingularValueList[mdeff]}];

chisquare=Sum[((Massesexperimental[[i]]-fittedMasses[[i]])/Masserrorsexperimental[[i]])^2,{i,6}];


Return[chisquare]

];

Now, we want to tell Mathematica that it should compute those model parameters that reproduce the experimental values as good as possible. This means we need to tell Mathematica to minimize our model function because a minimal  $\chi^2$ value corresponds to the best possible fit for the model parameters.

In general, this can only be done numerically. A problem here is that it’s never possible to be 100% certain that a given minimum for the model function is really the global minimum, i.e. the best fit that is possible. Thus, we need to restart the search for the minimal value as often as possible with different starting values. The best-fit point corresponds then to set of numerical values for the model parameters that yields the smallest $\chi^2$ value.

A numerical search for a minimum of a function is done in Mathematica with NMinimize[]. However, the usage of NMinimize[] is not that trivial. In theory, we simply plug in our function, tell Mathematica what our variables are and he spits out the values for them that correspond to a minimum.

NMinimize[
{chisquarefunction[{y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4}]}
, {y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4}];

However, in practice, there are three things that need to be taken into account additionally.

1.) There are different methods and options for NMinimize[] and different methods and options are suited for different problems. In our case, we are dealing with a function that has many variables and lots of local minima. (For example, each fit point where some mass is fitted to zero corresponds to a local minimum. The “minimal” $\chi^2$ value is in this case simply the square of the experimental value divided by the experimental error. For example, if the error is 5% and all other observables are fitted perfectly, the “minimal” value is $\chi^2= 1/0.05=400$). Some explanations for the various methods that are available for NMinimize[] are given here. In our case, the method DifferentialEvolution yields the best results. In addition, we need to tell Mathematica how long he is allowed to search. This is done by setting: MaxIterations-> 1000 or so. Take note that there can be memory problems and unexpected kernel quits if you allow too many iterations. (One way around this is to call NMinimize[] in parallel. Then, it is no longer a big deal if one of the subkernels quits because the master kernel still runs and simply starts a new subkernel.) In addition, because there is no way to make sure that a minimum that is found by a numerical procedure is indeed a global minimum, one should start the minimization several times, starting from different starting points. This can be achieved by calling the NMinimize function in Mathematica with different “RandomSeeds”. Here is an example:

fct[randnr_] := 
NMinimize[{chisquarefunction[{y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4}]},
 {y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4},
 Method -> {"DifferentialEvolution", "RandomSeed" -> randnr}, MaxIterations -> 100];

To perform the minimization for some random seed, we simply execute

fct[11]

2.) Mathematica tries to do some analytic simplifications of the function that should be minimized. This can be really problematic if we are dealing with a complicated model. In our example from above, especially the MatrixPower and the SingularValueDecomposition is problematic. Thus, when one executes NMinimize[] for a complicated function nothing happens for hours. If there are analytic simplifications that can be done, one possibility is to do them before the minimization and give NMinimize[] the optimized function. To avoid that Mathematica tries for hours to do analytic stuff to our function is to use Hold[]. If we tell Mathematica to minimize Hold[ourfunction], he no longer tries analytic stuff. For the minimization of Hold[ourfunction], he simply plugs in numerical values for the variables, looks what $\chi^2$ he gets back and then based on this information decides what he does next, i.e. what numerical values he plugs in, in the next iteration.

fct[randnr_] := NMinimize[
{Hold[chisquarefunction[{y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vbig1, vbig2, vbig3, vbig4}]]},
 {y2711, y2712, y2713, y2722, y2723, y2733, y35111, y35122, y35133, v1, v2, v3, v4, vsu51, vsu52, vSO10, vSO102, phase, ve6, ve62},
 Method -> {"DifferentialEvolution", "RandomSeed" -> randnr}, MaxIterations -> 100];

And again, to perform the minimization for some random seed, we simply execute

fct[14]

3.) NMinimize[] dislikes boundaries. If you want to tell Mathematica that some parameter $p$ should be a really, really large number, say $10^{16}$, a bad way to do this is to execute NMinimize[] with a boundary for the variable. The fit results get incredibly worse if boundaries are given. I suppose this is because the fit algorithm then can’t move as freely as he would like. However, without any, hint Mathematica never yields a fit result with such a large number ($10^16$) for one of the variables. Instead, he starts with values of order one for the parameters and ends up in local minima with order one values for the parameters. In physics, we often have some scale in mind for some of the parameters. For example, in a Grand Unified Theory, some of the VEVs must be larger than $10^{16}$ and any fit result with smaller values doesn’t make sense. The trick is to rescale all parameters such that they are of order one. For the parameter $p$ that should be around $10^{16}$, we simply define $p’ = \frac{p}{10^{16}}$, which yields $p= p’ 10^{16}$ and plug this into our model function. Then, we have a model function that is a function of $p’$ instead of $p$ and it is no longer a problem that Mathematica finds order one solutions for $p’$.

In our example, the big VEVs $vbig1,vbig2,vbig3,vbig4$ should be, for physical reasons, around $10^16$. Thus, we define our model function as follows:

Massesexperimantal = {171.7, 6.19*10^-1, 1.27*10^-3, 2.89, 5.5*10^-2, 2.9*10^-3};
Masserrorsexperimental = {3.0, 8.4*10^-2, 4.6*10^-4, 9.0*10^-2, 1.55*10^-2, 1.215*10^-3};

chisquarefunction[{y2711_,y2712_,y2713_,y2722_,y2723_,y2733_,y35111_,y35122_,y35133_,v1_,v2_,v3_,v4_,vbig1_,vbig2_,vbig3_,vbig4_}]:=

Block[{Y27,Y351,mu,md,MC,MR,Z,a,mdeff,fittedMasses,chisquare},

Y27=({{y2711,y2712,y2713},{y2712,y2722,y2723},{y2713,y2723,y2733}});
Y351=({{y35111,0,0},{0,y35122,0},{0,0,y35133}});

mu=Y27*v3+Y351*v4;
md=Y27*v1+Y351*v2;
MC=Y27*vbig1*10^16+Y351*vbig2*10^16;
MR=Y27*vbig3*10^16+Y351*vbig4*10^16;
Z=Transpose[MC.Inverse[MR]];
a=MatrixPower[({
{1, 0, 0},
{0, 1, 0},
{0, 0, 1}
})+ConjugateTranspose[Z].Z,-1/2];
mdeff=(md.a);

fittedMasses=Flatten[{SingularValueList[mu],SingularValueList[mdeff]}];

chisquare=Sum[((Massesexperimental[[i]]-fittedMasses[[i]])/Masserrorsexperimental[[i]])^2,{i,6}];
Return[chisquare]

];