James Sethna > Sloppy Models > Fitting Polynomials: Where is sloppiness from?

Cube, fourth, and fifth power of x plotted

Sloppiness comes from parameters which affect the predictions in similar ways. Here the monomials x³, x⁴, and x⁵ all are flat near zero, and increase smoothly near one. We can trade the coefficient a⁴ for a suitable combination of a³ and a⁵, and get almost exactly the same final function.

Fitting Polynomials: Where is sloppiness from?

Multiparameter models are usually sloppy - their parameters can vary over huge ranges without changing the fits to data very much. Where does this sloppiness come from? We provide three answers to this question, all really the same answer in disguise.

Consider the standard problem of fitting polynomials to data. Our model is a sum of monomials xⁿ, with coefficients a_n:

y(a₀, a₁, a₂, ... a₅) = a₀ + a₁ x + a₂ x² + a₃ x³ + a₄ x⁴ + a₅ x⁵ If we view a₀ ... a₅ as the six parameters of our model, it is extremely sloppy (even for polynomials with a rather small number of terms). At right, you can see that the monomials xⁿ all have roughly the same shape, and can be traded for one another without changing the fits.

1. Sloppiness comes from having groups of parameters whose effects can be traded for one another.

Indeed, if we are fitting a continuous function in the range between zero and one, we can explicitly calculate the Hessian matrix whose range of eigenvalues we use to decide whether the system is sloppy. This Hessian is the Hilbert matrix:

H_ij = 1/(i+j+1) =

1	1/2	1/3	1/4	1/5	1/6
1/2	1/3	1/4	1/5	1/6	1/7
1/3	1/4	1/5	1/6	1/7	1/8
1/4	1/5	1/6	1/7	1/8	1/9
1/5	1/6	1/7	1/8	1/9	1/10
1/6	1/7	1/8	1/9	1/10	1/11

a matrix famous for being ill-conditioned (having a huge ratio between its largest and smallest eigenvalue).

Changing the parameters can remove the sloppiness. When fitting polynomials to data, one is told to vary not the monomial coefficients a_n, but the coefficients b_n of a suitable class of orthogonal polynomials: here the shifted Legendre polynomials L_n. Here are plotted L₃, L₄, and L₅. By design, these polynomials can't be traded for one another. The same polynomial fit to the same data, expressed in terms of new variables b_n, is not sloppy.

Those readers who actually have fit polynomials to data will remember, though, that one doesn't usually calculate the monomial coefficients a_n. Instead, one fits to a sum of orthogonal polynomials

Y(..., b₃, b₄, b₅) = ... + b₃ L₃(x) + b₄ L₄(x) + b₅ L₅(x) (figure at left), designed so that every polynomial is a different "shape". This means that if we use the coefficients b₀ ... b₅ as the parameters in our model, it should not be sloppy. Indeed, the Hessian for the b_n is the identity matrix, and all the eigenvalues are equal to one. Notice that the space of functions is the same (polynomials up to fifth order), the best fit is the same, the ensemble of acceptable fits is the same - all we have changed is the way we've labeled the functions in our model.

2. Sloppiness comes from the way we choose the variables in our theory.

Notice that in the case of fitting polynomials, we often don't think that the monomial coefficients are very important: the b variables are almost as natural as the a variables. But in a biological model or an engineering system, the variables we use are usually natural and sensible, and using strange combinations of variables just to avoid sloppiness will usually not make scientific sense.

Skewness. The change of variables from the bare parameters (monomial coefficients a_n) to parameters natural to the model (coefficients of the orthogonal polynomials b_n is not a rotation in parameter space. It is more of a tortured, skewed stretching. The red arrows represent the change in natural coordinates when different bare coordinates are varied. Notice that one can get to nearly the same final behavior (tip of the skewed cube) by going along any of the three bare coordinates (they all affect the predictions in similar ways). Notice also that the volume of the skewed cube is small; it is proportional to the determinant of the mapping from bare to skewed coordinates.

Why did changing from the monomial coefficients a_n to the orthogonal polynomial coefficients b_n make such a big difference? Unlike many familiar transformations, this change is not just a rotation (in six-dimensional function space); it is instead a very skewed, shearing kind of transformation (figure at left). The meaning of "perpendicular" in parameter space is subtle: most scientifically natural choices for parameters are sloppy for this reason.
3. Sloppiness comes a severe skewness between the scientifically natural "bare" parameters and the parameters naturally governing the model behavior.

Can we understand the key properties of sloppy models mathematically? See Why sloppiness? The Sloppy Universality Class

References

"Sloppy model universality class and the Vandermonde matrix", Joshua J. Waterfall, Fergal P. Casey, Ryan N. Gutenkunst, Kevin S. Brown, Christopher R. Myers, Piet W. Brouwer, Veit Elser, and James P. Sethna, Phys. Rev. Letters 97, 150601 (2006), pdf.
Other papers on sloppy models

More on sloppiness:

Short course on information geometry, sloppy models, and visualizing behavior in high dimensions

Sloppy Models
- A sloppy systems biology model
- What is Sloppiness?
- What are Sloppy Models?
- Fitting Exponentials: Prediction without parameters
- Fitting Polynomials: Where is sloppiness from?
- Why sloppiness? The Sloppy Universality Class
- Differential Geometry and Sloppy Models (Transtrum)
  - The Model Manifold and Hyperribbons (Transtrum)
  - Sloppy Curvature (Transtrum)
- Why is science possible? Sloppy models in Physics.
  - Jessie Silverberg's Huffington Post article and Katheryn McGill's vlog Interview from The Physics Factor.
  - Unedited workshop interview by Steven Reiner, Stony Brook School of Journalism; Mobile version.
  - News article on our paper showing physics is sloppy too
Sloppy model applications
- Do parameters matter? Fits versus measurements.
- Experimental design in sloppy systems
- Robustness and sloppiness
- Estimating systematic errors for interatomic potentials and for density functional theory.
- Learning digits with InPCA
- Visualizing the learning trajectories of deep neural networks using InPCA

Last Modified: June 11, 2008

James P. Sethna, sethna@lassp.cornell.edu; This work supported by the Division of Materials Research of the U.S. National Science Foundation, through grant DMR-070167.

Statistical Mechanics: Entropy, Order Parameters, and Complexity, now available at Oxford University Press (USA, Europe).