May 12, 2003

Equation 0

My favorite equation is the following:

data = model + residual (Eqn 0)

When it was first presented to me it was in the form:

residual = data - model (Eqn 0')

In the 0' form it is a way to compare your understanding of a problem (model) with the way that the world actually works (data). Another way of thinking about eqn 0' is that the model is the part of the world that you understand and the residual is the part of the world that remains to be explained. Your model of the world is acceptable to the extent that your residual is acceptable. In general, acceptable resduals are very much like low volume white noise (they have no structure and low amplitude).

Begin Aside
Notice that I have not said that the model is true or false, it is only accepable or unacceptable.
End Aside


If your model is not acceptable there are two things that can be done. The first is to refine the parameters of the existing model. Lets say that our model of how much CO2 a car puts into the atmosphere is a linear function of how many miles it is driven. We might write that down as follows:

CO2 = a * miles + b (Eqn 1)

a and b are the parameters of the model. a is the slope of the line and b is the "0 intercept". The values of a and b are choices we make and can be adjusted based on the make and model of the particular car. Cars with better gas mileage will have a lower values of a. The intercept value, b, will be very close to 0 and will vary with the driver of the car; in my quick thinking tonight it might reflect the time that a driver allows her car to warm up before starting off.

The second option if your residuals are not acceptable is to change models. In the context of the example above perhaps the amount of CO2 emitted by a car is some more complicated function of its average speed:
CO2 = c * sqrt(avg speed) (Eqn 2)

In this case c is our adjustable parameter but we have also introduced a non-linear element (the square root) and an aggregate factor (average speed). (I am not going to into this further tonight, the important point is that there are alternate possiblities for our explanations of how the world works).

OK that is all fine and good, but what does it have to do with my preferred formulation of this equation, Eqn 0? Well my preferred form suggests that the data we actually collect reflects what we expect to find plus some surprises. This is a variation on Kuhn's ideas of a paradigm and pardigm shifts. In times of normal science, experiments are designed to explore the details (refine the values of a and b in Eqn 1) of the paradigm (model); we only look for what we expect to find. In times of pardigm shift, the surprise part cannot be ignored and we must replace our models (Eqn 1 vs Eqn 2).

The key issue here is that Eqn 0 and Eqn 0' are the same equation. Each form has surprise in it and models and data are acceptable to the extent that our level of surprise remains acceptable.

Begin Aside
Notice that I have not said that the model is true or false, it is only accepable or unacceptable.
End Aside