“Be kind, for everyone you meet is fighting a hard battle” - Often attributed to Plato but likely from Ian McLaren (pseudonym of Reverend John Watson)

## Sunday, September 30, 2007

### Data and Statistics

I've collected data on the fuel consumption of my LR3 HSE since I purchased it in November of 2006. I've missed a couple of fill ups and a few miles when my wife has used it when I've been out of town, but by and large it's a pretty complete data set.

I've driven the vehicle in a way that most people would consider normal in terms of speed, acceleration, behavior at lights and on hills, etc. and I've more recently driven it in a way that most would regard as extreme with respect to such matters. It's obvious on the surface that my fuel economizing techniques are effective, I need only look at the graphs. But what do the statistics say?

I keep track of mileage at the most recent fill up, as well as three tank, five tank, and ten tank moving average. I track the standard deviation of the mileage (separately for before and after resumption of fuel economy maximization). The "before" data consists of 35 points, the "after" of 21 points. Surprisingly, the standard deviation of the "before" data is 0.66 m.p.g., that of the "after" is 1.20 m.p.g.

Standard deviation is a measure of "central tendency," that is, of the tendency of a data set to be clustered closely to the mean (the mathematician or statistician's term for average), or scattered far from the mean. For a so-called "normally distributed" population (that is, a population that when plotted exhibits the classic "bell curve"), about 68% of the data points will lie within plus or minus one standard deviation of the mean, about 95% within two standard deviations.

We're actually looking at an experiment here though, the question is how accurately does the mean of my mileage calculations reflect the actual gas mileage I've achieved? What we're looking for is the standard error of the mean. It's the standard deviation of the population (as computed above, itself an estimate) divided by the square root of the sample size. So, for the pre-economizing driving it's 0.66/sqrt(35)=0.11. For the post-economizing driving, it's 1.20/sqrt(21)=0.26. This latter number means the true mean gas mileage for this driving methodology, vehicle, and driving regime has about a 95% probability of being within 19.61 +/- 2*0.26 m.p.g. Or, there's about a 5% chance that the actual population mileage is outside of this range, that is, there's a 2 1/2% chance it's less than 19.09 m.p.g. and a 2 1/2% chance it's greater than 20.13 m.p.g.

This is far higher than the EPA estimate, and the graphs of my mileage show the improvements leveling off. Well, I can probably begin to make deductions based on the data I have regarding what can be done.