The average spreadsheet has an error every hundred cells

Tue, 22 Jun 2021 - Gertjan Filarski

Yesterday I learned something new. Something that I should have known. I found research by Ray Panko tracing the average number of errors in spreadsheets. The conservative number? 1 in every 100 cells is wrong.

Think about that. Now also consider the number of decisions people make based on spreadsheets.

Now tell me that is not a problem.

Spreadsheet Research

Ray Panko is Professor of IT Management at the University of Hawaii. You can find his work at his website Spreadsheet Development Error Experiments.

This page looks at error rates during spreadsheet development. Specifically, it looks at the cell error rate (CER), which is the percentage of cells that is incorrect in the final model, when the developer declares the model finished to his or her satisfaction.

For the nine studies that collected data on errors when developers worked alone, the CERs averaged 1.1% to 5.6%. The results are consistent with error rates in other simple but nontrivial cognitive action such as writing, doing a calculation, or writing a line of software code.

The very clever Marieke van Erp (a former colleague of mine, friend, and leader of the Digital Humanities Research Lab at the Royal Netherlands Academy) pointed me to further research published by Springer. Jonathan Maletic and Andrian Marcus found an error average of 5% introduced by people during data acquisition and cleaning.

Finally, Felienne Hermans at the Delft University analysed the email and attachment archive of the Enron Corporation published at the 2015 International Conference on Software Engineering. The entire dataset contains over 15.000 spreadsheets with an average number of 6191 non-empty cells, and 1286 formulas per spreadsheet. She found that 24% of the spreadsheets have at least one error in a formula - and significant issues with long calculation chains to boost.

So I think I can safely say that both the data and the formulas in the spreadsheets we build are very likely wrong. We need to accept this. We know this for quite some time. Ray Panko has been working on these errors since 1998.

Decision making

Sometimes the consequences are big. 'The London Whale Trader' netted JPMorgan a $6.2 billion trading loss in 2013. James Kwak wrote in his analysis on The Importance of Excel in the debacle:

After subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeller had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR (value-at-risk)...

Oops.

And juicy as something this big is, the real problem is felt everywhere in all economies, all over the world. After I found these figures I quickly posted them on LinkedIN yesterday. Within half an hour there were some heartfelt replies:

I was working in a project and halfway we ran out of funds. Nobody understood it because there was very precise control. But somebody had forgotten to adjust the hour price of external personnel to the real price, instead of the price predicted before the project start. A small error with a huge outcome 😕

How to deal with errors

You cannot blame people for making mistakes. We are human and science suggests we have, on average, a 5% error rate in non-trivial activities like making calculations or writing lines of code. That is what people do - and quite consistently do. Software engineers deal with this truth all the time and have built some brilliant tools to limit the number of mistakes they make.

formula

The above is an INDEX formula with some hoops that Excel makes you jump through. I won't go into details, but it is an example of common work an Excel power-user does. And it is a nightmare to write. I know some people take pride in this kind of work - but I'd rather be sent to a penal colony doing hard labour.

This is not how you write code. And yes - even in Excel: when you are writing IF-statements and refer to variables then you are, for all intents and purposes, writing code. And you deserve better support.

All the common tools are missing that software development platforms have had for decades. Where is the color coding to separate variables from keywords and values? And for crying out loud: Excel expects me to write this formula on a single line!! Without any indentation to see in which clause or IF statement I am working! Not to mention proper code completion... no wonder people make mistakes.

These tools exist and they are not rocket science. They are the very basic. Software engineers have far more sophisticated solutions: extensive automated mocking and testing processes before release, dedicated debugging platforms, A/B testing, and extensive version control systems. Although it would be fantastic to have that at your disposal in a spreadsheet - I would be content with simple line indentation for starters.

Why isn't that a standard part of Excel? Why are these systems not actively helping to reduce the number of mistakes people make? I mean, it's not as if we take any big decisions based on a spreadsheet, right?

Can we help you?

Audits & Consultancy

An extra set of eyes to see if you are where you want to be, or going where you had imagined.

Grant Acquisition

Both editing and authoring of (EU) funding proposals.

Interim Management

Temporary assignments at home or abroad to get projects and teams up to speed - or back on the rails.