How “helpful” Excel corrupts your data
By Robert Cowham
In this article I want to look at some of the problems with Excel which applies “helpful” formatting and ends up corrupting your data (if you are not careful!).
Excel has a lot of intelligence which tries to interpret the values in particular columns and to automatically format that data in the way that makes most sense. Most of the time this is very useful and a real time saver. However, if you aren’t aware of the potential pitfalls you can end up corrupting your data because of this reformatting.
This problem is particularly frequent when importing CSV files, but it can also occur if you are just manually entering rows into an existing spreadsheet.
Number formatting – large number “corruptions”
A classic example of this is telephone numbers as shown in the screen shot to the right.
There is a column of Telephone numbers, and there are a couple of issues shown.
Why are the telephone number in D2 and D3 left and right aligned respectively – why aren’t they the same? In this example the number with hyphens in it is being treated as text and thus is automatically left justfied. the number in D2 is treated as a number which is automatically right justified. It’s easy to set the formatting so that all cells are justified as you prefer. As an aside, there is a data standards issue if some of your numbers have hyphens and some don’t!
The entry in D4 is being shown in scientific format (3.53+11), although we can see that the cell contents (shown next to fx) are a full Irish telephone number starting with country code 353. This is a problem when printing, since you need every digit to be visible to dial it correctly! One way of getting it to print correctly is to make the first character of the cell the single quote (‘) character. This makes the value be treated as text. Interestingly, if you just change the cell formatting to be Text (right click the cell > Format Cells) it will not change the format – because the cell value has already been entered.
Both of these issues result in the cells being formatted as General, at which point Excel analyses the contents to decide if it is a number or text or a date or whatever.
Number formatting – the leading Zero problem
Another issue is that Excel kindly removes leading zeroes in columns of numeric data.
In the example shown, Cabinet name is 01, 02, 03 etc. I have forced the values for the first two cells to be correct by formatting them as ’01 and ’02. The values 3, 4, 5 are not being shown correctly.
When this sort of thing happens one common result is that the cabinet names do not sort correctly. You get 1, 10, 11, 2, 3, … as the names sort on the first character.
We encounter this so often, that we tend to recommend that all cabinet names start with an character prefix such as A01, A02.
Date formatting problems
Dates are in some ways worse than numbers as Excel tries hard to format what you type and turn it in to a valid date in your local country date format.
When you then think of the multitude of possible date formats you might want, and Excel is trying hard to give you what you want – life can become interesting! It is even more so when dealing with US and UK date formats (mm/dd/yy vs dd/mm/yy) or similar issues.
In the example shown, the same date of 1st September 2013 is formatted in several different ways. Most of these are relatively obvious, although we can see that UK format is being used.
The last example is interesting as this is the hidden numeric representation of a date which is days since 1st Jan 1900.
Problems Importing CSV Files
All of the above issues are particularly bad when you just import a CSV file in Excel using the automated settings. Excel treats all values as General and tries to convert those that look like numbers into numbers, and those that look like dates into date values.
If you just double click a CSV file in Windows Explorer, then the default settings will open it up in Excel, and fields might get corrupted.
The safer way to do things is to run Excel first and then select Data > Get External Data from Text (this is Excel 2010). This results in a wizard which you can step through to select field separators etc. Step 3 is the import one, where you need to select particular columns and classify them as Text or Date rather than General (see screen shot).
Hopefully this has helped hightlight a few of the issues that we regularly come across when working with our clients.
In future posts we will look at other areas where Excel may not be the best tool for the job!