Spreadsheet data problems – inconsistent naming and values
By Robert Cowham
Naming is fundamental part of configuration management – good naming standards and conventions make a huge difference in an organisation. And yet we have seen the problems that our customers often have with doing this.
There are different types of naming issues. The basic issues need to be resolved by having good standards and master lists.
A simple example
Let’s consider the definition of a server model. The following are all perfectly good ways of identifying one type of model:
- DL 380 G7
- DL 380
- DL 380-G7
- HP DL 380 G7
- HP Proliant DL 380 G7
So which is the best one?! There is no one right answer, but there is a good answer for your organisation.
The problem is that if you have more than one way of specifying this particular model then it becomes more tricky to summarise and report your data. Multiply this by the number of different types of server or other equipment that you have, and it becomes a real headache.
Let’s look at an even simpler example – defining the speed of network ports. We have seen people record the same value in various ways:
- 1 Gigabit
Defining lookups for fields in Excel
The problem with simple lookup fields such as Port Speed can be addressed using data validation options within Excel.
You can define cell validation with a small set of lookup values defined elsewhere in the workbook.
This would give you a drop down as shown in the screen shot to the right.
This works nicely where the user is typing the values in, but it can be a little fiddly to setup and administer. The other problem is where the data is being imported via CSV or similar format.
Filtering and sorting columns to find inconsistent values
It is quite easy to use Excel’s standard filtering and sorting facilities so that you can sort rows of data by a particular field.
This makes it easy to at least identify that there are inconsistencies. At that point you filter to get only the invalid ones and manually fix them.
In the example to the right, we have clicked on the Model column in B1 and can see 3 different variations on a particular model name.
A slightly more advanced alternative is to create a pivot table – this would potentially show all the values and you can summarise how many times particular values are used. This indicates the scope of the problem.
So when would you choose the sorting/filtering instead of creating a lookup field? Probably when the number of potential values is quite high. So for example with models of servers, it is not unusual to have tens or hundreds of possible options. That many possible values does not lend itself to drop down validation.
Spreadsheets vs Databases
One issue with going a long way down the lookup validation route is that while it seems simple enough when you start, it can quickly become very complex. You are effectively creating a database equivalent with multiple tables within your spreadsheet! It is often particularly difficult to keep multiple copies or versions of the various spreadsheet lookup tables up-to-date.
This is where using a proper database solution starts to pay dividends. It scales and there is only one place to update lookup values.