The Credit Card Numbers
A few years ago, I read an amusing article about a chap who recorded his credit card details on an Excel spreadsheet
When he subsequently retrieved this information, the last digit of some numbers had somehow changed, which inevitably caused a good deal of chaos.
He was a victim of the dreaded 'Excel Eccentricities'.
Although an Excel spreadsheet looks simple enough, it does make various assumptions about the data in the absence of specific information—but these assumptions are not always helpful and can give rise to spurious results.
For example, Excel can handle many different types of data including text, numbers, currency, dates, etc.—but, unless you specify the type of data contained in a cell, it will assume a data type based on the actual data entered.
Take a typical credit card number such as: 4508 0001 2345 6789
In Excel 2016, this will be interpreted as a TEXT field because it contains spaces.
However, if the number is entered, without spaces, as 4508000123456789, it is assumed to be a floating-point number—that being the Excel format for handling very large numbers.
In Excel 2106, with no pre-formatting of the cell, the result is likely to be 4.508E+15.
If you then set the cell to display as a normal number, this will most-likely produce an error because the internal format is still floating-point for a value of this magnitude.
In Excel 2016, the likely result is 4508000123456780 (note change of last digit).
A smaller number, such as 45080001234, doesn't exhibit these behaviours.
Note that while floating-point numbers have a very high precision, they are always an ‘approximation’ and may not give the desired result in accounting-type applications.
The quick answer here is to recognise that this so-called credit card number is, in fact, a credit card ‘reference’ and not intended for any sort of calculation—it is TEXT and should be specifically defined as such.
The format should be set before entering any numbers.
Note: The reason I mention Excel version numbers is because specific behaviours can vary between versions.
The following picture illustrates a common issue with Excel spreadsheets—the figures in the total column don't add-up to the overall total arrived at by 'summing' those figures.
By increasing the number of decimal places in two of the columns, it becomes clear that the VAT calculation sometimes generates results which include fractions of a penny and these are rounded when the figures are displayed to 2 decimal places.
However, this rounding of the displayed numbers does not necessarily change the underlying numbers used for calculation so adding-up the two lists (rounded and unrounded) will often produce a significantly different result.
For most commercial accounting purposes, working to the nearest penny is good enough so any greater accuracy in the underlying numbers is going to create these sort of rounding problems.
The answer to this issue is buried in the Excel 'Options' (Version 2016 is shown)—if you set the 'precision as displayed' option, the underlying numbers will be rounded identically to the displayed numbers and the problem is resolved.
Ironically, this generates a message stating that you will permanently lose accuracy though, to my mind, this is the most useful setting for simple accounting applications.
The bigger lesson to be learned here is that, when creating a new spreadsheet, you should always test thoroughly to ensure that it does generate the expected results.
I recently looked at a job where the potential client was having difficulty with copying Excel data and using 'Notepad' to create a text file suitable for import to Sage accounting software.
Using Copy/Paste to move data between applications sounds simple enough but it's not quite so straightforward 'beneath the surface'.
That's because the copy process will often pick up 'hidden' characters such as those used for formatting ─ for example, 'Newline', 'Carriage Return' and 'Tab' characters may be copied but are 'invisible to the naked eye'.
Also, the Copy/Paste function has several options whose precise functionality can vary between versions of Excel.
In short, Copy/Paste can be a bit of a menace.
In this case, it might have been better to 'Save As' a .TXT or.CSV file which gives you the option to define the character (delimiter) which separates the fields ─ usually a Comma, Semi-Colon or Tab.
However, that doesn't really protect you from those delimiters occurring naturally in the data or from control characters, such as Newlines, which may have been inserted manually.
My suggested strategy was to avoid the copy/paste operation altogether and use code to generate a text file directly, containing only those control characters required by Sage and removing any characters likely to cause problems.
Of course, the starting point would be to identify the precise requirements for the Sage input file.
Most Excel users will be familiar with the scenario where the spreadsheet simply 'hangs up' and stops working.
In recent versions of Excel, it may even go through a time-consuming process of 'recovery', even if your preferred option is to shut it down. his frequently happens with spreadsheets that work perfectly well for most of the time.
The precise reasons for this are not well-documented but experience suggests that the issues arise when the data to be processed exceeds the amount of available internal (i.e. RAM) memory.
Also, MS-Office generally does not seem very efficient at freeing resources, even after shutting-down the application.
Although an Excel (2016) worksheet can theoretically support 1,048,576 rows by 16,384 columns, experienced Excel users will appreciate that problems soon start to arise as volumes grow.
Sometimes, this gets to the point where the job simply won't run to completion and alternative strategies are required.
The other issue is that Excel, by default, recalculates the entire worksheet when changes are made ─ for example, if you have 100,000 lines with 10 formulae per line, that's at least a million calculations for each change (more, if any of the formulae use results generated by another).
In that situation, it's worth considering a 'database' approach which doesn't attempt to process the entire volume of data in one go.
Another useful strategy is to process the work in 'batches'.
In one recent job, which generated around 250,000 records from 'data scraping' a web site, I could only make it work by processing batches of 20,000, with a new 'working' spreadsheet being automatically created for each batch.
The generated records were then added-in to the 'Master' spreadsheet which did no other processing and was well-able to handle the volume.
Even with small jobs, I find that several hours of ACCESS and Excel development gives rise to an increasing number of 'freezes' and 'crashes', which can only be rectified by rebooting the computer.
It's well-known that rebooting solves about 90% of all PC problems.