Using the new and sadly broken Excel Worksheet 'CustomProperties' collection
( Cut for VBA geekines and trauma for VBA-Phobics... )
I can has aspirin nau?
This afternoon I am working away on some SQL, automating the process by pasting the existing CREATE TABLE scripts in Microsoft Excel and using some elementary string-handling to split the clauses and automate a job that is essentially a series of find-and-replace operations.
( Oh go on... Show us some VBA... )Being geeky, and somewhat bored, I've embedded a Media Player in a spreadsheet and linked to the BBC Wimbledon stream. The Ladies' semi-finals, I think, and a very pleasing distraction it is, too.
This evening I shall trundle to Darkest Somerset, for tomorrow's Dan Grade Group Course at the Ki Federation's dojo. Hopefully I'll be back at a civilised hour on Saturday...
More thoughts on string-matching and measuring the similarity of words, phrases and addresses: this time, I'm looking at the Longest Common String approach, and a modified LCS algorithm that gives consistent results on addresses and sentences without the need for a separate scoring process that examines the word order.
It's up to you to paste this into the fuzzy-matching VLookup functions I posted a few days ago. Let me know how you get on!
Ever had to look up a name or an address in a list that doesn't quite match, so the standard Excel 'VLookup()' and 'Match()' functions don't help you?
Thanks to some suggestions made by friends on a previous post, I've been able to code a lookup function that returns the closest match in a list:
Public Function VFuzzyLookup(Lookup_Value As String, Table_Array As Variant)
Find the best match for a given string in column 1 of an array of data obtained from an Excel range
This is functionally similar to VLookup, but it returns the best match, not the first exact match
This function is not case-sensitive.
- If your data quality is poor, you are advised to display the retrieved index value from column 1 and use the MatchWord function on this index value to reveal the fuzzy-matching 'score' and discard all results below a threshold score.
- Use VFuzzyLookup_Phrase if you are trying to match a phrase or sentence (a sequence of words separated by spaces) as that function is faster, and has additional logic for the word order.
- Use vFuzzyLookup_Address if you are looking up names and addresses: that function has additional logic to normalise the common abbreviations and word-order conventions used in British addresses.
Yes, you read that correctly: you have a best-match lookup, vFuzzyLookup_Address() for lists of names and addresses. It could do with some performance-tuning (I wouldn't try it on lists exceeding 1024 members if you're in any kind of hurry) but it's usable.
Free to use any or all of it: there are limitations to what it can do, but I think this is good enough for everyday use and it plugs an obvious gap in the standard Excel worksheet functions.
An interesting question - easy to answer if two strings "ABCD" and "ABCD" are identical, harder if you have two sets of names and addresses that almost match and need some kind of fuzzy-matching process with a threshold.
What algorithms and metrics are out there?
Which of them are practical in a VBA or VB.NET environment, where simplicity is essential and performance is an issue?
Another day, another dollar... Except that they only accrue interest on weekdays, and I can't pay in the cheque on a bank holiday.
The point is that we need a quick way of knowing if a given day really is a business day, and to shift it forward (or back) automatically if it isn't. More importantly, we need to do so without mounting our own denial-of-service attack on the calendar database - because this is the kind of thing that appears in a spreadsheet with tens of thousands of payment dates, all of which will need to be checked. Most companies have to maintain a database of public holidays: here's an example for stock exchange trading and settlement holidays from an international stockbroker in Singapore*, and if you're going to use the sample code in this post, I would suggest that you grab it and save it into a database table.
This is where it starts getting difficult: I'm a VBA developer, and I don't have the sophisticated data structures available to a C++ developer. The data structures I can use are:
This example retrieves a large recordset of all the calendars, filters it down to the currency or calendar group we want, and applies a neat optimisation to the ADO 'Find' method.
( How the optimisation works: )
( The code is hidden behing a cut tag because there's rather a lot of it... )
*Tullet's have published it freely - which makes them jolly nice chaps, because this information is tedious to collect and very rarely published.
31/08/2005
30/11/2005
28/02/2006
28/05/2006
28/08/2006
28/11/2006
28/02/2007
28/05/2007
28/08/2007
28/11/2007
28/02/2008
' Special handling required for adding months at EOM:
' VBA.DateAdd("m", 1, "28 Feb 2006") = 28/03/2006 (!)
' Business logic is ALWAYS that adding a month to EOM
' gives the end of the following month -31 Mar 2006.
This is a fairly reliable method for recovering VBA from corrupted workbooks, and it runs in all versions of Excel from 97 onwards. ( Of interest to Excel geeks ONLY! You have been warned. )
Yet another specialist post, intended for my own reference and for VBA developers. The main interest is the example code for handling merged cells in VBA; they are far more trouble than they are worth.
Actually, it might interest other programmers, too, as a kind of deterrent to ever working with Excel. Be afraid, be very, very afraid... ( So it's all behind a cut tag )
More Excel stuff: this source code is probably of limited interest, even to other VBA developers.
This is particularly true of the real programmers in the HairyEars LJ Friends List, all of whom could code this up in a matter of minutes with a few dozen lines of C++. The post is very much for my own reference, and for the occasional visiting VBA developer. ( So it's all behind a cut tag )
This is a code module that gains access to the Windows registry API and reads the 'locale' settings that encode Time Zone information. Obtaining information about timezones is made quite difficult in Windows; you can always tell the time here and the daylight-saving work is all done for you - why on earth would anyone want want to know exactly when this happens, or what the time is somewhere else? ( Look behind the cut... )