Home About us Development Training Support |
Deduplicate names and addressesRemoving duplicates is one of the more difficult tasks when you're working with lists of names and addresses. There are three ways of doing this - four if we include the simple-minded approach of leaving it all to the user:
Different businesses have different views of duplicates so you must get managerial support for your strategy. A business sending out cheap mailshots to an unsophisticated audience can afford duplicates. A business sending out samples of high-value products will be more demanding. Let the user do itThis can work with a good operator who is the sole user of a small data set that changes slowly. If there are only a thousand or so records then the user will become familiar with the names and will notice if two names are similar.You may need to provide some extra tools to modify other tables in the database before a duplicate can be removed. For example, the orders that were assigned to the duplicate entry must be reassigned before the duplicate can be removed. Help the userEven the best user cannot be expected to remember all the entries in a large table or in one that changes rapidly. Duplicates are difficult to remove once they become established so the best time to catch them is at the time of data entry. Show the user a list of possible duplicates before committing this new record to the table.A search on the following fields will pick up the majority of potential duplicates:
Semi-automatic toolsIf you are checking for duplicates at the time of data entry then you must do it quickly. The list of possible duplicates must appear in less than a second, especially if your user is taking details from a new customer by phone. You can build a tool that does a more thorough search and include it on the maintenance menu.Add the following additional tests in a maintenance tool:
If you do build such a housekeeping tool then it might also be a good place to look for the predictable joke names such as 'M. Mouse' and 'B. Bunny'. Automatic toolsIt is very difficult to remove duplicates automatically. You must have a rigid data format as a starting point and be confident that names of towns, counties and countries are spelt consistently. You will also have to write algorithms that can identify house names, road names, and suburbs and extract these components of the addresses. These routines must be robust enough to cope with addresses that range from the minimal:
Batheaston House
Bath to the baroque:
Flat 2a
"The Lodge" The Batheaston House Inn High House Lane Off Bath Road Batheaston Near Bath Bath and North East Somerset BA1 1AA Are these duplicates? Only a local could tell you. |
Hints & tips
The textbox class in Visual FoxPro 9 has a new Autocomplete
property which shows the user the previous values that have
been entered in that textbox.
More...
|