Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

fuzzy matching query

Status
Not open for further replies.

ad602000

IS-IT--Management
Jul 10, 2002
17
AU
Hi all,
I have a query which appears to run ok and I start getting a result but then
Access just stops responding, I 've tried running it o/nite when there is
little network traffic but the same. My sql for the query is as follows

SELECT initials.Surname, initials.Suburb, PUBLIC_CLIENT.SURNAME,
PUBLIC_CLIENT.ADDRESS_LINE_3
FROM initials, PUBLIC_CLIENT
WHERE
(((Fsoundex([initials]![Surname]))=FSoundex([PUBLIC_CLIENT]![Surname])));


the module this query calls is as follows;

Option Compare Database
Option Explicit



Public Function fSoundex(Word As String) As String

Dim strCode As String
Dim strChar As String
Dim lngWordLength As Long
Dim strLastCode As String
Dim i As Long

'Grabs the first letter
strCode = UCase(Mid$(Word, 1, 1))
strLastCode = GetSoundCodeNumber(strCode)

'Stores the word length
lngWordLength = Len(Word)

'Continues the code, starting at the second letter
For i = 2 To lngWordLength
If Mid$(Word, i, 1) <> &quot; &quot; Then ' If there is a space go to the next
character

strChar = GetSoundCodeNumber(UCase(Mid$(Word, i, 1)))

' If adjacent numbers are the same,
' only count one of them
If Len(strChar) > 0 And strLastCode <> strChar Then
strCode = strCode & strChar
End If
strLastCode = strChar
End If
Next

'Trim it down to a maximum of four characters...
fSoundex = Mid$(strCode, 1, 4)

'... but if it's less than four characters, pad
'it out with a bunch of zeros...
If Len(strCode) < 4 Then
fSoundex = fSoundex & String(4 - Len(strCode), &quot;0&quot;)
End If

End Function

Private Function GetSoundCodeNumber(Character As String) As String

'Accepts a character and returns the
'appropriate number from the Soundex table

Select Case Character
Case &quot;B&quot;, &quot;F&quot;, &quot;P&quot;, &quot;V&quot;
GetSoundCodeNumber = &quot;1&quot;
Case &quot;C&quot;, &quot;G&quot;, &quot;J&quot;, &quot;K&quot;, &quot;Q&quot;, &quot;S&quot;, &quot;X&quot;, &quot;Z&quot;
GetSoundCodeNumber = &quot;2&quot;
Case &quot;D&quot;, &quot;T&quot;
GetSoundCodeNumber = &quot;3&quot;
Case &quot;L&quot;
GetSoundCodeNumber = &quot;4&quot;
Case &quot;M&quot;, &quot;N&quot;
GetSoundCodeNumber = &quot;5&quot;
Case &quot;R&quot;
GetSoundCodeNumber = &quot;6&quot;
End Select

End Function


any help appreciated
peter

 
this appear to be attempting to match on the calculated fields as they are calculated. In general, this is always going to tak a LONG time, as the process will attempt to calculate the soundex for every 'row' in BOTH tables for EACH row in Either Table.

An approach to AVIOD this is to have seperate queries which each 'calculate' the soundex for their respective surnames and then do the standard join of the two tables on the calculated soundex values.

Another consideration here is that there would (in the general Case) be multiple matches for each soundex, so you cross product may be considerably larger than expected. In part, you can limit the additional matches by extending the soundex. You limit it's length to 4, where the original design suggested that six characters were necessary to adequately 'match' names. I have often used as many as eight chars on as few as 1K names - and that was just attempting to find 'duplicates'.

Additionally, you calculate the 'soundex' for the entire length of the surname, and then TRUNCATE the results to four chars. What ever length you finally use, it seems pointless to generate a value which you know will be discarded, so your main routine should check hte generated length of your soundex and stop whenever the desired length (4 / 6 / n) is reached.

Then, again, for soundex to work properly - all of the calculated values need to be the same length, and I do not see where you pad the short entries to the desire length.

One additional issue. With the traditional Soundes routine(s), a MAJOR part of the process is to replace dipthongs whith their counterpart. I also do not see this feature in your implementation. It has been a while since I actually implemented a 'new' soundex function, or studied the requirements - but I do believe that your implementation is deficient in each of the ares mentioned above.

MichaelRed
m.red@att.net

There is never time to do it right but there is always time to do it over
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top