Skip to main content
added explain plan
Source Link
Andreas
  • 213
  • 1
  • 7

Update

This is the explain-plan for the query:

SQL> SET LINESIZE 130
SQL> SET PAGESIZE 0
SQL> SELECT *
2 FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 1462516232

---------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |    74M|  3041M|  2197K  (3)| 07:19:29 |
|   1 |  NESTED LOOPS      |            |    74M|  3041M|  2197K  (3)| 07:19:29 |
|*  2 |   TABLE ACCESS FULL| CLIENT     | 71843 |  2174K|  2386   (2)| 00:00:29 |
|*  3 |   TABLE ACCESS FULL| TMP_CONTROL|  1033 | 12396 |    31   (4)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):

2 - filter("STAANN"<>'D' AND TO_NUMBER("CODLAN")=3)
3 - filter("UTL_MATCH"."EDIT_DISTANCE_SIMILARITY"("NOMCLI"||'
          '||"PRNCLI","LASTNAME"||' '||"FIRSTNAME"||' '||"MIDDLENAME")>=60)

17 rows selected.


Update

This is the explain-plan for the query:

SQL> SET LINESIZE 130
SQL> SET PAGESIZE 0
SQL> SELECT *
2 FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 1462516232

---------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |    74M|  3041M|  2197K  (3)| 07:19:29 |
|   1 |  NESTED LOOPS      |            |    74M|  3041M|  2197K  (3)| 07:19:29 |
|*  2 |   TABLE ACCESS FULL| CLIENT     | 71843 |  2174K|  2386   (2)| 00:00:29 |
|*  3 |   TABLE ACCESS FULL| TMP_CONTROL|  1033 | 12396 |    31   (4)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):

2 - filter("STAANN"<>'D' AND TO_NUMBER("CODLAN")=3)
3 - filter("UTL_MATCH"."EDIT_DISTANCE_SIMILARITY"("NOMCLI"||'
          '||"PRNCLI","LASTNAME"||' '||"FIRSTNAME"||' '||"MIDDLENAME")>=60)

17 rows selected.

edited title
Link
Malachi
  • 29.1k
  • 11
  • 87
  • 188

Optimizing a cross join Comparing client lists with 'where UTL_MATCH.EDIT_DISTANCE_SIMILARITY >='Cross Joins

Undo Rev 3 (edit invalidated an answer); edited tags
Source Link
200_success
  • 145.6k
  • 22
  • 191
  • 481

I've written a query to compare the clients in our database with the people in a list that I've received. It needs to check if anyone from the list is one of our clients. I've created a temporary table which has been filled with the names and the query below makes a cross-join with the two tables. That query works very slowly (two very large tables), so I was wondering if there's any way to speed this up. Would it be faster to just join everything in the database and compare strings in code (java or something similar)?

(The second list was just names, so I can't use any indexed columns in our database.)

select C.NUMCLI,
       C.NAAM,
       T.FULLNAME,
       (UTL_MATCH.EDIT_DISTANCE_SIMILARITY(NAAM, FULLNAME))  
as DIFF
  from (
        select LASTNAME ||' '|| FIRSTNAME ||' '|| MIDDLENAME  as FULLNAME
        from TMP_CONTROL
     TMP_CONTROL) T,
     (
        select NUMCLI,
        NOMCLI ||' '|| PRNCLI as NAAM
        from CLIENT
        where CODLAN = 3
        and STAANN <> 'D'
     ) C 
where (UTL_MATCH.EDIT_DISTANCE_SIMILARITY(NAAM, FULLNAME)) >= 60

NOTE: A large portion of people on the list may have their name written in a slightly different manner, which is why I'm using the EDIT_DISTANCE SIMILARITY >= 60. The goal is just to filter out the large differences, so I can easily compare the smaller ones.

I've written a query to compare the clients in our database with the people in a list that I've received. It needs to check if anyone from the list is one of our clients. I've created a temporary table which has been filled with the names and the query below makes a cross-join with the two tables. That query works very slowly (two very large tables), so I was wondering if there's any way to speed this up. Would it be faster to just join everything in the database and compare strings in code (java or something similar)?

(The second list was just names, so I can't use any indexed columns in our database.)

select C.NUMCLI,
       C.NAAM,
       T.FULLNAME,
       (UTL_MATCH.EDIT_DISTANCE_SIMILARITY(NAAM, FULLNAME)) as DIFF
 from (
        select LASTNAME ||' '|| FIRSTNAME ||' '|| MIDDLENAME  as FULLNAME
        from TMP_CONTROL
     ) T,
     (
        select NUMCLI,
        NOMCLI ||' '|| PRNCLI as NAAM
        from CLIENT
        where CODLAN = 3
        and STAANN <> 'D'
     ) C 
where (UTL_MATCH.EDIT_DISTANCE_SIMILARITY(NAAM, FULLNAME)) >= 60

NOTE: A large portion of people on the list may have their name written in a slightly different manner, which is why I'm using the EDIT_DISTANCE SIMILARITY >= 60. The goal is just to filter out the large differences, so I can easily compare the smaller ones.

I've written a query to compare the clients in our database with the people in a list that I've received. It needs to check if anyone from the list is one of our clients. I've created a temporary table which has been filled with the names and the query below makes a cross-join with the two tables. That query works very slowly (two very large tables), so I was wondering if there's any way to speed this up. Would it be faster to just join everything in the database and compare strings in code (java or something similar)?

(The second list was just names, so I can't use any indexed columns in our database.)

select C.NUMCLI, C.NAAM, T.FULLNAME, (UTL_MATCH.EDIT_DISTANCE_SIMILARITY(NAAM, FULLNAME))  
as DIFF from (select LASTNAME ||' '|| FIRSTNAME ||' '|| MIDDLENAME  as FULLNAME from 
TMP_CONTROL) T, (select NUMCLI, NOMCLI ||' '|| PRNCLI as NAAM from CLIENT
where CODLAN = 3 and STAANN <> 'D') C 
where (UTL_MATCH.EDIT_DISTANCE_SIMILARITY(NAAM, FULLNAME)) >= 60

NOTE: A large portion of people on the list may have their name written in a slightly different manner, which is why I'm using the EDIT_DISTANCE SIMILARITY >= 60. The goal is just to filter out the large differences, so I can easily compare the smaller ones.

Added some more clarification
Source Link
Andreas
  • 213
  • 1
  • 7
Loading
Made SQL more readable
Source Link
Andreas
  • 213
  • 1
  • 7
Loading
Make the title reflect the actual method used.
Link
rolfl
  • 98.2k
  • 17
  • 220
  • 419
Loading
Source Link
Andreas
  • 213
  • 1
  • 7
Loading