added explain plan

Source Link

edited Mar 13, 2014 at 9:01

213
1
7

Update

This is the explain-plan for the query:

SQL> SET LINESIZE 130
SQL> SET PAGESIZE 0
SQL> SELECT *
2 FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 1462516232

---------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |    74M|  3041M|  2197K  (3)| 07:19:29 |
|   1 |  NESTED LOOPS      |            |    74M|  3041M|  2197K  (3)| 07:19:29 |
|*  2 |   TABLE ACCESS FULL| CLIENT     | 71843 |  2174K|  2386   (2)| 00:00:29 |
|*  3 |   TABLE ACCESS FULL| TMP_CONTROL|  1033 | 12396 |    31   (4)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):

2 - filter("STAANN"<>'D' AND TO_NUMBER("CODLAN")=3)
3 - filter("UTL_MATCH"."EDIT_DISTANCE_SIMILARITY"("NOMCLI"||'
          '||"PRNCLI","LASTNAME"||' '||"FIRSTNAME"||' '||"MIDDLENAME")>=60)

17 rows selected.

Update

This is the explain-plan for the query:

SQL> SET LINESIZE 130
SQL> SET PAGESIZE 0
SQL> SELECT *
2 FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 1462516232

---------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |    74M|  3041M|  2197K  (3)| 07:19:29 |
|   1 |  NESTED LOOPS      |            |    74M|  3041M|  2197K  (3)| 07:19:29 |
|*  2 |   TABLE ACCESS FULL| CLIENT     | 71843 |  2174K|  2386   (2)| 00:00:29 |
|*  3 |   TABLE ACCESS FULL| TMP_CONTROL|  1033 | 12396 |    31   (4)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):

2 - filter("STAANN"<>'D' AND TO_NUMBER("CODLAN")=3)
3 - filter("UTL_MATCH"."EDIT_DISTANCE_SIMILARITY"("NOMCLI"||'
          '||"PRNCLI","LASTNAME"||' '||"FIRSTNAME"||' '||"MIDDLENAME")>=60)

17 rows selected.

edited title

Link

edited Mar 13, 2014 at 5:04

Malachi

29.1k
11
87
188

Optimizing a cross join Comparing client lists with 'where UTL_MATCH.EDIT_DISTANCE_SIMILARITY >='Cross Joins

Undo Rev 3 (edit invalidated an answer); edited tags

Source Link

edited Mar 11, 2014 at 22:13

200_success

145.6k
22
191
481

I've written a query to compare the clients in our database with the people in a list that I've received. It needs to check if anyone from the list is one of our clients. I've created a temporary table which has been filled with the names and the query below makes a cross-join with the two tables. That query works very slowly (two very large tables), so I was wondering if there's any way to speed this up. Would it be faster to just join everything in the database and compare strings in code (java or something similar)?

(The second list was just names, so I can't use any indexed columns in our database.)

select C.NUMCLI,
       C.NAAM,
       T.FULLNAME,
       (UTL_MATCH.EDIT_DISTANCE_SIMILARITY(NAAM, FULLNAME))  
as DIFF
  from (
        select LASTNAME ||' '|| FIRSTNAME ||' '|| MIDDLENAME  as FULLNAME
        from TMP_CONTROL
     TMP_CONTROL) T,
     (
        select NUMCLI,
        NOMCLI ||' '|| PRNCLI as NAAM
        from CLIENT
        where CODLAN = 3
        and STAANN <> 'D'
     ) C 
where (UTL_MATCH.EDIT_DISTANCE_SIMILARITY(NAAM, FULLNAME)) >= 60

NOTE: A large portion of people on the list may have their name written in a slightly different manner, which is why I'm using the EDIT_DISTANCE SIMILARITY >= 60. The goal is just to filter out the large differences, so I can easily compare the smaller ones.

I've written a query to compare the clients in our database with the people in a list that I've received. It needs to check if anyone from the list is one of our clients. I've created a temporary table which has been filled with the names and the query below makes a cross-join with the two tables. That query works very slowly (two very large tables), so I was wondering if there's any way to speed this up. Would it be faster to just join everything in the database and compare strings in code (java or something similar)?

(The second list was just names, so I can't use any indexed columns in our database.)

select C.NUMCLI,
       C.NAAM,
       T.FULLNAME,
       (UTL_MATCH.EDIT_DISTANCE_SIMILARITY(NAAM, FULLNAME)) as DIFF
 from (
        select LASTNAME ||' '|| FIRSTNAME ||' '|| MIDDLENAME  as FULLNAME
        from TMP_CONTROL
     ) T,
     (
        select NUMCLI,
        NOMCLI ||' '|| PRNCLI as NAAM
        from CLIENT
        where CODLAN = 3
        and STAANN <> 'D'
     ) C 
where (UTL_MATCH.EDIT_DISTANCE_SIMILARITY(NAAM, FULLNAME)) >= 60

NOTE: A large portion of people on the list may have their name written in a slightly different manner, which is why I'm using the EDIT_DISTANCE SIMILARITY >= 60. The goal is just to filter out the large differences, so I can easily compare the smaller ones.

I've written a query to compare the clients in our database with the people in a list that I've received. It needs to check if anyone from the list is one of our clients. I've created a temporary table which has been filled with the names and the query below makes a cross-join with the two tables. That query works very slowly (two very large tables), so I was wondering if there's any way to speed this up. Would it be faster to just join everything in the database and compare strings in code (java or something similar)?

(The second list was just names, so I can't use any indexed columns in our database.)

select C.NUMCLI, C.NAAM, T.FULLNAME, (UTL_MATCH.EDIT_DISTANCE_SIMILARITY(NAAM, FULLNAME))  
as DIFF from (select LASTNAME ||' '|| FIRSTNAME ||' '|| MIDDLENAME  as FULLNAME from 
TMP_CONTROL) T, (select NUMCLI, NOMCLI ||' '|| PRNCLI as NAAM from CLIENT
where CODLAN = 3 and STAANN <> 'D') C 
where (UTL_MATCH.EDIT_DISTANCE_SIMILARITY(NAAM, FULLNAME)) >= 60

NOTE: A large portion of people on the list may have their name written in a slightly different manner, which is why I'm using the EDIT_DISTANCE SIMILARITY >= 60. The goal is just to filter out the large differences, so I can easily compare the smaller ones.

Added some more clarification

Source Link

edited Mar 11, 2014 at 14:04

Andreas

213
1
7

Loading

Made SQL more readable

Source Link

edited Mar 11, 2014 at 13:53

Andreas

213
1
7

Loading

Make the title reflect the actual method used.

Link

edited Mar 11, 2014 at 11:48

rolfl

98.2k
17
220
419

Loading

Source Link

asked Mar 11, 2014 at 11:07

Andreas

213
1
7

Loading

Stack Exchange Network

Return to Question

Update

Predicate Information (identified by operation id):

Update

Predicate Information (identified by operation id):

Optimizing a cross join Comparing client lists with 'where UTL_MATCH.EDIT_DISTANCE_SIMILARITY >='Cross Joins