2

http://sqlfiddle.com/#!9/b98ea/1 (Sample Table)

I have a table with the following fields:

  1. transfer_id
  2. src_path
  3. DH_USER_ID
  4. email
  5. status_state
  6. ip_address

src_path field contains a couple of duplicates filename values but a different folder name at the beginning of the string.

Example:

  1. 191915/NequeVestibulumEget.mp3
  2. /191918/NequeVestibulumEget.mp3
  3. 191920/NequeVestibulumEget.mp3

I am trying to do the following:

  1. Set status_state field to 'canceled' for all the duplicate filenames within (src_path) field except for one.

I want the results to look like this: http://sqlfiddle.com/#!9/5e65f/2

*I apologize in advance for being a complete noob, but I am taking SQL at college and I need help.

13
  • Hi again Rudy. The second row start with a / is that ok or is a typo? Commented Sep 4, 2015 at 17:27
  • possible duplicate of Find duplicate rows with PostgreSQL Commented Sep 4, 2015 at 17:34
  • 1
    Eliminating duplicates is a popular task. You didn't show what you tried so far. Commented Sep 4, 2015 at 17:35
  • I have tried several usuals examples. But since src_path contains a different path at the beginning of the string, I am lost on how to either pass that to a variable or sorted based on the duplicate of the filenames. So far, I have select substring(src_path, '[^\\//]*$') from priority_transfer; to only give me filenames (without the slash) Commented Sep 4, 2015 at 17:38
  • @JakubKania Even when is similar. The duplicated part have to be calculate first before use the normal process. Commented Sep 4, 2015 at 17:38

2 Answers 2

1

SQL Fiddle Demo

  • fix_os_name: Fix the windows path string to unix format.
  • file_name: Split the path using /, and use char_length to bring last split.
  • drank: Create a seq for each filename. So unique filename only have 1, but dup also have 2,3 ...
  • UPDATE: check if that row have rn > 1 mean is a dup.

.

Take note the color highlight is wrong, but code runs ok.

with fix_os_name as (
    SELECT transfer_id, replace(src_path,'\','/') src_path, 
    DH_USER_ID, email, status_state, ip_address
    FROM priority_transfer p
),  
file_name as (
    SELECT 
       fon.*,
       split_part(src_path,
                  '/',
                  char_length(src_path) - char_length(replace(src_path,'/','')) + 1
                 ) sfile
    FROM fix_os_name fon
), 
drank as (
    SELECT 
        f.*,
        row_number() over (partition by sfile order by sfile) rn
    from file_name f
)
UPDATE priority_transfer p
SET status_state = 'canceled'
WHERE EXISTS ( SELECT *
               FROM drank d
               WHERE d.transfer_id = p.transfer_id
               AND  d.rn > 1);

ADD: One row is untouch

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

It works great!!... However, query is updating status_state to 'canceled' for all dups. I need at least one record to leave untouched. Any thoughts? I'll see if I can come up with an answer too.
Please check the picture I add one row is untouched.
Awesome, I fixed the issue. It works flawlessly... Error on my side since both rows had the same transfer_id.
Good this work for you. But please next time try to work a litle more in making the question with all the details.
0

Use the regexp_matches function to separate the file name from the directory. From there you can use distinct() to build a table with unique values for the filename.

select
regexp_matches(src_path, '[a-zA-Z.0-9]*$') , *
from priority_transfer
;

7 Comments

You should try your query in the fiddle. As you can see there is window and unix folders.
You are correct @JuanCarlosOropeza, I didn't see this before. Modified the query, so that it gets all the file names now, regardless of win or unix folders... thnx!
Sorry pat still doesnt work. You need remove the folder portion and leave the remaing filename after '\' or '/' sometimes doesnt have folder. Let me know if you can work a regexp, i have to convert all '\' to '/' and then split by '/'
How about select regexp_matches(src_path, '[^\\//]*$') from priority_transfer; ?
select substring(src_path, '[^\\//]*$') from priority_transfer; ....Only gives me filenames after either / or \
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.