I have a csv file in which I have 2 columns (1st column is a number and 2nd is a sequence of characters) like this small example:
45373,VAREAKAVVLRDRKSTRLN
1678,SCTFAEGMLFEDCCGP
1524,SCPHFWLAECGP
20,SCTFAEGMLFEDCCGP
I want to look for the similar sequences of characters in the 2nd field and if they are similar I would sum up the corresponding 1st column for the similar rows. for instance for the above example the 2nd and the 4th rows have similar sequences of characters (2nd field) and in the below expected output I summed up the 1st field of those rows and put them in the single row:
expected output:
45373,VAREAKAVVLRDRKSTRLN
1698,SCTFAEGMLFEDCCGP
1524,SCPHFWLAECGP
to get the expected output I made the following command in AWK:
awk -F "," ({for(i=2;i<NF;i++){t+=$i==$(i+1)}}t==NF)) { print } }' infile.csv > outfile.csv
but it does not return the expected output. do you know how to fix it?