We will form a set s2 out of the comma-separated elements of the file B.txt
Then for each line of A.tsv we will convert the second field into a set and subtract the s2 set from it. This gets us the mutations present in A.tsv not found in B.txt. Then we join the resulting elements and print it along with the original line.
python3 -c 'import sys
tsv,txt = sys.argv[1:]
fs,rs = "\t","\n"
ofs,dlm = fs,","
with open(txt) as fh, open(tsv) as f:
s2 = set(*list(map(lambda x:x.rstrip(rs).split(dlm),fh.readlines())))
for nr,ln in enumerate(f,1):
l = ln.rstrip(rs)
if nr == 1: print(l,"mutation_not",sep=ofs)
else:
F = l.split(ofs)
if len(F) < 2: print(l)
else: print(l,
dlm.join({*F[1].split(dlm)}-s2),sep=ofs)
' A.tsv B.txt
Result:
id mutation mutation_not
243 siti,toto,mumu toto
254
267 lala,siti,sojo sojo
289 lala
This time we will use the Gnu sed editor to get the results:
sed -Ee '
1{h;d;}
2s/\tmutation$/&&_not/;t
s/\t\S+$/&&,/;T;G
s/\t/\n/2;ta
:a
s/\n([^,]+),(.*\n(.*,)?\1(,|$))/\n\2/;ta
s/\n([^,\n]+),/\t\1\n/;ta
s/\n.*//
' B.txt A.tsv
Idea being that Btxt file is stored in hold (assuming it us one line) and each line of A.tsv is appended by the B.txt contents and the mutations are ticked off that are found in B.txt. After all mutations have been looked at the line is printed.