Skip to main content
added 532 characters in body
Source Link
guest_7
  • 5.8k
  • 1
  • 8
  • 13

We will form a set s2 out of the comma-separated elements of the file B.txt

Then for each line of A.tsv we will convert the second field into a set and subtract the s2 set from it. This gets us the mutations present in A.tsv not found in B.txt. Then we join the resulting elements and print it along with the original line.

python3 -c 'import sys
tsv,txt = sys.argv[1:]
fs,rs = "\t","\n"
ofs,dlm = fs,","

with open(txt) as fh, open(tsv) as f:
  s2 = set(*list(map(lambda x:x.rstrip(rs).split(dlm),fh.readlines())))

  for nr,ln in enumerate(f,1):
    l = ln.rstrip(rs)
    if nr == 1: print(l,"mutation_not",sep=ofs)
    else:
      F = l.split(ofs)
      if len(F) < 2: print(l)
      else: print(l,
  dlm.join({*F[1].split(dlm)}-s2),sep=ofs)

' A.tsv B.txt

Result:

id  mutation    mutation_not
243 siti,toto,mumu  toto
254
267 lala,siti,sojo  sojo
289 lala    

This time we will use the Gnu sed editor to get the results:

sed -Ee '
  1{h;d;}
  2s/\tmutation$/&&_not/;t

  s/\t\S+$/&&,/;T;G
  s/\t/\n/2;ta

  :a
  s/\n([^,]+),(.*\n(.*,)?\1(,|$))/\n\2/;ta
  s/\n([^,\n]+),/\t\1\n/;ta

  s/\n.*//
' B.txt A.tsv

Idea being that Btxt file is stored in hold (assuming it us one line) and each line of A.tsv is appended by the B.txt contents and the mutations are ticked off that are found in B.txt. After all mutations have been looked at the line is printed.

We will form a set s2 out of the comma-separated elements of the file B.txt

Then for each line of A.tsv we will convert the second field into a set and subtract the s2 set from it. This gets us the mutations present in A.tsv not found in B.txt. Then we join the resulting elements and print it along with the original line.

python3 -c 'import sys
tsv,txt = sys.argv[1:]
fs,rs = "\t","\n"
ofs,dlm = fs,","

with open(txt) as fh, open(tsv) as f:
  s2 = set(*list(map(lambda x:x.rstrip(rs).split(dlm),fh.readlines())))

  for nr,ln in enumerate(f,1):
    l = ln.rstrip(rs)
    if nr == 1: print(l,"mutation_not",sep=ofs)
    else:
      F = l.split(ofs)
      if len(F) < 2: print(l)
      else: print(l,
  dlm.join({*F[1].split(dlm)}-s2),sep=ofs)

' A.tsv B.txt

Result:

id  mutation    mutation_not
243 siti,toto,mumu  toto
254
267 lala,siti,sojo  sojo
289 lala    

We will form a set s2 out of the comma-separated elements of the file B.txt

Then for each line of A.tsv we will convert the second field into a set and subtract the s2 set from it. This gets us the mutations present in A.tsv not found in B.txt. Then we join the resulting elements and print it along with the original line.

python3 -c 'import sys
tsv,txt = sys.argv[1:]
fs,rs = "\t","\n"
ofs,dlm = fs,","

with open(txt) as fh, open(tsv) as f:
  s2 = set(*list(map(lambda x:x.rstrip(rs).split(dlm),fh.readlines())))

  for nr,ln in enumerate(f,1):
    l = ln.rstrip(rs)
    if nr == 1: print(l,"mutation_not",sep=ofs)
    else:
      F = l.split(ofs)
      if len(F) < 2: print(l)
      else: print(l,
  dlm.join({*F[1].split(dlm)}-s2),sep=ofs)

' A.tsv B.txt

Result:

id  mutation    mutation_not
243 siti,toto,mumu  toto
254
267 lala,siti,sojo  sojo
289 lala    

This time we will use the Gnu sed editor to get the results:

sed -Ee '
  1{h;d;}
  2s/\tmutation$/&&_not/;t

  s/\t\S+$/&&,/;T;G
  s/\t/\n/2;ta

  :a
  s/\n([^,]+),(.*\n(.*,)?\1(,|$))/\n\2/;ta
  s/\n([^,\n]+),/\t\1\n/;ta

  s/\n.*//
' B.txt A.tsv

Idea being that Btxt file is stored in hold (assuming it us one line) and each line of A.tsv is appended by the B.txt contents and the mutations are ticked off that are found in B.txt. After all mutations have been looked at the line is printed.

Source Link
guest_7
  • 5.8k
  • 1
  • 8
  • 13

We will form a set s2 out of the comma-separated elements of the file B.txt

Then for each line of A.tsv we will convert the second field into a set and subtract the s2 set from it. This gets us the mutations present in A.tsv not found in B.txt. Then we join the resulting elements and print it along with the original line.

python3 -c 'import sys
tsv,txt = sys.argv[1:]
fs,rs = "\t","\n"
ofs,dlm = fs,","

with open(txt) as fh, open(tsv) as f:
  s2 = set(*list(map(lambda x:x.rstrip(rs).split(dlm),fh.readlines())))

  for nr,ln in enumerate(f,1):
    l = ln.rstrip(rs)
    if nr == 1: print(l,"mutation_not",sep=ofs)
    else:
      F = l.split(ofs)
      if len(F) < 2: print(l)
      else: print(l,
  dlm.join({*F[1].split(dlm)}-s2),sep=ofs)

' A.tsv B.txt

Result:

id  mutation    mutation_not
243 siti,toto,mumu  toto
254
267 lala,siti,sojo  sojo
289 lala