Skip to main content
added 510 characters in body
Source Link
Prabhjot Singh
  • 2.4k
  • 1
  • 6
  • 20

Using awkgawk:

awk 'BEGIN{OFS="\t"; }
NR==FNR{ar[$1]=$1;next}
FNR==1{$(NF+1) = "mutation_not"}
FNR>1{split($2,a,","); 
for(i in a) if (a[i] in ar) ; 
else ncol[$1] = (ncol[$1])? ncol[$1] "," a[i] : a[i]; 
$(NF+1) = ncol[$1]}1' 
RS="," B.txt  RS="\n" FS="\t" A.tsv

Assuming all fields are separated by comma and have only one line, Record Separator(RS) is set to comma for file B.txt.

NR==FNR{ar[$1]=$1;next creates an array ar indexed on first field of first file.

FNR==1{$(NF+1) = "mutation_not" creates one more column in header name.

FNR>1{split($2,a,",") splits second field of A.tsv to an array a.

Next entry not present in B.txt is saved to ncol array. $(NF+1) = ncol[$1] creates one more column with elements of array ncol.

Using awk:

awk 'BEGIN{OFS="\t"; }
NR==FNR{ar[$1]=$1;next}
FNR==1{$(NF+1) = "mutation_not"}
FNR>1{split($2,a,","); 
for(i in a) if (a[i] in ar) ; 
else ncol[$1] = (ncol[$1])? ncol[$1] "," a[i] : a[i]; 
$(NF+1) = ncol[$1]}1' 
RS="," B.txt  RS="\n" FS="\t" A.tsv

Using gawk:

awk 'BEGIN{OFS="\t"; }
NR==FNR{ar[$1]=$1;next}
FNR==1{$(NF+1) = "mutation_not"}
FNR>1{split($2,a,","); 
for(i in a) if (a[i] in ar) ; 
else ncol[$1] = (ncol[$1])? ncol[$1] "," a[i] : a[i]; 
$(NF+1) = ncol[$1]}1' 
RS="," B.txt  RS="\n" FS="\t" A.tsv

Assuming all fields are separated by comma and have only one line, Record Separator(RS) is set to comma for file B.txt.

NR==FNR{ar[$1]=$1;next creates an array ar indexed on first field of first file.

FNR==1{$(NF+1) = "mutation_not" creates one more column in header name.

FNR>1{split($2,a,",") splits second field of A.tsv to an array a.

Next entry not present in B.txt is saved to ncol array. $(NF+1) = ncol[$1] creates one more column with elements of array ncol.

Source Link
Prabhjot Singh
  • 2.4k
  • 1
  • 6
  • 20

Using awk:

awk 'BEGIN{OFS="\t"; }
NR==FNR{ar[$1]=$1;next}
FNR==1{$(NF+1) = "mutation_not"}
FNR>1{split($2,a,","); 
for(i in a) if (a[i] in ar) ; 
else ncol[$1] = (ncol[$1])? ncol[$1] "," a[i] : a[i]; 
$(NF+1) = ncol[$1]}1' 
RS="," B.txt  RS="\n" FS="\t" A.tsv