I have a PDB file (coordinates of atoms in a protein) on a Linux machine:
ATOM 1 N GLY A 1 0.535 51.766 5.682 1.00 0.00
ATOM 2 CA GLY A 1 -0.712 50.962 5.596 1.00 0.00
ATOM 3 C GLY A 1 -1.243 50.872 4.179 1.00 0.00
ATOM 4 O GLY A 1 -1.313 51.888 3.492 1.00 0.00
ATOM 5 N GLN A 2 -1.600 49.664 3.737 1.00 0.00
ATOM 6 CA GLN A 2 -2.221 49.468 2.423 1.00 0.00
ATOM 7 C GLN A 2 -3.542 48.719 2.507 1.00 0.00
ATOM 8 O GLN A 2 -3.722 47.844 3.356 1.00 0.00
ATOM 9 CB GLN A 2 -1.280 48.738 1.468 1.00 0.00
ATOM 10 CG GLN A 2 -0.976 47.294 1.830 1.00 0.00
.... .. .. .. . . .... .... .... .... ....
TER SPLIT LINE FOR INTERNAL USE ONLY
ATOM 1 O5' G A 1 -44.412 97.503 31.177 1.00 0.00
ATOM 2 C5' G A 1 -45.447 96.803 31.882 1.00 0.00
ATOM 3 C4' G A 1 -45.225 95.295 31.894 1.00 0.00
ATOM 4 O4' G A 1 -46.441 94.578 31.654 1.00 0.00
ATOM 5 C3' G A 1 -44.328 94.850 30.748 1.00 0.00
ATOM 6 O3' G A 1 -42.943 94.877 31.129 1.00 0.00
ATOM 7 C2' G A 1 -44.804 93.425 30.542 1.00 0.00
ATOM 8 O2' G A 1 -44.163 92.592 31.466 1.00 0.00
ATOM 9 C1' G A 1 -46.304 93.444 30.772 1.00 0.00
ATOM 10 N9 G A 1 -46.965 93.699 29.495 1.00 0.00
.... .. .. . . . ....... ...... ..... .... ...
The TER record explicitly marks the end of a particular amino acid chain. I want to change the chain ID of the protein at the 5th column by awk to assign the correct ID to the new chain after TER.
Expected Output:
ATOM 1 N GLY A 1 0.535 51.766 5.682 1.00 0.00
ATOM 2 CA GLY A 1 -0.712 50.962 5.596 1.00 0.00
ATOM 3 C GLY A 1 -1.243 50.872 4.179 1.00 0.00
ATOM 4 O GLY A 1 -1.313 51.888 3.492 1.00 0.00
ATOM 5 N GLN A 2 -1.600 49.664 3.737 1.00 0.00
ATOM 6 CA GLN A 2 -2.221 49.468 2.423 1.00 0.00
ATOM 7 C GLN A 2 -3.542 48.719 2.507 1.00 0.00
ATOM 8 O GLN A 2 -3.722 47.844 3.356 1.00 0.00
ATOM 9 CB GLN A 2 -1.280 48.738 1.468 1.00 0.00
ATOM 10 CG GLN A 2 -0.976 47.294 1.830 1.00 0.00
TER SPLIT LINE FOR INTERNAL USE ONLY
ATOM 1 O5' G B 1 -44.412 97.503 31.177 1.00 0.00
ATOM 2 C5' G B 1 -45.447 96.803 31.882 1.00 0.00
ATOM 3 C4' G B 1 -45.225 95.295 31.894 1.00 0.00
ATOM 4 O4' G B 1 -46.441 94.578 31.654 1.00 0.00
ATOM 5 C3' G B 1 -44.328 94.850 30.748 1.00 0.00
ATOM 6 O3' G B 1 -42.943 94.877 31.129 1.00 0.00
ATOM 7 C2' G B 1 -44.804 93.425 30.542 1.00 0.00
ATOM 8 O2' G B 1 -44.163 92.592 31.466 1.00 0.00
ATOM 9 C1' G B 1 -46.304 93.444 30.772 1.00 0.00
ATOM 10 N9 G B 1 -46.965 93.699 29.495 1.00 0.00
Everything needs to be separated with the same spaces, this following arrangement would be wrong:
ATOM 3674 CD1 PHE A 460 2.350 79.471 35.466 1.00 0.00
ATOM 3675 CD2 PHE A 460 1.037 81.443 35.196 1.00 0.00
ATOM 3676 CE1 PHE A 460 2.425 79.321 34.080 1.00 0.00
ATOM 3677 CE2 PHE A 460 1.108 81.298 33.805 1.00 0.00
ATOM 3678 CZ PHE A 460 1.805 80.232 33.250 1.00 0.00
TER SPLIT LINE FOR B USE ONLY
ATOM 1 O5' G B 1 -44.412 97.503 31.177 1.00 0.00
ATOM 2 C5' G B 1 -45.447 96.803 31.882 1.00 0.00
ATOM 3 C4' G B 1 -45.225 95.295 31.894 1.00 0.00
ATOM 4 O4' G B 1 -46.441 94.578 31.654 1.00 0.00
ATOM 5 C3' G B 1 -44.328 94.850 30.748 1.00 0.00
In addition, the file ends with this:
TER
ENDMDL
There is a blank line at the end of the file which needs to be left as it is
Ato in the 5th column? ToB? Please be explicit. Why are some changed and others not? For example, I seeATOM 3674 CD1 PHE A 460in your output. Why is thatAand notB? And is there oneTERor many? Which one should we use?sed -n '/\t/p' 9k38.pdb | wc -lreturns 0. I think you're just used to seeing the columns aligned, but you likely don't have tabs in your actual file.