Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

6
  • 1
    @Quasímodo null RS doesn't cause the field separator to be a newline, it causes it to include a newline. It still also includes blank and tab (or whatever else the FS is set to if it's a single char). Commented Jan 6, 2021 at 15:09
  • 1
    Oh, that's it. Quoting verbatim: "The newline shall always be a field separator." So the newline being a field separator does not mean it is the only one. What a tricky wording in the specification! Commented Jan 6, 2021 at 15:13
  • 1
    Right and the current version of the POSIX spec is unfortunately wrong where it says a <newline> shall always be a field separator, no matter what the value of FS is but that's not quite true, it should end with ...if FS is a single char as that's how all awks actually berhave. I'm pretty sure I have a bug report open against the spec about that, let me check... Commented Jan 6, 2021 at 15:14
  • 1
    @Quasímodo Ah, now I remember - I had raised the issue with the gawk providers (see lists.gnu.org/archive/html/bug-gawk/2019-04/msg00029.html) and THEY were going to follow up with the standards folks to get it fixed there. Unfortunately at that point I lost interest and didn't pursue the standard change, I expect it is in the queue somewhere. Commented Jan 6, 2021 at 15:22
  • 1
    @Quasímodo I see that functionality and bug in the spec is now specifically addressed in the gawk manual (gnu.org/software/gawk/manual/gawk.html#Multiple-Line): When RS is set to the empty string and FS is set to a single character, the newline character always acts as a field separator. This is in addition to whatever field separations result from FS....Note that language in the POSIX specification implies that this special feature should apply when FS is a regexp. However, Unix awk has never behaved that way, nor has gawk. This is essentially a bug in POSIX. Commented Jan 6, 2021 at 15:53