Skip to main content
Avoid the single quote in the comment so that the single quoted string does not unexpectedly end when the code is used without first stripping out the comments.
Source Link

Use Awk or Perl's paragraph mode to process a file paragraph by paragraph, where paragraphs are separated by blank lines.

awk -vRS= '
  NR!=1 {print ""}      # print blank line before every record but the first
  {                     # do this for every record (i.e. paragraph):
    gsub(" *\n *"," "); # replace newlines by spaces, compressing spaces
    sub(" *$","");      # remove spaces at the end of the paragraph
    print
  }
'
perl -000 -pe '             # for every paragraph:
  print "\n" unless $.==1;  # print a blank line, except before the first paragraph
  s/ *\n *(?!$)/ /g;        # replace newlines by spaces, compressing spaces, but not at the end of the paragraph
  s/ *\n+\z/\n/             # normalize the paragraph's last line end of the paragraph
'

Of course, since this doesn't parse the (La)TeX, it will horribly mutilate comments, verbatim environments and other special-syntax. You may want to look into DeTeX or other (La)TeX-to-text converters.

Use Awk or Perl's paragraph mode to process a file paragraph by paragraph, where paragraphs are separated by blank lines.

awk -vRS= '
  NR!=1 {print ""}      # print blank line before every record but the first
  {                     # do this for every record (i.e. paragraph):
    gsub(" *\n *"," "); # replace newlines by spaces, compressing spaces
    sub(" *$","");      # remove spaces at the end of the paragraph
    print
  }
'
perl -000 -pe '             # for every paragraph:
  print "\n" unless $.==1;  # print a blank line, except before the first paragraph
  s/ *\n *(?!$)/ /g;        # replace newlines by spaces, compressing spaces, but not at the end of the paragraph
  s/ *\n+\z/\n/             # normalize the paragraph's last line end
'

Of course, since this doesn't parse the (La)TeX, it will horribly mutilate comments, verbatim environments and other special-syntax. You may want to look into DeTeX or other (La)TeX-to-text converters.

Use Awk or Perl's paragraph mode to process a file paragraph by paragraph, where paragraphs are separated by blank lines.

awk -vRS= '
  NR!=1 {print ""}      # print blank line before every record but the first
  {                     # do this for every record (i.e. paragraph):
    gsub(" *\n *"," "); # replace newlines by spaces, compressing spaces
    sub(" *$","");      # remove spaces at the end of the paragraph
    print
  }
'
perl -000 -pe '             # for every paragraph:
  print "\n" unless $.==1;  # print a blank line, except before the first paragraph
  s/ *\n *(?!$)/ /g;        # replace newlines by spaces, compressing spaces, but not at the end of the paragraph
  s/ *\n+\z/\n/             # normalize the last line end of the paragraph
'

Of course, since this doesn't parse the (La)TeX, it will horribly mutilate comments, verbatim environments and other special-syntax. You may want to look into DeTeX or other (La)TeX-to-text converters.

Source Link
Gilles 'SO- stop being evil'
  • 865.3k
  • 205
  • 1.8k
  • 2.3k

Use Awk or Perl's paragraph mode to process a file paragraph by paragraph, where paragraphs are separated by blank lines.

awk -vRS= '
  NR!=1 {print ""}      # print blank line before every record but the first
  {                     # do this for every record (i.e. paragraph):
    gsub(" *\n *"," "); # replace newlines by spaces, compressing spaces
    sub(" *$","");      # remove spaces at the end of the paragraph
    print
  }
'
perl -000 -pe '             # for every paragraph:
  print "\n" unless $.==1;  # print a blank line, except before the first paragraph
  s/ *\n *(?!$)/ /g;        # replace newlines by spaces, compressing spaces, but not at the end of the paragraph
  s/ *\n+\z/\n/             # normalize the paragraph's last line end
'

Of course, since this doesn't parse the (La)TeX, it will horribly mutilate comments, verbatim environments and other special-syntax. You may want to look into DeTeX or other (La)TeX-to-text converters.