multi-line string substitution only if group matches

Question

How can I perform a ~~one-liner~~ single statement substitution on a multi-line string only if a group matches a pattern?

I need to quote "values" (of a YAML-like document) if they contain : or equal -. Consider this (non-working) code below:

$data =~ s/^(\s*\S+): (.+)$/$1: '$2'/mg if $2 =~ /:/ || $2 =~ /^\-$/;

sample input text string

    data:
        normal: text
        timestamp: Wed Aug 23 07:07:07 2023
        time-zone: UTC +03:00, Daylight Saving: +0h
        type: -
        duration: 45h 8m 41s

desired output

    data:
        normal: text
        timestamp: 'Wed Aug 23 07:07:07 2023'
        time-zone: 'UTC +03:00, Daylight Saving: +0h'
        type: '-'
        duration: 45h 8m 41s

working code - that I would like to replace with a more elegant form

my @lines = split "\n", $data;
foreach my $i (0 .. $#lines) {
  my $line = $lines[$i];
  if ($line =~ /^(\s*\S+): (.+)$/) {
    my $key = $1;
    my $val = $2;
    $lines[$i] = "$key: '$val'" if $val =~ /:/ || $val =~ /^\-$/; # quote invalids
  }
}
$data = join "\n", @lines;
say $data;

Can the pattern of interest ever be spread over lines, or is it always fully contained on one line? (In other words, why do you emphasize that the substitution be done on a "multi-line" string?) — zdim, Commented Sep 3, 2023 at 8:06
I think I'm using the wrong terminology. My sample input is a single string with line breaks \r?\n (not a file) — h q, Commented Sep 3, 2023 at 11:08

zdim · Accepted Answer · 2023-09-03 20:21:07Z

 perl -wnlE'say s/^.+?:\s\K (.*[:-].*)/\x27$1\x27/rx' file.txt

The \K drops all previous matches (from $&) so they stay in the string and we don't have to capture them and put them back. The \x27 is for a single quote, used in the question.

The /r modifier has the substitution return the changed string (or the original if the pattern didn't match), which is then printed; the original isn't changed. See modifiers in perlre. The output can be redirected into a file,

perl -wnlE'...' file.txt > out.txt

or the input file can be changed in-place with the -i switch

perl -i.bak -wnlE'...' file.txt

The .bak part makes it also save a backup with that extension. See switches in perlrun.

This assumes that the patterns of interest are always contained within one line.

Not sure whether one would call it "elegant" ...

As indicated in the question, and clarified in a comment, the input is a multiline string in a program, not a file. In order to process the whole string at once the regex above needs one change and different modifiers

use warnings;
use strict;
use feature 'say';

my $data = <<'EOF';
data:
        normal: text
        timestamp: Wed Aug 23 07:07:07 2023
        time-zone: UTC +03:00, Daylight Saving: +0h
        type: -
        duration: 45h 8m 41s
EOF

$data =~ s/^.+?:[\t ]+\K (.*[:-].*) $/'$1'/gmx;

say $data;

Now we need a literal space instead of \s (after the first :) since \s matches a newline as well and that misfires on the very first line (data: without anything following), making it search down the next line. With a literal space it can't match the newline and will abandon that first line and start the matching again from the next ^. I like a character class for a literal space ([ ]) for clarity, and then also add a tab, so [\t ].

Here we also need to limit the pattern to a line, otherwise the greedy .* would slurp up far more. With /m modifier the $ (and ^) apply to the lines inside the string (without the modifier they anchor only the whole string, not lines inside). Here we also need /g to keep going through the string, making changes.

Inside of a program the single quotes ' are not a problem as they are on the command line so now we don't need to use hex for them.

I use here doc to introduce the multiline string, with single quotes as we clearly want literal text.

Or, to avoid those subtleties and still process line by line, break the string into lines, run regex on each, then reassemble by joining with newlines (if needed)

$data = join "\n",
    map { s/^.+?:\s\K (.*[:-].*)$/'$1'/xr } 
    split /\n+/, $data;

This eliminates possible empty lines (none in shown sample data) since I split on all consecutive \n (with +). If that's undesirable use split /\n/ (no +) and empty lines stay.

If this need not be reassembled into a multiline string -- or you'll need individual lines anyway -- then assign to an array (instead of join-ing and assigning back to $data).

Now we again need /r modifier so that the block in map returns the (changed or original) string, but not /g (nor /m).

Elegant indeed :-) Though I can't get it to work inside a script $data =~ s/^.+?:\s\K (.*[:-].*)/\x27$1\x27/rx; I get: Useless use of non-destructive substitution (s///r) in void context at ./test.pl line XX. — h q, Commented Sep 3, 2023 at 10:16
@hq: The /r option on s/// changes its behaviour so it doesn't change the bound string but instead returns a changed version of the string. Doing that in void context makes no sense. Either assign the result to a variable (my $new_data = $data =~ s/.../.../r) or remove the /r. — Dave Cross, Commented Sep 3, 2023 at 11:00
Thanks again @DaveCross. I'm still unable to run it inside my script: $data =~ s/^.+?:\s\K (.*[:-].*)/\x27$1\x27/x; doesn't produce the desired result. — h q, Commented Sep 3, 2023 at 11:05
@hq "doesn't produce the desired result" -- OK, that's because the code runs on the whole multiline string (in $data) while it's meant to go line by line. Will fix as a I get a minute — zdim, Commented Sep 3, 2023 at 19:01
@hq Added to the end, should now work with a multiline string in a program. Also edited a little elsewhere for clarity (hopefully :) — zdim, Commented Sep 3, 2023 at 20:14

Collectives™ on Stack Overflow

multi-line string substitution only if group matches

1 Answer 1

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Related