Skip to main content
optimised by pre-compiling the regexps
Source Link
cas
  • 83.9k
  • 8
  • 136
  • 205
$ cat bissi.pl 
#! /usr/bin/perl

use strict;

# optimisation: use qr// for the search patterns so that
# the hash keys are pre-compiled regular expressions.
# this makes the for loop later MUCH faster if there are
# lots of patterns and lots of input lines to process. 
my %patterns = (
    '0qr/0-4 years low risk'risk/        => 'p1',
    '0qr/0-4 years high risk'risk/       => 'p2',
    
    '65\+qr/65\+ years low risk'risk/       => 'p19',
    '65\+qr/65\+ years pregnant women'women/ => 'p20',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}
$ cat bissi.pl 
#! /usr/bin/perl

use strict;

my %patterns = (
    '0-4 years low risk'        => 'p1',
    '0-4 years high risk'       => 'p2',
    
    '65\+ years low risk'       => 'p19',
    '65\+ years pregnant women' => 'p20',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}
$ cat bissi.pl 
#! /usr/bin/perl

use strict;

# optimisation: use qr// for the search patterns so that
# the hash keys are pre-compiled regular expressions.
# this makes the for loop later MUCH faster if there are
# lots of patterns and lots of input lines to process. 
my %patterns = (
    qr/0-4 years low risk/        => 'p1',
    qr/0-4 years high risk/       => 'p2',
    
    qr/65\+ years low risk/       => 'p19',
    qr/65\+ years pregnant women/ => 'p20',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}
updated sample output to match updated sample input and dictionary
Source Link
cas
  • 83.9k
  • 8
  • 136
  • 205

Here's a perl script that does the job.

You can add more patterns and replacements to the %patterns hash as required. Don't forget the comma at the end of each line.

Note that the patterns are interpreted as regular expressions, not as literal strings. So if your patterns have any regexp-special characters (like *, (, ), ?, +, etc), they need to be escaped with \ (e.g. \*, \(, \), \?, \+).

The script changes the output slightly, in that it joins all the fields with ,\t (comma and a single tab) where your original input had multiple spaces. If that's significant, you can tweak that print statement to produce the same or similar output (e.g. by using printf rather than print join())

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

my %patterns = (
    '0-4 years low risk'        => 'p1',
    '0-4 years high risk'       => 'p2',
    
    '65\+ years low risk'       => 'p19',
    '65\+ years pregnant women' => 'p20',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}
 

That produces the following ouput:

$ ./bissi.pl input.txt 
t,  group,  1,  3,  5
0,  p1, 0,  0,  1
0,  p2, 0,  0,  0
0,  p1, 0,  0,  0

To convert all 150 of your files, you'd wrap that in a shell for loop something like this:

mkdir -p new
for i in {1..150} ; do
    ./bissi.pl "scenario$i.csv" > "new/scenario$i.csv"
done

Here's a perl script that does the job.

You can add more patterns and replacements to the %patterns hash as required. Don't forget the comma at the end of each line.

Note that the patterns are interpreted as regular expressions, not as literal strings. So if your patterns have any regexp-special characters (like *, (, ), ?, +, etc), they need to be escaped with \ (e.g. \*, \(, \), \?, \+).

The script changes the output slightly, in that it joins all the fields with ,\t (comma and a single tab) where your original input had multiple spaces. If that's significant, you can tweak that print statement to produce the same or similar output (e.g. by using printf rather than print join())

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

my %patterns = (
    '0-4 years low risk' => 'p1',
    '0-4 years high risk' => 'p2',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}
 
$ ./bissi.pl input.txt 
t,  group,  1,  3,  5
0,  p1, 0,  0,  1
0,  p2, 0,  0,  0
0,  p1, 0,  0,  0

To convert all 150 of your files, you'd wrap that in a shell for loop something like this:

mkdir -p new
for i in {1..150} ; do
    ./bissi.pl "scenario$i.csv" > "new/scenario$i.csv"
done

Here's a perl script that does the job.

You can add more patterns and replacements to the %patterns hash as required. Don't forget the comma at the end of each line.

Note that the patterns are interpreted as regular expressions, not as literal strings. So if your patterns have any regexp-special characters (like *, (, ), ?, +, etc), they need to be escaped with \ (e.g. \*, \(, \), \?, \+).

The script changes the output slightly, in that it joins all the fields with ,\t (comma and a single tab) where your original input had multiple spaces. If that's significant, you can tweak that print statement to produce the same or similar output (e.g. by using printf rather than print join())

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

my %patterns = (
    '0-4 years low risk'        => 'p1',
    '0-4 years high risk'       => 'p2',
    
    '65\+ years low risk'       => 'p19',
    '65\+ years pregnant women' => 'p20',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}

That produces the following ouput:

$ ./bissi.pl input.txt 
t,  group,  1,  3,  5
0,  p1, 0,  0,  1
0,  p2, 0,  0,  0
0,  p1, 0,  0,  0

To convert all 150 of your files, you'd wrap that in a shell for loop something like this:

mkdir -p new
for i in {1..150} ; do
    ./bissi.pl "scenario$i.csv" > "new/scenario$i.csv"
done
updated sample output to match updated sample input
Source Link
cas
  • 83.9k
  • 8
  • 136
  • 205

Here's a perl script that does the job.

You can add more patterns and replacements to the %patterns hash as required. Don't forget the comma at the end of each line.

Note that the patterns are interpreted as regular expressions, not as literal strings. So if your patterns have any regexp-special characters (like *, (, ), ?, +, etc), they need to be escaped with \ (e.g. \*, \(, \), \?, \+).

The script changes the output slightly, in that it joins all the fields with ,\t (comma and a single tab) where your original input had multiple spaces. If that's significant, you can tweak that print statement to produce the same or similar output (e.g. by using printf rather than print join())

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

my %patterns = (
    '0-4 years low risk' => 'p1',
    '0-4 years high risk' => 'p2',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}

$ ./bissi.pl input.txt 
t,  group,  1,  3,  5
0,  p1, 0,  0,  01
0,  p2, 0,  0,  0
0,  p1, 0,  0,  0

To convert all 150 of your files, you'd wrap that in a shell for loop something like this:

mkdir -p new
for i in {1..150} ; do
    ./bissi.pl "scenario$i.csv" > "new/scenario$i.csv"
done

Here's a perl script that does the job.

You can add more patterns and replacements to the %patterns hash as required. Don't forget the comma at the end of each line.

Note that the patterns are interpreted as regular expressions, not as literal strings. So if your patterns have any regexp-special characters (like *, (, ), ? etc), they need to be escaped with \ (e.g. \*, \(, \), \?)

The script changes the output slightly, in that it joins all the fields with ,\t (comma and a single tab) where your original input had multiple spaces. If that's significant, you can tweak that print statement to produce the same or similar output (e.g. by using printf rather than print join())

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

my %patterns = (
    '0-4 years low risk' => 'p1',
    '0-4 years high risk' => 'p2',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}

$ ./bissi.pl input.txt 
t,  group,  1,  3,  5
0,  p1, 0,  0,  0
0,  p2, 0,  0,  0

To convert all 150 of your files, you'd wrap that in a shell for loop something like this:

mkdir -p new
for i in {1..150} ; do
    ./bissi.pl "scenario$i.csv" > "new/scenario$i.csv"
done

Here's a perl script that does the job.

You can add more patterns and replacements to the %patterns hash as required. Don't forget the comma at the end of each line.

Note that the patterns are interpreted as regular expressions, not as literal strings. So if your patterns have any regexp-special characters (like *, (, ), ?, +, etc), they need to be escaped with \ (e.g. \*, \(, \), \?, \+).

The script changes the output slightly, in that it joins all the fields with ,\t (comma and a single tab) where your original input had multiple spaces. If that's significant, you can tweak that print statement to produce the same or similar output (e.g. by using printf rather than print join())

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

my %patterns = (
    '0-4 years low risk' => 'p1',
    '0-4 years high risk' => 'p2',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}

$ ./bissi.pl input.txt 
t,  group,  1,  3,  5
0,  p1, 0,  0,  1
0,  p2, 0,  0,  0
0,  p1, 0,  0,  0

To convert all 150 of your files, you'd wrap that in a shell for loop something like this:

mkdir -p new
for i in {1..150} ; do
    ./bissi.pl "scenario$i.csv" > "new/scenario$i.csv"
done
add note about regexp. optimised for loop to jump to end with last on a successful match.
Source Link
cas
  • 83.9k
  • 8
  • 136
  • 205
Loading
Source Link
cas
  • 83.9k
  • 8
  • 136
  • 205
Loading