Revisions to Changing column of multiple csv files

optimised by pre-compiling the regexps

Source Link

edited Oct 29, 2015 at 1:25

83.9k
8
136
205

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

# optimisation: use qr// for the search patterns so that
# the hash keys are pre-compiled regular expressions.
# this makes the for loop later MUCH faster if there are
# lots of patterns and lots of input lines to process. 
my %patterns = (
    '0qr/0-4 years low risk'risk/        => 'p1',
    '0qr/0-4 years high risk'risk/       => 'p2',
    
    '65\+qr/65\+ years low risk'risk/       => 'p19',
    '65\+qr/65\+ years pregnant women'women/ => 'p20',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

my %patterns = (
    '0-4 years low risk'        => 'p1',
    '0-4 years high risk'       => 'p2',
    
    '65\+ years low risk'       => 'p19',
    '65\+ years pregnant women' => 'p20',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

# optimisation: use qr// for the search patterns so that
# the hash keys are pre-compiled regular expressions.
# this makes the for loop later MUCH faster if there are
# lots of patterns and lots of input lines to process. 
my %patterns = (
    qr/0-4 years low risk/        => 'p1',
    qr/0-4 years high risk/       => 'p2',
    
    qr/65\+ years low risk/       => 'p19',
    qr/65\+ years pregnant women/ => 'p20',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}

updated sample output to match updated sample input and dictionary

Source Link

edited Oct 29, 2015 at 0:00

cas

83.9k
8
136
205

Here's a perl script that does the job.

You can add more patterns and replacements to the %patterns hash as required. Don't forget the comma at the end of each line.

Note that the patterns are interpreted as regular expressions, not as literal strings. So if your patterns have any regexp-special characters (like *, (, ), ?, +, etc), they need to be escaped with \ (e.g. \*, \(, \), \?, \+).

The script changes the output slightly, in that it joins all the fields with ,\t (comma and a single tab) where your original input had multiple spaces. If that's significant, you can tweak that print statement to produce the same or similar output (e.g. by using printf rather than print join())

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

my %patterns = (
    '0-4 years low risk'        => 'p1',
    '0-4 years high risk'       => 'p2',
    
    '65\+ years low risk'       => 'p19',
    '65\+ years pregnant women' => 'p20',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}

That produces the following ouput:

$ ./bissi.pl input.txt 
t,  group,  1,  3,  5
0,  p1, 0,  0,  1
0,  p2, 0,  0,  0
0,  p1, 0,  0,  0

To convert all 150 of your files, you'd wrap that in a shell for loop something like this:

mkdir -p new
for i in {1..150} ; do
    ./bissi.pl "scenario$i.csv" > "new/scenario$i.csv"
done

Here's a perl script that does the job.

You can add more patterns and replacements to the %patterns hash as required. Don't forget the comma at the end of each line.

Note that the patterns are interpreted as regular expressions, not as literal strings. So if your patterns have any regexp-special characters (like *, (, ), ?, +, etc), they need to be escaped with \ (e.g. \*, \(, \), \?, \+).

The script changes the output slightly, in that it joins all the fields with ,\t (comma and a single tab) where your original input had multiple spaces. If that's significant, you can tweak that print statement to produce the same or similar output (e.g. by using printf rather than print join())

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

my %patterns = (
    '0-4 years low risk' => 'p1',
    '0-4 years high risk' => 'p2',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}
 
$ ./bissi.pl input.txt 
t,  group,  1,  3,  5
0,  p1, 0,  0,  1
0,  p2, 0,  0,  0
0,  p1, 0,  0,  0

To convert all 150 of your files, you'd wrap that in a shell for loop something like this:

mkdir -p new
for i in {1..150} ; do
    ./bissi.pl "scenario$i.csv" > "new/scenario$i.csv"
done

Here's a perl script that does the job.

You can add more patterns and replacements to the %patterns hash as required. Don't forget the comma at the end of each line.

Note that the patterns are interpreted as regular expressions, not as literal strings. So if your patterns have any regexp-special characters (like *, (, ), ?, +, etc), they need to be escaped with \ (e.g. \*, \(, \), \?, \+).

The script changes the output slightly, in that it joins all the fields with ,\t (comma and a single tab) where your original input had multiple spaces. If that's significant, you can tweak that print statement to produce the same or similar output (e.g. by using printf rather than print join())

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

my %patterns = (
    '0-4 years low risk'        => 'p1',
    '0-4 years high risk'       => 'p2',
    
    '65\+ years low risk'       => 'p19',
    '65\+ years pregnant women' => 'p20',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}

That produces the following ouput:

$ ./bissi.pl input.txt 
t,  group,  1,  3,  5
0,  p1, 0,  0,  1
0,  p2, 0,  0,  0
0,  p1, 0,  0,  0

To convert all 150 of your files, you'd wrap that in a shell for loop something like this:

mkdir -p new
for i in {1..150} ; do
    ./bissi.pl "scenario$i.csv" > "new/scenario$i.csv"
done

updated sample output to match updated sample input

Source Link

edited Oct 28, 2015 at 23:54

cas

83.9k
8
136
205

Here's a perl script that does the job.

You can add more patterns and replacements to the %patterns hash as required. Don't forget the comma at the end of each line.

Note that the patterns are interpreted as regular expressions, not as literal strings. So if your patterns have any regexp-special characters (like *, (, ), ?, +, etc), they need to be escaped with \ (e.g. \*, \(, \), \?, \+).

The script changes the output slightly, in that it joins all the fields with ,\t (comma and a single tab) where your original input had multiple spaces. If that's significant, you can tweak that print statement to produce the same or similar output (e.g. by using printf rather than print join())

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

my %patterns = (
    '0-4 years low risk' => 'p1',
    '0-4 years high risk' => 'p2',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}

$ ./bissi.pl input.txt 
t,  group,  1,  3,  5
0,  p1, 0,  0,  01
0,  p2, 0,  0,  0
0,  p1, 0,  0,  0

To convert all 150 of your files, you'd wrap that in a shell for loop something like this:

mkdir -p new
for i in {1..150} ; do
    ./bissi.pl "scenario$i.csv" > "new/scenario$i.csv"
done

Here's a perl script that does the job.

You can add more patterns and replacements to the %patterns hash as required. Don't forget the comma at the end of each line.

Note that the patterns are interpreted as regular expressions, not as literal strings. So if your patterns have any regexp-special characters (like *, (, ), ? etc), they need to be escaped with \ (e.g. \*, \(, \), \?)

The script changes the output slightly, in that it joins all the fields with ,\t (comma and a single tab) where your original input had multiple spaces. If that's significant, you can tweak that print statement to produce the same or similar output (e.g. by using printf rather than print join())

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

my %patterns = (
    '0-4 years low risk' => 'p1',
    '0-4 years high risk' => 'p2',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}

$ ./bissi.pl input.txt 
t,  group,  1,  3,  5
0,  p1, 0,  0,  0
0,  p2, 0,  0,  0

To convert all 150 of your files, you'd wrap that in a shell for loop something like this:

mkdir -p new
for i in {1..150} ; do
    ./bissi.pl "scenario$i.csv" > "new/scenario$i.csv"
done

Here's a perl script that does the job.

You can add more patterns and replacements to the %patterns hash as required. Don't forget the comma at the end of each line.

Note that the patterns are interpreted as regular expressions, not as literal strings. So if your patterns have any regexp-special characters (like *, (, ), ?, +, etc), they need to be escaped with \ (e.g. \*, \(, \), \?, \+).

The script changes the output slightly, in that it joins all the fields with ,\t (comma and a single tab) where your original input had multiple spaces. If that's significant, you can tweak that print statement to produce the same or similar output (e.g. by using printf rather than print join())

$ cat bissi.pl 
#! /usr/bin/perl

use strict;

my %patterns = (
    '0-4 years low risk' => 'p1',
    '0-4 years high risk' => 'p2',
);


while(<>) { 
    chomp;
    my @line = split /,\s*/;
    foreach my $key (keys %patterns) {
        # perl arrays are zero based, so $line[1] is 2nd field
        if ($line[1] =~ m/$key/) {
            $line[1] = $patterns{$key} ;
            last;
        }
    } 
    print join(",\t",@line), "\n";
}

$ ./bissi.pl input.txt 
t,  group,  1,  3,  5
0,  p1, 0,  0,  1
0,  p2, 0,  0,  0
0,  p1, 0,  0,  0

To convert all 150 of your files, you'd wrap that in a shell for loop something like this:

mkdir -p new
for i in {1..150} ; do
    ./bissi.pl "scenario$i.csv" > "new/scenario$i.csv"
done

add note about regexp. optimised for loop to jump to end with last on a successful match.

Source Link

edited Oct 28, 2015 at 23:46

cas

83.9k
8
136
205

Loading

Source Link

answered Oct 28, 2015 at 23:39

cas

83.9k
8
136
205

Loading

Stack Exchange Network

Return to Answer