Return to Answer

added possible perl implementation

Source Link

edited Sep 1, 2016 at 20:11

83.9k
12
124
175

FWIW here's my attempt at a perl solution, including normalization to the global max average as requested in comments. DISCLAIMER: I'm a novice perl programmer, so it may demonstrate poor programming practices.

#!/usr/bin/perl

use strict;
use warnings;

use List::MoreUtils qw(pairwise minmax);
use Math::Round qw(nearest);

my @hdr;
my %sums = ();
my %count = ();
my $key;

while (defined($_ = <ARGV>)) {
  chomp $_;
  my @F = split(' ', $_, 0);

  # UGLY: hardcoded to expect exactly 1 header row
  if ($. == 1) {
    @hdr = @F;
    next;
  }

  # sum column-wise, grouped by first column
  $key = shift @F;
  if ( exists $sums{$key} ) {
    $sums{$key} = [ pairwise { $a + $b } @{ $sums{$key} }, @F];
  }
  else {
    $sums{$key} = \@F;
  }

  $count{$key}++;
}


my %avgs = ();
# NB should really initialize $maxavg to a suitably large NEGATIVE value
my $maxavg = 0.0;

# find the column averages, and the global max of those averages
for $key ( keys %sums ) {
  $avgs{$key} = [ map { $_ / $count{$key} } @{ $sums{$key} } ];
  # NB could use List::Util=max here, but we're alresdy using List::MoreUtils
  my ($kmin, $kmax) = minmax @{ $avgs{$key} };
  $maxavg = $kmax > $maxavg ? $kmax : $maxavg;
}

# normalize and print the results, rounded to nearest 0.01
print join "\t", @hdr, "\n";
for $key ( sort keys %avgs ) {
  print join "\t", $key, (map { nearest (0.01, $_ / $maxavg) } @{ $avgs{$key} }), "\n";
}

Saved as colavgnorm.pl and made executable, then run as

$ ./colavgnorm.pl file
K       C1      C2      C3
a       0.77    0.81    0.86
b       0.91    0.95    1

where file is

K   C1  C2  C3
a   12  13  14
b   15  16  17
a   21  22  23
b   24  25  26

#!/usr/bin/perl

use strict;
use warnings;

use List::MoreUtils qw(pairwise minmax);
use Math::Round qw(nearest);

my @hdr;
my %sums = ();
my %count = ();
my $key;

while (defined($_ = <ARGV>)) {
  chomp $_;
  my @F = split(' ', $_, 0);

  # UGLY: hardcoded to expect exactly 1 header row
  if ($. == 1) {
    @hdr = @F;
    next;
  }

  # sum column-wise, grouped by first column
  $key = shift @F;
  if ( exists $sums{$key} ) {
    $sums{$key} = [ pairwise { $a + $b } @{ $sums{$key} }, @F];
  }
  else {
    $sums{$key} = \@F;
  }

  $count{$key}++;
}


my %avgs = ();
# NB should really initialize $maxavg to a suitably large NEGATIVE value
my $maxavg = 0.0;

# find the column averages, and the global max of those averages
for $key ( keys %sums ) {
  $avgs{$key} = [ map { $_ / $count{$key} } @{ $sums{$key} } ];
  # NB could use List::Util=max here, but we're alresdy using List::MoreUtils
  my ($kmin, $kmax) = minmax @{ $avgs{$key} };
  $maxavg = $kmax > $maxavg ? $kmax : $maxavg;
}

# normalize and print the results, rounded to nearest 0.01
print join "\t", @hdr, "\n";
for $key ( sort keys %avgs ) {
  print join "\t", $key, (map { nearest (0.01, $_ / $maxavg) } @{ $avgs{$key} }), "\n";
}

Saved as colavgnorm.pl and made executable, then run as

$ ./colavgnorm.pl file
K       C1      C2      C3
a       0.77    0.81    0.86
b       0.91    0.95    1

where file is

K   C1  C2  C3
a   12  13  14
b   15  16  17
a   21  22  23
b   24  25  26

Source Link

answered Sep 1, 2016 at 3:44

steeldriver

83.9k
12
124
175

Using awk, you could simulate a 2D array by constructing a composite index from the key (first column value) and column index:

awk '
  {
  c[$1]++; 
  for (i=2;i<=NF;i++) {
    s[$1"."i]+=$i};
  } 
  END {
    for (k in c) {
      printf "%s\t", k; 
      for(i=2;i<NF;i++) printf "%.1f\t", s[k"."i]/c[k]; 
      printf "%.1f\n", s[k"."NF]/c[k];
    }
  }' file
  a       16.5    17.5    18.5
  b       19.5    20.5    21.5

A similar approach may be implemented in perl more directly using a hash of arrays.

Alternatively, there's GNU datamash which (at least from version 1.1.0) supports group averages very compactly e.g.

datamash --sort --whitespace groupby 1 mean 2-4 < file
a       16.5    17.5    18.5
b       19.5    20.5    21.5