2

My data inside a folder data looks like data1.txt, data2.txt, … data120.txt.

Inside each .txt file I have four columns (1000 data lines in each column) example:

data1.txt

1 2 3 4 
4 0 1 3 
3 1 1 2 
2 2 2 1 
........

data2.txt

0 1 3 4 
4 2 1 3 
3 1 3 2 
2 3 2 1 
........

data3.txt

1 0 3 4 
4 0 0 3 
0 1 1 2 
2 0 2 1 
........

data120.txt

1 2 3 1 
4 1 1 3 
3 1 1 1 
2 1 2 1 
........

I want to get the average .txt which looks like below but divided by 4 because I used four data samples in this example.

1+0+1+1  \  2+1+0+2  \  3+3+3+3  \  4+4+4+1  \

4+4+4+4  \  0+2+0+1  \  1+1+0+1  \  3+3+3+3  \

3+3+0+3  \  1+1+1+1  \  1+3+1+1  \  2+2+2+1  \

2+2+2+2  \  2+3+0+1  \  2+2+2+2  \  1+1+1+1  \

I show my data this way just to make it clear - )

4
  • 2
    Formatted the text a little bit. Modify if it is wrong. (Assumed there was no blank lines between data rows in the file). Also; the output is a bit vague. Is it the way you have written it? Or do you want the average in addition, or only the average, or ? I Assume you want 0.75 1.25 3 3.25 in row 1 etc. Is this correct? Commented May 29, 2021 at 0:15
  • 2
    Not sure I get what you want here. You want to sum file1:Col1.Row1+file2:Col1.Row1 ... +file120Col1.Row1 ? 120 values. And file1:Col2.Row1+file2:Col2.Row1 ... +file120Col2.Row1 ... Then file1:Col1.Row2+file2:Col1.Row2 ... +file120Col1.Row2 ? Commented May 29, 2021 at 0:34
  • @ibuprofen thank you so much for your prompt response - and yes I want exactly how you understood - like 0.75 1.25, 3, 3.25 in row 1 and so forth.. Somehow I find difficulty in presenting the data in a clear way - Appreciate it. Commented May 29, 2021 at 4:26
  • If any of the answers here does what you want then see unix.stackexchange.com/help/someone-answers for what to do next. Commented Jun 1, 2021 at 13:04

3 Answers 3

3
$ paste data*.txt |
    awk -v numOutFlds=4 '{
        numFiles = NF / numOutFlds
        for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
            sum = 0
            for (fileNr=1; fileNr<=numFiles; fileNr++) {
                inFldNr = outFldNr + ((fileNr - 1) * numOutFlds)
                sum += $inFldNr
            }
            printf "%g%s", sum/numFiles, (outFldNr<numOutFlds ? OFS : ORS)
        }
    }' |
    column -t
0.75  1.25  3     3.25
4     0.75  0.75  3
2.25  1     1.5   1.75
2     1.5   2     1
2

You can make do with the RPN desk calculator dc

paste ./*.txt |
dc -e "2k #2 decimal digits output
[q]sq
[0dddsasbscsd]si # initialize the four registers
[la+sa lb+sb lc+sc ld+sd z3<u]su # update the four registers
[ld4/n9an lc4/n9an
 lb4/n9an la4/n10an lix]sp # print results
[?z0=q lux lpx z0=?]s?
lix l?x
"
.75 1.25    3.00    3.25
4.00    .75 .75 3.00
2.25    1.00    1.50    1.75
2.00    1.50    2.00    1.00

Perl can be used as follows:

paste *.txt |
perl -lane '
  my @avgs;
  while (@F >= 4) {
    my @tmp = splice(@F, 0, 4);
    $avgs[$#tmp] += pop(@tmp) while @tmp;
  }
  print join "\t", map { sprintf "%.2f", $_/4.0 } @avgs;
' - 

We can use the GNU sed in cooperation with bc to get the output.

n='(\S+)'
paste ./*.txt |
sed -Ee "
  s/\s+/ /g;s/^ | \$//g
  s/\$/ /;s/ /\n/4;ta
  :a
    s/^$n $n $n $n\n$n $n $n $n (.*)/printf '%d %d %d %d\n%s' \$((\1 + \5)) \$((\2 + \6)) \$((\3 + \7)) \$((\4 + \8)) '\9'/e
    s/\n/&/
  ta
  s/.*/printf '%d\/4\n' &|bc -l|paste -s/e
  s/(\...)\S*/\1/g
  s/(^|\t)\./\10./g
" -
1

Assuming I've interpreted it correctly, you could also do something like:

awk (POSIX):

awk -v n_col=4 '
NF != n_col { next }
FILENAME != file {
    file = FILENAME
    k = 0
}
{
    for (i = 1; i <= n_col; ++i)
        A[k++] += $i
}
END {
    n_files = ARGC - 1
    for (i = 0; i < k; ) {
        printf "%2.3f%s", A[i] / n_files,
            ++i % n_col == 0 ? "\n" : " "
    }
}
' data*.txt

perl:

I am sure this can be done better, but a stab at it.

./script.pl <COLUMNS> data*.txt
#!/usr/bin/env perl

use strict;
use warnings;

my @data;
my $cols = $ARGV[0];
my $ac = $#ARGV;
shift;
for (@ARGV) {
    my $k = 0;
    open my $fh, '<', $_
        or die "Cannot open '$_' - $!";
    local $/;
    my $fdata = <$fh>;
    close $fh;
    for ($fdata) {
        $data[$k++] += $_ for split;
    }
}

my $i = 0;
for (@data) {
    printf "%.3f%s", $_ / $ac, ++$i % $cols ? "\t" : "\n";
}


bash: (Slow)

As you tagged the question with bash as well I add a, for fun, sample of the same. Rather slow compared to perl, awk, ...

Note that while it is doable, it is not the best tool for the job.

Uses bashism in for of mapfile, read -a etc.

./script <COLUMNS> data*.txt

#!/bin/bash

declare -i res=1000
declare -i dec=$(( ${#res} - 1 ))
declare -i n_files
declare -i n_columns
declare -a A

process()
{
    local m a
    mapfile -t m< "$1"
    read -ra a<<< "${m[@]}"
    for (( i = 0; i < ${#a[*]}; ++i )); do
        (( A[i] += a[i] ))
    done
    (( ++n_files ))
}

n_columns=$1
shift
for f in "$@"; do
    process "$f"
done

for (( i = 0; i < ${#A[@]}; ++i )); do
    (( (i + 1) % n_columns == 0 )) && sep=$'\n' || sep=' '
    printf "%3.${dec}f%s" "$(( res * A[i] / n_files ))e-$dec" "$sep"
done

alternative method for printing "float":

    (( d = A[i] * res / n_files ))
    printf "%3d.%0${dec}d%s" "$(( d / res ))" "$(( d % res ))" "$sep"

Next sed ... nah, believe I only link this: Addition with 'sed' ;)

2
  • Really appreciate @ibuprofen and Ed Morton!! Thank you so much!! Commented May 29, 2021 at 19:00
  • @saya: NB! I somehow introduced a bug in the awk code while posting. Added the NR < col_len after copying over the code. Should obviously be NF not NR. Sorry for that. Commented May 30, 2021 at 10:46

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.