Return to Question

edited body

Source Link

edited Nov 25, 2016 at 11:13

77.9k
16
212
317

In a bash script, some integers in a range are generated, concatenated with fixed strings, and printed on a file. OS is Ubuntu 14.04 and bash version is 4.3.11(1)-release.

The strings (and so the lines) to be printed are millions. Here is the hard disk usage from iotop in several times:

 TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND  
5701 be/4 myuser    408.24 B/s   97.27 K/s  0.00 %  0.23 % bash ./script.sh
5701 be/4 myuser    408.20 B/s   97.27 K/s  0.00 %  0.00 % bash ./script.sh
5701 be/4 myuser    408.41 B/s   95.72 K/s  0.00 %  0.11 % bash ./script.sh

It takes about 16 minutes to print 1 million strings. Bandwidth of the hard disk is far away from being saturated. From uptime after more than 15 minutes of execution:

load average: 0,97, 1,14, 1,19

in a 4 core CPU. So, nor the CPU neither the hard disk seem to be responsible of this extremely slow execution.

How is it possible to speed up this process?
Is it possible still using bash or should I use another language?

Relevant parts of the script are the following lines:

#!/bin/bash

i=0
a='fixedstring1'
b='fixedstring2'

while [[ $i -le 9999999 ]]; do
[...]
    ur="$a""$b""$i"
[...]
    echo $ur >> exp1/file$m
[...]
    (( i++ ))
done

file$m is the m-th generated file (the script generates multiple files with a fixed number of lines instead of a unique file with all the lines).

In a bash script, some integers in a range are generated, concatenated with fixed strings, and printed on a file. OS is Ubuntu 14.04 and bash version is 4.3.11(1)-release.

The strings (and so the lines) to be printed are millions. Here is the hard disk usage from iotop in several times:

 TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND  
5701 be/4 myuser    408.24 B/s   97.27 K/s  0.00 %  0.23 % bash ./script.sh
5701 be/4 myuser    408.20 B/s   97.27 K/s  0.00 %  0.00 % bash ./script.sh
5701 be/4 myuser    408.41 B/s   95.72 K/s  0.00 %  0.11 % bash ./script.sh

It takes about 16 minutes to print 1 million strings. Bandwidth of the hard disk is far away from being saturated. From uptime after more than 15 minutes of execution:

load average: 0,97, 1,14, 1,19

in a 4 core CPU. So, nor the CPU neither the hard disk seem to be responsible of this extremely slow execution.

How is it possible to speed up this process?
Is it possible still using bash or should I use another language?

Relevant parts of the script are the following lines:

#!/bin/bash

i=0
a='fixedstring1'
b='fixedstring2'

while [[ $i -le 9999999 ]]; do
[...]
    ur="$a""$b""$i"
[...]
    echo $ur >> exp1/file$m
[...]
    (( i++ ))
done

file$m is the m-th generated file (the script generates multiple files with a fixed number of lines instead of a unique file with all the lines).

In a bash script, some integers in a range are generated, concatenated with fixed strings, and printed on a file. OS is Ubuntu 14.04 and bash version is 4.3.11(1)-release.

The strings (and so the lines) to be printed are millions. Here is the hard disk usage from iotop in several times:

 TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND  
5701 be/4 myuser    408.24 B/s   97.27 K/s  0.00 %  0.23 % bash ./script.sh
5701 be/4 myuser    408.20 B/s   97.27 K/s  0.00 %  0.00 % bash ./script.sh
5701 be/4 myuser    408.41 B/s   95.72 K/s  0.00 %  0.11 % bash ./script.sh

It takes about 16 minutes to print 1 million strings. Bandwidth of the hard disk is far away from being saturated. From uptime after more than 15 minutes of execution:

load average: 0,97, 1,14, 1,19

in a 4 core CPU. So, nor the CPU neither the hard disk seem to be responsible of this extremely slow execution.

How is it possible to speed up this process?
Is it possible still using bash or should I use another language?

Relevant parts of the script are the following lines:

#!/bin/bash

i=0
a='fixedstring1'
b='fixedstring2'

while [[ $i -le 9999999 ]]; do
[...]
    ur="$a""$b""$i"
[...]
    echo $ur >> exp1/file$m
[...]
    (( i++ ))
done

file$m is the m-th generated file (the script generates multiple files with a fixed number of lines instead of a unique file with all the lines).

Source Link

asked Nov 25, 2016 at 11:07

BowPark

5.2k
13
54
78

Bash script performance

In a bash script, some integers in a range are generated, concatenated with fixed strings, and printed on a file. OS is Ubuntu 14.04 and bash version is 4.3.11(1)-release.

The strings (and so the lines) to be printed are millions. Here is the hard disk usage from iotop in several times:

 TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND  
5701 be/4 myuser    408.24 B/s   97.27 K/s  0.00 %  0.23 % bash ./script.sh
5701 be/4 myuser    408.20 B/s   97.27 K/s  0.00 %  0.00 % bash ./script.sh
5701 be/4 myuser    408.41 B/s   95.72 K/s  0.00 %  0.11 % bash ./script.sh

It takes about 16 minutes to print 1 million strings. Bandwidth of the hard disk is far away from being saturated. From uptime after more than 15 minutes of execution:

load average: 0,97, 1,14, 1,19

in a 4 core CPU. So, nor the CPU neither the hard disk seem to be responsible of this extremely slow execution.

How is it possible to speed up this process?
Is it possible still using bash or should I use another language?

Relevant parts of the script are the following lines:

#!/bin/bash

i=0
a='fixedstring1'
b='fixedstring2'

while [[ $i -le 9999999 ]]; do
[...]
    ur="$a""$b""$i"
[...]
    echo $ur >> exp1/file$m
[...]
    (( i++ ))
done

file$m is the m-th generated file (the script generates multiple files with a fixed number of lines instead of a unique file with all the lines).