0

So I'm really stuck here. I got some files with millions of lines of data in them formatted like so:

username|process name|process time (in minutes)

There are close to 3.4 million lines of this stuff. Now the task at hand for me is to make a script for myself to look through all this data very fast.

So basically I want to enter a user name from the command line extract all the lines of data with that username, sum them up, and then display. Meaning like total process time for that user as well as total number of processes for that user.

This is what I have so far and it's not much

tput cup 19 10
read -p "Please Enter a UserName: " uname

That is all I have. Does anyone have an idea of how I can do this?

1 Answer 1

4

Let's take this as a sample input file:

$ cat file
jim|process1|23
bob|process2|5
jim|process3|7

Using awk

Now, let's create this shell script:

$ cat script.sh
#!/bin/sh
read -p "Please Enter a UserName: " uname
awk -v n="$uname" -F\| '$1==n{total+=$3} END{printf "Total for %s is %s minutes\n",n,total}' file

As an example, let's sum up the time used by jim:

$ sh script.sh
Please Enter a UserName: jim
Total for jim is 30 minutes

How it works

awk implicitly loops through every line in the input file. This script uses two variables: n which is the user name and total which is the running total of minutes used by user n.

  • -v n="$uname"

    This creates an awk variable n and assigns to it the value of the shell variable uname.

  • -F\|

    This tells awk to use | as the field separator

  • $1==n{total+=$3}

    Every time the first field, $1, matches the user name, n, we increment the total total by the amount of the third field, $3.

  • END{printf "Total for %s is %s minutes\n",n,total}

    When we are done reading the file, we print out the result.

Using shell

Alternatively, we can do the looping in shell:

$ cat script2.sh 
#!/bin/sh
read -p "Please Enter a UserName: " uname
while IFS=\| read -r name process minutes; do
    [ "$name" = "$uname" ] && total=$((total+minutes))
done <file
echo "Total for $uname is $total minutes"

As a demonstration:

$ sh script2.sh
Please Enter a UserName: jim
Total for jim is 30 minutes

I haven't timed the two approaches but I expect that awk will be much faster.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.