1

I have data in the following form:

 Sub: Size:14Val: 4644613 Some long string here
 Sub: Size:2Val: 19888493 Some other long string here
 Sub: Size:1Val: 6490281 Some other long string here1
 Sub: Size:1Val: 320829337 Some other long string here2
 Sub: Size:1Val: 50281086 Some other long string here3
 Sub: Size:1Val: 209077847 Some other long string here4
 Sub: Size:3Val: 320829337 Some other long string here2
 Sub: Size:3Val: 50281086 Some other long string here3
 Sub: Size:3Val: 209077847 Some other long string here4

Now I want to extract all Size:-- information from this file. That is I want to extract the following:

Size:14
Size:2
Size:1
Size:1
Size:1
Size:1
Size:3
Size:3
Size:3

And I want to find out number of occurrences of all the values associated with size. E.g. 14 occurs once, 2 occurs once, 1 occurs four times, etc. in a sorted order ((i).sorted by the number of occurrences and (ii).sorted by value associated with size)). That is want the following result in a sorted manner

(i). sorted by number of occurences
1->4
3->3
2->1
14->1

(ii). sorted by the value associated with Size:
1->4
2->1
3->3
14->1

I wrote a python program and was able to sort them. But I was thinking is there some way to do the same using linux commands like grep, etc? I am using ubuntu 12.04.

1 Answer 1

1

To extract the size field,

grep -o 'Size:[0-9]*' data

Sorting by unique occurrences can be done with sort | uniq -c | sort -rn and you can make some minor modifications to the first sort (i.e. add -t : -k2rn) and leave off the sort -rn at the end to sort by value. Massaging the final output into the format you require can easily be performed with a simple sed script.

grep -o 'Size:[0-9]*' data |
sort -t : -k2rn | uniq -c |
sed 's/^ *//;s/\([1-9][0-9]*\) Size:\([0-9]*\)/\2->\1/'
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.