0

I have a text file with something like this:

duration:       17100
series:         2016
episode:        58
modesizes:      original: hd1=9120MB,hd2=7543MB,sd1=4872MB,high1=2833MB,low1=634MB
runtime:        285


duration:       13740
series:         2016
episode:        59
modesizes:      original: hd1=9024MB,hd2=7203MB,sd1=5104MB,high1=2950MB,low1=570MB
runtime:        229

I would like to extract duration, episode and modesizes. Output should look like this:

13740,59,9024MB,7203MB,5104MB,2950MB,570MB
1
  • 4
    why doesn't first set of numbers feature in your output? and what have you tried? Commented Sep 16, 2016 at 11:33

3 Answers 3

2

With awk:

awk '/duration|episode/{printf "%s,", $2} /modesizes/{gsub(/[^=,]+=/,"",$3); print $3}' file

Explanation:

  • /duration|episode/ if the line matches duration or episode
    • printf "%s,", $2 then print the field with the value
  • /modesizes/ if the line matches modesizes
    • gsub(/[^=,]+=/,"",$3) then remove the identifiers and the equal sign
    • print $3 and print the changed field

With your input example it prints:

17100,58,9120MB,7543MB,4872MB,2833MB,634MB
13740,59,9024MB,7203MB,5104MB,2950MB,570MB
2

If you have grep with pcre regex

$ grep -oP '(duration|episode):\s*\K\d+|\d+MB' ip.txt | pr -ats, -7
17100,58,9120MB,7543MB,4872MB,2833MB,634MB
13740,59,9024MB,7203MB,5104MB,2950MB,570MB
  • (duration|episode):\s*\K positive lookbehind to check duration or episode followed by :, zero or more spaces. This is not part of output
  • \d+ one or more digits
  • |\d+MB alternate pattern, one or more digits ending with MB

Output so obtained is then styled using pr with , as separator and max of 7 columns

0

A sed solution:

sed -E -e \
    '/duration:/{
    N;N;N;N
    s/duration:\s*([0-9]*).*episode:\s*([0-9]*).*hd1=([0-9]*MB),hd2=([0-9]*MB),sd1=([0-9]*MB),high1=([0-9]*MB),low1=([0-9]*MB).*/\1,\2,\3,\4,\5,\6/
}' < input_file

It outputs:

17100,58,9120MB,7543MB,4872MB,2833MB


13740,59,9024MB,7203MB,5104MB,2950MB

It preserves the empty lines.

If you don't want these:

sed -E -n -e \
   '/duration:/{
    N;N;N;N
    s/duration:\s*([0-9]*).*episode:\s*([0-9]*).*hd1=([0-9]*MB),hd2=([0-9]*MB),sd1=([0-9]*MB),high1=([0-9]*MB),low1=([0-9]*MB).*/\1,\2,\3,\4,\5,\6/
    p
    d
}' < input_file

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.