count number of words between 2 fixed words

Question

I have a file as below

FHEAD
THEAD
TCUST
TITEM
TTEND
TTAIL
THEAD
TCUST
TCUST
TITEM
TITEM
TTEND
TTAIL
THEAD
TCUST
TITEM
TTEND
TTAIL
THEAD
TCUST
TCUST
TITEM
TTEND
TTAIL

I need to count thr number of occurrence of ONLY TCUST records between THEAD and TTAIL where the occurrence is more than once and print that file name and line.

There will be multiple files so I need to print the filename as well.

Expected result is

THEAD TCUST TCUST TITEM TITEM TTEND TTAIL THEAD TCUST TCUST TITEM TTEND TTAIL name of file

What line do you want to print? Or is it the line number? Of THEAD, TCUST lines? Or the count of TCUST lines? Each count separately or as a total? An expected result would help. — Stéphane Chazelas
– Stéphane Chazelas, Commented Nov 25, 2016 at 12:21
Hi Stéphane, Thanks for your reply. I want to find all occurrences of TCUST records where it is more than 1 between THEAD and TTAIL records, then print that line from THEAD to TTAIL (with more than 1 TCUST record) and also print the filename — Amit
– Amit, Commented Nov 25, 2016 at 12:37
Hi Sundeep- expected result is THEAD TCUST TCUST TTAIL THEAD TCUST TCUST TTAIL — Amit
– Amit, Commented Nov 25, 2016 at 12:43
@Sundeep. Yes. I want to extract lines between THEAD and TTAIL if TCUST occurs more than once — Amit
– Amit, Commented Nov 25, 2016 at 12:57
can you clarify: 1) the last line should be name of input file? and should that be printed only if there was at least one matching section? 2) can there be lines not matching TCUST between THEAD and TTAIL? — Sundeep
– Sundeep, Commented Nov 25, 2016 at 13:03

Sundeep · Accepted Answer · 2016-11-26 13:39:51Z

1

$ awk '
  /THEAD/{f=1; c=0; a = $0; next}
  f{a = a ORS $0; if(/TCUST/) c++}
  /TTAIL/{f=0; if(c > 1){print a; m=1} }
  ENDFILE{if(m) print FILENAME; m=0}
  ' ip.txt
THEAD
TCUST
TCUST
TITEM
TITEM
TTEND
TTAIL
THEAD
TCUST
TCUST
TITEM
TTEND
TTAIL
ip.txt

/THEAD/{f=1; c=0; a = $0; next} starting pattern, set flag and initialize counter. Save current line for later printing
f{a = a ORS $0; if(/TCUST/) c++} when flag is set, accumulate input lines in a variable and increment counter if line matches TCUST
/TTAIL/{f=0; if(c > 1){print a; m=1} } ending pattern, clear flag. Print contents of a if counter is greater than 1, also set variable m that at least one match is found
ENDFILE{if(m) print FILENAME; m=0} after all lines are processed for a file, print input file name if m is set and clear before next file is processed (Thanks @Costas for pointing out multiple file requirement)

Note: ENDFILE is GNU awk specific, I am not sure how to handle it without ENDFILE

Thanks @Costas for solution not dependent on GNU specificENDFILE:

$ awk '
  FNR==1{if(m) print fname; m=0; fname=FILENAME}
  /THEAD/{f=1; c=0; a = $0; next}
  f{a = a ORS $0; if(/TCUST/) c++}
  /TTAIL/{f=0; if(c > 1){print a; m=1} }
  END{if(m) print fname}
  ' *.txt

edited Nov 26, 2016 at 13:39

answered Nov 25, 2016 at 13:20

Sundeep

12.2k3 gold badges28 silver badges73 bronze badges

Hi Sundeep- Perfect.....works well !! Thanks for your help

Amit
– Amit

2016-11-25 14:27:13 +00:00
Commented Nov 25, 2016 at 14:27
Should add FNR==1{m=0} if several files

Costas
– Costas

2016-11-25 14:51:22 +00:00
Commented Nov 25, 2016 at 14:51
@Costas, thanks for pointing out fallacy for multiple files.. I have changed to ENDFILE which is gawk specific though..

Sundeep
– Sundeep

2016-11-25 15:06:31 +00:00
Commented Nov 25, 2016 at 15:06
1

For non-GNU awk you should add FNR==1{if(m) print fname; m=0; fname=FILENAME} altogether with END{if(m) print fname}

Costas
– Costas

2016-11-26 11:25:28 +00:00
Commented Nov 26, 2016 at 11:25
@ Sundeep - How to move the file (having multiple TCUST records) to some backup dir.. I am getting fatal division by 0 error

Amit
– Amit

2016-11-29 11:06:59 +00:00
Commented Nov 29, 2016 at 11:06

Add a comment |

Costas · Accepted Answer · 2016-11-25 14:46:10Z

1

By GNU sed the task can be done by

sed -sn '
    /THEAD/{:1;N;/TTAIL/! b1} #collect lines from `THEAD' to `TTAIL'
    /TCUST.*TCUST/{p;h}       #print if there are two TCUST and set hold
    ${x;//F}                  #check hold and output if two TCUST was in it
    ' file1 file2 …

answered Nov 25, 2016 at 14:46

Costas

15k24 silver badges38 bronze badges

Add a comment |

Stack Exchange Network

count number of words between 2 fixed words

2 Answers 2

You must log in to answer this question.

Hot Network Questions

count number of words between 2 fixed words

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions