I'm trying to count the number of occurrences of a regex containing recursive parentheses expression. In my particular case I'm looking for counting occurrences by line or by file of (NP *) (VP *) (NP *). My example file contains (line 4 has a recursive case):
$ more mini.example
<parse> (NP (NN opposition)) (VP et) (NP gouvernement) (NP (NN opposition)) (VP et) (NP gouvernement) (NP (NN opposition)) (VP et) (NP gouvernement) </parse>
<parse> (NP (NN opposition)) (XP et) (NP gouvernement) (NP (NN opposition)) (VP et) (NP gouvernement) (NP (NN opposition)) (VP et) (NP gouvernement) </parse>
<parse> (NP (NN opposition)) (VP et) (NP gouvernement) (NP (NN opposition)) (VP et) (NP gouvernement) </parse>
<parse> (NP (NN opposition)) (VP et) (NP gouvernement (NP (NN opposition)) (VP et) (NP gouvernement)) </parse>
<parse> (NP (NN opposition)) (VP et) (FP gouvernement) (NP (NN opposition)) (RP et) (NP gouvernement) </parse>
<parse> (NP (NN opposition)) (VP et) </parse>
<parse> (VP et) (NP gouvernement) </parse>
I would like to have an output like this:
3 1
2 2
2 3
2 4
0 5
0 6
I tried this:
$ grep -Pon '(?<=\(NP ).*(?=\).*(?<=\(VP ).*(?=\).*(?<=\(NP ).*(?=\))))' mini.example | cut -d : -f 1 | uniq -c | sort -k 1
But the output is:
1 1
1 2
1 4
1 5
1 6
Which is different to the desired one. It counts uniquely the first part of the pattern, even if the whole pattern does not match and recursion can't be verified. Thank you for any help.