Revisions to Text Processing - How to get pattern A matching line until first occurrence of pattern B matching line?

added 531 characters in body

Source Link

edited Feb 9, 2018 at 3:58

37.5k
30
149
284

POSIX `ex` to the rescue again!

ex is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.

The following one-liner works perfectly on your example_file2.txt:

printf '%s\n' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt

On your example_file.txt, it also works, but because the global command in ex can't write to a separate destination for each range acted upon, the desired two output files are merged like so:

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

However, this is easy enough to handle—with another POSIX tool, csplit, which is designed to split files according to a "context."

Portable POSIX solution:

patA='AK5[*]R'
patB='AK2'

printf '%s\n' 'g"g/AK5[*]R$patA/?AK2$patB?,.p'p" |
  ex example_file.txt |
  csplit -f my_unique_prefix_ -n 1 -s -k - '"/AK2$patB/'" '{999}'

for f in my_unique_prefix_*; do
  mv "$f" "e${f##my_unique_prefix_}.txt";
done

rm e0.txt

There is one final element to make this a perfect solution, which is to renumber the files in reverse order. I haven't done this portion.

If you don't care about the file numbering being in the same order as the file, and if you don't mind if the extension .txt is omitted, and if you don't mind if the files are numbered from e01 rather than from e1, and if you don't mind a diagnostic message being printed about how many lines were put in each file, then we can simplify:

patA='AK5[*]R'
patB='AK2'

printf '%s\n' "g/$patA/?$patB?,.p" |
  ex example_file.txt |
  csplit -f e -k - "/$patB/" '{999}'

rm e00

POSIX `ex` to the rescue again!

ex is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.

The following one-liner works perfectly on your example_file2.txt:

printf '%s\n' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt

On your example_file.txt, it also works, but because the global command in ex can't write to a separate destination for each range acted upon, the desired two output files are merged like so:

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

However, this is easy enough to handle—with another POSIX tool, csplit, which is designed to split files according to a "context."

Portable POSIX solution:

printf '%s\n' 'g/AK5[*]R/?AK2?,.p' |
  ex example_file.txt |
  csplit -f my_unique_prefix_ -n 1 -s -k - '/AK2/' '{999}'

for f in my_unique_prefix_*; do
  mv "$f" "e${f##my_unique_prefix_}.txt";
done

rm e0.txt

POSIX `ex` to the rescue again!

ex is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.

The following one-liner works perfectly on your example_file2.txt:

printf '%s\n' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt

On your example_file.txt, it also works, but because the global command in ex can't write to a separate destination for each range acted upon, the desired two output files are merged like so:

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

However, this is easy enough to handle—with another POSIX tool, csplit, which is designed to split files according to a "context."

Portable POSIX solution:

patA='AK5[*]R'
patB='AK2'

printf '%s\n' "g/$patA/?$patB?,.p" |
  ex example_file.txt |
  csplit -f my_unique_prefix_ -n 1 -s -k - "/$patB/" '{999}'

for f in my_unique_prefix_*; do
  mv "$f" "e${f##my_unique_prefix_}.txt";
done

rm e0.txt

There is one final element to make this a perfect solution, which is to renumber the files in reverse order. I haven't done this portion.

If you don't care about the file numbering being in the same order as the file, and if you don't mind if the extension .txt is omitted, and if you don't mind if the files are numbered from e01 rather than from e1, and if you don't mind a diagnostic message being printed about how many lines were put in each file, then we can simplify:

patA='AK5[*]R'
patB='AK2'

printf '%s\n' "g/$patA/?$patB?,.p" |
  ex example_file.txt |
  csplit -f e -k - "/$patB/" '{999}'

rm e00

added 531 characters in body

Source Link

edited Feb 9, 2018 at 3:52

Wildcard

37.5k
30
149
284

POSIX `ex` to the rescue again!

ex is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.

The following one-liner works perfectly on your example_file2.txt:

printf '%s\n' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt

On your example_file.txt, it also works, but because the global command in ex can't write to a separate destination for each range acted upon, the desired two output files are merged like so:

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

However, this is easy enough to handle:handle—with another POSIX tool, csplit, which is designed to split files according to a "context."

Portable POSIX solution:

printf '%s\n' 'g/AK5[*]R/?AK2?,.p' |
  ex example_file.txt |
  csplit -f my_unique_prefix_ -n 1 -s -k - '/AK2/' '{999}'

for f in my_unique_prefix_*; do
  mv "$f" "e${f##my_unique_prefix_}.txt";
done

rm e0.txt

POSIX `ex` to the rescue again!

The following one-liner works perfectly on your example_file2.txt:

printf '%s\n' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt

On your example_file.txt, it also works, but because the global command in ex can't write to a separate destination for each range acted upon, the desired two output files are merged like so:

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

However, this is easy enough to handle:

POSIX `ex` to the rescue again!

ex is the POSIX-specified scriptable file editor. For anything involving backwards addressing, it's usually a far better solution than Awk or Sed.

The following one-liner works perfectly on your example_file2.txt:

printf '%s\n' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt

On your example_file.txt, it also works, but because the global command in ex can't write to a separate destination for each range acted upon, the desired two output files are merged like so:

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

However, this is easy enough to handle—with another POSIX tool, csplit, which is designed to split files according to a "context."

Portable POSIX solution:

printf '%s\n' 'g/AK5[*]R/?AK2?,.p' |
  ex example_file.txt |
  csplit -f my_unique_prefix_ -n 1 -s -k - '/AK2/' '{999}'

for f in my_unique_prefix_*; do
  mv "$f" "e${f##my_unique_prefix_}.txt";
done

rm e0.txt

Source Link

answered Feb 9, 2018 at 3:39

Wildcard

37.5k
30
149
284

POSIX `ex` to the rescue again!

The following one-liner works perfectly on your example_file2.txt:

printf '%s\n' 'g/AK5[*]R/?AK2?,.p' | ex example_file.txt

On your example_file.txt, it also works, but because the global command in ex can't write to a separate destination for each range acted upon, the desired two output files are merged like so:

AK2*777*7777777
AK3*S6*5**3
AK3*A2*5**3
AK4*3*6969*4
AK4*7*6969*4
AK5*R*5
AK2*777*69696969
AK3*J7*5**3
AK4*3*6969*4
AK5*R*5

However, this is easy enough to handle:

Stack Exchange Network

Return to Answer

POSIX `ex` to the rescue again!

Portable POSIX solution:

POSIX `ex` to the rescue again!

Portable POSIX solution:

POSIX `ex` to the rescue again!

Portable POSIX solution:

POSIX `ex` to the rescue again!

Portable POSIX solution:

POSIX `ex` to the rescue again!

POSIX `ex` to the rescue again!

Portable POSIX solution:

POSIX `ex` to the rescue again!

Return to Answer

POSIX ex to the rescue again!

Portable POSIX solution:

POSIX ex to the rescue again!

Portable POSIX solution:

POSIX ex to the rescue again!

Portable POSIX solution:

POSIX ex to the rescue again!

Portable POSIX solution:

POSIX ex to the rescue again!

POSIX ex to the rescue again!

Portable POSIX solution:

POSIX ex to the rescue again!

POSIX `ex` to the rescue again!

POSIX `ex` to the rescue again!

POSIX `ex` to the rescue again!

POSIX `ex` to the rescue again!

POSIX `ex` to the rescue again!

POSIX `ex` to the rescue again!

POSIX `ex` to the rescue again!