0

I have a CSV file as below :

I want to remove the "INITIAL OFFER" block from this file and remain with only "FINAL OFFER" block I also want to remove comma(,) from the first field and also remove extra spaces from the last column, so as to make search on these columns easier.

INPUT

500076592,      INITIAL OFFER
500076592,|11|1|1|100 MB|2 Minutes|1.0 SAR
500076592,|11|2|3|300 MB|5 Minutes|3.0 SAR
500076592,|1|1|1|100 MB|NA|0.5 SAR
500076592,|1|2|3|300 MB|NA|1.5 SAR
500076592,|1|4|7|1000 MB|NA|5.0 SAR
500076592,|2|1|1|4096 MB|NA|1.5 SAR
500076592,|2|2|3|6144 MB|NA|2.0 SAR
500076592,|2|4|7|10240 MB|NA|4.0 SAR
500076592,|5|1|1|4096 MB|NA|2.0 SAR
500076592,|5|2|3|6144 MB|NA|2.5 SAR
500076592,|5|4|7|10240 MB|NA|5.0 SAR
500076592,|6|1|1|NA|2 Minutes|0.5 SAR
500076592,|6|2|3|NA|5 Minutes|1.5 SAR
500076592,|6|4|7|NA|10 Minutes|3.0 SAR
500076592,
500076592,|FINAL OFFER
500076592,|2|1|1|4096 MB|NA|1.5 SAR
500076592,|2|2|3|6144 MB|NA|2.0 SAR
500076592,|2|4|7|10240 MB|NA|4.0 SAR
500076592,|5|1|1|4096 MB|NA|2.0 SAR
500076592,|5|2|3|6144 MB|NA|2.5 SAR
500076592,|5|4|7|10240 MB|NA|5.0 SAR
500076592,|1|1|1|100 MB|NA|0.5 SAR
500076592,|1|2|3|300 MB|NA|1.5 SAR
500076592,|1|4|7|1000 MB|NA|5.0 SAR
500076592,|11|1|1|100 MB|2 Minutes|1.0 SAR
500076592,|11|2|3|300 MB|5 Minutes|3.0 SAR
500076592,|6|1|1|NA|2 Minutes|0.5 SAR
500076592,|6|2|3|NA|5 Minutes|1.5 SAR
500076592,|6|4|7|NA|10 Minutes|3.0 SAR
500076592,
500028952,      INITIAL OFFER
500028952,|11|1|1|250 MB|2 Minutes|3.0 SAR
500028952,|11|2|3|650 MB|10 Minutes|8.0 SAR
500028952,|11|4|7|1550 MB|30 Minutes|18.5 SAR
500028952,|1|1|1|250 MB|NA|2.5 SAR
500028952,|1|2|3|650 MB|NA|6.5 SAR
500028952,|1|4|7|1550 MB|NA|15.5 SAR
500028952,|2|1|1|4096 MB|NA|1.5 SAR
500028952,|2|2|3|6144 MB|NA|2.0 SAR
500028952,|2|4|7|10240 MB|NA|4.0 SAR
500028952,|5|1|1|4096 MB|NA|2.0 SAR
500028952,|5|2|3|6144 MB|NA|2.5 SAR
500028952,|5|4|7|10240 MB|NA|5.0 SAR
500028952,|6|1|1|NA|2 Minutes|0.5 SAR
500028952,|6|2|3|NA|10 Minutes|1.5 SAR
500028952,|6|4|7|NA|30 Minutes|3.0 SAR
500028952,
500028952,|FINAL OFFER
500028952,|2|1|1|4096 MB|NA|1.5 SAR
500028952,|2|2|3|6144 MB|NA|2.0 SAR
500028952,|2|4|7|10240 MB|NA|4.0 SAR
500028952,|1|1|1|250 MB|NA|2.5 SAR
500028952,|1|2|3|650 MB|NA|6.5 SAR
500028952,|1|4|7|1550 MB|NA|15.5 SAR
500028952,|11|1|1|250 MB|2 Minutes|3.0 SAR
500028952,|11|2|3|650 MB|10 Minutes|8.0 SAR
500028952,|11|4|7|1550 MB|30 Minutes|18.5 SAR
500028952,|5|1|1|4096 MB|NA|2.0 SAR
500028952,|5|2|3|6144 MB|NA|2.5 SAR
500028952,|5|4|7|10240 MB|NA|5.0 SAR
500028952,|6|1|1|NA|2 Minutes|0.5 SAR
500028952,|6|2|3|NA|10 Minutes|1.5 SAR
500028952,|6|4|7|NA|30 Minutes|3.0 SAR
500028952,

OUTPUT

500076592,|FINAL OFFER
500076592,|2|1|1|4096 MB|NA|1.5 SAR
500076592,|2|2|3|6144 MB|NA|2.0 SAR
500076592,|2|4|7|10240 MB|NA|4.0 SAR
500076592,|5|1|1|4096 MB|NA|2.0 SAR
500076592,|5|2|3|6144 MB|NA|2.5 SAR
500076592,|5|4|7|10240 MB|NA|5.0 SAR
500076592,|1|1|1|100 MB|NA|0.5 SAR
500076592,|1|2|3|300 MB|NA|1.5 SAR
500076592,|1|4|7|1000 MB|NA|5.0 SAR
500076592,|11|1|1|100 MB|2 Minutes|1.0 SAR
500076592,|11|2|3|300 MB|5 Minutes|3.0 SAR
500076592,|6|1|1|NA|2 Minutes|0.5 SAR
500076592,|6|2|3|NA|5 Minutes|1.5 SAR
500076592,|6|4|7|NA|10 Minutes|3.0 SAR
500028952,|FINAL OFFER
500028952,|2|1|1|4096 MB|NA|1.5 SAR
500028952,|2|2|3|6144 MB|NA|2.0 SAR
500028952,|2|4|7|10240 MB|NA|4.0 SAR
500028952,|1|1|1|250 MB|NA|2.5 SAR
500028952,|1|2|3|650 MB|NA|6.5 SAR
500028952,|1|4|7|1550 MB|NA|15.5 SAR
500028952,|11|1|1|250 MB|2 Minutes|3.0 SAR
500028952,|11|2|3|650 MB|10 Minutes|8.0 SAR
500028952,|11|4|7|1550 MB|30 Minutes|18.5 SAR
500028952,|5|1|1|4096 MB|NA|2.0 SAR
500028952,|5|2|3|6144 MB|NA|2.5 SAR
500028952,|5|4|7|10240 MB|NA|5.0 SAR
500028952,|6|1|1|NA|2 Minutes|0.5 SAR
500028952,|6|2|3|NA|10 Minutes|1.5 SAR
500028952,|6|4|7|NA|30 Minutes|3.0 SAR
500028952,
3
  • Why do you have a line with just 500028952, in the end of the second block? Should we also keep the 500076592, line from the first one? Commented May 6, 2021 at 10:43
  • its of no need... i will remove this Commented May 6, 2021 at 10:45
  • If it isn't a problem, it is actually easier if we can keep it. Just make sure that the two blocks are consistent. Commented May 6, 2021 at 10:49

4 Answers 4

2
sed -e '/FINAL OFFER/p;/INITIAL OFFER/,/FINAL OFFER/ d' input.csv  > output.csv

This prints the FINAL OFFER line again, because it's about to be deleted by the /INITIAL OFFER/,/FINAL OFFER/ range.

1
  • note: if there happens to be a FINAL OFFER line without a preceding INITIAL OFFER, this will add a spurious FINAL OFFER line to the output. If there are no other duplicate lines, you can fix this by piping to uniq. If there are dupe lines that you need to keep, it becomes a more complicated job and awk or perl would be better suited than sed. Commented May 6, 2021 at 11:00
1

If you use a pipe as a separator the data becomes easy to filter with awk based on the number of fields, e.g.:

awk -F'|' 'NF==2 { f=1 } NF==1 { f=0 } f' infile

Golfed:

awk -F\| 'NF==1{f=0}NF==2{f=1}f'
1
  • +1. or awk '/INITIAL OFFER/ {p=0};/FINAL OFFER/ {p=1};p' Commented May 6, 2021 at 11:52
1

You can use sed to delete everything between INITIAL OFFER and a line that has nothing but numbers and one comma:

$ sed '/INITIAL OFFER/,/^[0-9][0-9]*,$/d' file
500076592,|FINAL OFFER
500076592,|2|1|1|4096 MB|NA|1.5 SAR
500076592,|2|2|3|6144 MB|NA|2.0 SAR
500076592,|2|4|7|10240 MB|NA|4.0 SAR
500076592,|5|1|1|4096 MB|NA|2.0 SAR
500076592,|5|2|3|6144 MB|NA|2.5 SAR
500076592,|5|4|7|10240 MB|NA|5.0 SAR
500076592,|1|1|1|100 MB|NA|0.5 SAR
500076592,|1|2|3|300 MB|NA|1.5 SAR
500076592,|1|4|7|1000 MB|NA|5.0 SAR
500076592,|11|1|1|100 MB|2 Minutes|1.0 SAR
500076592,|11|2|3|300 MB|5 Minutes|3.0 SAR
500076592,|6|1|1|NA|2 Minutes|0.5 SAR
500076592,|6|2|3|NA|5 Minutes|1.5 SAR
500076592,|6|4|7|NA|10 Minutes|3.0 SAR
500076592,
500028952,|FINAL OFFER
500028952,|2|1|1|4096 MB|NA|1.5 SAR
500028952,|2|2|3|6144 MB|NA|2.0 SAR
500028952,|2|4|7|10240 MB|NA|4.0 SAR
500028952,|1|1|1|250 MB|NA|2.5 SAR
500028952,|1|2|3|650 MB|NA|6.5 SAR
500028952,|1|4|7|1550 MB|NA|15.5 SAR
500028952,|11|1|1|250 MB|2 Minutes|3.0 SAR
500028952,|11|2|3|650 MB|10 Minutes|8.0 SAR
500028952,|11|4|7|1550 MB|30 Minutes|18.5 SAR
500028952,|5|1|1|4096 MB|NA|2.0 SAR
500028952,|5|2|3|6144 MB|NA|2.5 SAR
500028952,|5|4|7|10240 MB|NA|5.0 SAR
500028952,|6|1|1|NA|2 Minutes|0.5 SAR
500028952,|6|2|3|NA|10 Minutes|1.5 SAR
500028952,|6|4|7|NA|30 Minutes|3.0 SAR
500028952,

If you don't want the 500076592, and 500028952, lines to be included, either use @cas's simpler approach, or you can do this:

$ sed '/INITIAL OFFER/,/^[0-9][0-9]*,$/d; /^[0-9][0-9]*,$/d' file
500076592,|FINAL OFFER
500076592,|2|1|1|4096 MB|NA|1.5 SAR
500076592,|2|2|3|6144 MB|NA|2.0 SAR
500076592,|2|4|7|10240 MB|NA|4.0 SAR
500076592,|5|1|1|4096 MB|NA|2.0 SAR
500076592,|5|2|3|6144 MB|NA|2.5 SAR
500076592,|5|4|7|10240 MB|NA|5.0 SAR
500076592,|1|1|1|100 MB|NA|0.5 SAR
500076592,|1|2|3|300 MB|NA|1.5 SAR
500076592,|1|4|7|1000 MB|NA|5.0 SAR
500076592,|11|1|1|100 MB|2 Minutes|1.0 SAR
500076592,|11|2|3|300 MB|5 Minutes|3.0 SAR
500076592,|6|1|1|NA|2 Minutes|0.5 SAR
500076592,|6|2|3|NA|5 Minutes|1.5 SAR
500076592,|6|4|7|NA|10 Minutes|3.0 SAR
500028952,|FINAL OFFER
500028952,|2|1|1|4096 MB|NA|1.5 SAR
500028952,|2|2|3|6144 MB|NA|2.0 SAR
500028952,|2|4|7|10240 MB|NA|4.0 SAR
500028952,|1|1|1|250 MB|NA|2.5 SAR
500028952,|1|2|3|650 MB|NA|6.5 SAR
500028952,|1|4|7|1550 MB|NA|15.5 SAR
500028952,|11|1|1|250 MB|2 Minutes|3.0 SAR
500028952,|11|2|3|650 MB|10 Minutes|8.0 SAR
500028952,|11|4|7|1550 MB|30 Minutes|18.5 SAR
500028952,|5|1|1|4096 MB|NA|2.0 SAR
500028952,|5|2|3|6144 MB|NA|2.5 SAR
500028952,|5|4|7|10240 MB|NA|5.0 SAR
500028952,|6|1|1|NA|2 Minutes|0.5 SAR
500028952,|6|2|3|NA|10 Minutes|1.5 SAR
500028952,|6|4|7|NA|30 Minutes|3.0 SAR
0

Using GNU sed with extended regex mode turned ON -E

sed -En '
  /^[^|]*\|?[^|]*$/h
  G;/\n.*\|/P
' file

Note:

  • Stop 🛑 printing when you see a non pipe record.
  • Start printing when you see a record with exactly one pipe.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.