1

I have a file with the following data

"MG1507XXXXXX|" "|020000XXXXXX" "20261031|"     "|3,827.92"     "|3,581.41"     "|542,729.62"   "MBA"
"MG1507XXXXXX|" "|020000XXXXXX" "20261130|"     "|3,680.15"     "|3,729.18"     "|539,000.44"   "MBA"
"MG1507XXXXXX|" "|020000XXXXXX" "20261231|"     "|3,776.70"     "|3,632.63"     "|535,367.81"   "MBA"
"MG1507XXXXXX|" "|020000XXXXXX" "20270131|"     "|3,751.24"     "|3,658.09"     "|531,709.72"   "MBA"
"MG1507XXXXXX|" "|020000XXXXXX" "20270228|"     "|3,365.07"     "|4,044.26"     "|527,665.46"   "MBA"
"MG1507XXXXXX|" "|020000XXXXXX" "20270331|"     "|3,697.28"     "|3,712.05"     "|523,953.41"   "MBA"
"MG1507XXXXXX|" "|020000XXXXXX" "20270430|"     "|3,552.84"     "|3,856.49"     "|520,096.92"   "MBA"
"MG1507XXXXXX|" "|020000XXXXXX" "20270531|"     "|3,644.24"     "|3,765.09"     "|516,331.83"   "MBA"
"MG1507XXXXXX|" "|020000XXXXXX" "20270630|"     "|3,501.16"     "|3,908.17"     "|512,423.66"   "MBA"
"MG1507XXXXXX|" "|020000XXXXXX" "20270731|"     "|3,590.47"     "|3,818.86"     "|508,604.80"   "MBA"
"MG1507XXXXXX|" "|020000XXXXXX" "20270831|"     "|3,563.72"     "|3,845.61"     "|504,759.19"   "MBA"
"MG1507XXXXXX|" "|020000XXXXXX" "20270930|"     "|3,422.68"     "|3,986.65"     "|500,772.54"   "MBA"
"MG1507XXXXXX|" "|020000XXXXXX" "20271031|"     "|3,508.84"     "|3,900.49"     "|496,872.05"   "MBA"

How ever i want to change it so it will look like

MG1507XXXXXX|020000XXXXXX|20261031|3,827.92|3,581.41|542,729.62|MBA|
MG1507XXXXXX|020000XXXXXX|20261130|3,680.15|3,729.18|539,000.44|MBA|
MG1507XXXXXX|020000XXXXXX|20261231|3,776.70|3,632.63|535,367.81|MBA|
MG1507XXXXXX|020000XXXXXX|20270131|3,751.24|3,658.09|531,709.72|MBA|
MG1507XXXXXX|020000XXXXXX|20270228|3,365.07|4,044.26|527,665.46|MBA|
MG1507XXXXXX|020000XXXXXX|20270331|3,697.28|3,712.05|523,953.41|MBA|
MG1507XXXXXX|020000XXXXXX|20270430|3,552.84|3,856.49|520,096.92|MBA|
MG1507XXXXXX|020000XXXXXX|20270531|3,644.24|3,765.09|516,331.83|MBA|
MG1507XXXXXX|020000XXXXXX|20270630|3,501.16|3,908.17|512,423.66|MBA|
MG1507XXXXXX|020000XXXXXX|20270731|3,590.47|3,818.86|508,604.80|MBA|
MG1507XXXXXX|020000XXXXXX|20270831|3,563.72|3,845.61|504,759.19|MBA|
MG1507XXXXXX|020000XXXXXX|20270930|3,422.68|3,986.65|500,772.54|MBA|
MG1507XXXXXX|020000XXXXXX|20271031|3,508.84|3,900.49|496,872.05|MBA|

I am not sure what to use to achieve this. Any ideas ?

0

5 Answers 5

3

You could translate all spaces and double quotes to | (and squeeze) then cut from the 2nd character to the end of line:

tr -s '[[:blank:]"]' \| <infile | cut -c2-
5
  • Could you explain how cut -c2- acts like cut --complement -c1? Commented Aug 25, 2015 at 14:41
  • 2
    @Fiximan - -c2- is standard i.e. -c followed by LIST - in this case one range: N- meaning from N'th character to end of line. --complement is a GNU extension: Select for printing the complement of the characters selected with the ‘-c’ option. In other words, do not print the characters specified via that option. Commented Aug 25, 2015 at 15:27
  • I wasn't aware one could use half-bounded intervals for ranges in cut - thanks. Commented Aug 25, 2015 at 15:34
  • 2
    Why not tr -s '[[:blank:]"]' \| <infile ? Commented Aug 25, 2015 at 16:08
  • @Costas - lol, because too tired I guess, thanks for the heads-up ! Commented Aug 25, 2015 at 16:25
1

Assuming your data is in a file named 'data':

sed -e s'/^"//' -e 's/|" "|/|/g' -e 's/" "|/|/g' -e 's/" "/|/g' -e s'/"$/|/' data
1
sed -i 's/\"//g' filename

You can escape the " character by putting a \. In case you want to remove all spaces as well, do the following:

sed -i 's/[" ]//g' filename
1
  • No. You don't want to escape ". " is not special in sed. On the other hand, \" may be special with some sed implementations. For instance, \' is special in GNU sed (means end-of-string even with the m flag is used). Commented Aug 25, 2015 at 14:13
1

Try this:

sed -e 's/["| ]\+/|/g' -e 's/^|//' < file

The first expression will replace each block containing one or more |, ", or spaces with a single |. The second will remove the | at the start of each line.

1

using awk

awk ' BEGIN { FS="[|\" ]+" ; OFS="|" } { print $2,$3,$4,$5,$6,$7,$8"|" } ' file

Explanation:

BEGIN { FS="[|\" ]+" ; OFS="|" } first set the following:

FS="[|\" ]+": fields are separated by any combination (+) (zero or more of any) of the set ([]) pipe, double quotes (need to be escaped) and space |\".

OFS="|" separate the output fields with pipes.

print $2,$3,$4,$5,$6,$7,$8"|" print columns 1 through 8 and a pipe at the end (note that it is shifted by one as the lines start with double quotes making the first field an empty string and thus shifting the position of all others).

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.