3

I have a large CSV file. One of the fields contains an error. This error appears as a new line in the file.

Since now i've been using notepad++ with this command to correct the problem :

\r";" => ";"

How can I do the same with sed ?

I've already tried

sed -i 's/\r";"/";"/g' /path/file.csv
sed -i 's/^";"/";"/g' /path/file.csv

no success, someone here know probably the right command

7
  • 1
    It'll be better if you show part of original text with some lines before and after error line Commented Nov 30, 2014 at 11:55
  • Is it the CR character (0x12) or the two characters backslash and r? Commented Nov 30, 2014 at 12:07
  • 3
    For me sed -i 's/\r";"/";"/g' works (GNU sed 4.2.2). You should prepare a file with a short test line and give us the exact file content with od -t c -t x1 file. Commented Nov 30, 2014 at 12:08
  • 1
    tr -d "\r" will delete the carriage returns. Commented Nov 30, 2014 at 13:10
  • `---this should be only in one line --- "1289665","first name","JSKTRADES ","2014-02-24 06:44:56","0","JSK International Trading Company","" --------- at the end of the third row content i got a carriage return Commented Nov 30, 2014 at 17:54

4 Answers 4

6

It is important to understand that sed works on a line by line basis. What sed does is basically : read a line into its buffer without the newline, execute your commands on the buffer, print the buffer (provided you haven't specified the -n flag), read the next line into its buffer, etc. So to merge two lines with sed requires that you explicitly force sed to treat more than a single line at a time. To do that, the N, P and D commands are your friend.

Now for your specific problem, to give you a specific and tested answer would require you to put a specific type of input, but here are some examples of what can be done :

This will merge every two lines together :

sed $'N;s/[\\n\r]//g'

or if you are sure to always have \r\n line endings :

sed 'N;s/.\n//'

For a more tailored approach to what I understood of your question, although not the best solution, this should do the work provided you use bash or another shell that supports C escape via the $'str' construct :

sed $':l;N;/\r\\n";"/{;s/\r\\n";"/";"/g;n;};bl'

or without the C-style escape construct and with \r\n line endings (non-negotiable) :

sed ':l;N;/\n";"/{;s/.\n";"/";"/g;n;};bl'

What it does is basically append the next line to its buffer (N) and test for the string you want (/\r\\n";"/). The script loops (bl --> branch to label :l defined at the beginning) as long as it doesn't find a match. When a match is found, it executes the sed script between the curly braces : replace all occurrences of \r\\n";" by ";" (s/\r\\n";"/";"/g) and flush the buffer and input the next line (n).

Of course, if the file is big and the "errors" are infrequent, this could run for a long time and take a lot of memory. If this is the case, another algorithm could be used, but I would need to have a better example of what you are up against to be sure that I understood your problem correctly.

Also, if you would like to learn a little more about sed, I strongly recommend this site which might not have the best background color, but is the best tutorial of sed out there IMO.

3

If you want to drop the \r characters, it's simpler with the tr command filter:

cat file.csv | tr -d '\r' >newfile.csv

or directly:

tr -d '\r' <file.csv >newfile.csv

man tr is your friend. Caveat: tr is meant to be used as a filter reading from its standard input and it cannot process a file in-place like sed -i.

2

If you can live with a perl solution:

perl -pe 's/\r";"/";"/g' foo.csv >foo_r.csv
0

I had a similar problem to solve, but I ended up using a slightly different version of @Fjor 's answer

cat file.csv | tr -d '\n'

(Tr is TRanslate, which is usually a search/replace command, but with -d it will simply delete all occurrences of the single-quoted search string)

Would've offered it as a comment to Fjor's answer if i had the rep to. Oh well, here it is anyways.

3
  • I don't see how this is different from @Fjor's answer other than you're using cat instead of reading directly from stdin. Qualifies for the useless use of cat. Also here. Commented Nov 26, 2022 at 16:32
  • The difference is just '\n' instead of '\r' Commented Dec 5, 2022 at 0:36
  • It depends only of the line endings used in the specific OS: \n is different for Windows, Linux and MacOS (I think \n is \r\l -CR,LF- in Windows, \l -LF- in Linux and \r -CR- in MacOS), so we must use the literal representation of the undesired char in tr -d to get rid of it [CR and LF are the ASCII chars Carriage Return (decimal 13) and Line Feed (decimal 10)]. Commented Apr 24, 2024 at 23:56

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.