16

I have a very large csv file. How would you remove the very last , with sed (or similar) ?

...
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0],
]

Desired output

...
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

The following sed command will delete the last occurrence per line, but I want per file.

sed -e 's/,$//' foo.csv

Nor does this work

sed '$s/,//' foo.csv
3
  • Is the comma always on the second-to-last line? Commented Oct 15, 2014 at 23:42
  • Yes, the second to last line Commented Oct 15, 2014 at 23:43
  • This looks like broken JSON. The correct solution would be to fix the program that generates the JSON, not to post-process the document with tools that aren't meant to be used to modify structured document formats. Commented Jul 11, 2021 at 19:00

8 Answers 8

16

Using awk

If the comma is always at the end of the second to last line:

$ awk 'NR>2{print a;} {a=b; b=$0} END{sub(/,$/, "", a); print a;print b;}'  input
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

Using awk and bash

$ awk -v "line=$(($(wc -l <input)-1))" 'NR==line{sub(/,$/, "")} 1'  input
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

Using sed

$ sed 'x;${s/,$//;p;x;};1d'  input
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

For OSX and other BSD platforms, try:

sed -e x -e '$ {s/,$//;p;x;}' -e 1d  input

Using bash

while IFS=  read -r line
do
    [ "$a" ] && printf "%s\n" "$a"
    a=$b
    b=$line
done <input
printf "%s\n" "${a%,}"
printf "%s\n" "$b"
4
  • Maybe its because I'm on a mac, but the sed command gives error sed: 1: "x;${s/,$//;p;x}; 2,$ p": extra characters at the end of x command Commented Oct 16, 2014 at 0:15
  • @spuder Yes, OSX has BSD sed and it is often different in subtle ways. I don't have access to OSX to test this, but please try sed -n -e x -e '${s/,$//;p;x;}' -e '2,$ p' input Commented Oct 16, 2014 at 2:19
  • Yes, that second one worked on Mac Commented Oct 16, 2014 at 3:35
  • sed 'x;${s/,$//;p;x;};1d' input - this didn't work for me on ubuntu - I am trying to do a similar thing but for semicolons instead of commas. FYI, the perl solution from Avinash Raj below did work when simply converting the commas to semicolons. Commented Nov 2, 2021 at 11:44
10

Simply you could try the below Perl one-liner command.

perl -00pe 's/,(?!.*,)//s' file

Explanation:

  • , Matches a comma.
  • (?!.*,) Negative lookahead asserts that there wouldn't be a comma after that matched comma. So it would match the last comma.
  • s And the most importing thing is s DOTALL modifier which makes dot to match even newline characters also.
1
  • 2
    You could also do: perl -0777 -pi -e 's/(.*),(.*?)/\1\2/s'. This works because the first .* is greedy, while the second one isn't. Commented Nov 22, 2015 at 2:43
6
lcomma() { sed '
    $x;$G;/\(.*\),/!H;//!{$!d
};  $!x;$s//\1/;s/^\n//'
}

That should remove only the last occurrence of a , in any input file - and it will still print those in which a , does not occur. Basically, it buffers sequences of lines that do not contain a comma.

When it encounters a comma it swaps the current line buffer with the hold buffer and in that way simultaneously prints out all lines that occurred since the last comma and frees its hold buffer.

I was just digging through my history file and found this:

lmatch(){ set "USAGE:\
        lmatch /BRE [-(((s|-sub) BRE)|(r|-ref)) REPL [-(f|-flag) FLAG]*]*
"       "${1%"${1#?}"}" "$@"
        eval "${ZSH_VERSION:+emulate sh}"; eval '
        sed "   1x;     \\$3$2!{1!H;\$!d
                };      \\$3$2{x;1!p;\$!d;x
                };      \\$3$2!x;\\$3$2!b'"
        $(      unset h;i=3 p=:-:shfr e='\033[' m=$(($#+1)) f=OPTERR
                [ -t 2 ] && f=$e\2K$e'1;41;17m}\r${h-'$f$e\0m
                f='\${$m?"\"${h-'$f':\t\${$i$e\n}\$1\""}\\c' e=} _o=
                o(){    IFS=\ ;getopts  $p a "$1"       &&
                        [ -n "${a#[?:]}" ]              &&
                        o=${a#-}${OPTARG-${1#-?}}       ||
                        ! eval "o=$f;o=\${o%%*\{$m\}*}"
        };      a(){    case ${a#[!-]}$o in (?|-*) a=;;esac; o=
                        set $* "${3-$2$}{$((i+=!${#a}))${a:+#-?}}"\
                                ${3+$2 "{$((i+=1))$e"} $2
                        IFS=$;  _o=${_o%"${3+$_o} "*}$*\
        };      while   eval "o \"\${$((i+=(OPTIND=1)))}\""
                do      case            ${o#[!$a]}      in
                        (s*|ub)         a s 2 ''        ;;
                        (r*|ef)         a s 2           ;;
                        (f*|lag)        a               ;;
                        (h*|elp)        h= o; break     ;;
                esac;   done;   set -f; printf  "\t%b\n\t" $o $_o
)\"";}

It's actually pretty good. Yes, it uses eval, but it never passes anything to it beyond a numeric reference to its arguments. It builds arbitrary sed scripts for handling a last match. I'll show you:

printf "%d\" %d' %d\" %d'\n" $(seq 5 5 200) |                               
    tee /dev/fd/2 |                                                         
    lmatch  d^.0     \  #all re's delimit w/ d now                           
        -r '&&&&'    \  #-r or --ref like: '...s//$ref/...'      
        --sub \' sq  \  #-s or --sub like: '...s/$arg1/$arg2/...'
        --flag 4     \  #-f or --flag appended to last -r or -s
        -s\" \\dq    \  #short opts can be '-s $arg1 $arg2' or '-r$arg1'
        -fg             #tacked on so: '...s/"/dq/g...'                     

That prints the following to stderr. This is a copy of lmatch's input:

5" 10' 15" 20'
25" 30' 35" 40'
45" 50' 55" 60'
65" 70' 75" 80'
85" 90' 95" 100'
105" 110' 115" 120'
125" 130' 135" 140'
145" 150' 155" 160'
165" 170' 175" 180'
185" 190' 195" 200'

The function's evaled subshell iterates through all of its arguments once. As it walks over them it iterates a counter appropriately depending on the context for each switch and skips over that many arguments for the next iteration. From then on it does one of a few things per argument:

  • For each option the option parser adds $a to $o. $a is assigned based on the value of $i which is incremented by arg count for each arg processed. $a is assigned one of the two following values:
    • a=$((i+=1)) - this is assigned if either a short-option does not have its argument appended to it or if the option was a long one.
    • a=$i#-? - this is assigned if the option is a short one and does have its arg appended to it.
    • a=\${$a}${1:+$d\${$(($1))\}} - Regardless of the initial assignment, $a's value is always wrapped in braces and - in an -s case - sometimes $i is incremented one more and additionally delimited field is appended.

The result is that eval is never passed a string containing any unknowns. Each of the command-line arguments are referred to by their numeric argument number - even the delimiter which is extracted from the first character of the first argument and is the only time you should use whatever character that is unescaped. Basically, the function is a macro generator - it never interprets the arguments' values in any special way because sed can (and will, of course) easily handle that when it parses the script. Instead, it just sensibly arranges its args into a workable script.

Here's some debug output of the function at work:

... sed "   1x;\\$2$1!{1!H;\$!d
        };      \\$2$1{x;1!p;\$!d;x
        };      \\$2$1!x;\\$2$1!b
        s$1$1${4}$1
        s$1${6}$1${7}$1${9}
        s$1${10#-?}$1${11}$1${12#-?}
        "
++ sed '        1x;\d^.0d!{1!H;$!d
        };      \d^.0d{x;1!p;$!d;x
        };      \d^.0d!x;\d^.0d!b
        sdd&&&&d
        sd'\''dsqd4
        sd"d\dqdg
        '

And so lmatch can be used to easily apply regexes to data following the last match in a file. The result of the command I ran above is:

5" 10' 15" 20'
25" 30' 35" 40'
45" 50' 55" 60'
65" 70' 75" 80'
85" 90' 95" 100'
101010105dq 110' 115dq 120'
125dq 130' 135dq 140sq
145dq 150' 155dq 160'
165dq 170' 175dq 180'
185dq 190' 195dq 200'

...which, given the subset of the file input that follows the last time /^.0/ is matched, applies the following substitutions:

  • sdd&&&&d - replaces $match with itself 4 times.
  • sd'dsqd4 - the fourth single-quote following the beginning of the line since the last match.
  • sd"d\dqd2 - ditto, but for double-quotes and globally.

And so, to demonstrate how one might use lmatch to remove the last comma in a file:

printf "%d, %d %d, %d\n" $(seq 5 5 100) |
lmatch '/\(.*\),' -r\\1

OUTPUT:

5, 10 15, 20
25, 30 35, 40
45, 50 55, 60
65, 70 75, 80
85, 90 95 100
1
  • 1
    @don_crissti - it's way better now - I dropped the -m option and made it mandatory, switched to multiple arguments for re and repl for -s and also implemented proper delimiter handling. I think it's bullet-proof. I successfully used both a space and a single quote as the delimiter, Commented Apr 27, 2015 at 13:16
4

If the comma might not be on the second-to-last line

Using awk and tac:

tac foo.csv | awk '/,$/ && !handled { sub(/,$/, ""); handled++ } {print}' | tac

The awk command is a simple one to do the substitution the first time the pattern is seen.  tac reverses the order of the lines in the file, so the awk command ends up removing the last comma.

I’ve been told that

tac foo.csv | awk '/,$/ && !handled { sub(/,$/, ""); handled++ } {print}' > tmp && tac tmp

may be more efficient.

4

see https://stackoverflow.com/questions/12390134/remove-comma-from-last-line

This is worked for me:

$cat input.txt
{"name": "secondary_ua","type":"STRING"},
{"name": "request_ip","type":"STRING"},
{"name": "cb","type":"STRING"},
$ sed '$s/,$//' < input.txt >output.txt
$cat output.txt
{"name": "secondary_ua","type":"STRING"},
{"name": "request_ip","type":"STRING"},
{"name": "cb","type":"STRING"}

My be best way is remove the last line and after removing comma, add the ] char again

2

If you can use tac:

tac file | perl -pe '$_=reverse;!$done && s/,// && $done++;$_=reverse'|tac
1

Try with below vi:

  vi "+:$-1s/\(,\)\(\_s*]\)/\2/e" "+:x" file

Explanation:

  • $-1 select second to last line

  • s replace

  • \(,\)\(\_s*]\) find a comma followed by ] and separated by spaces or newline
  • \2 replace by \(\_s*]\) i.e. spaces or newline followed by ]
-1

Try with below sed command.

sed -i '$s/,$//' foo.csv
2
  • 1
    This will remove trailling comma from every line, this is not waht OP want. Commented Aug 8, 2017 at 15:10
  • @Archemar No, it will remove only on last line but that won't work for OP's data which is not in the last line Commented Jun 4, 2018 at 14:55

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.