sed - remove the very last occurrence of a string (a comma) in a file?

Question

I have a very large csv file. How would you remove the very last , with sed (or similar) ?

...
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0],
]

Desired output

...
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

The following sed command will delete the last occurrence per line, but I want per file.

sed -e 's/,$//' foo.csv

Nor does this work

sed '$s/,//' foo.csv

This looks like broken JSON. The correct solution would be to fix the program that generates the JSON, not to post-process the document with tools that aren't meant to be used to modify structured document formats. — Kusalananda
– Kusalananda ♦, Commented Jul 11, 2021 at 19:00

John1024 · Accepted Answer · 2017-10-27 19:04:26Z

16

Using `awk`

If the comma is always at the end of the second to last line:

$ awk 'NR>2{print a;} {a=b; b=$0} END{sub(/,$/, "", a); print a;print b;}'  input
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

Using `awk` and `bash`

$ awk -v "line=$(($(wc -l <input)-1))" 'NR==line{sub(/,$/, "")} 1'  input
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

Using `sed`

$ sed 'x;${s/,$//;p;x;};1d'  input
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

For OSX and other BSD platforms, try:

sed -e x -e '$ {s/,$//;p;x;}' -e 1d  input

Using `bash`

while IFS=  read -r line
do
    [ "$a" ] && printf "%s\n" "$a"
    a=$b
    b=$line
done <input
printf "%s\n" "${a%,}"
printf "%s\n" "$b"

edited Oct 27, 2017 at 19:04

answered Oct 15, 2014 at 23:47

John1024

76.4k12 gold badges176 silver badges165 bronze badges

Maybe its because I'm on a mac, but the sed command gives error sed: 1: "x;${s/,$//;p;x}; 2,$ p": extra characters at the end of x command

spuder
– spuder

2014-10-16 00:15:59 +00:00
Commented Oct 16, 2014 at 0:15
@spuder Yes, OSX has BSD sed and it is often different in subtle ways. I don't have access to OSX to test this, but please try sed -n -e x -e '${s/,$//;p;x;}' -e '2,$ p' input

John1024
– John1024

2014-10-16 02:19:10 +00:00
Commented Oct 16, 2014 at 2:19
Yes, that second one worked on Mac

spuder
– spuder

2014-10-16 03:35:47 +00:00
Commented Oct 16, 2014 at 3:35
sed 'x;${s/,$//;p;x;};1d' input - this didn't work for me on ubuntu - I am trying to do a similar thing but for semicolons instead of commas. FYI, the perl solution from Avinash Raj below did work when simply converting the commas to semicolons.

Lost Crotchet
– Lost Crotchet

2021-11-02 11:44:31 +00:00
Commented Nov 2, 2021 at 11:44

Add a comment |

Avinash Raj · Accepted Answer · 2014-10-19 16:31:48Z

10

Simply you could try the below Perl one-liner command.

perl -00pe 's/,(?!.*,)//s' file

Explanation:

, Matches a comma.
(?!.*,) Negative lookahead asserts that there wouldn't be a comma after that matched comma. So it would match the last comma.
s And the most importing thing is s DOTALL modifier which makes dot to match even newline characters also.

answered Oct 19, 2014 at 16:31

Avinash Raj

3,7594 gold badges23 silver badges35 bronze badges

2

You could also do: perl -0777 -pi -e 's/(.*),(.*?)/\1\2/s'. This works because the first .* is greedy, while the second one isn't.

Oleg Vaskevich
– Oleg Vaskevich

2015-11-22 02:43:34 +00:00
Commented Nov 22, 2015 at 2:43

Add a comment |

mikeserv · Accepted Answer · 2015-04-29 09:45:47Z

lcomma() { sed '
    $x;$G;/\(.*\),/!H;//!{$!d
};  $!x;$s//\1/;s/^\n//'
}

That should remove only the last occurrence of a , in any input file - and it will still print those in which a , does not occur. Basically, it buffers sequences of lines that do not contain a comma.

When it encounters a comma it swaps the current line buffer with the hold buffer and in that way simultaneously prints out all lines that occurred since the last comma and frees its hold buffer.

I was just digging through my history file and found this:

lmatch(){ set "USAGE:\
        lmatch /BRE [-(((s|-sub) BRE)|(r|-ref)) REPL [-(f|-flag) FLAG]*]*
"       "${1%"${1#?}"}" "$@"
        eval "${ZSH_VERSION:+emulate sh}"; eval '
        sed "   1x;     \\$3$2!{1!H;\$!d
                };      \\$3$2{x;1!p;\$!d;x
                };      \\$3$2!x;\\$3$2!b'"
        $(      unset h;i=3 p=:-:shfr e='\033[' m=$(($#+1)) f=OPTERR
                [ -t 2 ] && f=$e\2K$e'1;41;17m}\r${h-'$f$e\0m
                f='\${$m?"\"${h-'$f':\t\${$i$e\n}\$1\""}\\c' e=} _o=
                o(){    IFS=\ ;getopts  $p a "$1"       &&
                        [ -n "${a#[?:]}" ]              &&
                        o=${a#-}${OPTARG-${1#-?}}       ||
                        ! eval "o=$f;o=\${o%%*\{$m\}*}"
        };      a(){    case ${a#[!-]}$o in (?|-*) a=;;esac; o=
                        set $* "${3-$2$}{$((i+=!${#a}))${a:+#-?}}"\
                                ${3+$2 "{$((i+=1))$e"} $2
                        IFS=$;  _o=${_o%"${3+$_o} "*}$*\
        };      while   eval "o \"\${$((i+=(OPTIND=1)))}\""
                do      case            ${o#[!$a]}      in
                        (s*|ub)         a s 2 ''        ;;
                        (r*|ef)         a s 2           ;;
                        (f*|lag)        a               ;;
                        (h*|elp)        h= o; break     ;;
                esac;   done;   set -f; printf  "\t%b\n\t" $o $_o
)\"";}

It's actually pretty good. Yes, it uses eval, but it never passes anything to it beyond a numeric reference to its arguments. It builds arbitrary sed scripts for handling a last match. I'll show you:

printf "%d\" %d' %d\" %d'\n" $(seq 5 5 200) |                               
    tee /dev/fd/2 |                                                         
    lmatch  d^.0     \  #all re's delimit w/ d now                           
        -r '&&&&'    \  #-r or --ref like: '...s//$ref/...'      
        --sub \' sq  \  #-s or --sub like: '...s/$arg1/$arg2/...'
        --flag 4     \  #-f or --flag appended to last -r or -s
        -s\" \\dq    \  #short opts can be '-s $arg1 $arg2' or '-r$arg1'
        -fg             #tacked on so: '...s/"/dq/g...'

That prints the following to stderr. This is a copy of lmatch's input:

5" 10' 15" 20'
25" 30' 35" 40'
45" 50' 55" 60'
65" 70' 75" 80'
85" 90' 95" 100'
105" 110' 115" 120'
125" 130' 135" 140'
145" 150' 155" 160'
165" 170' 175" 180'
185" 190' 195" 200'

The function's evaled subshell iterates through all of its arguments once. As it walks over them it iterates a counter appropriately depending on the context for each switch and skips over that many arguments for the next iteration. From then on it does one of a few things per argument:

For each option the option parser adds $a to $o. $a is assigned based on the value of $i which is incremented by arg count for each arg processed. $a is assigned one of the two following values:
- a=$((i+=1)) - this is assigned if either a short-option does not have its argument appended to it or if the option was a long one.
- a=$i#-? - this is assigned if the option is a short one and does have its arg appended to it.
- a=\${$a}${1:+$d\${$(($1))\}} - Regardless of the initial assignment, $a's value is always wrapped in braces and - in an -s case - sometimes $i is incremented one more and additionally delimited field is appended.

The result is that eval is never passed a string containing any unknowns. Each of the command-line arguments are referred to by their numeric argument number - even the delimiter which is extracted from the first character of the first argument and is the only time you should use whatever character that is unescaped. Basically, the function is a macro generator - it never interprets the arguments' values in any special way because sed can (and will, of course) easily handle that when it parses the script. Instead, it just sensibly arranges its args into a workable script.

Here's some debug output of the function at work:

... sed "   1x;\\$2$1!{1!H;\$!d
        };      \\$2$1{x;1!p;\$!d;x
        };      \\$2$1!x;\\$2$1!b
        s$1$1${4}$1
        s$1${6}$1${7}$1${9}
        s$1${10#-?}$1${11}$1${12#-?}
        "
++ sed '        1x;\d^.0d!{1!H;$!d
        };      \d^.0d{x;1!p;$!d;x
        };      \d^.0d!x;\d^.0d!b
        sdd&&&&d
        sd'\''dsqd4
        sd"d\dqdg
        '

And so lmatch can be used to easily apply regexes to data following the last match in a file. The result of the command I ran above is:

5" 10' 15" 20'
25" 30' 35" 40'
45" 50' 55" 60'
65" 70' 75" 80'
85" 90' 95" 100'
101010105dq 110' 115dq 120'
125dq 130' 135dq 140sq
145dq 150' 155dq 160'
165dq 170' 175dq 180'
185dq 190' 195dq 200'

...which, given the subset of the file input that follows the last time /^.0/ is matched, applies the following substitutions:

sdd&&&&d - replaces $match with itself 4 times.
sd'dsqd4 - the fourth single-quote following the beginning of the line since the last match.
sd"d\dqd2 - ditto, but for double-quotes and globally.

And so, to demonstrate how one might use lmatch to remove the last comma in a file:

printf "%d, %d %d, %d\n" $(seq 5 5 100) |
lmatch '/\(.*\),' -r\\1

OUTPUT:

5, 10 15, 20
25, 30 35, 40
45, 50 55, 60
65, 70 75, 80
85, 90 95 100

@don_crissti - it's way better now - I dropped the -m option and made it mandatory, switched to multiple arguments for re and repl for -s and also implemented proper delimiter handling. I think it's bullet-proof. I successfully used both a space and a single quote as the delimiter, — mikeserv
– mikeserv, Commented Apr 27, 2015 at 13:16

Community · Accepted Answer · 2020-06-11 12:04:56Z

4

If the comma might not be on the second-to-last line

Using `awk` and `tac`:

tac foo.csv | awk '/,$/ && !handled { sub(/,$/, ""); handled++ } {print}' | tac

The awk command is a simple one to do the substitution the first time the pattern is seen. tac reverses the order of the lines in the file, so the awk command ends up removing the last comma.

I’ve been told that

tac foo.csv | awk '/,$/ && !handled { sub(/,$/, ""); handled++ } {print}' > tmp && tac tmp

may be more efficient.

edited Jun 11, 2020 at 12:04

CommunityBot

1

answered Oct 16, 2014 at 0:24

G-Man Says 'Reinstate Monica'

24k29 gold badges76 silver badges130 bronze badges

Add a comment |

Community · Accepted Answer · 2017-05-23 12:39:59Z

4

see https://stackoverflow.com/questions/12390134/remove-comma-from-last-line

This is worked for me:

$cat input.txt
{"name": "secondary_ua","type":"STRING"},
{"name": "request_ip","type":"STRING"},
{"name": "cb","type":"STRING"},
$ sed '$s/,$//' < input.txt >output.txt
$cat output.txt
{"name": "secondary_ua","type":"STRING"},
{"name": "request_ip","type":"STRING"},
{"name": "cb","type":"STRING"}

My be best way is remove the last line and after removing comma, add the ] char again

edited May 23, 2017 at 12:39

CommunityBot

1

answered Nov 24, 2014 at 12:58

Yu Jiaao

2412 silver badges5 bronze badges

Add a comment |

Joseph R. · Accepted Answer · 2014-10-19 19:22:47Z

2

If you can use tac:

tac file | perl -pe '$_=reverse;!$done && s/,// && $done++;$_=reverse'|tac

answered Oct 19, 2014 at 19:22

Joseph R.

40.5k8 gold badges113 silver badges146 bronze badges

Add a comment |

knisterstern · Accepted Answer · 2018-01-03 10:46:52Z

1

Try with below vi:

  vi "+:$-1s/\(,\)\(\_s*]\)/\2/e" "+:x" file

Explanation:

$-1 select second to last line
s replace
$,$$\_s*]$ find a comma followed by ] and separated by spaces or newline
\2 replace by $\_s*]$ i.e. spaces or newline followed by ]

answered Jan 3, 2018 at 10:46

knisterstern

111 bronze badge

Add a comment |

αғsнιη · Accepted Answer · 2017-08-08 13:17:01Z

-1

Try with below sed command.

sed -i '$s/,$//' foo.csv

edited Aug 8, 2017 at 13:17

αғsнιη

41.9k17 gold badges75 silver badges117 bronze badges

answered Aug 8, 2017 at 13:11

Sachin

1

1

This will remove trailling comma from every line, this is not waht OP want.

Archemar
– Archemar

2017-08-08 15:10:35 +00:00
Commented Aug 8, 2017 at 15:10
@Archemar No, it will remove only on last line but that won't work for OP's data which is not in the last line

αғsнιη
– αғsнιη

2018-06-04 14:55:53 +00:00
Commented Jun 4, 2018 at 14:55

Add a comment |

Stack Exchange Network

sed - remove the very last occurrence of a string (a comma) in a file?

8 Answers 8

Using `awk`

Using `awk` and `bash`

Using `sed`

Using `bash`

OUTPUT:

If the comma might not be on the second-to-last line

Using `awk` and `tac`:

You must log in to answer this question.

Linked

Hot Network Questions

sed - remove the very last occurrence of a string (a comma) in a file?

8 Answers 8

Using awk

Using awk and bash

Using sed

Using bash

OUTPUT:

If the comma might not be on the second-to-last line

Using awk and tac:

You must log in to answer this question.

Linked

Related

Hot Network Questions

Using `awk`

Using `awk` and `bash`

Using `sed`

Using `bash`

Using `awk` and `tac`: