156

Basically I want to take as input text from a file, remove a line from that file, and send the output back to the same file. Something along these lines if that makes it any clearer.

grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > file_name

however, when I do this I end up with a blank file. Any thoughts?

2
  • 1
    See this as well: How to make reading and writing the same file in the same pipeline always “fail”? on Unix & Linux SO. Commented Apr 4, 2019 at 19:16
  • 1
    Several answers here are duplicates, and several deleted answers propose adding a pipe, like grep 'moo' file | cat >file which of course doesn't help at all. Please review existing answers before adding a new one, and please test any new solution before proposing it. Commented Apr 3, 2022 at 9:41

14 Answers 14

145

Use sponge for this kind of tasks. Its part of moreutils.

Try this command:

 grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | sponge file_name
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for the answer. As a possibly helpful addition, if you're using homebrew on Mac, can use brew install moreutils.
Or sudo apt-get install moreutils on Debian-based systems.
Word of caution, "sponge" is destructive, so if you have an error in your command, you can wipe out your input file (as I did the first time trying sponge). Make sure your command works, and/or the input file is under version control if you are trying to iterate on making the command work.
sudo yum install moreutils on rhel/centos/7, works on Fedora too
There's also a JavaScript implementation of sponge, here. Handy for package.json scripts and such.
|
117

You cannot do that because bash processes the redirections first, then executes the command. So by the time grep looks at file_name, it is already empty. You can use a temporary file though.

#!/bin/sh
tmpfile=$(mktemp)
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > ${tmpfile}
cat ${tmpfile} > file_name
rm -f ${tmpfile}

like that, consider using mktemp to create the tmpfile but note that it's not POSIX.

4 Comments

The reason why you can't do that: bash processes the redirections first, then executes the command. So by the time grep looks at file_name, it is already empty.
@glennjackman: by "processes redirection you mean that in the case of > it opens the file and clears it and in the case of >> it only opens it" ?
yes, but of note in this situation, the > redirection will open the file and truncate it before the shell launches grep.
It's perfectly possible do it with redirections, you just have to remove the file before writing to it.
23

Use sed instead:

sed -i '/seg[0-9]\{1,\}\.[0-9]\{1\}/d' file_name

2 Comments

iirc -i is GNU only extension, just noting.
On *BSD (and hence also OSX) you can say -i '' so the extension is not strictly mandatory, but the -i option does require some argument.
19

try this simple one

grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name

Your file will not be blank this time :) and your output is also printed to your terminal.

4 Comments

I like this solution! And if you don't want it to be printed in the terminal you can still redirect the output to /dev/null or similar places.
This clears the file contents here as well. Is that due to a GNU/BSD difference ? I'm on macOS...
Not guaranteed, same as stackoverflow.com/a/51173807/97439
not work with sed command
11

This is very much possible, you just have to make sure that by the time you write the output, you're writing it to a different file. This can be done by removing the file after opening a file descriptor to it, but before writing to it:

exec 3<file ; rm file; COMMAND <&3 >file ;  exec 3>&-

Or line by line, to understand it better :

exec 3<file       # open a file descriptor reading 'file'
rm file           # remove file (but fd3 will still point to the removed file)
COMMAND <&3 >file # run command, with the removed file as input
exec 3>&-         # close the file descriptor

It's still a risky thing to do, because if COMMAND fails to run properly, you'll lose the file contents. That can be mitigated by restoring the file if COMMAND returns a non-zero exit code :

exec 3<file ; rm file; COMMAND <&3 >file || cat <&3 >file ; exec 3>&-

We can also define a shell function to make it easier to use :

# Usage: replace FILE COMMAND
replace() {
    exec 3<"$1"
    rm "$1"
    "${@:2}" <&3 >"$1" || cat <&3 >"$1"
    exec 3>&-
}

Example :

$ echo aaa > test
$ replace test tr a b
$ cat test
bbb

Also, note that this will keep a full copy of the original file (until the third file descriptor is closed). If you're using Linux, and the file you're processing on is too big to fit twice on the disk, you can check out this script that will pipe the file to the specified command block-by-block while unallocating the already processed blocks. As always, read the warnings in the usage page.

Comments

9

Since this question is the top result in search engines, here's a one-liner based on https://serverfault.com/a/547331 that uses a subshell instead of sponge (which often isn't part of a vanilla install like OS X):

echo "$(grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name)" > file_name

The general case is:

echo "$(cat file_name)" > file_name

Edit, the above solution has some caveats:

  • printf '%s' <string> should be used instead of echo <string> so that files containing -n don't cause undesired behavior.
  • Command substitution strips trailing newlines (this is a bug/feature of shells like bash) so we should append a postfix character like x to the output and remove it on the outside via parameter expansion of a temporary variable like ${v%x}.
  • Using a temporary variable $v stomps the value of any existing variable $v in the current shell environment, so we should nest the entire expression in parentheses to preserve the previous value.
  • Another bug/feature of shells like bash is that command substitution strips unprintable characters like null from the output. I verified this by calling dd if=/dev/zero bs=1 count=1 >> file_name and viewing it in hex with cat file_name | xxd -p. But echo $(cat file_name) | xxd -p is stripped. So this answer should not be used on binary files or anything using unprintable characters, as Lynch pointed out.

The general solution (albiet slightly slower, more memory intensive and still stripping unprintable characters) is:

(v=$(cat file_name; printf x); printf '%s' ${v%x} > file_name)

Test from https://askubuntu.com/a/752451:

printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do (v=$(cat file_uniquely_named.txt; printf x); printf '%s' ${v%x} > file_uniquely_named.txt); done; cat file_uniquely_named.txt; rm file_uniquely_named.txt

Should print:

hello
world

Whereas calling cat file_uniquely_named.txt > file_uniquely_named.txt in the current shell:

printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do cat file_uniquely_named.txt > file_uniquely_named.txt; done; cat file_uniquely_named.txt; rm file_uniquely_named.txt

Prints an empty string.

I haven't tested this on large files (probably over 2 or 4 GB).

I have borrowed this answer from Hart Simha and kos.

2 Comments

Of course it will not work with large file. This can't possibly be a good solution or work all the time. What is happening is that bash execute first the command and then load the stdout of cat and put it as first argument to echo. Of course non printable variables will not output properly and corrupt the data. Don't try to redirect a file back to itself, it just can't be good.
Here is a newer/better command that takes the place of sponge and is cross-platform if your shell has perl installed: stackoverflow.com/a/69212059/539149 cat file_name.txt | grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' | perl -spe'open(STDOUT, ">", $o)' -- -o=file_name.txt
8

You can't use redirection operator (> or >>) to the same file, because it has a higher precedence and it will create/truncate the file before the command is even invoked. To avoid that, you should use appropriate tools such as tee, sponge, sed -i or any other tool which can write results to the file (e.g. sort file -o file).

Basically redirecting input to the same original file doesn't make sense and you should use appropriate in-place editors for that, for example Ex editor (part of Vim):

ex '+g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' -scwq file_name

where:

  • '+cmd'/-c - run any Ex/Vim command
  • g/pattern/d - remove lines matching a pattern using global (help :g)
  • -s - silent mode (man ex)
  • -c wq - execute :write and :quit commands

You may use sed to achieve the same (as already shown in other answers), however in-place (-i) is non-standard FreeBSD extension (may work differently between Unix/Linux) and basically it's a stream editor, not a file editor. See: Does Ex mode have any practical use?

Comments

6

One liner alternative - set the content of the file as variable:

VAR=`cat file_name`; echo "$VAR"|grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' > file_name

2 Comments

Several other similar answers have appeared, some of which have a fuller discussion of how this works, and some caveats. Modern scripts should definitely prefer the modern $(command substitution) syntax over backticks, which were deeply obsolescent already in 2013.
Just to repeat the feedback from elsewhere, you should prefer printf over echo here for robustness; and this will lose any trailing newlines.
5

The following will accomplish the same thing that sponge does, without requiring moreutils:

    shuf --output=file --random-source=/dev/zero 

The --random-source=/dev/zero part tricks shuf into doing its thing without doing any shuffling at all, so it will buffer your input without altering it.

However, it is true that using a temporary file is best, for performance reasons. So, here is a function that I have written that will do that for you in a generalized way:

# Pipes a file into a command, and pipes the output of that command
# back into the same file, ensuring that the file is not truncated.
# Parameters:
#    $1: the file.
#    $2: the command. (With $3... being its arguments.)
# See https://stackoverflow.com/a/55655338/773113

siphon()
{
    local tmp file rc=0
    [ "$#" -ge 2 ] || { echo "Usage: siphon filename [command...]" >&2; return 1; }
    file="$1"; shift
    tmp=$(mktemp -- "$file.XXXXXX") || return
    "$@" <"$file" >"$tmp" || rc=$?
    mv -- "$tmp" "$file" || rc=$(( rc | $? ))
    return "$rc"
}

8 Comments

$* really needs to be "$@". Otherwise, siphon "two words" becomes identical siphon "two" "words". Other than that, this answer is great.
...also, think about telling mktemp to create the temporary file in the same directory as where the output file lives; if the two locations are on different filesystems, the mv won't be atomic. local tmp=$(mktemp "$1.XXXXXX") is one quick/easy way to do that.
(Also, think about making local tmp file its own line; that way tmp=$(mktemp) will pass through the exit status of mktemp, so you can detect a case where it fails and act appropriately; for example, tmp=$(mktemp) || return to abort the rest of the function if mktemp doesn't succeed; that won't work with local preceding on the same line, since local itself has its own exit status and overrides $?).
I'd also suggest mv -- "$tmp" "$file" so filenames that start with dashes aren't incorrectly parsed as options to mv. See pubs.opengroup.org/onlinepubs/9699919799/basedefs/…, guideline 10.
And think about siphon() { with no preceding function; the function siphon { is a POSIX-incompliant kshism (which in bash doesn't behave quite the way it does in ksh, where it modifies how variable declarations behave within the function body). It's better than function siphon() {, which isn't compatible with either POSIX sh or legacy ksh, but worse than siphon() { with no function at all.
|
4

This does the trick pretty nicely in most of the cases I faced:

cat <<< "$(do_stuff_with f)" > f

Note that while $(…) strips trailing newlines, <<< ensures a final newline, so generally the result is magically satisfying. (Look for “Here Strings” in man bash if you want to learn more.)

Full example:

#! /usr/bin/env bash

get_new_content() {
    sed 's/Initial/Final/g' "${1:?}"
}

echo 'Initial content.' > f
cat f

cat <<< "$(get_new_content f)" > f

cat f

This does not truncate the file and yields:

Initial content.
Final content.

Note that I used a function here for the sake of clarity and extensibility, but that’s not a requirement.

A common usecase is JSON edition:

echo '{ "a": 12 }' > f
cat f
cat <<< "$(jq '.a = 24' f)" > f
cat f

This yields:

{ "a": 12 }
{
  "a": 24
}

Comments

2

There's also ed (as an alternative to sed -i):

# cf. http://wiki.bash-hackers.org/howto/edit-ed
printf '%s\n' H 'g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' wq |  ed -s file_name

Comments

0

You can use slurp with POSIX Awk:

!/seg[0-9]\{1,\}\.[0-9]\{1\}/ {
  q = q ? q RS $0 : $0
}
END {
  print q > ARGV[1]
}

Example

1 Comment

It should perhaps be pointed out that "slurp" means "read the entire file into memory". If you have a large input file, maybe you want to avoid that.
-2

Try this

echo -e "AAA\nBBB\nCCC" > testfile

cat testfile
AAA
BBB
CCC

echo "$(grep -v 'AAA' testfile)" > testfile
cat testfile
BBB
CCC

3 Comments

A short explanation or even comments may be helpful.
i think, it work because string extrapolation execute before redirect operator, but i don't know exactly
This is a duplicate of a couple of earlier answers, one of which has a a fuller discussion of the caveats. In short, this is buggy for a couple of reasons.
-3

I usually use the tee program to do this:

grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name

It creates and removes a tempfile by itself.

1 Comment

Sorry, tee isn't guaranteed to work. See askubuntu.com/a/752451/335781.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.