Revisions to is 'sed' thread safe

fix links

Source Link

edited Feb 24, 2020 at 12:52

user313992

I'm not going to nitpick on the awful terminology, but yes, GNU sed with its -i ("in-place") flag could be safely used by more than one process at the same time without any extra locking, because sed is not actually modifying the file in-place, but it's redirecting the output to a temporary file, and if everything goes well, it will rename(2) (move) the temporary file to the original file, and the rename(2) is guaranteed to be atomic:

$ strace sed -i s/o/e/g foo.txt
open("foo.txt", O_RDONLY)               = 3
...
open("./sedDe80VL", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
...
read(3, "foo\n", 4096)                  = 4
...
write(4, "fee\n", 4)                    = 4
read(3, "", 4096)                       = 0
...
close(3)                                = 0
close(4)                                = 0
rename("./sedDe80VL", "foo.txt")        = 0

At any point, foo.txt will refer either to the complete original file or to the complete processed file, never to something in between the two.

Notes:

This does not handle the case where more than one process starts editing a file without waiting for the other processes to have finished editing it, in which case only the process which finishes last "wins" (ie wipes the changes performed by the other processes). This is not a matter of data integrity, and cannot be handled without higher level coordination between the processes (blindly locking the file will lead to deadlocks).

Currently, GNU sed will copy the standard file permissions into the new inode, but not the ACLs and extended attributes. If using sed -i on such a file, all that extra metadata will be lost. IMHO that's more of a feature than a bug or limitation.

perl -i used to work very differently from sed -i until version 5.28; it used to first make a temporary copy of the file, truncate to the original file, and redirect the output to it. That was preserving the original inode number and extra metadata, but would completely trash the content of the file in the case where the perl -i process was interrupted or more than one perl -i process was editing the file at the same time. See the discussion, the original commit commit (which was subsequently improved) and the changelog in perl5280delta perl5280delta.

I'm not going to nitpick on the awful terminology, but yes, GNU sed with its -i ("in-place") flag could be safely used by more than one process at the same time without any extra locking, because sed is not actually modifying the file in-place, but it's redirecting the output to a temporary file, and if everything goes well, it will rename(2) (move) the temporary file to the original file, and the rename(2) is guaranteed to be atomic:

$ strace sed -i s/o/e/g foo.txt
open("foo.txt", O_RDONLY)               = 3
...
open("./sedDe80VL", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
...
read(3, "foo\n", 4096)                  = 4
...
write(4, "fee\n", 4)                    = 4
read(3, "", 4096)                       = 0
...
close(3)                                = 0
close(4)                                = 0
rename("./sedDe80VL", "foo.txt")        = 0

At any point, foo.txt will refer either to the complete original file or to the complete processed file, never to something in between the two.

Notes:

This does not handle the case where more than one process starts editing a file without waiting for the other processes to have finished editing it, in which case only the process which finishes last "wins" (ie wipes the changes performed by the other processes). This is not a matter of data integrity, and cannot be handled without higher level coordination between the processes (blindly locking the file will lead to deadlocks).

Currently, GNU sed will copy the standard file permissions into the new inode, but not the ACLs and extended attributes. If using sed -i on such a file, all that extra metadata will be lost. IMHO that's more of a feature than a bug or limitation.

perl -i used to work very differently from sed -i until version 5.28; it used to first make a temporary copy of the file, truncate to the original file, and redirect the output to it. That was preserving the original inode number and extra metadata, but would completely trash the content of the file in the case where the perl -i process was interrupted or more than one perl -i process was editing the file at the same time. See the discussion, the original commit (which was subsequently improved) and the changelog in perl5280delta.

I'm not going to nitpick on the awful terminology, but yes, GNU sed with its -i ("in-place") flag could be safely used by more than one process at the same time without any extra locking, because sed is not actually modifying the file in-place, but it's redirecting the output to a temporary file, and if everything goes well, it will rename(2) (move) the temporary file to the original file, and the rename(2) is guaranteed to be atomic:

$ strace sed -i s/o/e/g foo.txt
open("foo.txt", O_RDONLY)               = 3
...
open("./sedDe80VL", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
...
read(3, "foo\n", 4096)                  = 4
...
write(4, "fee\n", 4)                    = 4
read(3, "", 4096)                       = 0
...
close(3)                                = 0
close(4)                                = 0
rename("./sedDe80VL", "foo.txt")        = 0

At any point, foo.txt will refer either to the complete original file or to the complete processed file, never to something in between the two.

Notes:

This does not handle the case where more than one process starts editing a file without waiting for the other processes to have finished editing it, in which case only the process which finishes last "wins" (ie wipes the changes performed by the other processes). This is not a matter of data integrity, and cannot be handled without higher level coordination between the processes (blindly locking the file will lead to deadlocks).

Currently, GNU sed will copy the standard file permissions into the new inode, but not the ACLs and extended attributes. If using sed -i on such a file, all that extra metadata will be lost. IMHO that's more of a feature than a bug or limitation.

perl -i used to work very differently from sed -i until version 5.28; it used to first make a temporary copy of the file, truncate to the original file, and redirect the output to it. That was preserving the original inode number and extra metadata, but would completely trash the content of the file in the case where the perl -i process was interrupted or more than one perl -i process was editing the file at the same time. See the discussion, the original commit (which was subsequently improved) and the changelog in perl5280delta.

deleted 30 characters in body

Source Link

edited Mar 12, 2019 at 14:20

user313992

I'm not going to nitpick on the awful terminology, but yes, GNU sed with its -i ("in-place") flag could be safely used by more than one process at the same time without any extra locking, because sed is not actually modifying the file in-place, but it's redirecting the output to a temporary file, and if everything goes well, it will rename(2) (move) the temporary file to the original file, and the rename(2) is guaranteed to be atomic:

$ strace sed -i s/o/e/g foo.txt
open("foo.txt", O_RDONLY)               = 3
...
open("./sedDe80VL", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
...
read(3, "foo\n", 4096)                  = 4
...
write(4, "fee\n", 4)                    = 4
read(3, "", 4096)                       = 0
...
close(3)                                = 0
close(4)                                = 0
rename("./sedDe80VL", "foo.txt")        = 0

At any point, foo.txt will refer either to the complete original file or to the complete processed file, never to something in between the two.

Notes:

This does not handle the case where more than one process starts editing a file without waiting for the other processes to have finished editing it, in which case only the process which finishes last "wins" (ie wipes the changes performed by the other processes). This is not a matter of data integrity, and cannot be handled without higher level coordination between the processes that modify the file (blindly locking the file by sed will lead to deadlocks).

Currently, GNU sed will copy the standard file permissions into the new inode, but not the ACLs and extended attributes. If using sed -i on such a file, all that extra metadata will be lost. IMHO that's more of a feature than a bug or limitation.

perl -i used to work very differently from sed -i until version 5.28; it used to first make a temporary copy of the file, truncate to the original file, and redirect the output to it. That was preserving the original inode number and extra metadata, but would completely trash the content of the file in the case where the perl -i process was interrupted or more than one perl -i process was editing the file at the same time. See the discussion, the original commit (which was subsequently improved) and the changelog in perl5280delta.

I'm not going to nitpick on the awful terminology, but yes, GNU sed with its -i ("in-place") flag could be safely used by more than one process at the same time without any extra locking, because sed is not actually modifying the file in-place, but it's redirecting the output to a temporary file, and if everything goes well, it will rename(2) (move) the temporary file to the original file, and the rename(2) is guaranteed to be atomic:

$ strace sed -i s/o/e/g foo.txt
open("foo.txt", O_RDONLY)               = 3
...
open("./sedDe80VL", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
...
read(3, "foo\n", 4096)                  = 4
...
write(4, "fee\n", 4)                    = 4
read(3, "", 4096)                       = 0
...
close(3)                                = 0
close(4)                                = 0
rename("./sedDe80VL", "foo.txt")        = 0

At any point, foo.txt will refer either to the complete original file or to the complete processed file, never to something in between the two.

Notes:

This does not handle the case where more than one process starts editing a file without waiting for the other processes to have finished editing it, in which case only the process which finishes last "wins" (ie wipes the changes performed by the other processes). This is not a matter of data integrity, and cannot be handled without higher level coordination between the processes that modify the file (blindly locking the file by sed will lead to deadlocks).

Currently, GNU sed will copy the standard file permissions into the new inode, but not the ACLs and extended attributes. If using sed -i on such a file, all that extra metadata will be lost. IMHO that's more of a feature than a bug or limitation.

perl -i used to work very differently from sed -i until version 5.28; it used to first make a temporary copy of the file, truncate to the original file, and redirect the output to it. That was preserving the original inode number and extra metadata, but would completely trash the content of the file in the case where the perl -i process was interrupted or more than one perl -i process was editing the file at the same time. See the discussion, the original commit (which was subsequently improved) and the changelog in perl5280delta.

I'm not going to nitpick on the awful terminology, but yes, GNU sed with its -i ("in-place") flag could be safely used by more than one process at the same time without any extra locking, because sed is not actually modifying the file in-place, but it's redirecting the output to a temporary file, and if everything goes well, it will rename(2) (move) the temporary file to the original file, and the rename(2) is guaranteed to be atomic:

$ strace sed -i s/o/e/g foo.txt
open("foo.txt", O_RDONLY)               = 3
...
open("./sedDe80VL", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
...
read(3, "foo\n", 4096)                  = 4
...
write(4, "fee\n", 4)                    = 4
read(3, "", 4096)                       = 0
...
close(3)                                = 0
close(4)                                = 0
rename("./sedDe80VL", "foo.txt")        = 0

At any point, foo.txt will refer either to the complete original file or to the complete processed file, never to something in between the two.

Notes:

This does not handle the case where more than one process starts editing a file without waiting for the other processes to have finished editing it, in which case only the process which finishes last "wins" (ie wipes the changes performed by the other processes). This is not a matter of data integrity, and cannot be handled without higher level coordination between the processes (blindly locking the file will lead to deadlocks).

Currently, GNU sed will copy the standard file permissions into the new inode, but not the ACLs and extended attributes. If using sed -i on such a file, all that extra metadata will be lost. IMHO that's more of a feature than a bug or limitation.

perl -i used to work very differently from sed -i until version 5.28; it used to first make a temporary copy of the file, truncate to the original file, and redirect the output to it. That was preserving the original inode number and extra metadata, but would completely trash the content of the file in the case where the perl -i process was interrupted or more than one perl -i process was editing the file at the same time. See the discussion, the original commit (which was subsequently improved) and the changelog in perl5280delta.

add some notes, especially about `perl -i` being different until recently

Source Link

edited Mar 12, 2019 at 14:05

user313992

I'm not going to nitpick on the awful terminology, but yes, GNU sed with its -i ("in-place") flag could be safely used by more than one process at the same time without any extra locking, because sed is not actually modifying the file in-place, but it's redirecting the output to a temporary file, and if everything goes well, it will rename(2) (move) the temporary file to the original file, and the rename(2) is guaranteed to be atomic:

$ strace sed -i s/o/e/g foo.txt
open("foo.txt", O_RDONLY)               = 3
...
open("./sedDe80VL", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
...
read(3, "foo\n", 4096)                  = 4
...
write(4, "fee\n", 4)                    = 4
read(3, "", 4096)                       = 0
...
close(3)                                = 0
close(4)                                = 0
rename("./sedDe80VL", "foo.txt")        = 0

At any point, foo.txt will refer either to the complete original file or to the complete processed file, never to something in between the two.

Notes:

This does not handle the case where more than one process starts editing a file without waiting for the other processes to have finished editing it, in which case only the process which finishes last "wins" (ie wipes the changes performed by the other processes). This is not a matter of data integrity, and cannot be handled without higher level coordination between the processes that modify the file (blindly locking the file by sed will lead to deadlocks).

Currently, GNU sed will copy the standard file permissions into the new inode, but not the ACLs and extended attributes. If using sed -i on such a file, all that extra metadata will be lost. IMHO that's more of a feature than a bug or limitation.

perl -i used to work very differently from sed -i until version 5.28; it used to first make a temporary copy of the file, truncate to the original file, and redirect the output to it. That was preserving the original inode number and extra metadata, but would completely trash the content of the file in the case where the perl -i process was interrupted or more than one perl -i process was editing the file at the same time. See the discussion, the original commit (which was subsequently improved) and the changelog in perl5280delta.

I'm not going to nitpick on the awful terminology, but yes, GNU sed with its -i ("in-place") flag could be safely used by more than one process at the same time without any extra locking, because sed is not actually modifying the file in-place, but it's redirecting the output to a temporary file, and if everything goes well, it will rename(2) (move) the temporary file to the original file, and the rename(2) is guaranteed to be atomic:

$ strace sed -i s/o/e/g foo.txt
open("foo.txt", O_RDONLY)               = 3
...
open("./sedDe80VL", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
...
read(3, "foo\n", 4096)                  = 4
...
write(4, "fee\n", 4)                    = 4
read(3, "", 4096)                       = 0
...
close(3)                                = 0
close(4)                                = 0
rename("./sedDe80VL", "foo.txt")        = 0

At any point, foo.txt will refer either to the complete original file or to the complete processed file, never to something in between the two.

I'm not going to nitpick on the awful terminology, but yes, GNU sed with its -i ("in-place") flag could be safely used by more than one process at the same time without any extra locking, because sed is not actually modifying the file in-place, but it's redirecting the output to a temporary file, and if everything goes well, it will rename(2) (move) the temporary file to the original file, and the rename(2) is guaranteed to be atomic:

$ strace sed -i s/o/e/g foo.txt
open("foo.txt", O_RDONLY)               = 3
...
open("./sedDe80VL", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
...
read(3, "foo\n", 4096)                  = 4
...
write(4, "fee\n", 4)                    = 4
read(3, "", 4096)                       = 0
...
close(3)                                = 0
close(4)                                = 0
rename("./sedDe80VL", "foo.txt")        = 0

At any point, foo.txt will refer either to the complete original file or to the complete processed file, never to something in between the two.

Notes:

This does not handle the case where more than one process starts editing a file without waiting for the other processes to have finished editing it, in which case only the process which finishes last "wins" (ie wipes the changes performed by the other processes). This is not a matter of data integrity, and cannot be handled without higher level coordination between the processes that modify the file (blindly locking the file by sed will lead to deadlocks).

Currently, GNU sed will copy the standard file permissions into the new inode, but not the ACLs and extended attributes. If using sed -i on such a file, all that extra metadata will be lost. IMHO that's more of a feature than a bug or limitation.

perl -i used to work very differently from sed -i until version 5.28; it used to first make a temporary copy of the file, truncate to the original file, and redirect the output to it. That was preserving the original inode number and extra metadata, but would completely trash the content of the file in the case where the perl -i process was interrupted or more than one perl -i process was editing the file at the same time. See the discussion, the original commit (which was subsequently improved) and the changelog in perl5280delta.

added 149 characters in body

Source Link

edited Mar 6, 2019 at 21:54

user313992

Loading

Source Link

answered Mar 6, 2019 at 21:47

user313992

Loading

Stack Exchange Network

Return to Answer