Skip to main content
minor copy changes to improve the readability of this excellent answer
Source Link
Anthony Geoghegan
  • 13.6k
  • 7
  • 62
  • 66

An easy way to increase performance of file writes, is for the OS to just cache the data, tell (lie to) the application the write went through, and then actually do the write later. This is especially useful if there's other disk activity going on at the same time: the OS can prioritize reads and do the writes later. It can also remove the need for an actual write completely, e.g., in the case ofwhere a temporary file that is removed quickly afterwards.

I said, "is supposed to", since the drive itself might maketell the same lies to the OS, and tellsay that the write is complete, while it'sthe file really only exists in a volatile write cache within the drive. Depending on the drive, there might be no way around that.

In addition to fsync(), there are also the sync() and syncfs() system calls that ask the system to make sure all writes system-wide, writes or all writes on a particular filesystem have hit the disk. The utility sync can be used toto call those.

How a filesystem deals with metadata changes and the ordering between metadata and data writes varies a lot. With eE.g., with ext4, if you set the mount flag data=journal, then all writes, even data writes go through the journal and should be rather safe. That also means they get written twice, so performance goes down. The default options try to order the writes so that the data is on the disk before the metadata is updated. Other options or other filesystem may be better or worse,worse; I won't even try a comprehensive study.

An easy way to increase performance of file writes, is for the OS to just cache the data, tell (lie to) the application the write went through, and then actually do the write later. This is especially useful if there's other disk activity going on at the same time: the OS can prioritize reads and do the writes later. It can also remove the need for an actual write completely, e.g. in case of a temporary file that is removed quickly afterwards.

I said, "is supposed to", since the drive itself might make the same lies to the OS, and tell that the write is complete, while it's really only in a volatile write cache within the drive. Depending on the drive, there might be no way around that.

In addition to fsync(), there are also the sync() and syncfs() system calls that ask the system to make sure all writes system-wide, or all writes on a particular filesystem have hit the disk. The utility sync can be used to call those.

How a filesystem deals with metadata changes and the ordering between metadata and data writes varies a lot. With e.g. ext4, if you set the mount flag data=journal, then all writes, even data writes go through the journal and should be rather safe. That also means they get written twice, so performance goes down. The default options try to order the writes so that the data is on the disk before the metadata is updated. Other options or other filesystem may be better or worse, I won't even try a comprehensive study.

An easy way to increase performance of file writes, is for the OS to just cache the data, tell (lie to) the application the write went through, and then actually do the write later. This is especially useful if there's other disk activity going on at the same time: the OS can prioritize reads and do the writes later. It can also remove the need for an actual write completely, e.g., in the case where a temporary file is removed quickly afterwards.

I said, "is supposed to", since the drive itself might tell the same lies to the OS and say that the write is complete, while the file really only exists in a volatile write cache within the drive. Depending on the drive, there might be no way around that.

In addition to fsync(), there are also the sync() and syncfs() system calls that ask the system to make sure all system-wide writes or all writes on a particular filesystem have hit the disk. The utility sync can be used to call those.

How a filesystem deals with metadata changes and the ordering between metadata and data writes varies a lot. E.g., with ext4, if you set the mount flag data=journal, then all writes even data writes go through the journal and should be rather safe. That also means they get written twice, so performance goes down. The default options try to order the writes so that the data is on the disk before the metadata is updated. Other options or other filesystem may be better or worse; I won't even try a comprehensive study.

added 7 characters in body
Source Link
ilkkachu
  • 147.8k
  • 16
  • 268
  • 441

In practice, on a lightly loaded system, the file willshould hit the disk within a momentfew seconds. If you're dealing with removable storage, unmount the filesystem before pulling the media to make sure the data is actually sent to the drive, and there's no further activity. (Or have your GUI environment do that for you.)

In practice, on a lightly loaded system, the file will hit the disk within a moment. If you're dealing with removable storage, unmount the filesystem before pulling the media to make sure the data is actually sent to the drive, and there's no further activity. (Or have your GUI environment do that for you.)

In practice, on a lightly loaded system, the file should hit the disk within a few seconds. If you're dealing with removable storage, unmount the filesystem before pulling the media to make sure the data is actually sent to the drive, and there's no further activity. (Or have your GUI environment do that for you.)

added 1131 characters in body
Source Link
ilkkachu
  • 147.8k
  • 16
  • 268
  • 441

There'sIn addition to fsync(), there are also the sync() and syncfs() system calls that ask the system to make sure all writes system-wide, or all writes on a particular filesystem have hit the disk. The utility sync can be used to call those.

Then there's also the O_DIRECT flag to open(), which is supposed to "try to minimize cache effects of the I/O to and from this file." Removing caching reduces performance, so that's mostly used by applications (databases) that do their own caching and want to be in control of it. (O_DIRECT isn't without its issues, the comments about it in the man page are somewhat amusing.)


What happens on a power-out also depends on the filesystem. It's not just the file data that you should be concerned about, but the filesystem metadata. Having the file data on disk isn't much use if you can't find it. Just extending a file to a larger size will require allocating new data blocks, and they need to be marked somewhere.

How a filesystem deals with metadata changes and the ordering between metadata and data writes varies a lot. With e.g. ext4, if you set the mount flag data=journal, then all writes, even data writes go through the journal and should be rather safe. That also means they get written twice, so performance goes down. The default options try to order the writes so that the data is on the disk before the metadata is updated. Other options or other filesystem may be better or worse, I won't even try a comprehensive study.

There's also the O_DIRECT flag to open(), which is supposed to "try to minimize cache effects of the I/O to and from this file." Removing caching reduces performance, so that's mostly used by applications (databases) that do their own caching and want to be in control of it. (O_DIRECT isn't without its issues, the comments about it in the man page are somewhat amusing.)

In addition to fsync(), there are also the sync() and syncfs() system calls that ask the system to make sure all writes system-wide, or all writes on a particular filesystem have hit the disk. The utility sync can be used to call those.

Then there's also the O_DIRECT flag to open(), which is supposed to "try to minimize cache effects of the I/O to and from this file." Removing caching reduces performance, so that's mostly used by applications (databases) that do their own caching and want to be in control of it. (O_DIRECT isn't without its issues, the comments about it in the man page are somewhat amusing.)


What happens on a power-out also depends on the filesystem. It's not just the file data that you should be concerned about, but the filesystem metadata. Having the file data on disk isn't much use if you can't find it. Just extending a file to a larger size will require allocating new data blocks, and they need to be marked somewhere.

How a filesystem deals with metadata changes and the ordering between metadata and data writes varies a lot. With e.g. ext4, if you set the mount flag data=journal, then all writes, even data writes go through the journal and should be rather safe. That also means they get written twice, so performance goes down. The default options try to order the writes so that the data is on the disk before the metadata is updated. Other options or other filesystem may be better or worse, I won't even try a comprehensive study.

deleted 243 characters in body
Source Link
ilkkachu
  • 147.8k
  • 16
  • 268
  • 441
Loading
deleted 9 characters in body
Source Link
ilkkachu
  • 147.8k
  • 16
  • 268
  • 441
Loading
deleted 1 character in body
Source Link
Wildcard
  • 37.5k
  • 30
  • 149
  • 284
Loading
added 440 characters in body
Source Link
ilkkachu
  • 147.8k
  • 16
  • 268
  • 441
Loading
typo
Source Link
Stéphane Chazelas
  • 584.6k
  • 96
  • 1.1k
  • 1.7k
Loading
Source Link
ilkkachu
  • 147.8k
  • 16
  • 268
  • 441
Loading