Aside from the limitations of the hard link, which you already touch upon:
Note that they simply do different things, you cannot supplant one for the other in most cases. A symbolic link really contains the information on the target path, and that makes a huge difference: Whether or not you can access a path depends on all the components in the path, not just the final file. So, very different semantics. You typically want to know which path you're "actually" opening, so you use a symbolic link. I honestly can find barely any case where I'd prefer to not know where else in my file system the same (mutable!!) data can be found.
Each time, I conclude that hard links seem superior in most use cases, especially when Linux is my daily driver
I'll say that this is your (equally valid) opinion. But, to me, hard links are one of the sins of early UNIX file system design and shouldn't exist, because they are very very confusing, and break most assumptions you could make on a file, they are an everlasting source of race conditions, and although they are supported by its POSIX-y file systems, Linux is badly equipped for dealing with themΒΉ.
The file path is the identifier of the data. A hard link breaks that.
For example, when you ask libreoffice to open /home/pie/doc.odt, it asks Linux to open the file at path "/home/pie/doc.odt", and give it a handle to that. You edit a line in that document, and hit "save". What now happens is not special to libreoffice: most programs actually work that way. Libreoffice, instead of fiddling with the original file, potentially breaking it for a few split seconds, before finishing it, makes a new, temporary file, completely writes to that, and then renames it to "/home/pi/doc.odt". Linux guarantees that this rename is atomic: When you open the same path before the rename was completed, it's the old content, afterwards, it's the new. At no point in time would you get a "half-written" file.
A hardlink completely breaks that. Try this yourself:
# make two files, with different contents
echo a_content > a_file
echo b_content > b_file
# make hardlink
ln a_file a_linked
# check contents
cat a_file b_file a_linked
# gives
# a_content
# b_content
# a_content
# replace a with b, by renaming "b" to "a"
mv b_file a_file
# check contents
cat a_file b_file a_linked
# gives
# b_content
# b_content
# a_content
### ooops!
Not what the user expected.
Also, they can only work on the same file system, and that's not a good thing, because it breaks another POSIX abstraction, namely that a file behaves the same no matter where you move it. Got a hard-linked file that changed in lockstep with its other manifestation in another directory on the same file system? Now you move it somewhere else, and suddenly the link is gone. You, as user, can't even know that happened, you might not have the privileges to ask for whether two directories are on the same file system.
On a Unix system, a "file" is essentially an address pointing to data on disk.
I'd say, on a Unix system, a file is a piece of data stored in a structured way that can have a path. (or, with hardlinks, multiple paths)
For example, if I create a hard link to a file in ~/Documents, I can access the same data from ~/Desktop, and if I move the file from ~/Documents to ~/Images, the hard link still works seamlessly.
As explained above, that doesn't work well, as soon as you realize you're a photographer, your ~/Images is your life earnings, so you put that on a separate bunch of drives with mirroring, or you run out of space and put it somewhere else, or whatever.
This behavior reminds me of Windows shortcuts but without the fragilityβhard links remain valid even after moving the original file.
That's a huge break of trust, right? If I write invoices with illustrations in them, and at the end of 2024, I moved ~/Documents/invoices/*.html and ~/Documents/invoices/*.png to ~/Documents/invoices/archive/2024/, then I expect that my darn invoices don't suddenly look different when I check how I illustrated something later on.
If I worked, during 2025, on my ~/Images/illustrations/simple-example.png, which I hardlinked into ~/Document/invoices/simple-customer-simple-example.png back in 2024, then suddenly everything is broken.
Worst of this is that I have no practical way of knowing whether a hardlink was only into things already archived and immutable (e.g. my 2024 January invoice uses the same image as ~/Document/archive/2023/December/customer.html), or whether it's a "dangerous" hardlink into my working space, and needs to be "de-linked" by creating an actual copy.
On the other hand, symbolic links break if the target file is moved, which seems like a significant drawback.
That's a great upside β they are really just recognizable as references to other places. If I need that piece of data to be coherent with some other piece of data, then I need an actual copy.
In the libreoffice example above, your program could check for the actual path of the data, and replace that, atomically, for everyone. With hardlinks, you cannot even check for other manifestations of the same dataΒΉ!
What are the practical scenarios where symlinks are preferable over hard links?
Basically everyhwere, exactly because the "can't be on different file systems" aspect makes everything super fragile, most packaging strategies try to make systems work no matter whether they are stored on a single volume or split across many file systems, and because "spooky action at a distance" is a nightmare in practice.
ΒΉ Without going into too many details: You would think that an operating system that decides to deal with hard links on file system level would have APIs for dealing with them. But: it doesn't.
Linux can tell you a file has N "manifestations" (by default, only 1). (ls -l has a column for that.)
But if you need to figure out all manifestations of the same file, you need to
- Ask Linux which filesystem the path is on (on a desktop system, your user is probably allowed to do that. If you're a script on a server, chances are nope, going spelunking on the server's filesystems is not part of what e.g. a wordpress plugin needs to do, and you get blocked)
- Find all mountings of that same file system (only root of the initial filesystem namespace) that might have different mount options, mounting a different subset of files. These might literally be hundreds, for example, on a server that runs containers.
- On a file system with subvolumes (say,
btrfs, zfs, a few others), you might not have the common ancestor volume mounted anywhere, so the other "ends" of the hardlink might not even be visible. Mount the common ancestor file system read-only. Only root can do that.
- Ask Linux for the inode number of your file that you know is hardlinked somewhere else
- With that number, go through all these mountings, and make sure that you really check all files on that filesystem whether they have the same inode
That's of course very problematic. Your original path might get unlinked / removed while you're looking for the other manifestations. The "looking through all files" would typically take multiple seconds to weeks (on really big file systems, think archive storage clusters; then it will also cost real money because you're using power and wear the cold storage media).
I think of one case where a system intentionally used hardlinks within its own storage hierarchy only as a means of exchanging data in a multiprocess system, which meant it didn't have to look outside for "duplicate manifestations" when it recovered from a crash (it cleaned up the hardlinks on shutdown, anyways). Idea was that systems A, B, C and D look at the same file system, tell it to make hard links according to an elegant naming scheme when they needed to communicate that some data was now available for processing in the next processing step, and the next processing step removes the link from the previous step as soon as it has started processing a file. If something crashes, you just need to look which files have a link count of only 1: these are the ones that were currently being processed. So, crash recovery means going through all files, and building the pairs of files that have already been processed by the first step, and not yet by the next.
That's all fine and dandy (if a bit Rube Goldberg) until D is done with crash recovery faster than B, and starts removing files that have been completely processed, thereby freeing inode numbers for reuse by C. Suddenly, B becomes very confused about pairs, because inode numbers are not at all guaranteed to describe the same data over time and can be reused.