11

I frequently find myself googling the difference between symbolic links and hard links. Each time, I conclude that hard links seem superior in most use cases, especially when Linux is my daily driver. However, I find this conclusion unsatisfactory because almost every tutorial, blog post, or Stack Overflow answer discussing links overwhelmingly recommends symbolic links instead.

My understanding so far:

  • On a Unix system, a "file" is essentially an address pointing to data on disk.

  • A hard link allows multiple addresses to reference the same data, making it a powerful tool.

For example, if I create a hard link to a file in ~/Documents, I can access the same data from ~/Desktop, and if I move the file from ~/Documents to ~/Images, the hard link still works seamlessly. This behavior reminds me of Windows shortcuts but without the fragilityβ€”hard links remain valid even after moving the original file. On the other hand, symbolic links break if the target file is moved, which seems like a significant drawback.

The only major advantage of symlinks I’ve found is that they can span different filesystems, whereas hard links are restricted to the same filesystem.

Given this, why are symbolic links the standard in most cases? What are the practical scenarios where symlinks are preferable over hard links?

2
  • Windows shortcuts are symbolic links; Windows reparse points are hard links. Commented Mar 10 at 8:58
  • @MSalters Windows shortcuts are pseudo-symbolic links implemented by convention: true symbolic links are distinguished in the filesystem. (Windows NT has both: shortcuts are just .lnk files, and symbolic links are implemented in NTFS.) Commented Mar 10 at 19:47

6 Answers 6

22

Aside from the limitations of the hard link, which you already touch upon:

Note that they simply do different things, you cannot supplant one for the other in most cases. A symbolic link really contains the information on the target path, and that makes a huge difference: Whether or not you can access a path depends on all the components in the path, not just the final file. So, very different semantics. You typically want to know which path you're "actually" opening, so you use a symbolic link. I honestly can find barely any case where I'd prefer to not know where else in my file system the same (mutable!!) data can be found.

Each time, I conclude that hard links seem superior in most use cases, especially when Linux is my daily driver

I'll say that this is your (equally valid) opinion. But, to me, hard links are one of the sins of early UNIX file system design and shouldn't exist, because they are very very confusing, and break most assumptions you could make on a file, they are an everlasting source of race conditions, and although they are supported by its POSIX-y file systems, Linux is badly equipped for dealing with themΒΉ.

The file path is the identifier of the data. A hard link breaks that.

For example, when you ask libreoffice to open /home/pie/doc.odt, it asks Linux to open the file at path "/home/pie/doc.odt", and give it a handle to that. You edit a line in that document, and hit "save". What now happens is not special to libreoffice: most programs actually work that way. Libreoffice, instead of fiddling with the original file, potentially breaking it for a few split seconds, before finishing it, makes a new, temporary file, completely writes to that, and then renames it to "/home/pi/doc.odt". Linux guarantees that this rename is atomic: When you open the same path before the rename was completed, it's the old content, afterwards, it's the new. At no point in time would you get a "half-written" file.

A hardlink completely breaks that. Try this yourself:

# make two files, with different contents
echo a_content > a_file
echo b_content > b_file

# make hardlink
ln a_file a_linked

# check contents
cat a_file b_file a_linked
# gives
# a_content
# b_content
# a_content

# replace a with b, by renaming "b" to "a"
mv b_file a_file

# check contents
cat a_file b_file a_linked
# gives
# b_content
# b_content
# a_content
### ooops!

Not what the user expected.

Also, they can only work on the same file system, and that's not a good thing, because it breaks another POSIX abstraction, namely that a file behaves the same no matter where you move it. Got a hard-linked file that changed in lockstep with its other manifestation in another directory on the same file system? Now you move it somewhere else, and suddenly the link is gone. You, as user, can't even know that happened, you might not have the privileges to ask for whether two directories are on the same file system.

On a Unix system, a "file" is essentially an address pointing to data on disk.

I'd say, on a Unix system, a file is a piece of data stored in a structured way that can have a path. (or, with hardlinks, multiple paths)

For example, if I create a hard link to a file in ~/Documents, I can access the same data from ~/Desktop, and if I move the file from ~/Documents to ~/Images, the hard link still works seamlessly.

As explained above, that doesn't work well, as soon as you realize you're a photographer, your ~/Images is your life earnings, so you put that on a separate bunch of drives with mirroring, or you run out of space and put it somewhere else, or whatever.

This behavior reminds me of Windows shortcuts but without the fragilityβ€”hard links remain valid even after moving the original file.

That's a huge break of trust, right? If I write invoices with illustrations in them, and at the end of 2024, I moved ~/Documents/invoices/*.html and ~/Documents/invoices/*.png to ~/Documents/invoices/archive/2024/, then I expect that my darn invoices don't suddenly look different when I check how I illustrated something later on.

If I worked, during 2025, on my ~/Images/illustrations/simple-example.png, which I hardlinked into ~/Document/invoices/simple-customer-simple-example.png back in 2024, then suddenly everything is broken.

Worst of this is that I have no practical way of knowing whether a hardlink was only into things already archived and immutable (e.g. my 2024 January invoice uses the same image as ~/Document/archive/2023/December/customer.html), or whether it's a "dangerous" hardlink into my working space, and needs to be "de-linked" by creating an actual copy.

On the other hand, symbolic links break if the target file is moved, which seems like a significant drawback.

That's a great upside – they are really just recognizable as references to other places. If I need that piece of data to be coherent with some other piece of data, then I need an actual copy.

In the libreoffice example above, your program could check for the actual path of the data, and replace that, atomically, for everyone. With hardlinks, you cannot even check for other manifestations of the same dataΒΉ!

What are the practical scenarios where symlinks are preferable over hard links?

Basically everyhwere, exactly because the "can't be on different file systems" aspect makes everything super fragile, most packaging strategies try to make systems work no matter whether they are stored on a single volume or split across many file systems, and because "spooky action at a distance" is a nightmare in practice.


ΒΉ Without going into too many details: You would think that an operating system that decides to deal with hard links on file system level would have APIs for dealing with them. But: it doesn't.

Linux can tell you a file has N "manifestations" (by default, only 1). (ls -l has a column for that.)

But if you need to figure out all manifestations of the same file, you need to

  1. Ask Linux which filesystem the path is on (on a desktop system, your user is probably allowed to do that. If you're a script on a server, chances are nope, going spelunking on the server's filesystems is not part of what e.g. a wordpress plugin needs to do, and you get blocked)
  2. Find all mountings of that same file system (only root of the initial filesystem namespace) that might have different mount options, mounting a different subset of files. These might literally be hundreds, for example, on a server that runs containers.
    1. On a file system with subvolumes (say, btrfs, zfs, a few others), you might not have the common ancestor volume mounted anywhere, so the other "ends" of the hardlink might not even be visible. Mount the common ancestor file system read-only. Only root can do that.
  3. Ask Linux for the inode number of your file that you know is hardlinked somewhere else
  4. With that number, go through all these mountings, and make sure that you really check all files on that filesystem whether they have the same inode

That's of course very problematic. Your original path might get unlinked / removed while you're looking for the other manifestations. The "looking through all files" would typically take multiple seconds to weeks (on really big file systems, think archive storage clusters; then it will also cost real money because you're using power and wear the cold storage media).

I think of one case where a system intentionally used hardlinks within its own storage hierarchy only as a means of exchanging data in a multiprocess system, which meant it didn't have to look outside for "duplicate manifestations" when it recovered from a crash (it cleaned up the hardlinks on shutdown, anyways). Idea was that systems A, B, C and D look at the same file system, tell it to make hard links according to an elegant naming scheme when they needed to communicate that some data was now available for processing in the next processing step, and the next processing step removes the link from the previous step as soon as it has started processing a file. If something crashes, you just need to look which files have a link count of only 1: these are the ones that were currently being processed. So, crash recovery means going through all files, and building the pairs of files that have already been processed by the first step, and not yet by the next.

That's all fine and dandy (if a bit Rube Goldberg) until D is done with crash recovery faster than B, and starts removing files that have been completely processed, thereby freeing inode numbers for reuse by C. Suddenly, B becomes very confused about pairs, because inode numbers are not at all guaranteed to describe the same data over time and can be reused.

31
  • 10
    "to me, hard links are one of the sins of early UNIX file system design and shouldn't exist". No agreement here. They are the reason moving a file from one directory to another in the same filesystem is extremely fast, lightweight on CPU and disk I/O, and doesn't consume 2x the space during the move. This was an enormous benefit on 1970s/1980s hardware when the FS was invented, and is still a big benefit on modern hardware. There are drawbacks to hard links, just like any other tool. But a sin that shouldn't exist? I disagree completely. Commented Mar 9 at 0:14
  • 5
    Summary: the hammer is a mistake because it breaks screws. Commented Mar 9 at 8:48
  • 6
    @SottoVoce, I don't think efficient moving within the same fs requires hard links (i.e. the possibility to have two names at a time). Consider something like FAT where both the file name and inode metadata (such as it is) is stored within the directory: to move a file, you only need to move all that. Usually, there's some kind of indirection to the actual data anyway, if only because the amount of data per file can vary enormously, between files and within the lifetime of a single file. Commented Mar 9 at 10:52
  • 5
    @SottoVoce what you describe is not at all an argument for or a benefit of hard links :) The ability of your file system to move files from one directory to the other without copying the data has nothing to do with whether it supports hard links, just with whether the storage of which blocks/extends belong to a file are kept in the directory structure or separately. Commented Mar 9 at 11:13
  • 2
    @MarcusMüller, right, but is any other Unix-like / POSIX-compatible OS any better? I.e. exactly because the structure is like it is, answering that question is hard, and I don't see how it would be better in the BSDs, or Solaris, or whathaveyou. Unless of course they do have some magic to do exactly that, in which case do tell. :) And yeah, I forgot, Linux at least has the magic symlinks in /proc/pid/fd/ that actually show the path used to open the file (and I think they track renames too, to some extent). But that's not a standard API and doesn't tell about any other links... Commented Mar 9 at 11:38
12

Pros for hard links and when they are better:

  • if you need two names for the same file, but one of them might be deleted or renamed
  • If the purpose of the linking is to save space and/or reduce redundant files, and the relationship between the link and the target is not important

Cons for hard links:

  • There is no source/target relationship between hard links. All hard links are identical. The inode doesn't store the filenames that hard link to it, so the only way to find other hardlinks to the same file is to search the entire filesystem.
  • Some filesystems put a limit on how many hardlinks can reference the same file, but this practically is never an issue.
  • The inode holds the file permissions, so you can't have two hard links to the same file with different permissions or ownership. In some cases, this can cause security issues with users linking files into directories the owner cant delete them from. (Some systems block this.)

When symbolic links are better (or the only option):

  • if you want to link to a file on another filesystem
  • if you want to link to a directory
  • if the relationship between the two files is more important than accessing the data if the target is deleted or renamed
  • if the information you want to store and view is not the content, but the name of the target in the link
  • if the content of the file needs to change based on situation; examples:
    • a symbolic link to a file in /etc on a network shared filesystem, where the target is local and the symbolic link is not
    • a relative link to a file in another directory, where the current directory might be moved and you want to keep the relationship not the content (where there might be a parallel structure in the new location)
  • Symlinks can implement a primitive version of copy on write, where a second user can make a shadow directory tree of symlinks to another user's files, and then delete the symlink and replace it with a copy of the file before editing it (or have the editor do this automatically).

Cons of symlinks:

  • Symbolic links can be fragile; if the target of the symlink is moved or renamed, the link is broken.
  • Symlinks to directories (or symlinks!) can create a maze of twisty paths, making it confusing what the real path is. (All hardlinks are the real path.)
  • Symlinks to directories can cause loops, and naΓ―ve applications that don't check for them and traverse them blindly (like vscode) can get into infinite loops.
  • Symlinks are notorious for causing security issues, where an application can check a symlink's permissions and then another program can replace the symlink in a race before the first application uses it. (There are many mitigations for this.)

Note that use of hardlinks and symlinks is very situational, and that a particular feature can be a pro in one situation and a con in another.

2
  • 2
    I'd actually make the first "con" under hardlinks both a pro and a con. If you want the same read-only file in two wholly unrelated places, you don't want anything linking those two places; thus hard-links are great for decoupling wile saving space or improving performance. Also hardlinks work in chroot contexts where sym-links don't, although typically bind mounts get used for that case more than hard links. Commented Mar 10 at 13:14
  • It's certainly very situational, with some overlapping cases. Commented Mar 10 at 22:19
11

To give input on the question asked in the Title:

Why are symbolic links more common than hard links in Unix/LINUX?

1. Because hard links can't traverse different filesystems, and that's the most common use of symbolic links by OS installers. E.g. on Ubuntu ls -l /var/lock /var/run :

lrwxrwxrwx 1 root root 9 May 12 2020 /var/lock -> /run/lock
lrwxrwxrwx 1 root root 4 May 12 2020 /var/run -> /run

Older versions of Ubuntu used to put lock and pid files under /var but now they're under /run and the filesystem spanning feature of symbolic links provides backward compatibility for software and users who try to use the old paths.

Also e.g., on MacOS ls -l /var :

lrwxr-xr-x@ 1 root  wheel  11 Feb  4 08:57 /var -> private/var

2. Because symbolic links are self-documenting. E.g., on Ubuntu ls -l /usr/bin/python* :

lrwxrwxrwx 1 root root       7 Apr 15  2020 /usr/bin/python -> python2
lrwxrwxrwx 1 root root       9 Mar 13  2020 /usr/bin/python2 -> python2.7
-rwxr-xr-x 1 root root 3657904 Dec  9 11:35 /usr/bin/python2.7
lrwxrwxrwx 1 root root       9 May 12  2020 /usr/bin/python3 -> python3.8
-rwxr-xr-x 1 root root 5490520 Feb  4 07:02 /usr/bin/python3.8

or ls -l /dev/disk/by-uuid :

lrwxrwxrwx 1 root root 10 Feb 20 15:15 1f15f348-0a07-46be-868c-63a373cbb33c -> ../../sda3
lrwxrwxrwx 1 root root 10 Mar  1 06:39 6239c1e2-1244-48f1-9219-9f1213cb4826 -> ../../sda1
lrwxrwxrwx 1 root root  9 Feb 20 15:15 d0274a61-66ee-4a94-94a2-07615697d7a1 -> ../../sdb
lrwxrwxrwx 1 root root 10 Feb 20 15:15 d0e1cf56-df39-4336-bf61-a0c7853259db -> ../../sda2

3. Because the destination doesn't have to exist with a symbolic link, so programs and scripts can set a symbolic link pointing to a file/device that will be created/installed later.

I don't claim the previous (mostly programming-focused) answers didn't include the above, I'm just giving a shorter and sweeter summary of the top three admin-focused reasons symlinks are used so much.

2
  • Lots of talk in other answers, but these are the key elements. Actually, I'd add in the "self-documenting" section that symbolic links also document the fact that they are links at all. Commented Mar 11 at 0:08
  • You missed the fourth aspect: Symbolic links can point to directories, hard links (usually) can not. All your examples in the first point would not be possible even on the same filesystem on most platforms with hardlinks because they are pointing at directories. Commented Mar 11 at 14:07
8

In short, the problem with hard links is that they expose the fact that the "identity" of a file, on the technical level, is the inode. While a great many users might consider the file name the "identity". (Without the possibility of (multiple) hard links, it'd be a 1:1 mapping and the distinction wouldn't matter.)

If a file is modified, it still exists under the same name, but depending on what actually happened to the inode, the hard link might break, or it might not.

In concrete, there are two ways to save a new version of a file:

  • either open the original for writing, and write new data into it (keeping the inode the same, and keeping hard links intact), or
  • create a new file and rename it over the original (creating a new inode that doesn't have the same hard links)

And yes, some utilities use one way, some the other. Some text editors can be configured to use either. In the usual case of having no hard links around, you don't need to care, and you likely won't even notice. Now, depending on what you want to do, either of those might have the desired effect. E.g. hard links can be used as a way of cheap deduplication, but if you do, you need to be very careful to break the link exactly when you need to. You need to think about the tools used to modify the files, every time you modify them.

While at the same time, symlinks just unambiguously point to the file name, and that's that.


The other problem with hard links is that of discovering them. Now, which of these files are linked to each other?

-rw-r--r--  2 ilkkachu staff  2 Mar  9 13:12 a.txt
-rw-r--r--  2 ilkkachu staff  2 Mar  9 13:12 b.txt
-rw-r--r--  1 ilkkachu staff  2 Mar  9 13:12 c.txt
-rw-r--r--  2 ilkkachu staff  2 Mar  9 13:12 d.txt
-rw-r--r--  2 ilkkachu staff  2 Mar  9 13:12 e.txt
lrwxr-xr-x  1 ilkkachu staff  5 Mar  9 13:12 l.txt -> a.txt

One link should be obvious. The others, not so much.

3
  • Listing with ls -li would be a little more honestπŸ˜‰ Commented Mar 9 at 11:28
  • 5
    @MarkSetchell, it would be more revealing about the hard links, yes. But, how many of us regularly run ls -li? As the "default" way of listing a directory? How many graphical file managers expose the inode numbers? I guess the answer is "not too many". But even if one does, they'd see just a bunch of meaningless numbers. Sure, for the above files, you'd see both a.txt and d.txt have a meaningless number of 61741881, answering the question, but you really need to go through the whole list and compare the numbers. And that's with all links in the same directory, readily available... Commented Mar 9 at 11:34
  • 2
    Obviosly upvoted this answer. And as @ilkkachu says, it's not sensible to assume that "in a list of many files, two have the same number as attribute" is something anyone can make sense of; even machines struggle (see the discussion under my answer, where ilkkachu raised excellent points) Commented Mar 9 at 13:38
4

Consider also reflinks

Several answers here mentioned that when one side is mutated, the association might survive OR get severed, depending on access pattern.
Severing the link might be bug or feature, depending on your goals, but either way it's bad that it's unreliable!
[E.g. saving in some editors does direct write(), others write to temp then atomically replace rename(temp, f).]

=> Several Unixes and filesystems offer a newer 3rd option: "reflink", which has explicit copy-on-write semantics, on both sides.
Where supported, it saves space by O(1) copy and sharing data, but afterwards that's semantically transparent to users; in all ways both sides behave "as if" they were independent copies, separately changeable.

write() to target atomic replace target move target write() to link atomic replace link move link
ln target hardlink πŸ”— both sides see change πŸ’” separate files πŸ”— follows inode πŸ”— both sides see change πŸ’” separate files πŸ”— follows inode
ln -s target symlink πŸ”— both sides see change πŸ”— both sides see change πŸ’” broken link πŸ”— both sides see change πŸ’” separate files πŸ”— follows target path
cp --reflink target reflink πŸ’” separate files πŸ’” separate files πŸ”— follows content πŸ’” separate files πŸ’” separate files πŸ”— follows content

So reflinks are "just an optimization", and only cover the dedup use cases, but safer for those than both "hardlink farms" and symlinks...
Moreover, by being "undetectable" they may safely support fine-grained linking of even parts of a file (which happens anyway once you change a small portion of a copy-on-write).

2

I think the main reason why "symbolic links more common than hard links in Unix/Linux" is that many users do not understand hard links. And even many programmers. I think this comes from the idea "the name is the file". Note that that idea is wrong for Unix. In Unix, a file can exist with one name, a thousand names, or even zero names.

I think the other thing you should be aware of is that there are non-Unix OSes that include hard links, though they may be broken. I have seen this on VAX/VMS, and I believe it is the case in Microsoft Windows. In both these cases, when the capability was added, there was no link count, so when any name was deleted, the file was deleted. Worse, this left a dangling name which would then be attached to any other file created at the same whatever-their-term-for-an-inode-is. I've heard that later versions of both OpenVMS and Windows fixed the problem to make them fully functional hard links. I have not heard of either of them being easy to create.

Having said all that, I much prefer hard links (though editors occasionally surprise me). I'd love to play around with hard-linked directories, but that became a root-only feature in version 7, and Linux's ext2 doesn't even allow it.

5
  • I second this answer. I believe the widespread use of symbolic links is a result of lack of understanding. Take the "confusing" example from @Marcus Müller's answer, where mv unlinked the target directory entry instead of putting the source's content to the file that target used to point to. I argue that this is the expected behaviour, if we remind ourselves that directory entries are just names, or pointers, to actual files. mv then should move names, just like the rename() system call. Thus, if the target name is already linked to another file, naturally it is unlinked first. Commented Mar 12 at 17:47
  • If one uses Unix and its derived systems, then one should accept the Unix philosophy that a directory entry is nothing but a name and a pointer to an actual file. Like one may be called by many names, a file can also be. One should abandon the assumption that a path is a unique identifier for a file in Unix. It is still an identifier, but not unique. Commented Mar 12 at 17:54
  • I would like to complement this answer by giving an analogy. Take two pointers in C, int *p1, *p2;. Suppose we want to make p2 also point to what p1 is pointing to (c.f. renaming a file so that p2 becomes a name for it), then we would do p2 = p1;. Note that this statement in C always gives us what we want whether p2 already points to something else or not. If p2 does, then that thing just loses a pointer (name) to it. Think how absurd it would be if by p2 = p1 we also implicitly got *p2 = *p1! Commented Mar 12 at 18:16
  • Thanks for the support. But I think there is a problem in the last sentence of your analogy. Given int pointers p1 and p2, after p2 = p1, then *p2 == *p1. I think you mean that it would be absurd if p2 = p1 meant to do instead *p2 = *p1. Commented Mar 12 at 21:04
  • That's right. Thanks for pointing it out. Commented Mar 12 at 21:39

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.