39

Why was colon (:) chosen as path separator?

Note that I mean "path separator" and not "directory separator". Path separator is the symbol placed between the entries in the PATH environment variable.

PATH="/usr/local/sbin:/usr/local/bin:/usr/bin:..."
                     ^ this symbol

Everything in computers and software was once a deliberate decision made by someone somewhere. For example, why tilde represents home dir (and why hjkl for direction keys in vi). I would like to know the background for this decision.


Some random facts:

Having colon as the path separator means that directory with a colon in the name cannot be added to the path.

from POSIX:

Since <colon> is a separator in this context, directory names that might be used in PATH should not include a <colon> character.

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html

It seem to be not possible to escape the colon. @Random832 from Stack Overflow inspected the source code handling PATH and found no escape mechanism.

https://stackoverflow.com/questions/14661373/how-to-escape-colon-in-path-on-unix

13
  • 3
    That's also the separator for /etc/passwd (that also contains paths in the home and shell columns). Commented Sep 21, 2016 at 13:06
  • 15
    I spent about half an hour yesterday researching this question. I read the 1971 Unix Programmer's Manual which specifies the use of a colon but not the reason why colon was chosen over (e.g.) pipe symbol. I also read as much as I could about Multics but it, apparently, only had one directory in its PATH (so no need for separator). I doubt we'll get a good answer here but if there's a chance that some veteran Unix user could answer this question, I'd like them to have the opportunity, so I'm voting to re-open. Commented Sep 22, 2016 at 8:25
  • 7
    There might not have been a shell/environment variable called PATH before the introduction of Unix Version 7 (in 1979), but there was a :-delimited search path as early as 1977. PWB/Unix (Programmer’s Workbench) used the Mashey shell, written by John R. Mashey, which fell chronologically between the Thompson shell and the Bourne shell. … (Cont’d) Commented Sep 22, 2016 at 21:50
  • 5
    (Cont’d) …  The Mashey shell supported 26 shell variables (guess what their names were) — and variable p was the search path (called “the Shell directory search sequence for command execution”), with directories separated by colons. … … … … … … … … … … … … … … … … … … … … … … … Fun fact: while the Mashey shell processed the .profile file, it also allowed you to specify an initial $p value in file called .path. Commented Sep 22, 2016 at 22:00
  • 1
    @rudimeier: Well, back in the 1970s, there weren't popular file systems; there was the Unix file system. Then, when Unix Version 7 came along, there was the Unix Version 7 file system. But to answer your question, it has always been the case that all characters are allowed in filenames except for / (slash) and nul. Commented Sep 24, 2016 at 1:22

1 Answer 1

11

After some digging I don't have a real answer but at least new information to add to this conversation supported by some historical facts.

Here is Peter Chubb in one of his speeches talking about the shell, around the 19:00 mark you can hear him mentioning why e is the alias for the default editor in unix shells. It's because older terminals were not so comfortable or easy to use and typing on them was an unpleasant experience.

He mentions a precise model, the Teletype Model 33 in this case.

After some research I find that this machine only lets you pick in a pool of 64 characters, not even full US ASCII support, 2 to the power of 6 chars, it's a 6 bit combination.

In fact this machine has nothing to do with ASCII at all, meaning that it doesn't even support just the first 64 chars of ASCII, it's just going for a totally unrelated set of inputs and probably not standard (for our modern era) set of characters.

The ASR 33 teletype can print 64 characters which only allowed for UPPER CASE LETTERS, numbers, and symbols.Source

and this just proves that it's definitely not US ASCII given the fact that to support uppercase letters you really need more than 6 bits, the uppercase letters are beyond the 64 chars mark (or the value 63 in decimal if you want to follow a table)

 0 NUL    16 DLE    32      48 0    64 @    80 P    96 `   112 p 
 1 SOH    17 DC1    33 !    49 1    65 A    81 Q    97 a   113 q 
 2 STX    18 DC2    34 "    50 2    66 B    82 R    98 b   114 r 
 3 ETX    19 DC3    35 #    51 3    67 C    83 S    99 c   115 s 
 4 EOT    20 DC4    36 $    52 4    68 D    84 T   100 d   116 t 
 5 ENQ    21 NAK    37 %    53 5    69 E    85 U   101 e   117 u 
 6 ACK    22 SYN    38 &    54 6    70 F    86 V   102 f   118 v 
 7 BEL    23 ETB    39 '    55 7    71 G    87 W   103 g   119 w 
 8 BS     24 CAN    40 (    56 8    72 H    88 X   104 h   120 x 
 9 HT     25 EM     41 )    57 9    73 I    89 Y   105 i   121 y 
10 LF     26 SUB    42 *    58 :    74 J    90 Z   106 j   122 z 
11 VT     27 ESC    43 +    59 ;    75 K    91 [   107 k   123 { 
12 FF     28 FS     44 ,    60 <    76 L    92 \   108 l   124 | 
13 CR     29 GS     45 -    61 =    77 M    93 ]   109 m   125 } 
14 SO     30 RS     46 .    62 >    78 N    94 ^   110 n   126 ~ 
15 SI     31 US     47 /    63 ?    79 O    95 _   111 o   127 DEL 

Now we know that we get 64 chars out of this thing, without any real standard to support them in coded table and we also don't have lowercase letters, just uppercase plus symbols and numbers.

Thanks to this website I can show you the input layout of such keyboard

ASR33 keyboard layout

and by pressing SHIFT you also get

ASR33 keyboard layout (second layer)

There is also a bit more information about how the physical connections that generate the characters are coded (the page also clarifies that ASR33 and ASCII chars are different down to the bit level).

I think it's interesting to note that there are no { or } but only ( and ) which means that probably creating subshells was OK but creating new processes was probably not so easy or permitted by the terminal.

In the end I don't think that there is a real scientific answer, it was probably a "free" character waiting for a special meaning; one thing is sure though: shells and terminals are older than ASCII and thinking about ASCII or any coded table as we know them today is probably not going to solve the mystery.

1

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.