11

I recently discovered on a machine with RHEL6: ls -lbi

917921 -rw-r-----. 1 alex pivotal  5245 Dec 17 20:36 application.yml
917922 -rw-r-----. 1 alex pivotal  2972 Dec 17 20:36 application11.yml
917939 -rw-r-----. 1 alex pivotal  3047 Dec 17 20:36 application11.yml
917932 -rw-r-----. 1 alex pivotal  2197 Dec 17 20:36 applicationall.yml

I was wondering how something like this can be achieved ?

7
  • 3
    try ls -lb to show you any trailing spaces etc. Commented Dec 20, 2019 at 10:36
  • Hi Steve, the names are identically , I've checked multiple times, and these 2 particular files are baffling the autocomplete also Commented Dec 20, 2019 at 10:43
  • 1
    Maybe expand the question to include the command you're running to get that output, and also the output of ls -lib ? Commented Dec 20, 2019 at 10:44
  • I was able to move one of them : find . -inum INODE -exec mv {} new_file_name but I am very curios this feat was accomplished . Commented Dec 20, 2019 at 10:44
  • If you have GNU ls, you can run ls -lQ to see a quoted version of the filename. One of the files may have trailing whitespaces. Commented Dec 20, 2019 at 11:14

2 Answers 2

11

I was able to reproduce that behavior. See for example:

ls -lib
268947 -rw-r--r-- 1 root root  8 Dez 20 12:32 app
268944 -rw-r--r-- 1 root root 24 Dez 20 12:33 aрр

This is on my system (Linux debian 4.9.0-7-amd64 #1 SMP Debian 4.9.110-3+deb9u2 (2018-08-13) x86_64 GNU/Linux).

I have a UTF-8 locale and the character p in the above output is not the same, but it looks similar. In the first line it's a LATIN SMALL LETTER P and in the second line a CYRILLIC SMALL LETTER ER (see https://unicode.org/cldr/utility/confusables.jsp?a=p&r=None). This is just an example, it could be every character in the filename, even the dot.

When I use a UTF-8 locale, my shell gives the above output. But if I use a locale that has not all unicode characters for example the default locale c, then the output looks as follows (you can change the local by setting LC_ALL):

LC_ALL=c ls -lib
268947 -rw-r--r-- 1 root root  8 Dec 20 12:32 app
268944 -rw-r--r-- 1 root root 24 Dec 20 12:33 a\321\200\321\200

This is because the CYRILLIC SMALL LETTER ER is not present in ASCII.

4
  • But in your answer, the names are different, whereas the OP explicitly commented: "Hi Steve, the names are identically , I've checked multiple times". Commented Dec 20, 2019 at 19:31
  • 2
    @JörgWMittag The user never say anything about their locale though, so it's plausible that "checking multiple times" never included setting the locale to the C locale, and therefore getting the same-looking output every time, in their UTF-8 locale. Commented Dec 20, 2019 at 19:50
  • 1
    @JörgWMittag yeah, but that isn't possible. The file names will be different, it's just not a difference that can be seen when using simple ls. Commented Dec 20, 2019 at 19:50
  • 3
    It may help to know that, on all implementations of Unix except for MacOS, file names are treated as opaque sequences of bytes, not characters, by the kernel. You can put absolutely any sequence of bytes you want in a directory entry, as long as none of the bytes have value 0x2F (path component separator, ASCII '/') or 0x00 (C string terminator, ASCII NUL). Interpreting the byte sequence as some encoding of characters is entirely user space's problem. Commented Dec 20, 2019 at 21:06
0

I just had the same problem. Two files with different inodes, but apparently the same name (French).Chaos' excellent answer put me on the track: the two characters é in the name of the second file are different from those of the first file. the files don't have the same inode:

me@ubuntu:~$ ls -li 2020\ 06\ 03\ CR\ R*
9586921 -rw-rw-r-- 1 francis francis 107933 jun 4 18:53 '2020 06 03 CR Réunion équipe.docx'
9569690 -rw-rw-r-- 1 francis francis 107933 jun 4 17:11 '2020 06 03 CR Réunion équipe.docx'

and they don't have exactly the same name:

me@ubuntu:~$ LC_ALL=c ls 2020\ 06\ 03\ CR\ R*
bash: warning: setlocale: LC_ALL: cannot change locale (c)
'2020 06 03 CR Re'$'\314\201''union e'$'\314\201''quipe.docx' '2020 06 03 CR R'$'\303\251''union '$'\303\251''quipe.docx'

and their contents are identical:

me@ubuntu:~$ cmp '2020 06 03 CR Réunion équipe.docx' '2020 06 03 CR Réunion équipe.docx'

The explanation is that in the name of the second file, é is UTF8 LATIN SMALL LETTER E WITH ACUTE (C3 A9), in the first LATIN SMALL LETTER E + COMBINING ACUTE ACCENT (65 CC 81) :

Hexa dump of the two file names

Attention, by pasting LATIN SMALL LETTER E + COMBINING ACUTE ACCENT in a web form, it can be transformed into LATIN SMALL LETTER E WITH ACUTE.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.