1

I have written them following shell script to view what the Unicode characters look like on my terminal.

#!/bin/bash

X=0

while [ $X -lt 65536 ]; do
    HEX=`bc <<< "obase=16; $X"`
    HEX="0x${HEX}"
    UCODENAME=`printf "%0*x\n" 4 $HEX`
    UCODECHAR=`printf "\u%0*x\n" 4 $HEX`
    echo -e "Unicode ${UCODENAME} = ${UCODECHAR}"
    X=$((X + 1))
done

When I run the script I receive the following output:

print_unicode: line 9: printf: missing unicode digit for \u
Unicode 0188 = ƈ

The second line is exactly what I am looking for.

I did try using strictly printf in an attempt to eliminate the error.

#!/bin/bash

X=0

while [ $X -lt 65536 ]; do
    HEX=`bc <<< "obase=16; $X"`
    HEX="0x${HEX}"
    printf 'Unicode %0*x = \u%0*x\n' 4 $HEX 4 $HEX
    X=$((X + 1))
done

I get the following output:

print_unicode: line 8: printf: missing unicode digit for \u
Unicode 037f = \u037f

The second line is not what I am looking for, and I still get the same error message.

How do I fix this error?

Bonus: What is a more elegant solution for this?

0

3 Answers 3

1

The reason of the mistake you get is that:
The builtin printf understand the \U (or \u) only when it is followed by an actual number:

$ printf '\U0021'
!

To make it to create the number and also convert it, a two step printf is needed (a double \ is needed to pass thru the double quote):

$ printf '%b' "$(printf '\\U%04X' 33)"
!

As you want it:

$ printf '%b' "$(printf '\\u%0*X' 4 33)"
!

This also work:

$ printf '%b' "$(printf '\\U%0*X' 8 33)"
!

There is no need to use bc to tell bash of hexadecimal numbers.
bash could understand this perfectly well:

$ a=$(( 0xdef )); echo $(( a + 1 ))
3568

And to get the hexadecimal value of a number printf is good enough:

$ printf '0x%06x' 3568
0x000df0

The loop could be simplified to:

#!/bin/bash

cp=$((0x020))     len=6

for (( cp=32; cp<$((0x010000)); cp++)); do    
    Ucode="$(printf '%b' "$(printf '\\U%0*X' "$len" "$cp")")"
    printf 'Unicode U%0*x = %s\n' 4 "$cp" "$Ucode"
done

BEWARE From 0x20 to 0x010000 there are a lot of lines (~ 64k lines).

I increased the len to 6 as UNICODE could have code points up to 10FFFF.

Of course, Ucode is fully defined by this:

Ucode="$(printf '%b' "$(printf '\\U%0*X' $len "$cp")")"

Note that code points (cp) below dec=32 or HEX=0x20 are control characters.

Even if the code works for such codepoints I do not recommend you to play with them.

EXCEPT for UNICODE U0000 because the value is being assigned to a variable.

This prints \0

$ printf '%b' "$(printf '\\U%0*X' "6" "0")"

Confirm with xxd:

$ printf '%b' "$(printf '\\U%0*X' "6" "0")" | xxd
0000000: 00

CAVEAT: Bash below 4.3 fail to correctly encode values between U0080 and U00FF in utf-8. Please use version 4.3 or 4.4 .

1
  • "\\u" still gives an error printf: missing unicode digit for \u, i found that "\\\u" is works better an outputs \u as expected. Commented Aug 25, 2019 at 10:16
1

I kept experimenting a found a solution.

#!/bin/bash

X=0

while [ $X -lt 65536 ]; do
    HEX=`bc <<< "obase=16; $X"`
    HEX="0x${HEX}"
    UCODE=`printf "%0*x\n" 4 $HEX`
    printf "Unicode ${UCODE} = \u${UCODE}\n"
    X=$((X + 1))
done                                                        

I got the idea to try printf this way from: https://stackoverflow.com/questions/5947742/how-to-change-the-output-color-of-echo-in-linux

I'm still open to seeing more elegant solutions.

1

You can do this in a different way (since bash appears to ignore escaped backslashes around the u in "\u"):

#!/bin/bash

X=0

while [ $X -lt 65536 ]; do
    HEX=$(bc <<< "obase=16; $X")
    HEX="0x${HEX}"
    UCODENAME=$(printf "%0*x\n" 4 $HEX)
    UCODECHAR="\\u$(printf "%0*x" 4 $HEX)"
    echo -e "Unicode ${UCODENAME} = ${UCODECHAR}"
    X=$((X + 1))
done

though of course, the script is still bash-specific. A few other comments:

  • most people would suggest using $( and ) rather than back-tics.
  • bash's printf can print Unicode directly (no need for the echo).
  • the extra printf for UCODECHAR is redundant

Eliminating the redundancy:

#!/bin/bash

X=0

while [ $X -lt 65536 ]; do
    HEX=$(bc <<< "obase=16; $X")
    HEX="0x${HEX}"
    UCODENAME=$(printf "%0*x\n" 4 $HEX)
    UCODECHAR="\\u${UCODENAME}"
    echo -e "Unicode ${UCODENAME} = ${UCODECHAR}"
    X=$((X + 1))
done
1
  • Please read my answer, bash does not ignore backslash. But to get it working correctly it may be tricky sometimes. Commented Mar 30, 2016 at 1:08

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.