Skip to main content
Became Hot Network Question
Added quetion about Perl arguments encoding
Source Link
Andrew15_5
  • 291
  • 1
  • 8

I was working on a keymap script (map keys from one language keyboard layout to another). And after a lot of hard time trying to get everything working I found out that different characters are treated differently in all programs (perl, python). Then I run a simple test script (now simplified) in a terminal (kitty, gnome-terminal — it doesn't matter):

python -c 'import sys;print(len(sys.argv[1]))' テスト

And got an expected result:

3

But if I run this in a sh/bash (unix&utf-8) file:

#!/usr/bin/env bash
# or
#!/bin/sh
python -c 'import sys;print(len(sys.argv[1]))' テスト

I get (./test.sh):

9

And that's the reason all this encode/decode/upgrade/downgrade UTF-8 stuff didn't work in Perl (if I would run the command manually from terminal it would probably work without all this additional encoding functions).

Now I have a problem: why the exact same command gives me different results depending on the execution environment (terminal emulator vs shell script)? How can I fix this?

Update:

I forgot about my:

alias python='python3'

So with Python, running python3 explicitly makes everything work the same in both cases. But with Perl on the other hand:

echo 'print length $ARGV[0];' | perl -l -- - テスト

This works the same, but in both cases it outputs 9. With Perl there are no different versions and mine is 5.30.0 (which is printed in both cases exactly the same). Do I have to add some code in Perl itself to make it work like Python3 (length of 1 Unicode character is 1 and not 1-3 bytes)?

I was working on a keymap script (map keys from one language keyboard layout to another). And after a lot of hard time trying to get everything working I found out that different characters are treated differently in all programs (perl, python). Then I run a simple test script (now simplified) in a terminal (kitty, gnome-terminal — it doesn't matter):

python -c 'import sys;print(len(sys.argv[1]))' テスト

And got an expected result:

3

But if I run this in a sh/bash (unix&utf-8) file:

#!/usr/bin/env bash
# or
#!/bin/sh
python -c 'import sys;print(len(sys.argv[1]))' テスト

I get (./test.sh):

9

And that's the reason all this encode/decode/upgrade/downgrade UTF-8 stuff didn't work in Perl (if I would run the command manually from terminal it would probably work without all this additional encoding functions).

Now I have a problem: why the exact same command gives me different results depending on the execution environment (terminal emulator vs shell script)? How can I fix this?

I was working on a keymap script (map keys from one language keyboard layout to another). And after a lot of hard time trying to get everything working I found out that different characters are treated differently in all programs (perl, python). Then I run a simple test script (now simplified) in a terminal (kitty, gnome-terminal — it doesn't matter):

python -c 'import sys;print(len(sys.argv[1]))' テスト

And got an expected result:

3

But if I run this in a sh/bash (unix&utf-8) file:

#!/usr/bin/env bash
# or
#!/bin/sh
python -c 'import sys;print(len(sys.argv[1]))' テスト

I get (./test.sh):

9

And that's the reason all this encode/decode/upgrade/downgrade UTF-8 stuff didn't work in Perl (if I would run the command manually from terminal it would probably work without all this additional encoding functions).

Now I have a problem: why the exact same command gives me different results depending on the execution environment (terminal emulator vs shell script)? How can I fix this?

Update:

I forgot about my:

alias python='python3'

So with Python, running python3 explicitly makes everything work the same in both cases. But with Perl on the other hand:

echo 'print length $ARGV[0];' | perl -l -- - テスト

This works the same, but in both cases it outputs 9. With Perl there are no different versions and mine is 5.30.0 (which is printed in both cases exactly the same). Do I have to add some code in Perl itself to make it work like Python3 (length of 1 Unicode character is 1 and not 1-3 bytes)?

Source Link
Andrew15_5
  • 291
  • 1
  • 8

Different encoding/Unicode interpretation using terminal vs using shell script

I was working on a keymap script (map keys from one language keyboard layout to another). And after a lot of hard time trying to get everything working I found out that different characters are treated differently in all programs (perl, python). Then I run a simple test script (now simplified) in a terminal (kitty, gnome-terminal — it doesn't matter):

python -c 'import sys;print(len(sys.argv[1]))' テスト

And got an expected result:

3

But if I run this in a sh/bash (unix&utf-8) file:

#!/usr/bin/env bash
# or
#!/bin/sh
python -c 'import sys;print(len(sys.argv[1]))' テスト

I get (./test.sh):

9

And that's the reason all this encode/decode/upgrade/downgrade UTF-8 stuff didn't work in Perl (if I would run the command manually from terminal it would probably work without all this additional encoding functions).

Now I have a problem: why the exact same command gives me different results depending on the execution environment (terminal emulator vs shell script)? How can I fix this?