Became Hot Network Question

occurred Jan 25, 2023 at 6:37

Added quetion about Perl arguments encoding

Source Link

edited Jan 24, 2023 at 23:09

291
1
8

I was working on a keymap script (map keys from one language keyboard layout to another). And after a lot of hard time trying to get everything working I found out that different characters are treated differently in all programs (perl, python). Then I run a simple test script (now simplified) in a terminal (kitty, gnome-terminal — it doesn't matter):

python -c 'import sys;print(len(sys.argv[1]))' テスト

And got an expected result:

But if I run this in a sh/bash (unix&utf-8) file:

#!/usr/bin/env bash
# or
#!/bin/sh
python -c 'import sys;print(len(sys.argv[1]))' テスト

I get (./test.sh):

And that's the reason all this encode/decode/upgrade/downgrade UTF-8 stuff didn't work in Perl (if I would run the command manually from terminal it would probably work without all this additional encoding functions).

Now I have a problem: why the exact same command gives me different results depending on the execution environment (terminal emulator vs shell script)? How can I fix this?

Update:

I forgot about my:

alias python='python3'

So with Python, running python3 explicitly makes everything work the same in both cases. But with Perl on the other hand:

echo 'print length $ARGV[0];' | perl -l -- - テスト

This works the same, but in both cases it outputs 9. With Perl there are no different versions and mine is 5.30.0 (which is printed in both cases exactly the same). Do I have to add some code in Perl itself to make it work like Python3 (length of 1 Unicode character is 1 and not 1-3 bytes)?

I was working on a keymap script (map keys from one language keyboard layout to another). And after a lot of hard time trying to get everything working I found out that different characters are treated differently in all programs (perl, python). Then I run a simple test script (now simplified) in a terminal (kitty, gnome-terminal — it doesn't matter):

python -c 'import sys;print(len(sys.argv[1]))' テスト

And got an expected result:

But if I run this in a sh/bash (unix&utf-8) file:

#!/usr/bin/env bash
# or
#!/bin/sh
python -c 'import sys;print(len(sys.argv[1]))' テスト

I get (./test.sh):

And that's the reason all this encode/decode/upgrade/downgrade UTF-8 stuff didn't work in Perl (if I would run the command manually from terminal it would probably work without all this additional encoding functions).

Now I have a problem: why the exact same command gives me different results depending on the execution environment (terminal emulator vs shell script)? How can I fix this?

I was working on a keymap script (map keys from one language keyboard layout to another). And after a lot of hard time trying to get everything working I found out that different characters are treated differently in all programs (perl, python). Then I run a simple test script (now simplified) in a terminal (kitty, gnome-terminal — it doesn't matter):

python -c 'import sys;print(len(sys.argv[1]))' テスト

And got an expected result:

But if I run this in a sh/bash (unix&utf-8) file:

#!/usr/bin/env bash
# or
#!/bin/sh
python -c 'import sys;print(len(sys.argv[1]))' テスト

I get (./test.sh):

And that's the reason all this encode/decode/upgrade/downgrade UTF-8 stuff didn't work in Perl (if I would run the command manually from terminal it would probably work without all this additional encoding functions).

Now I have a problem: why the exact same command gives me different results depending on the execution environment (terminal emulator vs shell script)? How can I fix this?

Update:

I forgot about my:

alias python='python3'

So with Python, running python3 explicitly makes everything work the same in both cases. But with Perl on the other hand:

echo 'print length $ARGV[0];' | perl -l -- - テスト

This works the same, but in both cases it outputs 9. With Perl there are no different versions and mine is 5.30.0 (which is printed in both cases exactly the same). Do I have to add some code in Perl itself to make it work like Python3 (length of 1 Unicode character is 1 and not 1-3 bytes)?

Source Link

asked Jan 24, 2023 at 22:37

Andrew15_5

291
1
8

Different encoding/Unicode interpretation using terminal vs using shell script

I was working on a keymap script (map keys from one language keyboard layout to another). And after a lot of hard time trying to get everything working I found out that different characters are treated differently in all programs (perl, python). Then I run a simple test script (now simplified) in a terminal (kitty, gnome-terminal — it doesn't matter):

python -c 'import sys;print(len(sys.argv[1]))' テスト

And got an expected result:

But if I run this in a sh/bash (unix&utf-8) file:

#!/usr/bin/env bash
# or
#!/bin/sh
python -c 'import sys;print(len(sys.argv[1]))' テスト

I get (./test.sh):

And that's the reason all this encode/decode/upgrade/downgrade UTF-8 stuff didn't work in Perl (if I would run the command manually from terminal it would probably work without all this additional encoding functions).

Now I have a problem: why the exact same command gives me different results depending on the execution environment (terminal emulator vs shell script)? How can I fix this?

shell-script terminal unicode character-encoding

Stack Exchange Network

Return to Question

Different encoding/Unicode interpretation using terminal vs using shell script