55

I've created a UTF8 script for PowerShell with non-ascii characters.

characters.ps1:

Write-Host "ç â ã á à"

When the script is run in PowerShell console, it outputs wrong characters.

enter image description here

However, if I write the chars directly in the console, they are shown as expected:

enter image description here

Does anyone knows what causes that behavior?

The problem arised from a script I wrote who has hardcoded paths which include non-ascii characters. When I try to pass the path as argument to cmdlets (in the case I was gonna robocopy a folder) the command fails because it cannot find the path (which is output wrongly in the screen).

1
  • 1
    With the character "Ä" (capital ä) it's even worse; as soon as you write it between double quotation marks it will produce an error, if the file is encoded with utf-8 without bom. Commented Jan 31, 2018 at 9:10

4 Answers 4

118

Changing the encoding of the script to UTF-8 with BOM solved the issue.

I was using SublimeText with the EncodingHelper plugin to control the character-set of the script. It was set correctly to UTF8.

I changed the encoding of the script in SublimeText to "UTF-8 with BOM" and the output was shown correctly.

I created the same script with Notepad++, which defaults to "UTF-8 with BOM", and the string was shown correctly in the console.

I changed the encoding of the script in Notepad++ to "UTF-8 without BOM" and it was shown incorrectly.

It seems PowerShell cannot guess correctly the encoding of UTF-8 files with no BOM.

Sign up to request clarification or add additional context in comments.

5 Comments

This is pathetic. Especially considering how pointless and useless UTF-8 BOM is. +1 for enlightening information though.
Tried like 10 commands wih no result, and it was just that ... Thanks bro
I would guess that in the absence of the BOM, Windows assumes Windows-1252 encoding for legacy reasons, unlike Linux which assumes UTF-8.
This happened to me too. I created a PowerShell script with VS Code that created an Azure AD group with accented characters in the group description. Something was mangling the description, and it looks like that something was PowerShell. VS Code created the script as UTF-8 with no BOM, but I used Notepad++ to add the BOM and that fixed it.
In VS Code, my file was UTF8 by default. I saved with encoding UTF8 with BOM and it worked like magic. (Click on the encoding, bottom right part of editor)
15

In my case the problem was caused by creating a new PowerShell script with Visual Studio Code which has the default encoding of UTF-8 without BOM. Set the encoding to "Windows 1252" solved the problem.

It seems that PowerShell can't handle UTF-8 without BOM, it needs "Windows 1252" or "UTF8 with BOM" encodings.

Comments

1

There is a reliable way to detect utf8nobom (https://unicodebook.readthedocs.io/guess_encoding.html). Like a lot of other little things, this seems to work better in PS 6. Even my beloved emacs 25 for windows gets the encoding wrong.

PS C:\users\admin> pwsh
PowerShell 6.1.0
Copyright (c) Microsoft Corporation. All rights reserved.

https://aka.ms/pscore6-docs
Type 'help' to get help.

PS C:\users\admin> "write-host 'ç â ã á à'" | set-content -Encoding utf8NoBOM accent.ps1
PS C:\users\admin> .\accent
ç â ã á à

Comments

0

See what version you are running by executing $PSVersionTable, if PSVersion is 5.1, you're running Windows PowerShell (and can see that in $PSHOME leading to system32), which is an extremely dated version.

If you instead use modern PowerShell, you'll find that this and other behaviors are improved.
The problem is gone for me on the current 7.4 version, and someone else posted that 6.1 is also fine.

Alternatively, save your scripts as UTF8 with BOM to make 5.1 behave nicer.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.