1

I am converting my string to byte array using ASCII encoding using below code.

String data = "<?xml version="1.0" encoding="utf-8"?><ns0:ReceivedPayment Amount="1.01"/>"
byte[] buffer = Encoding.ASCII.GetBytes(data);

The problem i am facing is it's adding "?" in my string.

Now if i again convert back my byte array to string

var str = System.Text.Encoding.Default.GetString(buffer);

my string becomes

string str = "?<?xml version="1.0" encoding="utf-8"?><ns0:ReceivedPayment Amount="1.01"/>"

Does any one know why it's adding "?" in my string and how to remove it.

9
  • I could not reproduce this, but using mismatching encoding and decoding is wrong anyway (even if it had worked) Commented Feb 12, 2016 at 19:05
  • 1
    Is encoding="utf-8" not a hint you should use Encoding.UTF8? Commented Feb 12, 2016 at 19:05
  • encoding="utf-8" is just inside string. even i remove that it's behaving same. Commented Feb 12, 2016 at 19:08
  • @user1104946 it means that the receiving end will (or at least should) decode it as if it was utf-8. If it's not, well, that could be bad. Commented Feb 12, 2016 at 19:10
  • I just changed Encoding.ASCII.GetBytes(data) to Encoding.UTF8.GetBytes(data) still facing same issue. Commented Feb 12, 2016 at 19:12

3 Answers 3

5

It seems that you showed only simplified code. Am I right that you read data from a file? If yes, check for a BOM (byte order mark) field at the begining of the file. It is used for encoding: UTF-8, UTF-16 and UTF-32.

Sign up to request clarification or add additional context in comments.

2 Comments

Yes, I only showed simplified code. My string had hidden character. I used Regex expression to remove hidden character.
There are many ways to remove BOM markers. If you are sure that the first significant char in you string is always '<', then you can use code like this: int index = data.IndexOf("<"); if (index > 0) data = data.Substring(index);
0

There a several things wrong here. One is not showing the relevant code.

Nonetheless, if you use valid methods to read text from a UTF-8, UTF-32, etc file, you won't have a BOM in your string because the string will hold the text and the BOM is not part of the text.

One the other hand, if you are reading an XML file, it is not a "text" file. You should use an XML reader. That would take care to use the encoding that is (most likely) indicated in the file.

And, when you write an XML file (which I presume you'll be doing with the byte array), you should use an XML writer. That would take care to use the encoding you specify and write it into the file.

Keep in mind, though, that conversion from Unicode (for which UTF-8 is one encoding) to some other character set can silently corrupt your data with a replacement character (typically '?') for those that are not in the target character set.

Comments

-1

Here is my extension method:

   public static byte[] ToByteArray(this string str)
    {
        var bytes = new byte[str.Length * sizeof(char)];
        Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
        return bytes;
    }

1 Comment

so is this normal converting string to byte array adds a "?" in string.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.