81

I have a string that I need to convert to the equivalent array of bytes in .NET.

This ought to be easy, but I am having a brain cramp.

4 Answers 4

105

You need to use an encoding (System.Text.Encoding) to tell .NET what you expect as the output. For example, in UTF-16 (= System.Text.Encoding.Unicode):

var result = System.Text.Encoding.Unicode.GetBytes(text);
Sign up to request clarification or add additional context in comments.

3 Comments

There are a lot more encodings in System.Text.Encoding than just Unicode: make sure you understand which one you need.
Joel: Hence I wrote “for example”. ;-) But your comment is of course valid.
:) Trying to help show where the non-UTF16 encodings are- I probably could have worded it better.
43

First work out which encoding you want: you need to know a bit about Unicode first.

Next work out which System.Text.Encoding that corresponds to. My Core .NET refcard describes most of the common ones, and how to get an instance (e.g. by a static property of Encoding or by calling a Encoding.GetEncoding.

Finally, work out whether you want all the bytes at once (which is the easiest way of working - call Encoding.GetBytes(string) once and you're done) or whether you need to break it into chunks - in which case you'll want to use Encoding.GetEncoder and then encode a bit at a time. The encoder takes care of keeping the state between calls, in case you need to break off half way through a character, for example.

37 Comments

@Mehrdad: You absolutely do. An encoding defines what the conversion from a string to a byte array does. Compression and encryption are entirely different matters. Otherwise it's like saying the image format doesn't matter when you want to save a picture as a file - many different image formats may be okay, but there has to be one involved, by definition.
@Mehrdad: No, the user does need to know the encoding. Just because UTF-16 is in some sense the natural encoding for .NET doesn't mean it's the encoding he wants to use. The point of writing data out is so that it can be read again - and that will need to use the same encoding. The fact that the OP referred to "the equivalent array of bytes" suggests that they're unaware that encodings even exist, and it's vitally important to understand encodings if you're going to convert between text and binary representations.
I've seen countless people fail to preserve information correctly because they haven't understood encodings. In my experience, educating them about the topic is a much better approach than using Buffer.BlockCopy and assuming it's what they want.
@Mehrdad: But someone is going to interpret the bytes later. You're right in saying that the compression/encryption part doesn't need to care, but whatever's going to later turn it back into a string absolutely does... and if no-one's ever going to interpret the data, there's not much point in it being there. So yes, you do still need to choose an encoding, and make sure it's used consistently. Which encoding you decide to use is somewhat arbitrary so long as it can encode all your text, although it will affect space etc. Arbitrary isn't the same as irrelevant though.
@Mehrdad: Yes, absolutely. Just like you must choose an image format if you want to save a picture to disk. Use that analogy as far as you can. Strings aren't made of bytes (conceptually) so in order to convert to bytes, you have to go through some sort of conversion... and that is precisely the encoding.
|
21

What Encoding are you using? Konrad's got it pretty much down, but there are others out there and you could get goofy results with the wrong one:

byte[] bytes = System.Text.Encoding.XXX.GetBytes(text)

Where XXX can be:

ASCII
BigEndianUnicode
Default
Unicode
UTF32
UTF7
UTF8

Comments

10

Like this:

    string test = "text";
    byte[] arr = Encoding.UTF8.GetBytes(test);

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.