4

I want to translate each byte from a byte[] into a char, then put those chars on a String. This is the so-called "binary" encoding of some databases. So far, the best I could find is this huge boilerplate:

byte[] bytes = ...;
char[] chars = new char[bytes.length];
for (int i = 0; i < bytes.length; ++i) {
    chars[i] = (char) (bytes[i] & 0xFF);
}
String s = new String(chars);

Is there another option from Java SE or perhaps from Apache Commons? I wish I could have something like this:

final Charset BINARY_CS = Charset.forName("BINARY");
String s = new String(bytes, BINARY_CS);

But I'm not willing to write a Charset and their codecs (yet). Is there such a ready binary Charset in JRE or in Apache Commons?

3
  • 1
    How is that "huge boilerplate"? Just wrap it in a method that takes a byte array and returns a string. Commented Dec 7, 2011 at 17:13
  • 2
    Not entirely sure of your problem. Won't ISO 8859-1 (Latin-1) do the job? It is an 8-bit single byte encoding... Commented Dec 7, 2011 at 17:28
  • @ColinD That wrapping was done before. But I had to code this method a few times in distinct projects with no shared library between them. And I don't want to build a library only for this. That's why we use stuff like java.util, java.text, java.lang and Apache Commons. Commented Dec 7, 2011 at 17:34

4 Answers 4

9

You could use the ASCII encoding for 7-bit characters

String s = "Hello World!";
byte[] b = s.getBytes("ASCII");
System.out.println(new String(b, "ASCII"));

or 8-bit ascii

String s = "Hello World! \u00ff";
byte[] b = s.getBytes("ISO-8859-1");
System.out.println(new String(b, "ISO-8859-1"));

EDIT

System.out.println("ASCII => " + Charset.forName("ASCII"));
System.out.println("US-ASCII => " + Charset.forName("US-ASCII"));
System.out.println("ISO-8859-1 => " + Charset.forName("ISO-8859-1"));

prints

ASCII => US-ASCII
US-ASCII => US-ASCII
ISO-8859-1 => ISO-8859-1
Sign up to request clarification or add additional context in comments.

13 Comments

Thanks, but there will exist some 8-bit characteres.
other way around, "ASCII" is an alias for "US-ASCII". obviously, both will work, i'm just saying that this is the "official" name java uses.
@Peter Lawrey: From the article you linked: "US-ASCII is the Internet Assigned Numbers Authority (IANA) preferred charset name for ASCII." Also, I believe the charset that's required to be present in all Java implementations is "US-ASCII".
ISO-8859-1 did the trick. Very interesting. I thought that it would map some bytes to 0x7F, because not all byte values have meaning in this encoding (according to en.wikipedia.org/wiki/ISO/IEC_8859-1).
The standard charset identifiers are listed in the Charset Javadoc: docs.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html
|
1

You could skip the step of a char array and putting in String and even use a StringBuilder (or StringBuffer if you are worried about multi-threading). My example shows StringBuilder.

byte[] bytes = ...;
StringBuilder sb = new StringBuilder(bytes.length);
for (int i = 0; i < bytes.length; i++) {
  sb.append((char) (bytes[i] & 0xFF));
}

return sb.toString();

I know it doesn't answer your other question. Just seeking to help with simplifying the "boilerplate" code.

2 Comments

There is no good reason not to use StringBuilder instead of StringBuffer if you're using it as a local variable like in your example.
@ColinD Modified to StringBuilder. You are right there. Used to using StringBuffer as that was all we had before Java 5. Also, we have a multi-threaded app, so StringBuffer works well for us. But ++ to your point.
0

There is a String constructor that takes an array of bytes and a string specifying the format of the bytes:

String s = new String(bytes, "UTF-8");   // if the charset is UTF-8
String s = new String(bytes, "ASCII");   // if the charset is ASCII

3 Comments

UTF-8 will translate some multi-byte characters in single-char characteres, so it won't work. ASCII only handles 7-bit characteres, and there will exist some 7-bit characters.
Why would this get a downvote? I told you that the String constructor does exactly what you want it to do. Sorry for not doing your research for you as to what charset to use...
This downvote was not from me. By the way, thanks for trying to answer.
0

You can use base64 encoding. There is an implementation done by apache

http://commons.apache.org/codec/

Base 64 http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base64.html

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.