0

I'm trying to convert a byte array to a String, then back to a byte array. The first part (byte[] to string) works, when I try to convert the string back to a byte array then compare what I get with my initial byte array, I find out they're different. I'm guessing it's an encoding issue, I tried different solutions (using UTF-8, ISO-8859-1, UTF-16LE and others) but none seem to work.

Would anyone know how to solve this problem? Thanks in advance

Path path = Paths.get("C:\\folder1", "profil1.bmp");

        try {

            //file to byte[] 
            byte[] byte_array = Files.readAllBytes(path);
            System.out.println(Arrays.toString(byte_array ));

            //byte[] to string
            String byte_string = Arrays.toString(byte_array); 

            //String to byte[]
            byte[] string_byte = byte_string.getBytes();

            System.out.println(Arrays.equals(byte_array, string_byte));

        } catch (IOException e) {
            System.out.println(e);
        }

Here's the output: (The result was too long, so I cut of a part of it)

[66, 77, -10, -44, 1, 0, 0, 0, 0, 0, 1, -1, ....... ,-1]
false
4
  • Why do you want to treat your BMP data as a String ? Commented Jul 16, 2015 at 14:46
  • I want to send it in an ArrayList<HashMap<String, String>> alongside other information (which are all strings) Commented Jul 16, 2015 at 14:48
  • Obviously the Characterset in both byte[] is different. Try big-endian or little-endian (depens on your OS) Commented Jul 16, 2015 at 14:50
  • Try US-ASCII encoding (but make sure you use it for both encoding and decoding). It's generally a bad idea to treat raw data as Strings, but if you've no choice... Commented Jul 16, 2015 at 14:51

2 Answers 2

3

Arrays.toString(byte[]) doesn't just convert the byte[] into a String, it converts it to a human-readable format. When you then call getBytes() on that String, it is converting the characters that represent the original byte information into a byte[], along with the formatting characters, such as the brackets and commas.

If you want to create a String from a byte[] use the String constructor which takes a byte[] to explicitly create a String object containing your data:

    ...
    //byte[] to string
    String byte_string = new String(byte_array);

    //String to byte[]
    byte[] string_byte = byte_string.getBytes();

    System.out.println(Arrays.equals(byte_array, string_byte));

As pointed out by others, not all binary data is cleanly represented in all character sets, so you might be able to get the conversion to work by explicitly specifying the encoding.

For instance, the above sample code still outputs false when I try to encode an executable program file (.exe), but compares as true if I specify ISO_8859_1 encoding:

    //byte[] to string
    String byte_string = new String(byte_array, StandardCharsets.ISO_8859_1);

    //String to byte[]
    byte[] string_byte = byte_string.getBytes(StandardCharsets.ISO_8859_1);

    System.out.println(Arrays.equals(byte_array, string_byte));

The absolute safest way to convert your data to a String and back would be to use base64 encoding as suggested by this answer:

    //file to byte[] 
    byte[] byte_array = Files.readAllBytes(path);
    byte[] encoded = Base64.encodeBase64(byte_array);

    //byte[] to string
    String byte_string = new String(encoded, StandardCharsets.US_ASCII);

    //String to byte[]
    byte[] string_byte = byte_string.getBytes(StandardCharsets.US_ASCII);
    byte[] decoded = Base64.decodeBase64(string_byte);

    System.out.println(Arrays.equals(byte_array, decoded));
Sign up to request clarification or add additional context in comments.

2 Comments

I tried it, but I'm still getting 'false'. I also tried specifying the which encoding to be used (using the answer of Joop Eggen) but it didn't work either.
@HusaynHakeem I've updated my answer to (hopefully) fix that.
1

Char/String contain Unicode text by design (as opposed to other languages). That means they

  • always convert back and forth to binary data (byte[]) using the encoding (of the bytes);
  • cannot hold any binary data, if the bytes are not well-formed
  • may mix several scripts Latin/Cyrillic/Arabic/symbols.

So:

byte[] b = s.getBytes(StandardCharsets.UTF_8);
s = new String(b, StandardCharsets.UTF_8);

Without the charset parameter the default encoding is used, platform dependent. The conversion will possibly substitute placeholders for non-representable chars, or the binary data may be totally malformed.

Text (String/char) are totally separate from binary data (byte). Also not that char is 2 bytes UTF-16BE, whereas byte is 1 byte.

1 Comment

Thanks for explaining, I now understand my mistake. I changed my code and tried doing what you said, but I'm still getting "false" (so the two strings are still different)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.