Timeline for Java takes 2 bytes to represent character?
Current License: CC BY-SA 4.0
8 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| May 17, 2023 at 19:42 | comment | added | skomisa | It's worth noting that since your answer was posted JEP 254: Compact Strings was implemented in Java 9. The effect of that change was to internally store Latin-1 characters (i.e. those <= 255) in a single byte rather than two. So the "horrible bodge" has become even more horrible, though the enhancement itself is entirely reasonable and understandable. | |
| Jul 11, 2018 at 16:43 | audit | Low quality posts | |||
| Jul 11, 2018 at 16:44 | |||||
| Jul 9, 2018 at 3:09 | vote | accept | user3198603 | ||
| Jul 6, 2018 at 21:02 | comment | added | gnasher729 | Swift standard library is flexible: It uses either ASCII or UTF-16. But that’s the representation only. Character = extended graphemes cluster. Lots of characters consist of more than one Unicode code points. | |
| Jul 6, 2018 at 16:12 | comment | added | amon |
As an addendum to this answer: Java provides methods like Character.isSurrogate(char ch) to check whether one Java-character (= UTF16 code unit) is half of an encoded Unicode code point, and related methods to get the code point as an int. Because of these surrogate pairs, string operations like length(), charAt() and substring() are really tricky (i.e. can give “wrong“ or corrupted results). The simple interview question “how do you reverse a string?” is also complete horror when considering surrogate pairs (or other Unicode specialties like combining characters or bidi-marks).
|
|
| Jul 6, 2018 at 15:35 | comment | added | Deduplicator | @BgrWorker .Net does that because Windows NT went from UCS-2 to UTF-16 too, and it's a MS design. | |
| Jul 6, 2018 at 15:23 | comment | added | BgrWorker | I'll also add that it's not the only language to do that. .NET languages also use UTF-16 for their internal character representation. | |
| Jul 6, 2018 at 14:40 | history | answered | Simon B | CC BY-SA 4.0 |