Skip to main content
8 events
when toggle format what by license comment
May 17, 2023 at 19:42 comment added skomisa It's worth noting that since your answer was posted JEP 254: Compact Strings was implemented in Java 9. The effect of that change was to internally store Latin-1 characters (i.e. those <= 255) in a single byte rather than two. So the "horrible bodge" has become even more horrible, though the enhancement itself is entirely reasonable and understandable.
Jul 11, 2018 at 16:43 audit Low quality posts
Jul 11, 2018 at 16:44
Jul 9, 2018 at 3:09 vote accept user3198603
Jul 6, 2018 at 21:02 comment added gnasher729 Swift standard library is flexible: It uses either ASCII or UTF-16. But that’s the representation only. Character = extended graphemes cluster. Lots of characters consist of more than one Unicode code points.
Jul 6, 2018 at 16:12 comment added amon As an addendum to this answer: Java provides methods like Character.isSurrogate(char ch) to check whether one Java-character (= UTF16 code unit) is half of an encoded Unicode code point, and related methods to get the code point as an int. Because of these surrogate pairs, string operations like length(), charAt() and substring() are really tricky (i.e. can give “wrong“ or corrupted results). The simple interview question “how do you reverse a string?” is also complete horror when considering surrogate pairs (or other Unicode specialties like combining characters or bidi-marks).
Jul 6, 2018 at 15:35 comment added Deduplicator @BgrWorker .Net does that because Windows NT went from UCS-2 to UTF-16 too, and it's a MS design.
Jul 6, 2018 at 15:23 comment added BgrWorker I'll also add that it's not the only language to do that. .NET languages also use UTF-16 for their internal character representation.
Jul 6, 2018 at 14:40 history answered Simon B CC BY-SA 4.0