Timeline for Java takes 2 bytes to represent character?

Current License: CC BY-SA 4.0

8 events

when toggle format	what		by	license	comment
May 17, 2023 at 19:42	comment	added	skomisa		It's worth noting that since your answer was posted JEP 254: Compact Strings was implemented in Java 9. The effect of that change was to internally store Latin-1 characters (i.e. those <= 255) in a single byte rather than two. So the "horrible bodge" has become even more horrible, though the enhancement itself is entirely reasonable and understandable.
Jul 11, 2018 at 16:43	audit	Low quality posts
Jul 11, 2018 at 16:44
Jul 9, 2018 at 3:09	vote	accept	user3198603
Jul 6, 2018 at 21:02	comment	added	gnasher729		Swift standard library is flexible: It uses either ASCII or UTF-16. But that’s the representation only. Character = extended graphemes cluster. Lots of characters consist of more than one Unicode code points.
Jul 6, 2018 at 16:12	comment	added	amon		As an addendum to this answer: Java provides methods like `Character.isSurrogate(char ch)` to check whether one Java-character (= UTF16 code unit) is half of an encoded Unicode code point, and related methods to get the code point as an int. Because of these surrogate pairs, string operations like `length()`, `charAt()` and `substring()` are really tricky (i.e. can give “wrong“ or corrupted results). The simple interview question “how do you reverse a string?” is also complete horror when considering surrogate pairs (or other Unicode specialties like combining characters or bidi-marks).
Jul 6, 2018 at 15:35	comment	added	Deduplicator		@BgrWorker .Net does that because Windows NT went from UCS-2 to UTF-16 too, and it's a MS design.
Jul 6, 2018 at 15:23	comment	added	BgrWorker		I'll also add that it's not the only language to do that. .NET languages also use UTF-16 for their internal character representation.
Jul 6, 2018 at 14:40	history	answered	Simon B	CC BY-SA 4.0