Skip to main content
4 events
when toggle format what by license comment
Dec 20, 2021 at 17:55 history edited Davislor CC BY-SA 4.0
added 77 characters in body
Dec 20, 2021 at 17:53 comment added Davislor @Deduplicator I think an unordered_map on char32_t as the key type (or a hashed canonical representation of a multi-byte string could work too), with a count as the value type, would be the way to go.
Dec 20, 2021 at 17:45 comment added Deduplicator Going full unicode or at least catering to mbcs is interesting, especially deciding to throw away combining characters changes things. BTW: My answer doesn't assume minimal size bytes anywhere, though I do mention that the table could become infeasibly large if a byte is too big.
Dec 20, 2021 at 17:24 history answered Davislor CC BY-SA 4.0