Skip to main content
Use real superscripts (so that it's copyable without loss)
Source Link
Toby Speight
  • 88.3k
  • 14
  • 104
  • 327

I realize that this is old, but I just saw it today and don't see anyone else making this point explicitly.

I doubt O(N^2) is the best we can do here.

It absolutely is not. As you outlined, you can sort the data to find the answer. It's possible to implement a sort in loglinear (\$\mathcal{O}(n \log{n})\$) time and constant space (Heap Sort; Shell Sort). You can either modify the sort to return if it finds a duplicate or just scan the sorted results in linear time. Note that this solution won't work well in Java because a String is immutable. But if you could switch the input from a String to some manipulable data type (e.g. a character array), you could sort that in constant space. And languages like C can sort characters in strings that way (because C stores strings as character arrays and offers direct access to the underlying array).

If you are allowed an auxiliary data structure of the same size as the alphabet, you could find an answer in linear time. For example, if the string can contain only lowercase letters, you could do it within a 32-bit integer (or a 26-bit integer, but that is uncommon size). However, that won't work if the string can contain any UTF-8 character. I mean, you could still build the data structure and it could be constant size (relative to the input length). But the size would be \$2^{21} = 2,097,152\$2²¹ = 2,097,152 bits or bytes or words or whatever (UTF-8 has a direct translation to UTF-32 and only requires 21 bits of that for all current characters). So if you use one bit per character, full UTF-8 still requires 262,144 bytes even if you map the characters to simple integer indexes.

This seems like a rather cheesy use of the term "constant" space, but it is arguably accurate. I would prefer the description linear in the alphabet size, but it is certainly constant if you consider the alphabet size to be constant. And in the specific example of a 26-letter Latin alphabet, it could be represented by a 32-bit integer.

I realize that this is old, but I just saw it today and don't see anyone else making this point explicitly.

I doubt O(N^2) is the best we can do here.

It absolutely is not. As you outlined, you can sort the data to find the answer. It's possible to implement a sort in loglinear (\$\mathcal{O}(n \log{n})\$) time and constant space (Heap Sort; Shell Sort). You can either modify the sort to return if it finds a duplicate or just scan the sorted results in linear time. Note that this solution won't work well in Java because a String is immutable. But if you could switch the input from a String to some manipulable data type (e.g. a character array), you could sort that in constant space. And languages like C can sort characters in strings that way (because C stores strings as character arrays and offers direct access to the underlying array).

If you are allowed an auxiliary data structure of the same size as the alphabet, you could find an answer in linear time. For example, if the string can contain only lowercase letters, you could do it within a 32-bit integer (or a 26-bit integer, but that is uncommon size). However, that won't work if the string can contain any UTF-8 character. I mean, you could still build the data structure and it could be constant size (relative to the input length). But the size would be \$2^{21} = 2,097,152\$ bits or bytes or words or whatever (UTF-8 has a direct translation to UTF-32 and only requires 21 bits of that for all current characters). So if you use one bit per character, full UTF-8 still requires 262,144 bytes even if you map the characters to simple integer indexes.

This seems like a rather cheesy use of the term "constant" space, but it is arguably accurate. I would prefer the description linear in the alphabet size, but it is certainly constant if you consider the alphabet size to be constant. And in the specific example of a 26-letter Latin alphabet, it could be represented by a 32-bit integer.

I realize that this is old, but I just saw it today and don't see anyone else making this point explicitly.

I doubt O(N^2) is the best we can do here.

It absolutely is not. As you outlined, you can sort the data to find the answer. It's possible to implement a sort in loglinear (\$\mathcal{O}(n \log{n})\$) time and constant space (Heap Sort; Shell Sort). You can either modify the sort to return if it finds a duplicate or just scan the sorted results in linear time. Note that this solution won't work well in Java because a String is immutable. But if you could switch the input from a String to some manipulable data type (e.g. a character array), you could sort that in constant space. And languages like C can sort characters in strings that way (because C stores strings as character arrays and offers direct access to the underlying array).

If you are allowed an auxiliary data structure of the same size as the alphabet, you could find an answer in linear time. For example, if the string can contain only lowercase letters, you could do it within a 32-bit integer (or a 26-bit integer, but that is uncommon size). However, that won't work if the string can contain any UTF-8 character. I mean, you could still build the data structure and it could be constant size (relative to the input length). But the size would be 2²¹ = 2,097,152 bits or bytes or words or whatever (UTF-8 has a direct translation to UTF-32 and only requires 21 bits of that for all current characters). So if you use one bit per character, full UTF-8 still requires 262,144 bytes even if you map the characters to simple integer indexes.

This seems like a rather cheesy use of the term "constant" space, but it is arguably accurate. I would prefer the description linear in the alphabet size, but it is certainly constant if you consider the alphabet size to be constant. And in the specific example of a 26-letter Latin alphabet, it could be represented by a 32-bit integer.

Source Link
mdfst13
  • 22.4k
  • 6
  • 34
  • 70

I realize that this is old, but I just saw it today and don't see anyone else making this point explicitly.

I doubt O(N^2) is the best we can do here.

It absolutely is not. As you outlined, you can sort the data to find the answer. It's possible to implement a sort in loglinear (\$\mathcal{O}(n \log{n})\$) time and constant space (Heap Sort; Shell Sort). You can either modify the sort to return if it finds a duplicate or just scan the sorted results in linear time. Note that this solution won't work well in Java because a String is immutable. But if you could switch the input from a String to some manipulable data type (e.g. a character array), you could sort that in constant space. And languages like C can sort characters in strings that way (because C stores strings as character arrays and offers direct access to the underlying array).

If you are allowed an auxiliary data structure of the same size as the alphabet, you could find an answer in linear time. For example, if the string can contain only lowercase letters, you could do it within a 32-bit integer (or a 26-bit integer, but that is uncommon size). However, that won't work if the string can contain any UTF-8 character. I mean, you could still build the data structure and it could be constant size (relative to the input length). But the size would be \$2^{21} = 2,097,152\$ bits or bytes or words or whatever (UTF-8 has a direct translation to UTF-32 and only requires 21 bits of that for all current characters). So if you use one bit per character, full UTF-8 still requires 262,144 bytes even if you map the characters to simple integer indexes.

This seems like a rather cheesy use of the term "constant" space, but it is arguably accurate. I would prefer the description linear in the alphabet size, but it is certainly constant if you consider the alphabet size to be constant. And in the specific example of a 26-letter Latin alphabet, it could be represented by a 32-bit integer.