Revisions to Finding first unique character in string

added 210 characters in body

Source Link

edited Sep 5, 2022 at 11:16

113.1k
15
154
396

Consider a simpler algorithm:

Pass 1: Build a map of counts of characters. You can use a simple int array to store this mapping.
Pass 2: For each index, if the count of the current character is 1, return the index.
If pass 2 terminates without returning, then there are no unique characters, return -1.

This is simple, straightforward logic. The constant number of passes doesn't change the time complexity.

You could do it in a single pass too, but I would prefer to avoid the added complexity unless there is a compelling reason.

Example implementation of the above algorithm:

std::array<int, 26> counts {};

for (auto c : s) {
    counts[c - 'a']++;
}

for (std::size_t index = 0; index < s.size(); ++index) {
    auto c = static_cast<unsigned char>(s[index]);
    if (counts[c - 'a'] == 1) return index;
}

return -1;

Note that the hack of using c - 'a' to map lowercase English letters to the range [0 .. 25] works with the ASCII character set, and may not be suitable for others. If that's a concern, you can use the approach of the other answer, or use a proper std::set.

If there are more than INT_MAX repetitions of a character (as pointed out by @Toby), the count will overflow, and may even reach 1 again, producing incorrect result. If you need to add support for such input, then it will be better to consider a similar algorithm (alluded to by @greybeard):

Pass 1: Build a set of characters seen, and another set of characters seen twice.
Pass 2: For each index, if the character is not in the set of characters seen twice, return the index.
If pass 2 terminates without returning, then there are no unique characters, return -1.

This uses an additional set (constant space), but has the advantage that it supports inputs of arbitrary length and repetitions.

Example implementation:

std::array<bool, 26> seen {}, repeated {};

for (auto c : s) {
    repeated[c - 'a'] = seen[c - 'a'];
    seen[c - 'a'] = true;
}

for (std::size_t index = 0; index < s.size(); ++index) {
    auto c = static_cast<unsigned char>(s[index]);
    if (!repeated[c - 'a']) return index;
}

return -1;

Consider a simpler algorithm:

Pass 1: Build a map of counts of characters. You can use a simple int array to store this mapping.
Pass 2: For each index, if the count of the current character is 1, return the index.
If pass 2 terminates without returning, then there are no unique characters, return -1.

This is simple, straightforward logic. The constant number of passes doesn't change the time complexity.

You could do it in a single pass too, but I would prefer to avoid the added complexity unless there is a compelling reason.

Example implementation of the above algorithm:

std::array<int, 26> counts {};

for (auto c : s) {
    counts[c - 'a']++;
}

for (std::size_t index = 0; index < s.size(); ++index) {
    auto c = static_cast<unsigned char>(s[index]);
    if (counts[c - 'a'] == 1) return index;
}

return -1;

If there are more than INT_MAX repetitions of a character (as pointed out by @Toby), the count will overflow, and may even reach 1 again, producing incorrect result. If you need to add support for such input, then it will be better to consider a similar algorithm (alluded to by @greybeard):

Pass 1: Build a set of characters seen, and another set of characters seen twice.
Pass 2: For each index, if the character is not in the set of characters seen twice, return the index.
If pass 2 terminates without returning, then there are no unique characters, return -1.

This uses an additional set (constant space), but has the advantage that it supports inputs of arbitrary length and repetitions.

Example implementation:

std::array<bool, 26> seen {}, repeated {};

for (auto c : s) {
    repeated[c - 'a'] = seen[c - 'a'];
    seen[c - 'a'] = true;
}

for (std::size_t index = 0; index < s.size(); ++index) {
    auto c = static_cast<unsigned char>(s[index]);
    if (!repeated[c - 'a']) return index;
}

return -1;

Consider a simpler algorithm:

Pass 1: Build a map of counts of characters.
Pass 2: For each index, if the count of the current character is 1, return the index.
If pass 2 terminates without returning, then there are no unique characters, return -1.

This is simple, straightforward logic. The constant number of passes doesn't change the time complexity.

You could do it in a single pass too, but I would prefer to avoid the added complexity unless there is a compelling reason.

Example implementation of the above algorithm:

std::array<int, 26> counts {};

for (auto c : s) {
    counts[c - 'a']++;
}

for (std::size_t index = 0; index < s.size(); ++index) {
    auto c = static_cast<unsigned char>(s[index]);
    if (counts[c - 'a'] == 1) return index;
}

return -1;

Note that the hack of using c - 'a' to map lowercase English letters to the range [0 .. 25] works with the ASCII character set, and may not be suitable for others. If that's a concern, you can use the approach of the other answer, or use a proper std::set.

If there are more than INT_MAX repetitions of a character (as pointed out by @Toby), the count will overflow, and may even reach 1 again, producing incorrect result. If you need to add support for such input, then it will be better to consider a similar algorithm (alluded to by @greybeard):

Pass 1: Build a set of characters seen, and another set of characters seen twice.
Pass 2: For each index, if the character is not in the set of characters seen twice, return the index.
If pass 2 terminates without returning, then there are no unique characters, return -1.

This uses an additional set (constant space), but has the advantage that it supports inputs of arbitrary length and repetitions.

Example implementation:

std::array<bool, 26> seen {}, repeated {};

for (auto c : s) {
    repeated[c - 'a'] = seen[c - 'a'];
    seen[c - 'a'] = true;
}

for (std::size_t index = 0; index < s.size(); ++index) {
    auto c = static_cast<unsigned char>(s[index]);
    if (!repeated[c - 'a']) return index;
}

return -1;

added 1252 characters in body

Source Link

edited Sep 5, 2022 at 9:52

janos

113.1k
15
154
396

Consider a simpler algorithm:

Pass 1: Build a map of counts of characters. You can use a simple int array to store this mapping.
Pass 2: For each index, if the count of the current character is 1, return the index.
If pass 2 terminates without returning, then there are no unique characters, return -1.

This is simple, straightforward logic. The constant number of passes doesn't change the time complexity.

You could do it in a single pass too, but I would prefer to avoid the added complexity unless there is a compelling reason.

Example implementation of the above algorithm:

std::array<int, 26> counts {};

for (auto c : s) {
    counts[c - 'a']++;
}

for (std::size_t index = 0; index < s.size(); ++index) {
    auto c = static_cast<unsigned char>(s[index]);
    if (counts[c - 'a'] == 1) return index;
}

return -1;

If there are more than INT_MAX repetitions of a character (as pointed out by @Toby), the count will overflow, and may even reach 1 again, producing incorrect result. If you need to add support for such input, then it will be better to consider a similar algorithm (alluded to by @greybeard):

Pass 1: Build a set of characters seen, and another set of characters seen twice.

Pass 2: For each index, if the character is not in the set of characters seen twice, return the index.

If pass 2 terminates without returning, then there are no unique characters, return -1.

This uses an additional set (constant space), but has the advantage that it supports inputs of arbitrary length and repetitions.

Example implementation:

std::array<bool, 26> seen {}, repeated {};

for (auto c : s) {
    repeated[c - 'a'] = seen[c - 'a'];
    seen[c - 'a'] = true;
}

for (std::size_t index = 0; index < s.size(); ++index) {
    auto c = static_cast<unsigned char>(s[index]);
    if (!repeated[c - 'a']) return index;
}

return -1;

Consider a simpler algorithm:

Pass 1: Build a map of counts of characters. You can use a simple int array to store this mapping.
Pass 2: For each index, if the count of the current character is 1, return the index.
If pass 2 terminates without returning, then there are no unique characters, return -1.

This is simple, straightforward logic. The constant number of passes doesn't change the time complexity.

You could do it in a single pass too, but I would prefer to avoid the added complexity unless there is a compelling reason.

Example implementation of the above algorithm:

std::array<int, 26> counts {};

for (auto c : s) {
    counts[c - 'a']++;
}

for (std::size_t index = 0; index < s.size(); ++index) {
    auto c = static_cast<unsigned char>(s[index]);
    if (counts[c - 'a'] == 1) return index;
}

return -1;

Consider a simpler algorithm:

Pass 1: Build a map of counts of characters. You can use a simple int array to store this mapping.
Pass 2: For each index, if the count of the current character is 1, return the index.
If pass 2 terminates without returning, then there are no unique characters, return -1.

This is simple, straightforward logic. The constant number of passes doesn't change the time complexity.

You could do it in a single pass too, but I would prefer to avoid the added complexity unless there is a compelling reason.

Example implementation of the above algorithm:

std::array<int, 26> counts {};

for (auto c : s) {
    counts[c - 'a']++;
}

for (std::size_t index = 0; index < s.size(); ++index) {
    auto c = static_cast<unsigned char>(s[index]);
    if (counts[c - 'a'] == 1) return index;
}

return -1;

If there are more than INT_MAX repetitions of a character (as pointed out by @Toby), the count will overflow, and may even reach 1 again, producing incorrect result. If you need to add support for such input, then it will be better to consider a similar algorithm (alluded to by @greybeard):

Pass 1: Build a set of characters seen, and another set of characters seen twice.

Pass 2: For each index, if the character is not in the set of characters seen twice, return the index.

If pass 2 terminates without returning, then there are no unique characters, return -1.

This uses an additional set (constant space), but has the advantage that it supports inputs of arbitrary length and repetitions.

Example implementation:

std::array<bool, 26> seen {}, repeated {};

for (auto c : s) {
    repeated[c - 'a'] = seen[c - 'a'];
    seen[c - 'a'] = true;
}

for (std::size_t index = 0; index < s.size(); ++index) {
    auto c = static_cast<unsigned char>(s[index]);
    if (!repeated[c - 'a']) return index;
}

return -1;

deleted 80 characters in body

Source Link

edited Sep 4, 2022 at 19:04

janos

113.1k
15
154
396

Consider a simpler algorithm:

Pass 1: Build a map of counts of characters. You can use a simple int array to store this mapping.
Pass 2: For each index, if the count of the current character is 1, return the index.
If pass 2 terminates without returning, then there are no unique characters, return -1.

This is simple, straightforward logic. The constant number of passes doesn't change the time complexity.

You could do it in a single pass too, for example using a queue to keep track of the ordering of what was seen first, but I would prefer to avoid the added complexity unless there is a compelling reason.

Example implementation of the above algorithm:

std::array<int, 26> counts {};

for (auto c : s) {
    counts[c - 'a']++;
}

for (std::size_t index = 0; index < s.size(); ++index) {
    auto c = static_cast<unsigned char>(s[index]);
    if (counts[c - 'a'] == 1) return index;
}

return -1;

Consider a simpler algorithm:

Pass 1: Build a map of counts of characters. You can use a simple int array to store this mapping.
Pass 2: For each index, if the count of the current character is 1, return the index.
If pass 2 terminates without returning, then there are no unique characters, return -1.

This is simple, straightforward logic. The constant number of passes doesn't change the time complexity.

You could do it in a single pass too, for example using a queue to keep track of the ordering of what was seen first, but I would prefer to avoid the added complexity unless there is a compelling reason.

Example implementation of the above algorithm:

std::array<int, 26> counts {};

for (auto c : s) {
    counts[c - 'a']++;
}

for (std::size_t index = 0; index < s.size(); ++index) {
    auto c = static_cast<unsigned char>(s[index]);
    if (counts[c - 'a'] == 1) return index;
}

return -1;

Consider a simpler algorithm:

Pass 1: Build a map of counts of characters. You can use a simple int array to store this mapping.
Pass 2: For each index, if the count of the current character is 1, return the index.
If pass 2 terminates without returning, then there are no unique characters, return -1.

This is simple, straightforward logic. The constant number of passes doesn't change the time complexity.

You could do it in a single pass too, but I would prefer to avoid the added complexity unless there is a compelling reason.

Example implementation of the above algorithm:

std::array<int, 26> counts {};

for (auto c : s) {
    counts[c - 'a']++;
}

for (std::size_t index = 0; index < s.size(); ++index) {
    auto c = static_cast<unsigned char>(s[index]);
    if (counts[c - 'a'] == 1) return index;
}

return -1;

added 351 characters in body

Source Link

edited Sep 4, 2022 at 12:28

janos

113.1k
15
154
396

Loading

added 209 characters in body

Source Link

edited Sep 4, 2022 at 8:12

janos

113.1k
15
154
396

Loading

Source Link

answered Sep 4, 2022 at 7:43

janos

113.1k
15
154
396

Loading

Stack Exchange Network

Return to Answer