1

At a recent interview I was asked to write an algorithm which:

Given a string text, and string subtext, finds the starting character positions of each subtext found within the text.

"hey hi how are you hi" = 5, 20

I was forbidden to use any System.String functions (Substring, IndexOf etc).

I wrote this in C#:

IEnumerable<int> CalculateSubtextPositions(string text, string subtext)
{
    var charIndexToMatch = 0;
    for (int i = 0; i < text.Length; i++)
    {
        if (text[i] != subtext[charIndexToMatch])
            charIndexToMatch = 0;

        if (text[i] == subtext[charIndexToMatch])
        {
            charIndexToMatch++;
            if (charIndexToMatch == subtext.Length)
            {
                yield return i - charIndexToMatch + 2;
                charIndexToMatch = 0;
            }
        }
    }
}

Which was described as 'disappointing'. It may well be I've missed something, but I can't see what's hugely wrong with the above. It works, it's efficient, and reasonably easy to read compared to many approaches (at least in my opinion, please correct me if I'm wrong).

Would people be able to suggest where I might have messed up? It may have been because I was asked to make the code 'reusable', but I'm not sure to what extent that can be applied when you're only asked to write an algorithm.

Please do criticise! :)

7
  • 4
    Did they offer any feedback beyond describing it as "disappointing"? I haven't checked whether it would definitely work, but it looks like a very neat solution to the challenge to me. I'd put it down to the interviewer being a "expert beginner" or similar poor quality developer who probably didn't understand yield and didn't want to admit it to you. You are likely better off not working there, therefore... Commented Jan 13, 2016 at 19:27
  • 3
    Are you sure it works? Suppose the input text is "aaabc" and the text to find is "aabc". Also consider "aaaa" and "aa". Did you ask the interviewer about overlapping matches? Similar problems arise constantly in real-world programming. Serious companies are looking for the people who will find and resolve those problems before the code is delivered. Commented Jan 13, 2016 at 19:32
  • Also "charIndexToMatch" is unnecessarily long but worse, doesn't say which of the two strings it is indexing. "subtextIndex" would be shorter and better. All this is off-topic here. Maybe repost to Code Review. Commented Jan 13, 2016 at 19:38
  • @DavidArno: The interviewer should have asked the candidate about the failing test cases, rather than just saying it was "disappointing". However, this code would not meet the bar at most of the places I have worked. The use of "yield" is not the problem. Commented Jan 13, 2016 at 19:40
  • 3
    @kevincline, and those serious companies would not expect someone to work out an algorithm like this on paper .. they'd set up pair-programming sessions and allow the candidate to develop the solution through exploring ideas and TDD. This code would certainly pass such a paper-based test with me as the use of excellent names like charIndexToMatch would count far more than working out edge cases in one's head. Each to their own, I guess... Commented Jan 13, 2016 at 20:20

1 Answer 1

8

Your code isn't correct enough.

When you detect a match failure, resetting your match counter to 0 is too aggressive; as kevin pointed out, this can miss valid matches if the text is constructed misleadingly. You have to reset to the last possible global match start position, and that isn't easy to do. In an interview, I'd probably go for the obvious double-loop solution instead.

3
  • Could you please provide an example which would fail? Commented Jan 13, 2016 at 19:37
  • Actually I see your point - well spotted. Commented Jan 13, 2016 at 19:45
  • @user666254: I provided two examples in my comments. Commented Jan 13, 2016 at 20:04

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.