What is wrong with this substring-matching algorithm? [closed]

Question

Closed. This question is off-topic. It is not currently accepting answers.

Questions asking for assistance in writing or debugging existing code are off-topic on Programmers. These questions can be asked on Stack Overflow if they include the desired behavior, a specific problem or error, and the shortest code necessary to reproduce it in the question itself. See How To Create a Minimal, Complete, and Verifiable Example.

Closed 9 years ago.

Improve this question

At a recent interview I was asked to write an algorithm which:

Given a string text, and string subtext, finds the starting character positions of each subtext found within the text.

"hey hi how are you hi" = 5, 20

I was forbidden to use any System.String functions (Substring, IndexOf etc).

I wrote this in C#:

IEnumerable<int> CalculateSubtextPositions(string text, string subtext)
{
    var charIndexToMatch = 0;
    for (int i = 0; i < text.Length; i++)
    {
        if (text[i] != subtext[charIndexToMatch])
            charIndexToMatch = 0;

        if (text[i] == subtext[charIndexToMatch])
        {
            charIndexToMatch++;
            if (charIndexToMatch == subtext.Length)
            {
                yield return i - charIndexToMatch + 2;
                charIndexToMatch = 0;
            }
        }
    }
}

Which was described as 'disappointing'. It may well be I've missed something, but I can't see what's hugely wrong with the above. It works, it's efficient, and reasonably easy to read compared to many approaches (at least in my opinion, please correct me if I'm wrong).

Would people be able to suggest where I might have messed up? It may have been because I was asked to make the code 'reusable', but I'm not sure to what extent that can be applied when you're only asked to write an algorithm.

Please do criticise! :)

Did they offer any feedback beyond describing it as "disappointing"? I haven't checked whether it would definitely work, but it looks like a very neat solution to the challenge to me. I'd put it down to the interviewer being a "expert beginner" or similar poor quality developer who probably didn't understand yield and didn't want to admit it to you. You are likely better off not working there, therefore... — David Arno
– David Arno, Commented Jan 13, 2016 at 19:27
Are you sure it works? Suppose the input text is "aaabc" and the text to find is "aabc". Also consider "aaaa" and "aa". Did you ask the interviewer about overlapping matches? Similar problems arise constantly in real-world programming. Serious companies are looking for the people who will find and resolve those problems before the code is delivered. — kevin cline
– kevin cline, Commented Jan 13, 2016 at 19:32
Also "charIndexToMatch" is unnecessarily long but worse, doesn't say which of the two strings it is indexing. "subtextIndex" would be shorter and better. All this is off-topic here. Maybe repost to Code Review. — kevin cline
– kevin cline, Commented Jan 13, 2016 at 19:38
@DavidArno: The interviewer should have asked the candidate about the failing test cases, rather than just saying it was "disappointing". However, this code would not meet the bar at most of the places I have worked. The use of "yield" is not the problem. — kevin cline
– kevin cline, Commented Jan 13, 2016 at 19:40
@kevincline, and those serious companies would not expect someone to work out an algorithm like this on paper .. they'd set up pair-programming sessions and allow the candidate to develop the solution through exploring ideas and TDD. This code would certainly pass such a paper-based test with me as the use of excellent names like charIndexToMatch would count far more than working out edge cases in one's head. Each to their own, I guess... — David Arno
– David Arno, Commented Jan 13, 2016 at 20:20

Kilian Foth · Accepted Answer · 2016-01-13 19:31:42Z

8

Your code isn't correct enough.

When you detect a match failure, resetting your match counter to 0 is too aggressive; as kevin pointed out, this can miss valid matches if the text is constructed misleadingly. You have to reset to the last possible global match start position, and that isn't easy to do. In an interview, I'd probably go for the obvious double-loop solution instead.

answered Jan 13, 2016 at 19:31

Kilian Foth

111k45 gold badges301 silver badges323 bronze badges

Could you please provide an example which would fail?

FBryant87
– FBryant87

2016-01-13 19:37:55 +00:00
Commented Jan 13, 2016 at 19:37
Actually I see your point - well spotted.

FBryant87
– FBryant87

2016-01-13 19:45:30 +00:00
Commented Jan 13, 2016 at 19:45
@user666254: I provided two examples in my comments.

kevin cline
– kevin cline

2016-01-13 20:04:16 +00:00
Commented Jan 13, 2016 at 20:04

Add a comment |

Stack Exchange Network

What is wrong with this substring-matching algorithm? [closed]

1 Answer 1

Hot Network Questions

What is wrong with this substring-matching algorithm? [closed]

1 Answer 1

Related

Hot Network Questions