DEV Community

Cover image for DSA Pattern: A Clean Way to Parse Words from a String
Al Amin
Al Amin

Posted on

DSA Pattern: A Clean Way to Parse Words from a String

πŸš€ A Clean Pattern for Word-by-Word String Parsing in PHP

Recently, while solving problems like:

  • βœ… Counting segments in a sentence
  • βœ… Reversing words
  • βœ… Manual string tokenization

I discovered a powerful template for breaking down a string into individual words without using built-in functions like explode() or str_word_count().

Here's the simplified version of the logic:

$sentence = [];
$word = '';

for ($i = 0; $i < strlen($s); $i++) {
    if ($s[$i] !== ' ') {
        // Keep building the word until space is found
        $word .= $s[$i];
    } else {
        // Space hit = word finished, push it to array
       if( $word != '' ){
          $sentence[] = $word;
          $word = ''; // Reset word builder
        }
    }
}

// After loop ends, push the last word (if any)
if ($word !== '') {
    $sentence[] = $word;
}
Enter fullscreen mode Exit fullscreen mode

🧠 What’s happening here?

  1. We're looping through each character.
  2. If the character is not a space β†’ it's part of a word β†’ build it.
  3. If the character is a space β†’ we finished building one word β†’ store it β†’ reset.
  4. At the end of the loop, if a word is still in progress, we save it.

πŸ’₯ Why is this helpful?

  • Works even when multiple spaces are between words (after using preg_replace('/\s+/', ' ', $s) to normalize).
  • Doesn't rely on external functions, gives you full control.
  • Can be adapted to parse custom delimiters or handle punctuation-sensitive input.

✨ Bonus Insight:

The final word in a string is not followed by a space β€” so it never hits the β€œelse” block. That’s why the if ($word !== '') after the loop is crucial. Without it, your last word would be lost!


πŸ“Œ Template takeaway:
If you're building tools that deal with sentence parsing, custom formatting, or you're preparing for string-related DSA problems, this small but powerful pattern will keep showing up!

Let me know what you think, or how you'd adapt this for other tasks!

Top comments (6)

Collapse
 
nevodavid profile image
Nevo David

Pretty cool seeing someone ditch built-ins and actually walk through it - I always get a kick out of handling stuff character by character.

Collapse
 
dev-alamin profile image
Al Amin

Thanks! I’ve been trying to solve problems more manually lately to really understand the underlying logic.

Built-ins are great, but walking through things character by character forces me to think deeper and improve my problem-solving muscle.

Glad you appreciated it β€” means a lot! πŸ™Œ

Collapse
 
nathan_tarbert profile image
Nathan Tarbert

Pretty cool, I always end up forgetting that last-word edge case. This actually helps me when I want more control. Nice!

Collapse
 
dev-alamin profile image
Al Amin

Yes, this is a common mistake to make. Only revise the things can last longer in memory.

Collapse
 
dotallio profile image
Dotallio

Love this approach, super clean and easy to adapt for tricky cases! How would you tweak this to handle punctuation or special characters inside words?

Collapse
 
dev-alamin profile image
Al Amin

Thank you so much! 😊
Really appreciate your kind words.

Great question β€” punctuation and special characters can definitely complicate parsing! In this post, I kept it simple by splitting only on spaces to stay focused on the DSA concept. But for trickier inputs (like "don't stop-believing!"), here are a couple of ways to handle it:

Strategy Options:

1. Character check with custom conditions:
We can allow characters like ' or - if they're considered part of a word:

if (ctype_alpha($char) || in_array($char, ["'", "-"])) {
    $word .= $char;
}
Enter fullscreen mode Exit fullscreen mode

2. Regex-based splitting:
For complex rules, something like preg_split('/[^a-zA-Z\'-]+/', $str) can split on anything that's not a valid word character.

3. Filter after building words:
Build all chunks first, then clean/filter them based on your needs β€” great for modularity.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.