0

I have a text in PHP stored in the variable $row. I'd like to find the position of a certain group of words and that's quite easy. What's not so easy is to make my code recognize that the word it has found is exactly the word i'm looking for or a part of a larger word. Is there a way to do it?

Example of what I'd like to obtain

CODE:

$row= "some ugly text of some kind i'd like to find in someway"
$token= "some";
$pos= -1;
$counter= substr_count($row, $token);
for ($h=0; $h<$counter; $h++) {
     $pos= strpos($row, $token, $pos+1);
     echo $pos.' ';
}

OUTPUT:

what I obtain:

0 17 47

what I'd like to obtain

0 17

Any hint?

3
  • You mean 0, 18, 48 :) Have you tried regexp with word boundaries? Commented Mar 12, 2014 at 12:06
  • try giving $token = " some "; (i.e. space before and after your token) if you want the position of that word only... Hope I got the question correctly... if not then please try to elaborate Commented Mar 12, 2014 at 12:08
  • @sumitb.mdi this could work almost perfectly.. but what if the token is at the start or at the end of the string? Commented Mar 12, 2014 at 12:13

3 Answers 3

3

Use preg_match_all() with word boundaries (\b):

$search = preg_quote($token, '/');
preg_match_all("/\b$search\b/", $row, $m, PREG_OFFSET_CAPTURE);

Here, the preg_quote() statement is used to correctly escape the user input so as to use it in our regular expression. Some characters have special meaning in regular expression language — without proper escaping, those characters will lose their "special meaning" and your regex might not work as intended.

In the preg_match_all() statement, we are supplying the following regex:

/\b$search\b/

Explanation:

  • / - starting delimiter
  • \b - word boundary. A word boundary, in most regex dialects, is a position between a word character (\w) and a non-word character (\W).
  • $search - escaped search term
  • \b - word boundary
  • / - ending delimiter

In simple English, it means: find all the occurrences of the given word some.

Note that we're also using PREG_OFFSET_CAPTURE flag here. If this flag is passed, for every occurring match the appendant string offset will also be returned. See the documentation for more information.

To obtain the results you want, you can simply loop through the $m array and extract the offsets:

$result = implode(' ', array_map(function($arr) {
    return $arr[1];
}, $m[0]));

echo $result;

Output:

0 18

Demo

Sign up to request clarification or add additional context in comments.

4 Comments

The answer seeker wishes 0 17 as output.. Can you please suggest how can he get that from your code?
@Tzar: The output they're hoping to get is probably wrong. some ugly text of s — I see 18 characters before the second s. Maybe it was a counting mistake in the original question?
You're missing the point my friend.. Am talking about the formatting & display.. Needs position numbers separated by spaces..
@Tzar: I thought that was easy. Anyway I've updated the answer to include an explanation. Thanks for the heads-up.
2

What you're looking for is a combination of Regex with a word boundaries pattern and the flag to return the offset (PREG_OFFSET_CAPTURE).

PREG_OFFSET_CAPTURE

If this flag is passed, for every occurring match the appendant string offset will also be returned. Note that this changes the value of matches into an array where every element is an array consisting of the matched string at offset 0 and its string offset into subject at offset 1.

$row= "some ugly text of some kind i'd like to find in someway";
$pattern= "/\bsome\b/i";
preg_match_all($pattern, $row, $matches, PREG_OFFSET_CAPTURE);

And we get something like this:

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => some
                    [1] => 0
                )
            [1] => Array
                (
                    [0] => some
                    [1] => 18
                )
        )
)

And just loop through the matches and extract the offset where the needle was found in the haystack.

// store the positions of the match
$offsets = array();
foreach($matches[0] as $match) {
    $offsets[] = $match[1];
}

// display the offsets
echo implode(' ', $offsets);

4 Comments

The answer seeker wishes 0 17 as output.. Can you please suggest how can he get that from your code?
I've added some snippet on how to extract the offset. Thanks for pointing out @Tzar
@Max Yeah! That's much better now!
To be fair, your answer was made 4 minutes after Amal's, so not just a bit faster in terms of SO :)
-1

Use preg_match():

if(preg_match("/some/", $row))
// [..]

The first argument is a regex, which can match virtually anything you want to match. But, there are dire warnings about using it to match things like HTML.

2 Comments

Don't really think this will solve the OP's problem, but I've edited the answer to "fix" the code. And removed my downvote :)
You're right, the selected answer is much better - and besides, I misread the question. But thank you -

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.