1

I have some code which scrapes a string between two other strings (sandwich). It is working - but I need to loop through various "sandwich" strings.

//needle in haystack
$result 'sandwich: Today is a nice day.
    sandwich: Today is a cloudy day.
    sandwich: Today is a rainy day.
    sandwich type 2: Yesterday I had an awesome time. 
    sandwich type 2: Yesterday I had an great time.';

$beginString = 'today is a';
$endString = 'day';

function extract_unit($haystack, $keyword1, $keyword2) {
    $return = array();

    while($a = strpos($haystack, $keyword1, $a)) {   // loop until $a is FALSE
        $a+=strlen($keyword1);                    // set offset to after  $keyword1 word

        if($b = strpos($haystack, $keyword2, $a)) {  // if found $keyword2 position's
            $return[] = trim(substr($haystack, $a, $b-$a)); // put result to $return array
        }
    }
    return $return;  
}

$text = $result;
$unit = extract_unit($text, $beginString, $endString);
print_r($unit);

//$unit returns= nice, cloudy and rainy

I need to loop through different types of sentences/sandwiches and be able to capture all the adjectives (nice cloudy rainy awesome great):

//needle in haystack
$result 'sandwich: Today is a nice day.
    sandwich: Today is a cloudy day.
    sandwich: Today is a rainy day.
    sandwich type 2: Yesterday I had an awesome time. 
    sandwich type 2: Yesterday I had an great time.';

$beginString1 = 'today is a';
$endString1 = 'day';
$beginString2 = 'Yesterday I had an';
$endString2 = 'time';

[scaping code with loop...]
print_r($unit);

This is the goal to end up with this array:

Array ( [0] => nice [1] => cloudy [2] => rainy [3] => awesome [4] => great ) 

Any ideas? Much appreciated.

1 Answer 1

3

You could use a regular expression to scrape into the strings, if you do not have problems using arrays instead of separated strings, this could be a sample code to do that:

$starts = array('Today is a', 'Yesterday I had an');
$ends = array('day', 'time');

$haystack = array(
    'Today is a nice day.',
    'Today is a cloudy day.',
    'Today is a rainy day.',
    'Yesterday I had an awesome time.',
    'Yesterday I had an great time.'
);

function extract_unit($haystack, $starts, $ends){

    $reg = '/.*?(?:' . implode('|', $starts) . ')(.*?)(?:' . implode('|', $ends) . ').*/';

    foreach($haystack as $str){

        if(preg_match($reg, $str)) $return[] = preg_replace($reg, '$1', $str);

    }

    return $return;

}

print_r (extract_unit($haystack, $starts, $ends));

EDIT

Following the @ven comments I've made some changes to the code, now is more precise:

//---Array with all sandwiches
$between = array(
    array('hay1=', 'hay=Gold'),
    array('hay2=', 'hay=Silver')
);

$haystack = 'Data set 1: hay2= this is a bunch of hay  hay1= Gold_Needle hay=Gold
             Data Set 2: hay2=Silver_Needle hay=Silver';

function extract_unit($haystack, $between){

    $return = array();

    foreach($between as $item){

        $reg = '/.*?' . $item[0] . '\s*(.*?)\s*' . $item[1] . '.*?/';

        preg_match_all($reg, $haystack, $finded);

        $return = array_merge($return, $finded[1]);

    }

    return $return;

}

print_r (extract_unit($haystack, $between));

The result will be:

Array
(
    [0] => Gold_Needle
    [1] => Silver_Needle
)

Here you have an Ideone sample code

Sign up to request clarification or add additional context in comments.

6 Comments

@ideone - thanks so much! ... but I get an error on line: preg_match_all($reg, $haystack, $return); which states "Parse error: syntax error, unexpected 'preg_match_all' (T_STRING) in". I guess it works for you? -- i tried the edited version where the haystack is a single string.
You have an online example at the end, and you can fork this code and make your own tests. What PHP version do you have?
I've forgot the ";" at the end of the previous line in the example code, that was my error. I've fixed that.
Thanks for the help. The code works but it breaks down in the following scenario I posted on here: ideone.com/QATj5a ...Any ideas how to ensure that only the desired pairs of "hay" are used to find the needles?
Get the last changes.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.