0

I have a list of URLs, each page is a specific category:

http://www.site.com/category-1/page.html
http://www.site.com/category-2/page.html
http://www.site.com/category-3/page.html

On each page are let's say 4 items. I want to extract each item on each page and assign it it's corresponding category number i.e.

category-1_ITEM - CAT-1  
category-1_ITEM - CAT-1  
category-1_ITEM - CAT-1  
category-1_ITEM - CAT-1 

category-2_ITEM - CAT-2 
category-2_ITEM - CAT-2  
category-2_ITEM - CAT-2  
category-2_ITEM - CAT-2  

category-3_ITEM - CAT-3  
category-3_ITEM - CAT-3  
category-3_ITEM - CAT-3  
category-3_ITEM - CAT-3   

I figured this would be pretty straightforward but now I'm having to deal with apparent looping issues, here's the code, I've removed all irrelevant lines for simplicity's sake:

$urls = array(
"http://www.site.com/category-1/page.html",
"http://www.site.com/category-2/page.html",
"http://www.site.com/category-3/page.html"
);

foreach ($urls as $url) {

//Load Page, find items

foreach($items as $item) {

preg_match('#http\:\/\/www\.site\.com\/(.*?)\/page\.html#is',$url,$result);

switch ($result[1]){

case "category-1": $cat = 'CAT-1'; break;
case "category-2": $cat = 'CAT-2'; break;
case "category-3": $cat = 'CAT-3'; break;
}

echo $item . ' - ' . $cat . '<br>';


}
}

This is what it outputs:

category-1_ITEM - CAT-1  
category-1_ITEM - CAT-1  
category-1_ITEM - CAT-1  
category-1_ITEM - CAT-1 

category-1_ITEM - CAT-2  
category-1_ITEM - CAT-2  
category-1_ITEM - CAT-2 
category-1_ITEM - CAT-2 

category-2_ITEM - CAT-2  
category-2_ITEM - CAT-2  
category-2_ITEM - CAT-2 
category-2_ITEM - CAT-2 

category-1_ITEM - CAT-3  
category-1_ITEM - CAT-3  
category-1_ITEM - CAT-3
category-1_ITEM - CAT-3 

category-2_ITEM - CAT-3  
category-2_ITEM - CAT-3  
category-2_ITEM - CAT-3
category-2_ITEM - CAT-3 

category-3_ITEM - CAT-3  
category-3_ITEM - CAT-3  
category-3_ITEM - CAT-3
category-3_ITEM - CAT-3 

Any ideas on what I'm doing wrong? I have a feeling it's a simple mistake, I'm just not seeing it.

1
  • These guys are probably right, you need to add the code for "//Load Page, find items" for any real answer. Commented Aug 26, 2010 at 21:40

2 Answers 2

1

The problem is in this code:

//Load Page, find items

If I may be so bold to make a guess, you're probably doing something like:

$items[] = "some content";
$items[] = "some content";

Not with constants, but the key is what you wrote before the equals sign. All the time you are adding new items to the end of the array, so the first time you have the items from the first page. The second time you add the contents of the second page to that and you have both of them in the array. In other words: you are forgetting to reset $items. Add $items = array() at the beginning of //Load page, find items and you should be fine.

If you are coming from another language, the problem is perhaps better explained in more technical terms: in php code blocks don't create a new scope. Basically only functions do.

Sign up to request clarification or add additional context in comments.

Comments

0

Edit: I believe your problem is that you're not clearing down the $items array each time you loop.

I've tested the following code:

$urls = array(
"http://www.site.com/category-1/page.html",
"http://www.site.com/category-2/page.html",
"http://www.site.com/category-3/page.html"
);

$id = 0;

foreach ($urls as $url) {

$items = array(
"i" . $id++,
"i" . $id++,
"i" . $id++
);


foreach($items as $item) {

preg_match('#http://www.site.com/(.*?)/page.html#is',$url,$result);

switch ($result[1]){

case "category-1": $cat = 'CAT-1'; break;
case "category-2": $cat = 'CAT-2'; break;
case "category-3": $cat = 'CAT-3'; break;
}

echo $item . ' - ' . $cat . '<br>';


}
echo "<br/>";
}

and I get the following output:

i0 - CAT-1
i1 - CAT-1
i2 - CAT-1

i3 - CAT-2
i4 - CAT-2
i5 - CAT-2

i6 - CAT-3
i7 - CAT-3
i8 - CAT-3

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.