0

I have a working script, but I'm sure that my method of managing arrays could be better. I've searched for a solution and haven't found one, but I'm sure that I should be using the functionality of associative arrays to do things more efficiently.

I have two arrays, one from a CSV file and one from a DB. I've created the CSV array as numeric and the DB array as associative (although I'm aware that the difference is blurry in PHP).

I'm trying to find a record in the DB array where the value in one field matches a value in the CSV array. Both arrays are multi-dimensional.

Within each record in each array there is a reference number. It appears once in the CSV array and may appear in the DB array. If it does, I need to take action.

I'm currently doing this (simplified):

$CSVarray:
('reference01', 'blue', 'small' ),
('reference02', 'red', 'large' ),
('reference03', 'pink', 'medium' )

$Dbarray:
(0 => array(ref=>'reference01',name=>"tom",type=>"mouse"),
(1 => array(ref=>'reference02',name=>"jerry",type=>"cat"),
(2 => array(ref=>'reference03',name=>"butch",type=>"dog"),



foreach ($CSVarray as $CSVrecord) {
    foreach ($Dbarray as $DBrecord) {
        if ($CSVarray[$numerickey] == $DBrecord['key'] {
            do something with the various values in the $DBrecord 
        }
    }
}

This is horrible, as the arrays are each thousands of lines.

I don't just want to know if matching values exist, I want to retrieve data from the matching record, so functions like 'array_search ' don't do what I want and array_walk doesn't seem any better than my current approach.

What I really need is something like this (gibberish code):

foreach ($CSVarray as $CSVrecord) {
    WHERE $Dbarray['key']['key'] == $CSVrecord[$numerickey] {
        do something with the other values in $Dbarray['key']
    }
}

I'm looking for a way to match the values using the keys (either numeric or associative) rather than walking the arrays. Can anyone offer any help please?

5
  • Doesn't in_array() just search for values? I need to take action when values are found and I don't see a method for in_array() of referencing the record that it has found. Am I missing something? Commented Jul 26, 2012 at 15:42
  • try the array_walk_recursive function. Commented Jul 26, 2012 at 15:43
  • @Simon instead of foreach ($Dbarray as $DBrecord) { if ($CSVarray[$numerickey] == $DBrecord['key'] { do something with the various values in the $DBrecord } } if (in_array($CSVarray[$numerickey], $DBRecord['key'])) { // do something } Commented Jul 26, 2012 at 15:43
  • Actually, @Anorflame has a really good solution. Commented Jul 26, 2012 at 15:45
  • I'm going to try Anorflame's solution, but thanks for the input. Commented Jul 26, 2012 at 15:55

3 Answers 3

2

use a hash map - take one array and map each key of the record it belongs to, to that record. Then take the second array and simply iterate over it, checking for each record key if the hashmap has anything set for it. Regarding your example:

foreach ($DBarray as $DBrecord){
   $Hash[$record[$key]] = $DBrecord;
}

foreach ($CSVarray as $record){
   if (isset($Hash[$record[$CSVkey]])){
       $DBrecord = $Hash[$record[$CSVkey]];
        //do stuff with $DBrecord and $CSVrecord
   }
}

this solution works at O(n) while yours at O(n^2)...

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you. I've updated my script to use hash maps. The sub-routines in question went from taking 11-12 seconds to run to taking 0.5-0.6 seconds to run. This is obviously a massive improvement. Thanks again.
I don't know if you noticed, but the code I wrote is more of a simplification. In the code above, if two or more DBRecords have an identical key, at the later stage you will retrieve only the latest record. To overcome this problem, you can set the hashmap[key] to point to an array of records.
0

You can use foreach loops like this too:

foreach ($record as $key => $value) {
  switch($key)
  {
    case 'asd':
      // do something
      break;
    default:
      // Default
      break;
  }
}

A switch may be what you are looking for also :)

6 Comments

Wouldn't that still require a foreach within a foreach? I could use SWITCH to take action once the required values were found, but I don't see that would reduce the amount of array traversing.
Yes it would still require a foreach in a foreach, I was just demonstrating how you can use it. Unless you are dealing with millions of records, there is nothing wrong with traversing through two lists...
Thanks, the problem is that with each array contains thousands of records, I'm in effect doing tens of millions of comparisons and my poor laptop takes a while to chew over it :)
Understood, I am not sure if @Anorflame's approach will work, but going down from O(n^2) to O(n) increase the speed by a ridiculous factor. My question is why are you not using a database? Or are you? Database are meant to handle these kinds of things and only pick out the particular things you want. For instance in Mysqsl: SELECT * FROM table WHERE id = 1 AND extra = 'asd' which would do most of the work for you and then you just have to loop through 1 time and do whatever you want with them.
You're right, it's just that I've always used languages - COBOL, Perl etc and that's how my mind approaches things now :)
|
0

Load CSV into the db, and use db (not db array) if possible for retrieval. Index the referenceid field.

2 Comments

I hadn't thought of that approach, thanks. I'll bear it in mind for future problems.
When working with constantly changing high volume csvs, I find it much easier to just load into the db and let db handle everything.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.