Javascript: Optimizing `reduce` for performance

Question

I'm using a .reduce method to iterate thru an array of objects in order to return the array index for the object that best fits certain conditions. My array has around 30,000 indexes right now and I'm shooting for well over a million. Trouble is, iterating thru the array using .reduce takes FOREVER!!! We're talking almost 4 seconds at the moment, imagine if the array had my projected 1 million indexes. The array is compact. I'm not connected to a database or server. Here is my code:

 var startMatchMaking = function () {
    var loopCounter = 0;
    var length = personArray.length;
    do {
        var manCounter = 0;
        loopCounter++;
        for (var i = length; i--;){
            if (!personArray[i].isSingle && personArray[i].sex === "Male" &&
                personArray[i].isAvailable === true) {
                manCounter++;            
                var num = normalRandomScaled(2.1, 12.44);

                var result = personArray.reduce(function(p,c,k,a){
                    return c.sex !== personArray[i].sex &&
                    !c.isSingle && c.isAvailable === true &&
                    c.age <= (personArray[i].age + num) &&
                    c.age >= (personArray[i].age - num) ? k : p;
                }, 0);

                result = !personArray[result].isSingle && 
                    personArray[result].sex !== personArray[i].sex &&
                    personArray[result].age <= (personArray[i].age + num) &&
                    personArray[result].age >= (personArray[i].age - num) ? result : -1;

                if (result >= 0) {
                    householdArray.push (new Household (personArray[i], personArray[result]));
                    personArray[result].isAvailable = false;
                    personArray[i].isAvailable = false;
                }
            }
        }
        document.write("<br>Mancounter is: " + manCounter +
                " loopCounter is: " + loopCounter + " households: " + householdArray.length);
    }
    while (manCounter > 0 && loopCounter <= 5);
};

startMatchMaking();

CONTEXT: I'm trying to develop a standalone application to run an agent-based model demographics simulation. personArray essentially contains 30,000 human beings. The particular bit of code above is related to the initial setup of the population. Persons were previously created and pushed to the array. Each Person object has a firstName, lastName, sex, age and isSingle property. They were assigned random values for each. At this stage in the program, I need to take the Persons who are meant to be not single, and match them with a suitable mate of the opposing sex and a compatible age to form households.

How can I optimize this in order to increase performance significantly? I'm open to small changes or completely different alternatives that would output the same result.

If you're invoking that code inside another loop, then that's your problem. — Pointy
– Pointy, Commented Aug 13, 2016 at 14:34
Are you using reduce correctly? Seems like k is being used in a weird place and where does i come from? — Sean Wang
– Sean Wang, Commented Aug 13, 2016 at 14:35
Whatever you're doing with that much data should probably done in a database, not with js. Especially since your code looks much like a query, not a reduction. — Bergi
– Bergi, Commented Aug 13, 2016 at 14:35
OK, there you go. You're iterating through personArrray, and on each iteration you iterate again via that .reduce() call. That means that the .reduce() callback will be invoked nine hundred million times. Nothing you do to speed up the .reduce() callback will help. — Pointy
– Pointy, Commented Aug 13, 2016 at 14:48
Basically what @Pointy is saying is: at the moment you're using brute-force, what you want to do is search for a better clustering-strategy. Step one would be to split your data up into categories sex, single, available and only search the right category/ies, i.e. removing that overhead of personArray.length times checking the wrong categories. Step two could be splitting those categories by age-ranges. This enables you to only search those ranges, that intersect with the age range your interested in. — BeyelerStudios
– BeyelerStudios, Commented Aug 13, 2016 at 15:03

Community · Accepted Answer · 2017-05-23 12:01:02Z

You use reduce and iterate over all elements this way in a loop that also iterates over the elements as already mentioned in the comments. This leads to quadratic complexity. This means that if you double the number of persons the runtime of the algorithm is multiplied by 4. So, it is completely unfeasible to process millions of people this way.

It seems to me that the inner iteration over all elements is not necessary. You can replace reduce by an ordinary loop and stop iterating when a match is found. The current solution takes the last found match. Is there anything that makes the last one better than the first one? Or do I miss anything? What about randomly picking some indices when searching for a match and stopping when a match is found? This is a solution that does not require much changes and I expect it to make a big difference except for very young and very old people (outliers).

A solution that requires more changes is to map people by there properties as similarly already mentioned in the comments so that you can do something like matchCandidates = people[oppositeSex][ageWithSomeRandomness]. Have a look at this post for more information about possible implementations of maps and hashtables in Javascript.

An additional improvement might be achieved by filtering the people at the beginning so that no singles are included, i. e. copying people that are not single into a new array and only access the new array in the algorithm.

If your code runs in browsers you can use web workers to avoid that the browser freezes.

Arnauld · Accepted Answer · 2016-08-13 16:46:59Z

I think you'll need some pre-processing in order to speed this up.

For instance:

split population into men and women
sort men by age
iterate on the sorted array of men and compute a new set of matching women only when a new age is being processed
just pick women from the current set while it's up-to-date and not empty

EDIT: We can optionally optimize the number of matches by sorting the set of women by age difference, so that couples with a small age difference are created first.

Below is some example code.

var personArray = [];

// create test population
for(var n = 0; n < 30000; n++) {
  personArray.push({
    isSingle: Math.random() < 0.5,
    age: Math.round(18 + Math.random() * 80),
    sex: Math.random() < 0.5 ? 'M' : 'F',
    isAvailable: true
  });
}

var num = 7, // instead of num = normalRandomScaled(2.1, 12.44)
    sex = [ [], [] ],
    curAge = -1, subset,
    houseHold = [],
    ts = performance.now();

// split population into men & women
personArray.forEach(function(p) {
  sex[p.sex == 'M' ? 0 : 1].push(p);
});

// sort men by age
sex[0].sort(function(a, b) { return a.age - b.age; });

// iterate on men
sex[0].forEach(function(m) {
  if(m.age != curAge) {
    // create set of matching women for this age
    subset = sex[1].filter(function(w) {
      return w.isAvailable && w.isSingle && Math.abs(m.age - w.age) <= num;
    });
    // sort by age difference, so that women with
    // a small age difference are picked first
    subset.sort(function(a, b) {
      return Math.abs(m.age - b.age) - Math.abs(m.age - a.age);
    });
    curAge = m.age;
  }
  if(m.isSingle && subset.length) {
    // pick woman from set
    var w = subset.pop();
    m.isAvailable = false; // (not really necessary)
    w.isAvailable = false;
    houseHold.push([ m, w ]);
  }
});

console.log(
  'Found ' + houseHold.length + ' matches ' +
  'in ' + Math.round(performance.now() - ts) + 'ms'
);
console.log(
  'Random example:',
  houseHold[(Math.random() * houseHold.length) | 0]
);

Interesting, I think I can work with something like that. The only thing is that I really wanted my householdArray to contain Household objects with different properties. Two of those properties would be man and woman and the value for these would be the appropriate Person object from my personArray. The Person object would also have a household property and the value would be the appropriate Household object from the householdArray. This way, I could do things likepersonArray[x].household.woman or household[x].man.age. It's all very neat and organized in my head.
@neoflash - Sure. I slightly simplified your original code so that it could be executed as a snippet without the definition of your Household object. Just use householdArray.push(new Household(m, w)) and you should be all set to go.

BeyelerStudios · Accepted Answer · 2016-08-13 18:10:08Z

As others and I have stated in the comments: your approach is bruteforce, which in this case is quadratic in the input size. There are several optimisation possibilities. For binary values (i.e. booleans) splitting the array into categories is trivial. Number values like age may be clustered, e.g. into ranges. And you should definitively adopt early abort as mentioned by mm759. TLDR: there's a table and conclusion at the bottom.

Consider the bruteforce approach (for reference):

// The result is a list of matches [[candidate.id, match.id (or -1)]]
function bruteforce(arr) {
  var matches = [];
  for(var i = 0; i < arr.length; ++i) {
    var candidate = arr[i], num = 12;
    var isCandidate = !candidate.isSingle && candidate.isAvailable && candidate.sex == 1;
    var cSex = candidate.sex;
    var cSexPref = candidate.sexPref;
    var cAgeMin = candidate.age - num;
    var cAgeMax = candidate.age + num;
    var result = !isCandidate ? -1 : arr.reduce(function(p,c,k,a){
      return k != i &&
        c.sex == cSexPref &&
        c.sexPref == cSex &&
        !c.isSingle && c.isAvailable &&
        c.age <= cAgeMax &&
        c.age >= cAgeMin ? k : p;
    }, -1);
    if(isCandidate)
      matches.push([i, result]);
  }
  return matches;
}

The category approach might look like this:

function useTheCategory(arr) {
  // preprocessing the data
  var wmNonsingleAvailables = [];
  var wwNonsingleAvailables = [];
  var mwNonsingleAvailables = [];
  var mmNonsingleAvailables = [];
  // split the data into categories
  arr.forEach(function(c) {
    if(!c.isSingle && c.isAvailable) {
      if(c.sex == 0) {
        if(c.sexPref == 1)
          wmNonsingleAvailables.push(c);
        else
          wwNonsingleAvailables.push(c);
      } else {
        if(c.sexPref == 0)
          mwNonsingleAvailables.push(c);
        else
          mmNonsingleAvailables.push(c);
      }
    }
  });

  var matches = [];
  for(var i = 0; i < arr.length; ++i) {
    var candidate = arr[i], num = 12;
    var isCandidate = !candidate.isSingle && candidate.isAvailable && candidate.sex == 1;
    var cSex = candidate.sex;
    var cSexPref = candidate.sexPref;
    var cAgeMin = candidate.age - num;
    var cAgeMax = candidate.age + num;
    if(isCandidate) {
      var category = null;
      // find the relevant category (in this case)
      // a more complex approach/split might include multiple categories here
      if(cSex == 0) {
        if(cSexPref == 1)
          category = mwNonsingleAvailables;
        else if(cSexPref == 0)
          category = wwNonsingleAvailables;
      } else if(cSex == 1) {
        if(cSexPref == 0)
          category = wmNonsingleAvailables;
        else if(cSexPref == 1)
          category = mmNonsingleAvailables;
      }
      var result = -1;
      if(category == null) {
        // always handle the error case...
        console.log("logic error: missing category!");
        console.log("candidate: " + JSON.stringify(candidate));
      } else {
        // the tests for matching sex/single/availability are left-overs and not necessarily required,
        // they are left in here to show that the reduce is not the culprit of your slowdown
        var match = category.reduce(function(p,c,k,a){
          return c.id != i &&
            c.sex == cSexPref &&
            c.sexPref == cSex &&
            !c.isSingle && c.isAvailable &&
            c.age <= cAgeMax &&
            c.age >= cAgeMin ? k : p;
        }, -1);
        // translate to arr index
        if(match != -1)
          result = category[match].id;
      }
      matches.push([i, result]);
    }
  }
  return matches;
}

And the age range bucket approach might look like this:

function useAgeRange(arr) {
  // preprocessing the data
  var ranges = [1, 2, 3, 4, 5]; // find appropriate ranges to spread the entries evenly (analyse your data, more below...)
  var ageBuckets = [];
  // find the range of age values
  var ageRange = arr.length == 0 ? [0, 0] : arr.reduce(function(p,c) {
    var min = c.age < p[0] ? c.age : p[0];
    var max = c.age > p[1] ? c.age : p[1];
    return [min, max];
  }, [arr[0].age, arr[0].age]);
  // build the buckets (floor for nicer values)
  for(var age = Math.floor(ageRange[0]), maxAge = ageRange[1], step = 0; age <= maxAge; age += step) {
    // update step size
    if(step == 0)
      step = ranges[0];
    else
      step = ranges[Math.min(ranges.length - 1, ranges.indexOf(step) + 1)];
    ageBuckets.push({
      nextAge: age + step,
      bucket: [],
    });
  }
  function findBucketIndex(age) {
    // min i with age < ageBuckets[i].nextAge
    for(var i = 0, maxi = ageBuckets.length - 1; i < maxi; ++i)
      if(age < ageBuckets[i + 1].nextAge)
        return i;
    return -1;
  }
  arr.forEach(function(c) {
    ageBuckets[findBucketIndex(c.age)].bucket.push(c);
  });

  var matches = [];
  for(var i = 0; i < arr.length; ++i) {
    var candidate = arr[i], num = 12;
    var isCandidate = !candidate.isSingle && candidate.isAvailable && candidate.sex == 1;
    var cSex = candidate.sex;
    var cSexPref = candidate.sexPref;
    var cAgeMin = candidate.age - num;
    var cAgeMax = candidate.age + num;
    if(isCandidate) {
      // Find range intersection with ageBuckets
      var startBucket = findBucketIndex(cAgeMin);
      var endBucket = findBucketIndex(cAgeMax);
      if(startBucket < 0) startBucket = 0;
      if(endBucket < 0) endBucket = ageBuckets.length - 1;
      var result = -1;
      // now only search those candidate buckets
      for(var b = startBucket; b <= endBucket; ++b) {
        var bucket = ageBuckets[b].bucket;
        var match = bucket.reduce(function(p,c,k,a){
          return c.id != i &&
            c.sex == cSexPref &&
            c.sexPref == cSex &&
            !c.isSingle && c.isAvailable &&
            c.age <= cAgeMax &&
            c.age >= cAgeMin ? k : p;
        }, -1);
        // translate to arr index
        if(match >= 0)
          result = bucket[match].id;
      }
      matches.push([i, result]);
    }
  }
  return matches;
}

I created a benchmark showing the improvements of the two approaches on jsfiddle. Both are effective by themselves (even including the preprocessing, values will differ across systems and browsers):

N       Search space  Brute force  Categories  Range buckets
         (#matches)          (relative timing values)
20000   2500          200          34          140
40000   5000          1400         180         556
80000   10000         5335         659         2582
160000  20000         17000        2450        16900

Analysing your data to find which approach is appropriate is everything: my benchmark generates an exponential distribution (ages 18-20 are 28% of the data points, 21-32 another 27%, 33-52 another 27% with 53-77 leaving about 18%). The range-approach doesn't deal well with this distribution as we can see in the timings above (that's for a fixed num = 12 years and 14 buckets) because for most of the queries an age range of 24 spans 55% of the data.

Collectives™ on Stack Overflow

Javascript: Optimizing `reduce` for performance

3 Answers 3

Comments

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Linked

Related