System/Language: I'm using plain Javascript to write this algorithm. Environment: Node 17.4.0 with a system that has ~7000 MiB of Memory and ~400GB of Storage.
Input Data
- An (
AoP) Array ofPairs (a Pair is an array of two elements) where the first element of each Pair is aStringand the second element is aNumber. - The program can ASSUME that the input AoP is sorted in ascending order by the first element of each Pair - the String.
Example of the Input Data
[ ["A", 3], ["A", 2], ["B", 9], ["C", 1], ["C", 2], ["C", 3], ["D", 2], ["E", 0], ["E", 1] ]
Algorithm
Reduce the input AOP such that all Pairs that have the same String element are merged into one pair with the same string. The Number element should be the sum of all Number elements of the Pairs merged.
Example of the Output Data
[ ["A", 5], ["B", 9], ["C", 6], ["D", 2], ["E", 1] ]
Initial Algorithm
I'm more comfortable writing functions in recursive form (coming from a Scheme background) so I quickly came up with the following:
function sumSameKeysSortedRecursive(arr) {
if (arr.length === 0) {
return [];
} else {
let [fst, ...rst] = arr;
let recurAns = sumSameKeysSortedRecursive(rst);
if (recurAns.length === 0) {
return [fst];
} else {
let [f, ...r] = recurAns;
return f[0] === fst[0]
? [[f[0], f[1] + fst[1]], ...r]
: [fst, ...recurAns];
}
}
}
However, my input data has 11 million lines and won't scale with the recursive version (even a tail-recursive version won't work on Node). The second version uses loops:
function sumSameKeysSortedLoop(arr) {
let res = [];
let i = 0; // tracks first occurrence a pair with a unique String
let j = 0; // tracks all subsequent occurrences of Pairs with the same String
let len = arr.length;
while (i < len) {
let sum = 0;
let iname = arr[i][0];
let jname = iname;
while (jname === iname) {
sum = sum + arr[j][1];
j++;
if (j === len) { // have reached beyond the array bound
break;
} else {
jname = arr[j][0];
}
}
res.push([iname, sum]);
i = j; // jump over to the new unique element
if (i === len) {
break;
} else {
iname = arr[i][0];
}
}
return res;
}
Possible Improvements?
I think v2 has space for improvement in at least 2 areas:
- Readability: In v1, I can read and understand exactly what is happening (can improve it further by adding helper functions), while in v2 a future reader of the code might take more time understanding it. (Could be subjective)
- while + break;: Is there an easy way to translate the program such that its functionality stays the same but uses a for-loop instead? Or where it does not use
break;(a more general version)? Or where it does not have to keep track of so many variables (i, j, sum, iname, jname)?