2

I have a data aggregation function that collects statistics from multiple sources. The problem is that whenever I add a new metric, I need to manually update the code in three different places, which is error-prone and hard to maintain.

Current Code Structure:

// Part 1: Initialize data object
const storeData = {
    collectors_count: 0,
    toys_count: 0,
    toys_inspected: 0,
    toys_cleaned: 0,
    inspection_percent: 0,
    cleaning_percent: 0,
};

// Part 2: Accumulate data from each source
for (const collector of collectors) {
    const details = await db.getCollectorDetails(collector.id);
    
    if (!details) continue;

    storeData.collectors_count++;
    storeData.toys_count += details.toy_count;
    storeData.toys_inspected += details.inspected_count;
    storeData.toys_cleaned += details.cleaned_count;
}

// Part 3: Calculate derived metrics
storeData.inspection_percent = (storeData.toys_inspected / storeData.toys_count) * 100;
storeData.cleaning_percent = (storeData.toys_cleaned / storeData.toys_count) * 100;

return storeData;

The Problem:

  • When I want to add a new metric (e.g., toys_repaired), I have to:
  • Add it to the initial object with value 0
  • Add accumulation logic in the loop: storeData.toys_repaired += details.repaired_count
  • Possibly add calculation logic if it's a derived metric: storeData.repair_percent = ...

This becomes tedious with many metrics and I sometimes forget to update all three places. Is there a way to refactor this code to be more extensible and maintainable?

1
  • Some languages and/or tools will warn you if you have an unused variable/field, or that you only write to it, not read and vice versa (e.g. Rust does this). But there's absolutely no way for a compiler to warn you that it needs to be used in two places. So either you just discipline yourself or add an intermediate data structure between the steps. Which increases code bloat (and potentially degrades performance) but might be worth it. Commented Oct 1 at 7:28

2 Answers 2

5

1

Do it the other way around:

  1. Add a unit test that would assert that the calculation is correct. Once you do it, if you forget anything else, the test will remain you about that.

  2. Then change the calculation logic. Once you do that, there is no way your code would work if you don't add the new metric.

  3. Then add the accumulator, and finally the new metric.

2

An additional thing you may use is to move some of the code closer. Instead of having an anonymous type to represent storeData, you can create a class DataCollector, with its own logic when it comes to adding one collector to another (using operator overloading, if supported, or a custom method if not). Same applies for the calculations. Your accumulation and the use of calculations would therefore look like this:

// Part 2: Accumulate data from each source
for (const collector of collectors) {
    const details = await db.getCollectorDetails(collector.id);

    if (details) {
        storeData.add(details);  // No more individual metrics here.
    }
}

// Part 3: Calculate derived metrics
const something = storeData.get_inspection_percent();
const other = storeData.get_cleaning_percent();

3

A final thing is to add check, through unit testing, if all metrics are used. Say you followed my second advice and you implemented DataCollector.add() method. If you are afraid that you'll forget using my first advice and will add a metric, without using it in add, then create a test that would (1) list all the metrics, (2) mutate them one by one, and (3) ensure that add would change.

2
  • Good answer. Still I would it make more apparent that your class DataCollector is intended to be the type of the variable storeData - I had to read this two times before I got this. Commented Oct 2 at 5:58
  • @DocBrown: thanks. It's fixed now. Commented Oct 2 at 13:38
0

The problem you have there is in some sense similar to what JavaScripts's Array.reduce function is designed to tackle (other languages provide alike facilities). It generalizes various data accumulation/summarizing procedures, so you could use that for inspiration.

Here's a quick overview: The function iterates over an array, and you pass in a lambda that takes in an "accumulator" object and the next element in the iteration sequence. Inside the lambda, you update the accumulator and return it. Optionally, you can pass in the initial accumulator value (otherwise, the first element will be used).

const initialVal = 0;
let sum = [1, 2, 3].reduce((acc, next) => acc + next, initialVal);

// (0, 1) => 0 + 1      acc input is 'initialVal', returns the new acc value for the next iteration
// (1, 2) => 1 + 2      post execution, acc is 3
// (3, 3) => 3 + 3      post execution, acc is 6
// sum == 6             final result

You can do sums, products, and all kinds of other things. It's not hard to roll your own:

function reduce(collection, reducer, initialValue) {
    let acc = initialValue;
    
    for(const item of collection)
        acc = reducer(acc, item);
        
    return acc;
}

You can adapt this design to serve your purposes. I'm going to assume that it makes sense to pre-fetch all the data beforehand, and store it in an in-memory array for later processing.

Your array might look something like this:

[
    { toy_count: 10, inspected_count: 5, cleaned_count: 4 },
    { toy_count: 20, inspected_count: 12, cleaned_count: 6 },
    { toy_count: 15, inspected_count: 15, cleaned_count: 10 },
    ...
]

Now, the problem you have is that related changes are spread-out; this makes it easy to forget to make all the necessary edits. Let's bring all the related stuff together. First, create a "config" object that keeps all the information about collecting data in one place. Here's one way to do it:

const config = {
    metrics: [
        {
            initial_value: 0, 
            prop: "collectors_count",
            reducer: (acc, _) => acc + 1,
        },
        { 
            initial_value: 0, 
            prop: "toys_count",
            reducer: (acc, details) => acc + details.toy_count,
        },
        { 
            initial_value: 0,
            prop: "toys_inspected",
            reducer: (acc, details) => acc + details.inspected_count,
        },
        { 
            initial_value: 0,
            prop: "toys_cleaned",
            reducer: (acc, details) => acc + details.cleaned_count,
        },
    ],
    derived_metrics: [
        (storeData) => storeData.inspection_percent = (storeData.toys_inspected / storeData.toys_count) * 100,
        (storeData) => storeData.cleaning_percent = (storeData.toys_cleaned / storeData.toys_count) * 100,
    ]
}

Now create a reduce function that can work with this config object. Compared to the vanilla reduce function from before, the parameter list is a bit different, and there are now two loops, but it's the same general idea:

function reduce(detailsList, storeData, config) {
    for (const details of detailsList) {
        for (const metricSpec of config.metrics) {
            let acc = storeData[metricSpec.prop] || metricSpec.initial_value;
            acc = metricSpec.reducer(acc, details);
            storeData[metricSpec.prop] = acc;
        }
    }
}

And another function for the derived metrics:

function deriveMetrics(storeData, config) {
    for (const metricCalculator of config.derived_metrics) {
        metricCalculator(storeData);
    }
}

Finally, you can use them like this:

let storeData = {};

reduce(detailsList, storeData, config);
deriveMetrics(storeData, config);

// at this point storeData will be filled in 

Note that now you don't have to initialize any properties in storeData, everything is driven by the config object. You also won't have to touch reduce or deriveMetrics, assuming your changes don't require you to fundamentally alter the accumulation logic.

If you for some reason cannot pre-fetch the data, and need reduce to be async and work on the fly, you can do that too:

async function reduce(collectors, storeData, config) {
    
    for (const collector of collectors) {
        const details = await db.getCollectorDetails(collector.id);
        if (!details) continue;
      
        for (const metricSpec of config.metrics) {
            let acc = storeData[metricSpec.prop] || metricSpec.initial_value;
            acc = metricSpec.reducer(acc, details);
            storeData[metricSpec.prop] = acc;
        }
    }
}

// And then elsewhere

let storeData = {};

await reduce(detailsList, storeData, config);
deriveMetrics(storeData, config);

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.