0

PROBLEM

In PowerQuery I need to go from this table (input) enter image description here

to this table (output)
enter image description here

where each YYYYMM column (except the first, in this case 202401) is transformed by subtracting the value in the column immediately to the left.
It's guaranteed there will be no skipped months in the sequence.
The number of month columns is unknown at each refresh, so I need to do this dynamically.


SO FAR
After researching I found a promising approach in this SO answer.

This works:

output = Table.TransformRows(
    input,
    (r) =>  Record.TransformFields(
        r,
        {{"202405", each _ - r[202404]},
        {"202404", each _ - r[202403]},
        {"202403", each _ - r[202402]},
        {"202402", each _ - r[202401]},
        {"202401", each _ - r[202312]}}
    )
)

but as you can see, this approach requires hardcoded values so it needs to be generalized.

Record.TransformFields accepts a list of TransformOperations, so I tried to generate the list of transformations dynamically, but ultimately failed and finally succeeded (see UPDATE below).

periods = List.Sort(List.Skip(Table.ColumnNames(input), 1), Order.Ascending)

output = Table.TransformRows(
    input,
    (r) =>  Record.TransformFields(
        r,
        List.Transform(
            List.Skip(periods, 1),
            each (p) => {p, each (e) => e - Record.Field(r, Text.From(Number.From(p) - 1))}
        )
    )
)

The code above doesn't work for two reasons:

a. List.Transform is not returning valid TransformOperations since each row transformation errors out with

Expression.Error: Expected a TransformOperations value.
Details:
    [List]

b. It wouldn't handle Januaries since the previous column would have a different year and the month wraps (eg.: when I need to subtract 202312 from 202401). I guess this could be handled with an if statement placed in the List.Transform transformation (subtract 89 if last digit is 1).

I also researched Table.TransformColumns but I believe the transformOperations can't access values outside of the column being transformed.

I am not even sure this is the right approach and I couldn't find anything else, so I'd appreciate any help with this.


UPDATE - WORKING BUT EXTREMELY SLOW SOLUTION
I managed to make the above approach work. I was incorrectly using each with the explicit function declaration (a) => something(a). I also plugged in the logic for when the year wraps.

periods = List.Sort(List.Skip(Table.ColumnNames(input), 1), Order.Ascending)

output = Table.TransformRows(
    input,
    (r) =>  Record.TransformFields(
        r,
        List.Transform(
            List.Skip(periods, 1),
            (p) => if Text.EndsWith(p, "01")
            then {p, (e) => e - Record.Field(r, Text.From(Number.From(p) - 89))}
            else {p, (e) => e - Record.Field(r, Text.From(Number.From(p) - 1))}
        )
    )
)  

The reason I am not using this answer for my own question is that this works fast on the test table I provided, but it is incredibly slow on my main table with hundreds, and possibly thousands of rows.

Not sure at this point if I should even attempt this, but I am pretty sure this can be done with reasonable performance. If I find anything more compelling I'll update/answer the question.

2 Answers 2

1

I'd be interested if this is any faster. Another approach would be to

  • Unpivot all except the ID column
  • Group by ID
  • Within each group
    • Add a Shifted Value column (to avoid using an Index column)
    • Do the subtractions
    • Pivot the results
  • Re-expand the groups

Data
enter image description here

M Code

let
    Source = Excel.CurrentWorkbook(){[Name="Table8"]}[Content],
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"ID"}, "Attribute", "Value"),
    #"Grouped Rows" = Table.Group(#"Unpivoted Other Columns", {"ID"}, {
        {"Subtract", (t)=>let 
                #"Add Shifted" = 
                    Table.FromColumns(
                        Table.ToColumns(t) &
                        {{null} & List.RemoveLastN(t[Value])},
                        {"ID", "Attribute","Value","Shifted Value"}),
                #"New Value" = Table.AddColumn(#"Add Shifted","New", each ([Value] - [Shifted Value])??[Value], type number),
                #"Remove Columns" = Table.RemoveColumns(#"New Value",{"Value","Shifted Value"}),
                #"Rename" = Table.RenameColumns(#"Remove Columns",{{"New","Value"}}),
                #"Pivot" = Table.Pivot(#"Rename", #"Rename"[Attribute], "Attribute","Value")    
            in 
                #"Pivot"
            }}),
    #"Expanded Subtract" = Table.ExpandTableColumn(#"Grouped Rows", "Subtract", List.RemoveFirstN(Table.ColumnNames(Source),1)),
    #"Type Data" = Table.TransformColumnTypes(#"Expanded Subtract", 
        List.Transform(List.RemoveFirstN(Table.ColumnNames(Source),1), each {_, Int64.Type}))
in
    #"Type Data"

Results
enter image description here

Edit
For a technique using the Table.ReplaceValue function, along with List.Accumulate, I suggest the following:

let
    Source = Excel.CurrentWorkbook(){[Name="Table8"]}[Content],
    #"Change Type" = Table.TransformColumnTypes(Source,
        {{"ID", type text}} & List.Transform(List.RemoveFirstN(Table.ColumnNames(Source),1), each {_, Int64.Type})),

//Column pairs to process
    cp= List.Reverse(List.RemoveLastN(
            List.Zip({List.RemoveFirstN(Table.ColumnNames(Source),1), List.RemoveFirstN(Table.ColumnNames(Source),2)})
            ,1)),
    
    #"Replace Values" = List.Accumulate(
        cp,
        #"Change Type",
        (s,c)=> Table.ReplaceValue(
            s,
            each Record.Field(_,c{1}),
            each Record.Field(_,c{0}),
            (x,y,z) as number => y-z,
            {c{1}}
        )
    )

in
    #"Replace Values"
Sign up to request clarification or add additional context in comments.

1 Comment

I've marked this as the accepted answer since not only this is faster than my solution when adapted to my specific dataset (<10s against ~15s), but it also removes a pivot step (the data I presented in the question was pivoted earlier in the query to show months on columns), so shifting values is superior in my case. Thanks for the feedback, good to see a different approach.
0

FAST SOLUTION
I am posting this as an answer since using Table.AddColumn inside List.Accumulate is extremely fast and it solves my problem.

periods = List.Sort(List.Skip(Table.ColumnNames(input), 1), Order.Ascending)

output = Table.RemoveColumns(
    List.Accumulate(
        periods,
        input,
        (tbl, item) => Table.AddColumn(
            tbl,
            "N" & item,
            (e) => if Text.EndsWith(item, "1")
            then try Record.Field(e, item) - Record.Field(e, Text.From(Number.From(item) - 89))
                otherwise Record.Field(e, item)
            else try Record.Field(e, item) - Record.Field(e, Text.From(Number.From(item) - 1))
                otherwise Record.Field(e, item), 
            type number
        )
    ),
    periods
)

The entire query including this step runs in ~15s on my real dataset. The other approach would take 5 minutes and above on the same dataset.

Using try ... otherwise ... has the added benefit of not needing to skip the first element in the periods list, and avoids any query failure when columns are missing.

This is not exactly transforming the columns, since I am creating new columns with temporary names ("N" & period, e.g.: N202401), but Power Query is extremely slow when attempting to transform columns in place, and I am not sure why (would love to understand more though).

So, in order to restore the original column names I need an extra step:

clean = Table.RenameColumns(output, List.Transform(periods, (p) => {"N" & p, p}))

This step doesn't introduce any noticeable overhead and I still obtain the desired result in few seconds, so I can definitely live with this.


ALTERNATIVE BUT SLOW SOLUTION WITH Table.ReplaceValue
Since I originally wanted to achieve a solution without resorting to new columns, I followed up on my first working approach mentioned in the question.
I came up with a second solution that uses List.Accumulate and Table.ReplaceValue but it's still extremely slow, I am just adding it for completeness.

periods = List.Sort(List.Skip(Table.ColumnNames(input), 1), Order.Ascending)

output = List.Accumulate(
    periods,
    input,
    (tbl, item) => Table.ReplaceValue(
        tbl,
        (src) => Record.Field(src, item),
        (dest) => if Text.EndsWith(item, "1")
            then Record.Field(dest, item) - Record.Field(dest, Text.From(Number.From(item) - 89))
            else Record.Field(dest, item) - Record.Field(dest, Text.From(Number.From(item) - 1)), 
        Replacer.ReplaceValue,
        periods
    )
)

This approach, other than being slow, has the drawback of changing columns type back to Any. Not a big deal, but still far from optimal.
Also, this needs to be started from the last month towards the first (periods needs to be sorted with Order.Descending), since values in columns are now changed when further replacing values in successive columns.

The query with this step takes around 2m45s, around half of the original working method run time, but still ~x10 more than my preferred solution.


OTHER SOLUTIONS
Even though this is enough of a solution for me, I would love to see different approaches from other PowerQuery users.

If you come up with a better solution, feel free to post it and I will change the verified answer if it's faster and more straightforward than mine.

1 Comment

BTW, you can assign primitive data types in Table.ReplaceValue by using a custom replacer function. eg Table.ReplaceValue(#"Change Type", each [202405], each [202404],(x,y,z) as number => y-z,{"202405"} would assign the number type. Unfortunately,you cannot type as other than a primitive type.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.