Complex transformation in Excel Power Query

Question

I have 2 input Tables. Input Table1 is the source data and input Table 2 is a criteria table.

+--------------------------+----------+  +--------------------------+-------+
|       TABLE 1 (Source data)         |  |       TABLE 2 (Criterias)        |
+-------------------------------------+  +----------------------------------+

+-------------------------------------+  +----------------------------------+
| DESCRIPTION              | VALUE    |  | PREFIX                   | CODE  |
+-------------------------------------+  +----------------------------------+
| ID                       | 0        |  | 7235                     | ABX1  |
| NAME                     | JFMSC    |  | 3553                     | POWQ  |
| TYPE                     | UHELQ    |  | 7459                     | UWEER |
| DFRUL                    | F4       |  | 10012                    | ABX1  |
| ADDR                     | 10012002 |  | 430                      | ABX1  |
| RRUL                     | P1       |  +--------------------------+-------+ 
| ADDR                     | 723      |  
| RRUL                     | P1       |  
| ID                       | 2        |  
| NAME                     | PLLSJS   |  
| TYPE                     | UHELQ    |  
| DFRUL                    | P3       |  
| ID                       | 4        |  
| NAME                     | AAAARR   |  
| TYPE                     | UHELQ    |  
| DFRUL                    | T7       |  
| ADDR                     | 35531156 |  
| RRUL                     | P1       |  
| ADDR                     | 72358    |  
| RRUL                     | P1       |  
| ADDR                     | 86401    |  
| RRUL                     | K9       |  
| ID                       | 0        |  
| NAME                     | PPROOA   |  
| TYPE                     | RRHN     |  
| DFRUL                    | P1       |  
| ADDR                     | 43001    |  
| RRUL                     | T8       |  
| ADDR                     | 7459001  |  
| RRUL                     | D4       |  
| ADDR                     | 430457   |  
| RRUL                     | W2       |  
| ADDR                     | 745913   |  
| RRUL                     | P1       |  
| ADDR                     | 74598001 |  
| RRUL                     | Y5       |  
+--------------------------+----------+

My goal is to get the an output table like below (Would be the Table #4), that shows the CODE that is THE MOST similar compared with each number of field "ADDR" based on criterias of "TABLE 2". If there are repeated CODEs for each ID, I only want to show one (unique codes list).

I explain in more detail in Sample file attached here SampleV1.xlsx.

I want to Transform the data based in Input Table 1 and 2 to get an output table like this (Desired OUTPUT TABLE #2 in file attached):

+----+--------+-------+-------+-------+------+
| ID | NAME   | TYPE  | DFRUL | CODE  | RRUL |
+----+--------+-------+-------+-------+------+  
| 0  | JFMSC  | UHELQ | P1    | ABX1  | P1   |
| 2  | PLLSJS | UHELQ | P3    |       |      |
| 4  | AAAARR | UHELQ | T7    | POWQ  | P1   |
|    |        |       |       | ABX1  | P1   |
|    |        |       |       | 86401 | K9   |
| 0  | PPROOA | RRHN  | P1    | ABX1  | P1   |
|    |        |       |       | UWEER | P1   |
+----+--------+-------+-------+-------+------+

I hope someone could help me with this. Thanks in advance.

There's too much going on in this question. It seems to me your main issue is attempting to match a full ADDR to a prefix, so I'd recommend posting a more specific question targeted at just that. An mcve is ideal. You've got plenty of detail, but it's more likely to attract a solution and be useful for future readers if it's a bit more focused and minimal. — Alexis Olson
– Alexis Olson, Commented Mar 16, 2019 at 15:30
I understand, but if I don't put anything else than input and output desired normally they ask, what have you done? Hehe — Ger Cas
– Ger Cas, Commented Mar 16, 2019 at 17:18
@Alexis Olson I've edited the question, I hope be better in this way. Thanks — Ger Cas
– Ger Cas, Commented Mar 16, 2019 at 17:27

Andrey Minakov · Accepted Answer · 2019-03-24 15:23:20Z

Below is the UPDATED solution.

In general, I compiled the solution in order to be as less vulnerable to problems with data, as possible.

The only constrains to data are:

Field sets must have ID field, which must be the first field of set.
all the RRUL and ADDR have to be in pairs,
Duplicates of RRUL/ADDR pairs inside one ID are acceptable or absent.

I also compiled the solution in a way to correctly find the closest value in all possible variants of ADDR and PREFIX. By the way - there is one case, not covered in your bigsample - when PREFIX is shorter then ADDR but not equal to it. If there are such cases - my solution handles them correctly but demands some performance overhead for this particular situation.

let
        Source = #"Source data",
    #"Added Index1" = Table.AddIndexColumn(Source, "Index", 0, 1),

    #"Added Custom" = Table.AddColumn(#"Added Index1", "Main Key", each if [DESCRIPTION] = "ID" then [Index] else null, type number),

    #"Added Custom10" = Table.AddColumn(#"Added Custom", "Last notADDR", each 
        if [DESCRIPTION] <> "ADDR" and [DESCRIPTION] <> "RRUL" then [Index] else null),

    #"Filled Down" = Table.FillDown(#"Added Custom10",{"Main Key", "Last notADDR"}),

    #"Added Custom2" = Table.AddColumn(#"Filled Down", "Key", each [Main Key] + (
        if [DESCRIPTION] = "RRUL" then [Index] - [Last notADDR] - 2 
            else if [DESCRIPTION] = "ADDR" then [Index] - [Last notADDR] - 1 else 0)),

    #"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Index", "Main Key", "Last notADDR"}),

    #"Pivoted Column1" = Table.Pivot(#"Removed Columns", 
        List.Distinct(#"Removed Columns"[DESCRIPTION]), "DESCRIPTION", "VALUE"),

    #"Added Custom3" = Table.AddColumn(#"Pivoted Column1", "CODE", each if [ADDR] = null then null else let t = Table.AddIndexColumn(Table.SelectRows(Criterias, (x)=> 
        let s=List.Sort({x[PREFIX], [ADDR]}, each Text.Length(_)) in Text.StartsWith(s{1}, s{0})), "Index")
            in if Table.RowCount(t) > 0 then Table.First(Table.Sort(t, (y)=> Number.BitwiseShiftLeft(Number.Abs(Text.Length([ADDR]) - Text.Length(y[PREFIX])), 16) + y[Index]))[CODE] 
            else "Not Found"),
    #"Removed Columns1" = Table.RemoveColumns(#"Added Custom3",{"Key", "ADDR"}),
    #"Filled Down1" = Table.FillDown(#"Removed Columns1",{"ID", "NAME", "TYPE", "DFRUL"})
in
    #"Filled Down1"

Hi Andrey, thanks so much for answer and take time and detailed and clear code that can be understandable better. Your solution produces a closer output. I'll try to show you the differences asap.
Hi Andrey, your output shows similar vs my desired output, but in my output the CODEs are shown only once. In your solution the CODEs appeared the same number of times that an ADDR value related with some CODE. Thanks for the help. Regards

Collectives™ on Stack Overflow

Complex transformation in Excel Power Query

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related