Skip to main content
added 129 characters in body
Source Link
Greedo
  • 2.6k
  • 2
  • 15
  • 36

So tooToo big for a comment, this is an alternative approach:

The reason to do it this way is 1:

  • Very fast compared to what's easily achievable in python without a lot more thought, as PowerQuery is optimised for working with Tabular Data
  • Simple Expressive code is easier to maintain, again PQ is the tool for the job and makes it easier to write simple code in this instance.
  • Built into Excel so no added dependencies easier to maintain and distribute

So too big for a comment, this is an alternative approach:

The reason to do it this way is 1:

  • Very fast as PowerQuery is optimised for working with Tabular Data
  • Simple Expressive code
  • Built into Excel no added dependencies

Too big for a comment, this is an alternative approach:

The reason to do it this way is:

  • Very fast compared to what's easily achievable in python without a lot more thought, as PowerQuery is optimised for working with Tabular Data
  • Simple Expressive code is easier to maintain, again PQ is the tool for the job and makes it easier to write simple code in this instance.
  • Built into Excel so no added dependencies easier to maintain and distribute
added 36 characters in body
Source Link
Greedo
  • 2.6k
  • 2
  • 15
  • 36

The image shows 3 tables of data, each representing a different file, each with a File Created column, each with the same column names, each with an ID column and some fields of data.

  • Note, modify the FieldNames parameter in the query to match the columns in your workbook, make sure there is an ID column (right now it just uses the first column as ID)
  • This uses file creation date to find the most recent data.
  • If a row of data is changed then changed back, the "Last Change" column will find the first occasion where the product had those values. You can get around this by deleting old data, or adjusting the algorithm
  • Table.Buffer is needed to prevent PQ lazily evaluating the sort, since this would result in random records being dropped, not necessarily the oldest ones

The image shows 3 tables of data, each with a File Created column, each with the same column names, each with an ID column and some fields of data.

  • Note, modify the FieldNames parameter in the query to match the columns in your workbook, make sure there is an ID column (right now it just uses the first column as ID)
  • This uses file creation date to find the most recent data.
  • If a row of data is changed then changed back, the "Last Change" column will find the first occasion where the product had those values. You can get around this by deleting old data, or adjusting the algorithm

The image shows 3 tables of data, each representing a different file, each with a File Created column, each with the same column names, each with an ID column and some fields of data.

  • Note, modify the FieldNames parameter in the query to match the columns in your workbook, make sure there is an ID column (right now it just uses the first column as ID)
  • This uses file creation date to find the most recent data.
  • If a row of data is changed then changed back, the "Last Change" column will find the first occasion where the product had those values. You can get around this by deleting old data, or adjusting the algorithm
  • Table.Buffer is needed to prevent PQ lazily evaluating the sort, since this would result in random records being dropped, not necessarily the oldest ones
Source Link
Greedo
  • 2.6k
  • 2
  • 15
  • 36

So too big for a comment, this is an alternative approach:

Suppose you have some different files like this:

enter image description here

The image shows 3 tables of data, each with a File Created column, each with the same column names, each with an ID column and some fields of data.

I've also highlighted where a new piece of data differs from the same ID in a previous table.

To merge these, we can use the following algorithm:

  • Load all tables and append them
  • Group duplicate data; rows where the ID and fields are identical, but set a new "Last Modification Date" to be the oldest date in that group - i.e. the data has remained unchanged since that date
    • e.g. M002 is no different between 1-Jan (blue table) and 15-Jan (Orange Table). So group into a single row and set the last modified date to 1-Jan
  • Now for each ID, keep only the most recent "last modification" as this is the most up-to-date data
    • This can be achieved by sorting the "Last Modification Date" newest -> Oldest and dropping any rows with duplicate IDs
  • Finally sort alphabetically by ID or by modification date, whatever you find most logical.

Following that algorithm you get a table like this:

enter image description here

See how M004 has remained unchanged the whole time and so its last update was 1-Jan, M005 was updated 2-Feb etc.

Hopefully this is what you are after. The whole thing can be achieved using Excel's builtin PowerQuery

enter image description here Add a blank query to your Master Workbook and then go to View -> Advanced Editor and paste the following code:

let
    FieldNames = {"MRP ID", "Field 1", "Field 2", "Field 3"},
    IDField = List.First(FieldNames),
    SourceFolder = "C:\path\MASTER SHEETS\MRP Data",
    Source = Folder.Contents(SourceFolder),
    #"Filter MRP Files" = Table.SelectRows(Source, each ([Extension] = ".xlsx") and ([Name] = "MRP1.xlsx" or [Name] = "MRP2.xlsx" or [Name] = "MRP3.xlsx") and ([Attributes]?[Hidden]? <> true)),
    #"Read Tables from Files" = Table.AddColumn(#"Filter MRP Files", "First Table", each Excel.Workbook([Content]){[Kind="Table"]}[Data]),
    #"Discard Other Columns" = Table.SelectColumns(#"Read Tables from Files", {"Date created", "First Table"}),
    #"Merge Tables" = Table.ExpandTableColumn(#"Discard Other Columns", "First Table", FieldNames),
    #"Squish Unchanged Data" = Table.Group(#"Merge Tables", FieldNames, {{"Last Modified", each List.Min([Date created]), type nullable datetime}}),
    #"Force Most Recent files to top" = Table.Buffer(Table.Sort(#"Squish Unchanged Data",{{"Last Modified", Order.Descending}})),
    #"Drop out-dated Data" = Table.Distinct(#"Force Most Recent files to top", {IDField}),
    #"Sort Alphabetically" = Table.Sort(#"Drop out-dated Data",{{IDField, Order.Ascending}})
in
    #"Sort Alphabetically"

You should end up with something like this:

enter image description here

Now in the Home tab of the powerquery editor click Close and Load to a table in your workbook. Refresh using the refresh all command in Excel.

The reason to do it this way is 1:

  • Very fast as PowerQuery is optimised for working with Tabular Data
  • Simple Expressive code
  • Built into Excel no added dependencies

It is possible to pass params like filepaths, column names etc from VBA to PowerQuery if you want interactivity. By default the data will refresh whenever you open or close the workbook.

  • Note, modify the FieldNames parameter in the query to match the columns in your workbook, make sure there is an ID column (right now it just uses the first column as ID)
  • This uses file creation date to find the most recent data.
  • If a row of data is changed then changed back, the "Last Change" column will find the first occasion where the product had those values. You can get around this by deleting old data, or adjusting the algorithm