I have a handful of tables that contain data about Payments, and we need to identify which of the payments is the most recent and which is the oldest (partitioned by the person the payment was made to, and the type of payment).
We presumed the best way to achieve this was with a calculated column and RANKX - however, the server runs out of memory when we do a recalc - and the cause is the calculated columns.
Part of the reason is the number of records in the tables I'm referring to (approx. 10mil)
Probably easier to see with some mocked up examples - in the table below, the 'RANK' Column is our expected output. What we are trying to achieve is to group records by PersonRef and ElementID - then sort by EndDate - the record with the LATEST EndDate will have a RANK of 1.
+-----------+-----------+------------+------+ | PersonRef | ElementId | EndDate | RANK | +-----------+-----------+------------+------+ | 123456 | 1000 | NULL | 1 | | 123456 | 1000 | 01/01/2017 | 2 | | 123456 | 6010 | 31/03/2018 | 1 | | 123456 | 6010 | 12/01/2018 | 2 | | 789789 | 999 | NULL | 1 | | 789789 | 999 | 25/02/2018 | 2 | | 789789 | 999 | 01/03/2016 | 3 | | 789789 | 1000 | 25/02/2018 | 1 | | 789789 | 1000 | 01/03/2016 | 2 | +-----------+-----------+------------+------+
We can't really do this in our SQL table either, because the SSAS table is partitioned, and we process only relevant partitions once every 5 minutes - if a new entry came in at Rank 1, we would have to alter the records in SQL for that person, which would result in all the SSAS partitions being processed, which is too inefficient for us.
We tried this as a Calculated Column, and it's memory usage was intense:
VAR CurrentPersonRef = 'Payment'[Person_Ref]
VAR CurrentPayElement = 'Payment'[ElementId]
RETURN
RANKX
(
FILTER
(
'Payment',
'Payment'[Person_Ref]= CurrentPersonRef &&
'Payment'[ElementId] = CurrentPayElement
),
IF(ISBLANK('Payment'[Pay End Date]),"2999-01-01",'Payment'[Pay End Date]), , ASC, DENSE
)
Any other suggestions would be appreciated!