Fixing Memory intensive Calculated Column

Question

I have a handful of tables that contain data about Payments, and we need to identify which of the payments is the most recent and which is the oldest (partitioned by the person the payment was made to, and the type of payment).

We presumed the best way to achieve this was with a calculated column and RANKX - however, the server runs out of memory when we do a recalc - and the cause is the calculated columns.

Part of the reason is the number of records in the tables I'm referring to (approx. 10mil)

Probably easier to see with some mocked up examples - in the table below, the 'RANK' Column is our expected output. What we are trying to achieve is to group records by PersonRef and ElementID - then sort by EndDate - the record with the LATEST EndDate will have a RANK of 1.

+-----------+-----------+------------+------+
| PersonRef | ElementId |  EndDate   | RANK |
+-----------+-----------+------------+------+
|    123456 |      1000 | NULL       |    1 |
|    123456 |      1000 | 01/01/2017 |    2 |
|    123456 |      6010 | 31/03/2018 |    1 |
|    123456 |      6010 | 12/01/2018 |    2 |
|    789789 |       999 | NULL       |    1 |
|    789789 |       999 | 25/02/2018 |    2 |
|    789789 |       999 | 01/03/2016 |    3 |
|    789789 |      1000 | 25/02/2018 |    1 |
|    789789 |      1000 | 01/03/2016 |    2 |
+-----------+-----------+------------+------+

We can't really do this in our SQL table either, because the SSAS table is partitioned, and we process only relevant partitions once every 5 minutes - if a new entry came in at Rank 1, we would have to alter the records in SQL for that person, which would result in all the SSAS partitions being processed, which is too inefficient for us.

We tried this as a Calculated Column, and it's memory usage was intense:

VAR CurrentPersonRef = 'Payment'[Person_Ref]
VAR CurrentPayElement = 'Payment'[ElementId]
RETURN
RANKX 
(
    FILTER 
    (
        'Payment',
        'Payment'[Person_Ref]= CurrentPersonRef &&
        'Payment'[ElementId] = CurrentPayElement
    ),
    IF(ISBLANK('Payment'[Pay End Date]),"2999-01-01",'Payment'[Pay End Date]), , ASC, DENSE
)

Any other suggestions would be appreciated!

Alexis Olson · Accepted Answer · 2018-05-11 14:58:25Z

1

I'm not positive that this will give better performance, but give it a try:

Ranked = 
    VAR EndDate = Payment[AdjEndDate]
    RETURN CALCULATE(
               RANK.EQ(EndDate, Payment[AdjEndDate], DESC),
               ALLEXCEPT(Payment, Payment[Person_Ref], Payment[ElementId]))

Where Payment[AdjEndDate] is a column that replaces the null values:

AdjEndDate = IF(ISBLANK(Payment[Pay End Date]), DATE(2999,1,1), Payment[Pay End Date])

answered May 11, 2018 at 14:58

Alexis Olson

40.4k8 gold badges51 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Celador Over a year ago

Thanks for your time Alexis, won't be able to try this now until Monday, but didn't want you to think I was ignoring your response! Will update once I've given it a try

Celador Over a year ago

Hi Alexis, I've given this a try and it does appear to be much more efficient, using about half the memory of the other procedures we've tried

Collectives™ on Stack Overflow

Fixing Memory intensive Calculated Column

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related