1

I have a handful of tables that contain data about Payments, and we need to identify which of the payments is the most recent and which is the oldest (partitioned by the person the payment was made to, and the type of payment).

We presumed the best way to achieve this was with a calculated column and RANKX - however, the server runs out of memory when we do a recalc - and the cause is the calculated columns.

Part of the reason is the number of records in the tables I'm referring to (approx. 10mil)

Probably easier to see with some mocked up examples - in the table below, the 'RANK' Column is our expected output. What we are trying to achieve is to group records by PersonRef and ElementID - then sort by EndDate - the record with the LATEST EndDate will have a RANK of 1.

+-----------+-----------+------------+------+
| PersonRef | ElementId |  EndDate   | RANK |
+-----------+-----------+------------+------+
|    123456 |      1000 | NULL       |    1 |
|    123456 |      1000 | 01/01/2017 |    2 |
|    123456 |      6010 | 31/03/2018 |    1 |
|    123456 |      6010 | 12/01/2018 |    2 |
|    789789 |       999 | NULL       |    1 |
|    789789 |       999 | 25/02/2018 |    2 |
|    789789 |       999 | 01/03/2016 |    3 |
|    789789 |      1000 | 25/02/2018 |    1 |
|    789789 |      1000 | 01/03/2016 |    2 |
+-----------+-----------+------------+------+

We can't really do this in our SQL table either, because the SSAS table is partitioned, and we process only relevant partitions once every 5 minutes - if a new entry came in at Rank 1, we would have to alter the records in SQL for that person, which would result in all the SSAS partitions being processed, which is too inefficient for us.

We tried this as a Calculated Column, and it's memory usage was intense:

VAR CurrentPersonRef = 'Payment'[Person_Ref]
VAR CurrentPayElement = 'Payment'[ElementId]
RETURN
RANKX 
(
    FILTER 
    (
        'Payment',
        'Payment'[Person_Ref]= CurrentPersonRef &&
        'Payment'[ElementId] = CurrentPayElement
    ),
    IF(ISBLANK('Payment'[Pay End Date]),"2999-01-01",'Payment'[Pay End Date]), , ASC, DENSE
)

Any other suggestions would be appreciated!

1 Answer 1

1

I'm not positive that this will give better performance, but give it a try:

Ranked = 
    VAR EndDate = Payment[AdjEndDate]
    RETURN CALCULATE(
               RANK.EQ(EndDate, Payment[AdjEndDate], DESC),
               ALLEXCEPT(Payment, Payment[Person_Ref], Payment[ElementId]))

Where Payment[AdjEndDate] is a column that replaces the null values:

AdjEndDate = IF(ISBLANK(Payment[Pay End Date]), DATE(2999,1,1), Payment[Pay End Date])
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your time Alexis, won't be able to try this now until Monday, but didn't want you to think I was ignoring your response! Will update once I've given it a try
Hi Alexis, I've given this a try and it does appear to be much more efficient, using about half the memory of the other procedures we've tried

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.