0

Its a sort of CDC ( Change Data Capture ) scenario in which I am trying to compare new data (in tblNewData) with old data (in tblOldData), and logging the changes into a log table (tblExpectedDataLog) including Column names of the changed values, new value itself and the old value in that column. Sample tables creation and data insertion script is as below. Challenge to me is, that the actual new and old tables are having more than 250 columns (just shortened the things to give you a better understanding). and I wish to avoid 250 conditions for each column to compare/check whether the values have changed or not.

Create Table tblNewData
(
    TransactionID   string,
    ReportDate  date,
    Comments    string,
    Premium     int
);

Insert Into tblNewData
Select '1116_0025' As TransactionID,    '2025-07-30' As ReportDate, 'ghi' As Comments,  105 As Premium
Union
Select '2540_0038' As TransactionID,    '2025-07-30' As ReportDate, 'jkl' As Comments,  100 As Premium
Union
Select '3459_0001' As TransactionID,    '2025-07-30' As ReportDate, 'pqr' As Comments,  80 As Premium
Union
Select '4870_0041' As TransactionID,    '2025-08-01' As ReportDate, 'bbbb' As Comments, 80 As Premium;


Create Table tblOldData
(
    TransactionID   string,
    ReportDate  date,
    Comments    string,
    Premium     int,
    ActiveFlag  boolean
);

Insert Into tblOldData
Select '1116_0025' As TransactionID,    '2025-07-30' As ReportDate, 'def' As Comments,  95 As Premium,  1 As ActiveFlag
Union
Select '1116_0025' As TransactionID,    '2025-07-30' As ReportDate, 'abc' As Comments,  90 As Premium,  0 As ActiveFlag
Union
Select '2540_0038' As TransactionID,    '2025-07-30' As ReportDate, 'jkl' As Comments,  100 As Premium, 1 As ActiveFlag
Union
Select '3459_0001' As TransactionID,    '2025-07-30' As ReportDate, 'mno' As Comments,  70 As Premium,  1 As ActiveFlag
Union
Select '4870_0041' As TransactionID,    '2025-07-01' As ReportDate, 'bbbb' As Comments, 80 As Premiums, 1 As ActiveFlag;

Create Table tblExpectedDataLog
(
    TransactionID                   string,
    Column_name_of_changed_value    string,
    PreviousValue                   string,
    NewValue                        string,
    LogDateTime                     timestamp
);

So I thought of getting the column names from Information_Schema.columns and comparing them dynamically. And to achieve this I tried using my old way of using STUFF() and for xml path('') to stuff the query with 250 columns and generate the SQL dynamically. But unfortunately Azure Databricks is not supporting this XML Path() function.

Eventually coming here to seek any possible solution?

2
  • String_agg?.... Commented Aug 1 at 5:08
  • Thanks, good idea, I'll give it a try. Commented Aug 2 at 4:37

1 Answer 1

0

It is possible to generate the sql code described in your question dynamicaly using information_schema.columns system view. The main thing to take care of is to cast all the values to the same data type (in my code::text) - this way you can generate the code that could be used to INSERT (or MERGE INTO) all the columns that have different values in old and new tables.

SELECT 'SELECT transactionID, Column_name_of_changed_value, PreviousValue, NewValue, now() as logdatetime FROM (' ||  Chr(10) || 
  STRING_AGG( 'Select   g.transactionID, ' || '''' || column_name || ''''  || ' as Column_name_of_changed_value, ' || Chr(10) ||
'g.old_' || column_name || '::text as PreviousValue, ' || 'g.new_' || column_name || '::text as NewValue ' || Chr(10) || 
'From  (Select tnew.transactionid, tnew.' || column_name || ' as new_' || column_name || ', ' || Chr(10) || 
'told.' || column_name || ' as old_' || column_name || Chr(10) ||   
'From tblNewData tnew 
Left Join tblOldData told ON( told.TransactionID = tnew.TransactionID And told.' || column_name || ' != tnew.' || column_name || ') ' || Chr(10) || 
'Where told.transactionid Is Not Null ) g ', 
  Chr(10) || 'UNION ' || Chr(10) ORDER BY ordinal_position) || ') ORDER BY transactionid' as sql_cmd
FROM information_schema.columns
Where table_schema = 'public' And  table_name = 'tblnewdata' And ordinal_position > 1
sql_cmd
SELECT transactionID, Column_name_of_changed_value, PreviousValue, NewValue, now() as logdatetime FROM (
Select g.transactionID, 'reportdate' as Column_name_of_changed_value,
g.old_reportdate::text as PreviousValue, g.new_reportdate::text as NewValue
From (Select tnew.transactionid, tnew.reportdate as new_reportdate,
told.reportdate as old_reportdate
From tblNewData tnew
Left Join tblOldData told ON( told.TransactionID = tnew.TransactionID And told.reportdate != tnew.reportdate)
Where told.transactionid Is Not Null ) g
UNION
Select g.transactionID, 'comments' as Column_name_of_changed_value,
g.old_comments::text as PreviousValue, g.new_comments::text as NewValue
From (Select tnew.transactionid, tnew.comments as new_comments,
told.comments as old_comments
From tblNewData tnew
Left Join tblOldData told ON( told.TransactionID = tnew.TransactionID And told.comments != tnew.comments)
Where told.transactionid Is Not Null ) g
UNION
Select g.transactionID, 'premium' as Column_name_of_changed_value,
g.old_premium::text as PreviousValue, g.new_premium::text as NewValue
From (Select tnew.transactionid, tnew.premium as new_premium,
told.premium as old_premium
From tblNewData tnew
Left Join tblOldData told ON( told.TransactionID = tnew.TransactionID And told.premium != tnew.premium)
Where told.transactionid Is Not Null ) g ) ORDER BY transactionid

... the code generated above could be used to insert the data in your tblExpectedDataLog table ...

INSERT INTO tblExpectedDataLog 

SELECT transactionID, Column_name_of_changed_value, PreviousValue, NewValue, now() as logdatetime FROM (
Select   g.transactionID, 'reportdate' as Column_name_of_changed_value, 
g.old_reportdate::text as PreviousValue, g.new_reportdate::text as NewValue 
From  (Select tnew.transactionid, tnew.reportdate as new_reportdate, 
told.reportdate as old_reportdate
From tblNewData tnew 
Left Join tblOldData told ON( told.TransactionID = tnew.TransactionID And told.reportdate != tnew.reportdate) 
Where told.transactionid Is Not Null ) g 
UNION 
Select   g.transactionID, 'comments' as Column_name_of_changed_value, 
g.old_comments::text as PreviousValue, g.new_comments::text as NewValue 
From  (Select tnew.transactionid, tnew.comments as new_comments, 
told.comments as old_comments
From tblNewData tnew 
Left Join tblOldData told ON( told.TransactionID = tnew.TransactionID And told.comments != tnew.comments) 
Where told.transactionid Is Not Null ) g 
UNION 
Select   g.transactionID, 'premium' as Column_name_of_changed_value, 
g.old_premium::text as PreviousValue, g.new_premium::text as NewValue 
From  (Select tnew.transactionid, tnew.premium as new_premium, 
told.premium as old_premium
From tblNewData tnew 
Left Join tblOldData told ON( told.TransactionID = tnew.TransactionID And told.premium != tnew.premium) 
Where told.transactionid Is Not Null ) g ) ORDER BY transactionid;
-- Check the table content
SELECT * FROM tblExpectedDataLog;

R e s u l t :

transactionid column_name_of_changed_value previousvalue newvalue logdatetime
1116_0025 comments abc ghi 2025-08-03 00:09:10.105595
1116_0025 comments def ghi 2025-08-03 00:09:10.105595
1116_0025 premium 90 105 2025-08-03 00:09:10.105595
1116_0025 premium 95 105 2025-08-03 00:09:10.105595
3459_0001 comments mno pqr 2025-08-03 00:09:10.105595
3459_0001 premium 70 80 2025-08-03 00:09:10.105595
4870_0041 reportdate 2025-07-01 2025-08-01 2025-08-03 00:09:10.105595

fiddle

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.