Its a sort of CDC ( Change Data Capture ) scenario in which I am trying to compare new data (in tblNewData) with old data (in tblOldData), and logging the changes into a log table (tblExpectedDataLog) including Column names of the changed values, new value itself and the old value in that column. Sample tables creation and data insertion script is as below. Challenge to me is, that the actual new and old tables are having more than 250 columns (just shortened the things to give you a better understanding). and I wish to avoid 250 conditions for each column to compare/check whether the values have changed or not.
Create Table tblNewData
(
TransactionID string,
ReportDate date,
Comments string,
Premium int
);
Insert Into tblNewData
Select '1116_0025' As TransactionID, '2025-07-30' As ReportDate, 'ghi' As Comments, 105 As Premium
Union
Select '2540_0038' As TransactionID, '2025-07-30' As ReportDate, 'jkl' As Comments, 100 As Premium
Union
Select '3459_0001' As TransactionID, '2025-07-30' As ReportDate, 'pqr' As Comments, 80 As Premium
Union
Select '4870_0041' As TransactionID, '2025-08-01' As ReportDate, 'bbbb' As Comments, 80 As Premium;
Create Table tblOldData
(
TransactionID string,
ReportDate date,
Comments string,
Premium int,
ActiveFlag boolean
);
Insert Into tblOldData
Select '1116_0025' As TransactionID, '2025-07-30' As ReportDate, 'def' As Comments, 95 As Premium, 1 As ActiveFlag
Union
Select '1116_0025' As TransactionID, '2025-07-30' As ReportDate, 'abc' As Comments, 90 As Premium, 0 As ActiveFlag
Union
Select '2540_0038' As TransactionID, '2025-07-30' As ReportDate, 'jkl' As Comments, 100 As Premium, 1 As ActiveFlag
Union
Select '3459_0001' As TransactionID, '2025-07-30' As ReportDate, 'mno' As Comments, 70 As Premium, 1 As ActiveFlag
Union
Select '4870_0041' As TransactionID, '2025-07-01' As ReportDate, 'bbbb' As Comments, 80 As Premiums, 1 As ActiveFlag;
Create Table tblExpectedDataLog
(
TransactionID string,
Column_name_of_changed_value string,
PreviousValue string,
NewValue string,
LogDateTime timestamp
);
So I thought of getting the column names from Information_Schema.columns and comparing them dynamically. And to achieve this I tried using my old way of using STUFF() and for xml path('') to stuff the query with 250 columns and generate the SQL dynamically. But unfortunately Azure Databricks is not supporting this XML Path() function.
Eventually coming here to seek any possible solution?