I want to release a simple tool based on jq to transform some common Office 365 Unified Audit Log events into a tabular report format, but having challenges with the way certain key arrays are nested. In particular, when i get down to Folders[] that contain sets of Ids, Paths, and FolderItems[] that contain rows of message IDs and sizes, I can't figure out a way to make the related values from the arrays stay in sync / collate - instead I am getting massive combinations of every value as though I'm unintentionally iterating through them.
Here's some sample data:
{"CreationTime":"2024-02-06T12:13:14","Id":"abcdabcd-1234-1234-5555-888888888888","Operation":"MailItemsAccessed","ResultStatus":"Succeeded","UserId":"[email protected]","ClientIPAddress":"5.5.5.5","Folders":[{"FolderItems":[{"InternetMessageId":"<[email protected]>","SizeInBytes":12345},{"InternetMessageId":"<[email protected]>","SizeInBytes":11122},{"InternetMessageId":"<[email protected]>","SizeInBytes":88888}],"Id":"EEEEEEEE","Path":"\\Outbox"},{"FolderItems":[{"InternetMessageId":"<[email protected]>","SizeInBytes":44444},{"InternetMessageId":"<[email protected]>","SizeInBytes":100000},{"InternetMessageId":"<[email protected]>","SizeInBytes":109000},{"InternetMessageId":"<[email protected]>","SizeInBytes":22000},{"InternetMessageId":"<[email protected]>","SizeInBytes":333333}],"Id":"FFFFFFFFFFFFFFFFFAB","Path":"\\Inbox"}]}
{"CreationTime":"2024-02-06T20:00:00","Id":"abcdabcd-1234-1234-6666-9999999999999","Operation":"MailItemsAccessed","ResultStatus":"Succeeded","UserId":"[email protected]","ClientIPAddress":"7.7.7.7","Folders":{"FolderItems":[{"InternetMessageId":"<[email protected]>","SizeInBytes":77777},{"InternetMessageId":"<[email protected]>","SizeInBytes":888888},{"InternetMessageId":"<[email protected]>","SizeInBytes":99999}],"Id":"12341234","Path":"\\Temp"}}
Desired output:
| CreationTime | Id | UserId | ClientIPAddress | FolderId | FolderPath | InternetMessageId | SizeInBytes |
|---|---|---|---|---|---|---|---|
| 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [email protected] | 5.5.5.5 | EEEEEEEE | \Outbox | [email protected] | 12345 |
| 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [email protected] | 5.5.5.5 | EEEEEEEE | \Outbox | [email protected] | 11122 |
| 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [email protected] | 5.5.5.5 | EEEEEEEE | \Outbox | [email protected] | 88888 |
| 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [email protected] | 5.5.5.5 | FFFFFFFFFFFFFFFFFAB | \Inbox | [email protected] | 44444 |
| 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [email protected] | 5.5.5.5 | FFFFFFFFFFFFFFFFFAB | \Inbox | [email protected] | 100000 |
| 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [email protected] | 5.5.5.5 | FFFFFFFFFFFFFFFFFAB | \Inbox | [email protected] | 109000 |
| 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [email protected] | 5.5.5.5 | FFFFFFFFFFFFFFFFFAB | \Inbox | [email protected] | 22000 |
| 2024-02-06T12:13:14 | abcdabcd-1234-1234-5555-888888888888 | [email protected] | 5.5.5.5 | FFFFFFFFFFFFFFFFFAB | \Inbox | [email protected] | 333333 |
| 2024-02-06T20:00:00 | 12341234 | [email protected] | 7.7.7.7 | 12341234 | \Temp | [email protected] | 77777 |
| 2024-02-06T20:00:00 | 12341234 | [email protected] | 7.7.7.7 | 12341234 | \Temp | [email protected] | 888888 |
| 2024-02-06T20:00:00 | 12341234 | [email protected] | 7.7.7.7 | 12341234 | \Temp | [email protected] | 99999 |
Note that the .Folders element can sometimes come in string format but that I was able to easily conditionally load using fromjson. For example:
[...]"Folders": "[{\"FolderItems\":[{\"InternetMessageId\":\""Fo<[email protected]>\",\"SizeInBytes\":12345},[...]
Code so far:
cat | jq '
if has("Folders") then
if(.Folders | type=="string") and .Folders != "" then .Folders |= fromjson end |
if(.Folders | type=="string") and .Folders == "" then .Folders = null end
end | .' | # works up to here at least
jq '
if has("Item") then .Item |= (if type=="string" and .!="" then fromjson else {} end) else .Item|={} end |
if has("Item") then
if .Item | has("Id") then .ItemId = .Item.Id else .ItemId={} end |
if .Item | has("ParentFolder") then
.ItemParentFolderId=.Item.ParentFolder.Id? |
.ItemParentFolderPath=.Item.ParentFolder.Path? |
.ItemParentFolderName=.Item.ParentFolder.Name?
end
end | . ' | cat # works up to here at least
jq '
if has("Folders") then
if (.Folders | select(type=="array")) then
.Folders[].Id? |
.FoldersPath=.Folders[].Path? |
.FoldersFolderItems=.Folders[].FolderItems?
else . end
end
' |
jq -r '. | (.TimeGenerated // .CreationTime) as $EventTime |
.ClientIP = if .ClientIP == "" then null else .ClientIP end |
.ClientIP_ = if .ClientIP_ == "" then null else .ClientIP_ end |
.Client_IPAddress = if .Client_IPAddress == "" then null else .Client_IPAddress end |
.ClientIPAddress = if .ClientIPAddress == "" then null else .ClientIPAddress end |
.ActorIpAddress = if .ActorIpAddress == "" then null else .ActorIpAddress end |
(.ClientIP // .ClientIP_ // .Client_IPAddress // .ClientIPAddress // .ActorIpAddress) as $IPAddress |
(.UserId // .UserId_) as $LogonUser |
.FFIIMI as $InternetMessageId |
.FFISIB as $SizeInBytes |
{EventTime: $EventTime, IPAddress: $IPAddress, LogonUser: $LogonUser, InternetMessageId: $InternetMessageId, SizeInBytes: $SizeInBytes} + . |
[.Id, .EventTime, .IPAddress, .LogonUser, .MailboxOwnerUPN, .Operation, .InternetMessageId, .SizeInBytes] | @csv'