2

I want to release a simple tool based on jq to transform some common Office 365 Unified Audit Log events into a tabular report format, but having challenges with the way certain key arrays are nested. In particular, when i get down to Folders[] that contain sets of Ids, Paths, and FolderItems[] that contain rows of message IDs and sizes, I can't figure out a way to make the related values from the arrays stay in sync / collate - instead I am getting massive combinations of every value as though I'm unintentionally iterating through them.

Here's some sample data:

{"CreationTime":"2024-02-06T12:13:14","Id":"abcdabcd-1234-1234-5555-888888888888","Operation":"MailItemsAccessed","ResultStatus":"Succeeded","UserId":"[email protected]","ClientIPAddress":"5.5.5.5","Folders":[{"FolderItems":[{"InternetMessageId":"<[email protected]>","SizeInBytes":12345},{"InternetMessageId":"<[email protected]>","SizeInBytes":11122},{"InternetMessageId":"<[email protected]>","SizeInBytes":88888}],"Id":"EEEEEEEE","Path":"\\Outbox"},{"FolderItems":[{"InternetMessageId":"<[email protected]>","SizeInBytes":44444},{"InternetMessageId":"<[email protected]>","SizeInBytes":100000},{"InternetMessageId":"<[email protected]>","SizeInBytes":109000},{"InternetMessageId":"<[email protected]>","SizeInBytes":22000},{"InternetMessageId":"<[email protected]>","SizeInBytes":333333}],"Id":"FFFFFFFFFFFFFFFFFAB","Path":"\\Inbox"}]}
{"CreationTime":"2024-02-06T20:00:00","Id":"abcdabcd-1234-1234-6666-9999999999999","Operation":"MailItemsAccessed","ResultStatus":"Succeeded","UserId":"[email protected]","ClientIPAddress":"7.7.7.7","Folders":{"FolderItems":[{"InternetMessageId":"<[email protected]>","SizeInBytes":77777},{"InternetMessageId":"<[email protected]>","SizeInBytes":888888},{"InternetMessageId":"<[email protected]>","SizeInBytes":99999}],"Id":"12341234","Path":"\\Temp"}}

Desired output:

CreationTime Id UserId ClientIPAddress FolderId FolderPath InternetMessageId SizeInBytes
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [email protected] 5.5.5.5 EEEEEEEE \Outbox [email protected] 12345
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [email protected] 5.5.5.5 EEEEEEEE \Outbox [email protected] 11122
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [email protected] 5.5.5.5 EEEEEEEE \Outbox [email protected] 88888
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [email protected] 5.5.5.5 FFFFFFFFFFFFFFFFFAB \Inbox [email protected] 44444
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [email protected] 5.5.5.5 FFFFFFFFFFFFFFFFFAB \Inbox [email protected] 100000
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [email protected] 5.5.5.5 FFFFFFFFFFFFFFFFFAB \Inbox [email protected] 109000
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [email protected] 5.5.5.5 FFFFFFFFFFFFFFFFFAB \Inbox [email protected] 22000
2024-02-06T12:13:14 abcdabcd-1234-1234-5555-888888888888 [email protected] 5.5.5.5 FFFFFFFFFFFFFFFFFAB \Inbox [email protected] 333333
2024-02-06T20:00:00 12341234 [email protected] 7.7.7.7 12341234 \Temp [email protected] 77777
2024-02-06T20:00:00 12341234 [email protected] 7.7.7.7 12341234 \Temp [email protected] 888888
2024-02-06T20:00:00 12341234 [email protected] 7.7.7.7 12341234 \Temp [email protected] 99999

Note that the .Folders element can sometimes come in string format but that I was able to easily conditionally load using fromjson. For example:

[...]"Folders": "[{\"FolderItems\":[{\"InternetMessageId\":\""Fo<[email protected]>\",\"SizeInBytes\":12345},[...]

Code so far:

cat | jq '
    if has("Folders") then
        if(.Folders | type=="string") and .Folders != "" then .Folders |= fromjson  end |
        if(.Folders | type=="string") and .Folders == "" then .Folders = null end
    end | .' |     # works up to here at least
    jq '
if has("Item") then .Item |= (if type=="string" and .!="" then fromjson else {} end) else .Item|={}  end |
    if has("Item") then
            if .Item | has("Id") then .ItemId = .Item.Id else .ItemId={} end |
            if .Item | has("ParentFolder") then
                .ItemParentFolderId=.Item.ParentFolder.Id? |
                    .ItemParentFolderPath=.Item.ParentFolder.Path? |
                    .ItemParentFolderName=.Item.ParentFolder.Name?
            end
        end | . ' | cat # works up to here at least
    jq '
    if has("Folders") then
        if (.Folders | select(type=="array")) then
            .Folders[].Id? |
            .FoldersPath=.Folders[].Path? |
            .FoldersFolderItems=.Folders[].FolderItems?
        else . end
    end
    ' |
jq -r '. | (.TimeGenerated // .CreationTime) as $EventTime |
.ClientIP = if .ClientIP == "" then null else .ClientIP end |
.ClientIP_ = if .ClientIP_ == "" then null else .ClientIP_ end |
.Client_IPAddress = if .Client_IPAddress == "" then null else .Client_IPAddress end |
.ClientIPAddress = if .ClientIPAddress == "" then null else .ClientIPAddress end |
.ActorIpAddress = if .ActorIpAddress == "" then null else .ActorIpAddress end |
(.ClientIP // .ClientIP_ // .Client_IPAddress // .ClientIPAddress // .ActorIpAddress) as $IPAddress |
(.UserId // .UserId_) as $LogonUser |
.FFIIMI as $InternetMessageId |
.FFISIB as $SizeInBytes |
{EventTime: $EventTime, IPAddress: $IPAddress, LogonUser: $LogonUser, InternetMessageId: $InternetMessageId, SizeInBytes: $SizeInBytes} + . |
[.Id, .EventTime, .IPAddress, .LogonUser, .MailboxOwnerUPN, .Operation, .InternetMessageId, .SizeInBytes] | @csv'
0

1 Answer 1

0

I'm starting with the JSON that you have provided as sample data. It is unclear whether this JSON is preprocessed in some way or not.

Since the top-level Folders array seems to not be an array if it contains a single item, we first need to turn it into an array if it isn't one already. In jq, we can do that with

.Folders |= (if type == "array" then . else [.] end)

The general idea of the rest of the transformation is to duplicate the upper-level data, i.e. some of the key+value pairs from the top-most level and the Id and Path key from each Folders element, into the lowest level, the FolderItems elements. We can then convert each FolderItems element into a CSV record.

To avoid keys with duplicate names, we also need to rename the Folders element's Id key to FolderId (and rename Path from the same level to FolderPath, for consistency).

The data from the top-most level that we want can be picked out and transferred down in an internal variable using

pick(.CreationTime, .Id, .UserId, .ClientIPAddress) as $record

This will create $record as

{
  "CreationTime": "2024-02-06T12:13:14",
  "Id": "abcdabcd-1234-1234-5555-888888888888",
  "UserId": "[email protected]",
  "ClientIPAddress": "5.5.5.5"
}

... for the first JSON object. Note that we get the keys in the order we want in the final CSV output.

We can then extract only the Folders elements with .Folders[] and pick out the Id and Path for each. Since we want to rename these, we can't use pick() again, so we need to be a bit more hands-on:

.Folders[] | { FolderId: .Id, FolderPath: .Path } as $folder

We may then use .FolderItems[] to get a set of FolderItems elements, which we can add $record and $folder in front of:

.FolderItems[] | $record + $folder + .

As a single jq expression:

.Folders |= (if type == "array" then . else [.] end) |
pick(.CreationTime, .Id, .UserId, .ClientIPAddress) as $record |
.Folders[] | { FolderId: .Id, FolderPath: .Path } as $folder |
.FolderItems[] | $record + $folder + .

The result, given the data in the question:

{
  "CreationTime": "2024-02-06T12:13:14",
  "Id": "abcdabcd-1234-1234-5555-888888888888",
  "UserId": "[email protected]",
  "ClientIPAddress": "5.5.5.5",
  "FolderId": "EEEEEEEE",
  "FolderPath": "\\Outbox",
  "InternetMessageId": "<[email protected]>",
  "SizeInBytes": 12345
}
{
  "CreationTime": "2024-02-06T12:13:14",
  "Id": "abcdabcd-1234-1234-5555-888888888888",
  "UserId": "[email protected]",
  "ClientIPAddress": "5.5.5.5",
  "FolderId": "EEEEEEEE",
  "FolderPath": "\\Outbox",
  "InternetMessageId": "<[email protected]>",
  "SizeInBytes": 11122
}

(etc.)

I would personally convert this set of JSON objects to CSV using Miller:

$ jq '.Folders |= (if type == "array" then . else [.] end) | pick(.CreationTime, .Id, .UserId, .ClientIPAddress) as $record | .Folders[] | { FolderId: .Id, FolderPath: .Path } as $folder | .FolderItems[] | $record + $folder + .' file | mlr --j2c cat
CreationTime,Id,UserId,ClientIPAddress,FolderId,FolderPath,InternetMessageId,SizeInBytes
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,EEEEEEEE,\Outbox,<[email protected]>,12345
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,EEEEEEEE,\Outbox,<[email protected]>,11122
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,EEEEEEEE,\Outbox,<[email protected]>,88888
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,44444
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,100000
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,109000
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,22000
2024-02-06T12:13:14,abcdabcd-1234-1234-5555-888888888888,[email protected],5.5.5.5,FFFFFFFFFFFFFFFFFAB,\Inbox,<[email protected]>,333333
2024-02-06T20:00:00,abcdabcd-1234-1234-6666-9999999999999,[email protected],7.7.7.7,12341234,\Temp,<[email protected]>,77777
2024-02-06T20:00:00,abcdabcd-1234-1234-6666-9999999999999,[email protected],7.7.7.7,12341234,\Temp,<[email protected]>,888888
2024-02-06T20:00:00,abcdabcd-1234-1234-6666-9999999999999,[email protected],7.7.7.7,12341234,\Temp,<[email protected]>,99999

To do the conversion to CSV using jq in place of Miller, replace the mlr command with something like

jq -s -r '(first|keys|@csv), map([.[]]|@csv)[]'

This would pick the keys from the first input object as the CSV headers and would CSV-format all values into one record per object.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.